Teradata Columnar

Teradata Confidential — Copyright © 2011-2012 Teradata Corp. — All Rights Reserved

Teradata Columnar

By: Paul Sinclair and Carrie Ballinger Date: January 16, 2012

Doc: 541-0009036A02

Release: Teradata 14.0

Abstract: Teradata Columnar is introduced in Teradata 14.0 and consists of column partitioning, a choice of columnar or row storage for a column partition, autocompression for columnar storage, and other supporting capabilities. This orange book provides usage considerations, examples, recommendations, and technical details for Teradata Columnar.

This document is intended for use by Teradata Customers Only

In no case will you cause this document or its contents to be disseminated to any third party, reproduced or copied by any means (in whole or in part) without Teradata's prior written consent. Please read the detailed copyright notice on the following page.

541-0009036A02 Teradata Columnar

Teradata Confidential — Copyright © 2011-2012 Teradata Corp. — All Rights Reserved 541-0009036 A02

TERADATA CONFIDENTIAL

Copyright © 2011-2012 by Teradata Corporation.

All Rights Reserved.

This document, which includes the information contained herein,: (i) is the exclusive property of Teradata Corporation; (ii) constitutes Teradata confidential information; (iii) may not be disclosed by you to third parties; (iv) may only be used by you for the exclusive purpose of facilitating your internal Teradata-authorized use of the Teradata product(s) described in this document to the extent that you have separately acquired a written license from Teradata for such product(s); and (v) is provided to you solely on an "AS-IS" basis. In no case will you cause this document or its contents to be disseminated to any third party, reproduced or copied by any means (in whole or in part) without Teradata's prior written consent. Any copy of this document, or portion thereof, must include this notice, and all other restrictive legends appearing in this document. Note that any product, process or technology described in this document may be the subject of other intellectual property rights reserved by Teradata and are not licensed hereunder. No license rights will be implied. Use, duplication, or disclosure by the United States government is subject to the restrictions set forth in DFARS 252.227-7013(c)(1)(ii) and FAR 52.227-19. Other brand and product names used herein are for identification purposes only and may be trademarks of their respective companies.

Revision History

Revision/Version

Author(s) Date Comments

A01 Paul Sinclair Carrie Ballinger

07/29/11 Initial review version.

A02 Paul Sinclair Carrie Ballinger

01/16/12 Updated in response to A01 review comments.

Teradata Columnar 541-0009036A02

541-0009036 A02 Teradata Confidential — Copyright © 2011-2012 Teradata Corp. — All Rights Reserved I

TABLE OF CONTENTS

Chapter 1: Introduction ............................................................................................ 1 1.1 Audience ............................................................................................................. 1 1.2 Additional Information ......................................................................................... 1

Chapter 2: Why Teradata Columnar? ...................................................................... 2 2.1 Column vs. Row Partitioning ............................................................................... 2 2.2 Efficiencies of Column Partitioning ...................................................................... 4 2.3 Telcom Use Case ................................................................................................ 6

Chapter 3: Storing Column-Partitioned Data on Disk ............................................ 8 3.1 How Column Partitions are Formatted ................................................................ 8 3.1.1 COLUMN Format .......................................................................................................... 8 3.1.2 ROW Format ................................................................................................................. 9 3.1.3 COLUMN and ROW Formats ..................................................................................... 10 3.2 Columns and Column Partition Numbers .......................................................... 10 3.3 Rowids for NoPI Tables .................................................................................... 10 3.4 Applying NoPI Rowid Conventions to a Column-Partitioned Table ................... 11

Chapter 4: Autocompression ................................................................................. 13

Chapter 5: Reading from Column-Partitioned Tables .......................................... 15 5.1 Scanning Column-Partitioned Data ................................................................... 15 5.2 Bringing Together Columns from the Same Logical Row ................................. 17 5.3 Reading Column-Partitioned Data using a Secondary Index ............................ 18 5.4 Joins with a Column-Partitioned Table .............................................................. 19 5.5 Row Hash Locking ............................................................................................ 19

Chapter 6: Loading and Maintenance Operations ............................................... 20 6.1 INSERT-SELECT .............................................................................................. 20 6.2 Deleting Rows ................................................................................................... 21 6.3 The Delete Column Partition ............................................................................. 22 6.4 Updating Rows .................................................................................................. 22

Chapter 7: Row Partitioning with Column Partitioning ........................................ 23 7.1 Determining the Column-Partitioning Level ....................................................... 23 7.2 Avoiding Over Partitioning a Table .................................................................... 24

Chapter 8: EXPLAIN Terminology ......................................................................... 25 8.1 Small SELECT from a Column-Partitioned Table ............................................. 25 8.2 Large Selection from a Column-Partitioned Table ............................................ 26 8.3 Select All Rows from a Column/Row-Partitioned Table .................................... 27

Chapter 9: Guidelines for Use ................................................................................ 29 9.1 General Considerations .................................................................................... 29 9.2 CP and NoPI Common Considerations ............................................................. 31 9.3 Differences between CP and NoPI Tables ........................................................ 32 9.4 Space Usage Considerations ............................................................................ 32

Chapter 10: Performance Considerations .............................................................. 33

Chapter 11: Tuning Opportunities ........................................................................... 36



11.1 Autocompression On or Off .............................................................................. 37 11.2 COLUMN vs. ROW Format ............................................................................... 38 11.2.1 COLUMN Format ........................................................................................................ 38 11.2.2 ROW Format ............................................................................................................... 39 11.3 Grouping Columns into a Column Partition ....................................................... 40 11.4 PPICacheThrP .................................................................................................. 41 11.5 DATABLOCKSIZE/PermDBSize ....................................................................... 42 11.6 FREESPACE/FreeSpacePercent ..................................................................... 42

Chapter 12: Final Thoughts ..................................................................................... 43

Appendix A: Comparative Performance Tests ........................................................ 44 A.1 Size Comparisons ............................................................................................. 44 A.2 Full-Table Scan Comparison ............................................................................ 45 A.3 Simple Aggregation Comparison ...................................................................... 46 A.4 Rollup Query Comparison ................................................................................. 47 A.5 Join Comparison ............................................................................................... 49 A.6 I/O Intensive Request ....................................................................................... 51 A.7 INSERT-SELECT Comparisons ....................................................................... 52 A.8 Conclusions ...................................................................................................... 53

Appendix B: Frequently Ask Questions ................................................................... 54 B.1 What is Teradata Columnar? ............................................................................ 54 B.2 Is Teradata Columnar enabled by default? ....................................................... 55 B.3 Does enabling Teradata Columnar mean all tables will be columnar? ............. 55 B.4 Can I alter an existing table to be a CP table? .................................................. 55 B.5 Should I change all my tables to be column partitioned? .................................. 55 B.6 Why use a CP table or join index? .................................................................... 55 B.7 Why is there an increase in CPU usage? ......................................................... 55 B.8 Why no primary index (NoPI)? .......................................................................... 56 B.9 Can a CP table have a PPI or MLPPI? ............................................................. 56 B.10 Can NoPI table have a PPI or MLPPI? ............................................................. 57 B.11 Can a CP table be temporal? ............................................................................ 57 B.12 Can a GLOBAL TEMPORARY or VOLATILE table be column partitioned? ..... 57 B.13 Why is dictionary autocompression local to a container? ................................. 57 B.14 Are rows inserted using round robin distribution to the AMPs? ........................ 58 B.15 Are CP tables usable when data is highly volatile? .......................................... 58 B.16 Why is space not reclaimed for a DELETE? ..................................................... 58 B.17 Why isn’t FastLoad supported for a CP table? ................................................. 59 B.18 Why isn’t MultiLoad supported for a CP table? ................................................. 59 B.19 Why isn’t the upsert form of UPDATE supported for a CP table? ..................... 59 B.20 Why isn’t MERGE supported for a CP table? ................................................... 59 B.21 Why may data skew occur for restore and copy of a CP table? ....................... 59 B.22 Why does data skew occur for Reconfig? ......................................................... 59 B.23 Why may data skew occur for INSERT-SELECT into a CP table? ................... 59 B.24 Why may data skew occur for Down-AMP recovery? ....................................... 59

Appendix C: DDL Details ........................................................................................... 60 C.1 Column Partitioning Syntax ............................................................................... 60 C.2 Column Partitioning ........................................................................................... 62 C.3 CREATE TABLE Statement .............................................................................. 63 C.4 CREATE JOIN INDEX Statement ..................................................................... 68 C.5 Replication ........................................................................................................ 69


541-0009036 A02 Teradata Confidential — Copyright © 2011-2012 Teradata Corp. — All Rights Reserved I

C.6 ALTER TABLE Statement ................................................................................. 70 C.6.1 Adding Columns to a Table ......................................................................................... 70 C.6.2 Dropping Columns from a Column-Partitioned Table ................................................. 70 C.6.3 RI Error Table .............................................................................................................. 71 C.6.4 REVALIDATE .............................................................................................................. 71 C.6.5 MODIFY Primary Index and/or Partitioning ................................................................ 72 C.6.6 ALTER TABLE ... TO CURRENT Statement .............................................................. 72 C.6.7 ALTER TABLE Examples ........................................................................................... 72 C.7 COLLECT/DROP/HELP STATISTICS Statements ........................................... 78

Appendix D: Loading a Column-Partitioned Table .................................................. 79 D.1 Load Utilities ...................................................................................................... 81 D.2 TPump Array INSERT into a CP Table ............................................................. 81 D.2.1 SERIALIZE .................................................................................................................. 83 D.2.2 TPump Sessions ......................................................................................................... 83

Appendix E: EXPLAIN Phrases and Examples ........................................................ 84 E.1 EXPLAIN Phrases ............................................................................................. 84 E.2 EXPLAIN Examples .......................................................................................... 91

Appendix F: Miscellaneous Topics ........................................................................ 100 F.1 Column-Partitioned Table as a Source Table ................................................. 100 F.2 Archive and Restore ........................................................................................ 100 F.3 CheckTable ..................................................................................................... 100 F.4 Data Skewing .................................................................................................. 101 F.4.1 Restore/Copy ............................................................................................................ 101 F.4.2 Reconfig .................................................................................................................... 102 F.4.3 INSERT-SELECT (CP/NoPI Target Table) ............................................................... 102 F.4.4 Down-AMP Recovery ................................................................................................ 103

Appendix G: Partitioning Meta Data ....................................................................... 105 G.1 System-Derived Column PARTITION[#Ln] ..................................................... 105 G.2 HELP COLUMN Statement ............................................................................. 105 G.3 SHOW TABLE, JOIN INDEX, and DML Statements ....................................... 106 G.4 HELP INDEX Statement ................................................................................. 107 G.5 DBC.TVM System Table ................................................................................. 107 G.6 DBC.TablesV[X] System View ........................................................................ 107 G.7 DBC.TVFields System Table .......................................................................... 107 G.8 DBC.ColumnsV[X] System View ..................................................................... 108 G.9 DBC.TableConstraints System Table ............................................................. 108 G.10 DBC.PartitioningConstraintsV[X] System Views ............................................. 110 G.11 DBC.DBQLStepTbl ......................................................................................... 111 G.12 DBC.QryLogStepsV System View .................................................................. 112 G.13 Query Capture Database ................................................................................ 112 G.14 XML Plan ......................................................................................................... 112

Appendix H: System Settings ................................................................................. 113 H.1 DBS Control Fields .......................................................................................... 113 H.1.1 PPICacheThrP .......................................................................................................... 113 H.1.2 PrimaryIndexDefault ................................................................................................. 113 H.2 Cost Profile Constants .................................................................................... 113 H.2.1 PartitioningConstraintForm ....................................................................................... 114 H.2.2 PPICacheThrP .......................................................................................................... 114

Glossary ............................................................................................................. 115



Table of Figures

Figure 1: Row Partitioning by Date .............................................................................................. 2 Figure 2: Column Partitioning ...................................................................................................... 3 Figure 3: Column and Row Partitioning Combined ...................................................................... 4 Figure 4: Comparison of Physical Database Design Choices ..................................................... 5 Figure 5: Using Containers for Column Partitions ........................................................................ 8 Figure 6: Using Subrows for Column Partitions ........................................................................... 9


541-0009036 A02 Teradata Confidential — Copyright © 2011-2012 Teradata Corp. — All Rights Reserved 1

Chapter 1: Introduction

Teradata 14.0 introduces Teradata Columnar – an option for organizing the data of a user-defined table or join index on disk.

Teradata Columnar offers the ability to partition a table or join index by column. It includes column-storage as an alternative choice to row-storage for a column partition and autocompression. Column partitioning can be used alone in a single-level partitioning definition or with row partitioning in a multilevel partitioning definition.

Teradata Columnar is a new paradigm for partitioning, storing data, and compression that changes the cost-benefit tradeoffs of the available physical database design choices and their combinations. Teradata Columnar provides a benefit to the user by reducing I/O for certain classes of queries while at the same time decreasing space usage.

A column-partitioned (CP) table or join index has several key characteristics which are explained in further detail in this orange book:

1. It does not have a primary index.1

2. Each of its column partitions can be composed of a single column or multiple columns.

3. Each column partition usually contains multiple physical rows.

4. A new physical row format COLUMN may be utilized for a column partition; such a physical row is called a container. This is used to implement column-storage, row header compression, and autocompression for a column partition.

5. Alternatively, a column partition may have physical rows with ROW format that are used to implement row-storage; such a physical row is called a subrow.

This orange book describes how Teradata Columnar works within the Teradata Database. Additionally, Chapter 9: Guidelines for Use and Chapter 10: Performance Considerations provide must read guidance on the use of this physical database design feature.

1.1 Audience

This book is targeted to experienced Teradata Database administrators and those with a background in physical database design concepts. However, the content is designed to be readily understood by any reader with a reasonable background in database technologies.

1.2 Additional Information

To get the most from this book, it is recommended that the reader be familiar with row partitioning and multilevel partitioning as described in the Orange Book: Partitioned Primary Index Usage (Single-Level and Multilevel Partitioning). For more information about 8-byte partitioning, see the Orange Book: Increased Partition Limit and other Partitioning Enhancements. A familiarity with no primary index (NoPI) tables is also recommended; see the Orange Book: No Primary Index Table User’s Guide.

For terms that may be unfamiliar, refer to the glossary on page 115.

1 See section B.8 for why a primary index is not allowed in Teradata 14.0 for a column-partitioned (CP)

table.


2 Teradata Confidential — Copyright © 2011-2012 Teradata Corp. — All Rights Reserved 541-0009036 A02

Chapter 2: Why Teradata Columnar?

When becoming acquainted with the concepts of Teradata Columnar, and specifically column partitioning, it may be helpful to consider how column partitioning is similar to, yet different from, row partitioning. Row partitioning is defined by a partitioning expression in a PARTITION BY clause and is the partitioning used with a partitioned primary index (PPI).

2.1 Column vs. Row Partitioning

Row partitioning allows you to partition the data of a table horizontally. Each row partition clusters together a subset of the table’s rows that are assigned to an AMP, for example one day of transaction data. If you have a query that specifies a value or a range of values for the partitioning column, fewer rows need to be accessed compared to doing a full-table scan.

With a PPI table, each table row corresponds to a physical row.2 The only thing that is different from a non-PPI table is that the rowid for each physical row in the table carries a nonzero internal partition number. A given row for both PPI and nonpartitioned primary-indexed (PI) tables is physically stored on an AMP’s disks first by internal partition number (which is always zero for a nonpartitioned PI table), then by row hash, and finally by uniqueness (that is, in rowid order). With row partitioning, the administrator can define one or several partitioning levels and, for each level, define a partitioning expression which is used to compute the partition number for a row. For example,

CREATE TABLE Sales_PPI ( TxnNo INTEGER, TxnDate DATE, ItemNo INTEGER, Quantity INTEGER ) PRIMARY INDEX (TxnNo), PARTITION BY RANGE_N(TxnDate BETWEEN DATE '2011-01-01' AND DATE '2011-12-31' EACH INTERVAL '1' DAY);

A simple example of the data for the above PPI table is shown in the following Figure 1. In this example, the table is partitioned by row on TxnDate.

QuantityItemNoTxnDateTxnNo

281505-30-2011530

111005-30-2011450

143705-30-2011290

312405-29-2011100

175605-29-2011100


281505-30-2011530

111005-30-2011450

143705-30-2011290

312405-29-2011100

175605-29-2011100 One row partition

One row partition

Figure 1: Row Partitioning by Date

With column partitioning, each column or groups of columns in the table becomes a partition containing the column partition values of that column partition. This is a simpler partitioning approach since there is no need to define partitioning expressions. In addition, only one of the partitioning levels may be column partitioned, and determining partition elimination is very simple.

2 See glossary on page 115 for the definitions of physical row and table row.



For example,

CREATE TABLE Sales_CP ( TxnNo INTEGER, TxnDate DATE, ItemNo INTEGER, Quantity INTEGER ) PARTITION BY COLUMN;

This creates a column-partitioned (CP) table that partitions the data of the table vertically. Note that a primary index is not specified so this is NoPI table. Moreover, a primary index must not be specified if the table is column partitioned.

The following Figure 2 illustrates some sample data for the above CP table with each column in its own column partition (so a column partition value is just a value of that column).


281505-30-2011530

111005-30-2011450

143705-30-2011290

312405-29-2011100

175605-29-2011100


281505-30-2011530

111005-30-2011450

143705-30-2011290

312405-29-2011100

175605-29-2011100

One column partition




Figure 2: Column Partitioning

On an AMP, the column data for a column partition is clustered together instead of the rows.

To better understand the differences between row and column partitioning, consider an architectural example where there are two different, contrasting approaches to constructing living spaces.

Partitioning by family—separate dwellings

This is similar to row partitioning. Each unit contains all the necessary components of household living—a living room, a bedroom, a bathroom, a kitchen, a laundry room. Each house, or bundled set of these different components, is located physically separate from any other house. Once you are in the house, you can easily move from kitchen to living room, to bedroom.

Partitioning by function—dormitories

This is like column partitioning. Various rooms that people live in are grouped together. Bedrooms are congregated in one section of the structure. There is a separate shared dining area and a large group laundry area in the basement. When you enter a dormitory you can either enter the sleeping area, the eating area, or the shared living area, but moving between the different functional areas is more of an effort.

As with database design, there are tradeoffs in selecting the right living space for you. Dormitories have economy of scale advantages, particularly in the area of energy conservation, landscaping, mail delivery, more efficient use of space, and cost to live there. But separate dwellings offer more privacy, more control over your environment, ease of moving between functional areas, and the ability to customize your surroundings.



2.2 Efficiencies of Column Partitioning

The key benefit in defining row partitioning for a table is when queries access a subset of rows based on constraints on the one or more partitioning columns. The major advantage of using column partitioning for a table is to improve the performance of queries that access a subset of the columns from a table either for predicates or projections. Because sets of one or more columns can be stored in separate column partitions, only the column partitions that contain the columns referenced by the query need to be accessed.

The advantages of both can be combined3 further reducing I/O. Fewer data blocks need to be read since more data of interest is packed together into data blocks. For example,

CREATE TABLE Sales_CPRP ( TxnNo INTEGER, TxnDate DATE, ItemNo INTEGER, Quantity INTEGER ) PARTITION BY ( COLUMN, RANGE_N(TxnDate BETWEEN DATE '2011-01-01' AND DATE '2011-12-31' EACH INTERVAL '1' DAY) );

Looking at the following Figure 3, assume that you want a list of all the items sold on May 29, 2011. With both column and row partitioning defined on the table, the query only needs to access column partitions containing items that are associated with the date specified.


281505-30-2011530

111005-30-2011450

143705-30-2011290

312405-29-2011100

175605-29-2011100


281505-30-2011530

111005-30-2011450

143705-30-2011290

312405-29-2011100

175605-29-2011100 One row partition

One row partition





Figure 3: Column and Row Partitioning Combined

Another way to look at the advantages of partitioning is to contrast the data that is accessed when different types of partitioning are defined. Consider this table definition with various physical database designs (shown in Figure 4 on the next page):

CREATE TABLE mytable (A INT, B INT, C CHAR(100),D INT, E INT, F INT, G INT, H INT, I INT, J INT, K INT, L INT);

and the following query based on the above table:

SELECT SUM(F) FROM mytable WHERE B BETWEEN 4 AND 7;

Only columns F and B are referenced by the query even though the table has 12 columns. The different examples in Figure 4 on the next page illustrate the data that has to be accessed by this query when there is no partitioning, when there is row partitioning on column B, when there is column partitioning, and when there are both column and row partitioning.

3 A fuller discussion of multilevel partitioning that combines column and row partitioning takes place in

Chapter 7: Row Partitioning with Column Partitioning.



A

1

2

B

5

9

13

4

C

a 3

D

q

d

8 m

5

1

7

E

9

4

F

9

6

31

3

G

4 6

H

3

3

9 4

8

4

1

1

4

3

5

2

9

66

7

5 1

2

9

0

7

3

5

6

7

3

6

2

48

f

r 1

2

e

u

0

9

I

2

5

J

7

1

87

4

K

4 5

L

1

2

2 8

2

9

6

2

8

5

4

2

1

10

7

8 3

3

6

2

4

7

6 3 3 89 2 d 3 7 5 1 2

Column names

Column values

Column names

Column values

No Partitioning Row Partitioning

A B C D E F G H I J K

1 5 a 3 9 9 4 6 2 7 4 5

L

4 2 5 16 6 r 1 8 2 8 3

66 0 348 u 9 10 2 7

Partitioning column

Primary Index column

A B

5

9

1

C D

8

E F

9

G H

3

6

2

4

I J K L

2

1

2

Column names

Column values

NoPI with Column Partitioning

A B C D E F G H I J K L

1

9

2

Column names

Column values

NoPI with Column and Row Partitioning

5

6

4

1 2

3 4

Figure 4: Comparison of Physical Database Design Choices

Based on the Figure 4 above, Example 1 shows that if your table had no partitioning of any kind, all the data is accessed. Example 2 shows that, by using row partition elimination, only 3 rows are accessed, but with all the columns in those 3 rows included. Example 3 shows column partitioning results in accessing all values in the predicate column (column B) but only the values in column F (in the select list) that correspond to the table rows that met the predicate’s criteria. Example 4 shows what happens when you combine both column and row partitioning (a CP/RP table) with a further reduction in I/O.

If the table is populated with 4 million rows of generated data, the query reads about 9,987 data blocks for the PI or NoPI; about 4,529 data blocks for the PPI table; about 281 data blocks for the CP table; and about 171 data blocks for the CP/RP table.4 The decreased I/O comes with higher CPU usage for this example. Since I/O is often relatively expensive compared to CPU (and CPU is getting faster at a much higher rate than I/O), this can be a reasonable tradeoff in many cases.

Double click on the following left icon to see an animated comparison of accessing a PI, NoPI, CP, and CP/RP table (while viewing, single-click when l is displayed in the lower right corner). The right icon contains slide versions of the figures in this orange book.

Columnar Animiated

Columnar Figures

4 These results are specific to this example. Other cases may have different results.



2.3 Telcom Use Case

A common table in a database for Telcom applications is a call detail record (CDR) table. A row is inserted into the table for each call made. A row in the CDR table includes the call time, call duration, caller’s number, callee’s number, and other attributes of a call.

The following lists common characteristics of a typical CDR table with the advantages (indicated by a +) and disadvantages (indicated by a –) for a column-partitioned CDR table. For a disadvantage, tradeoffs may need to be made or additional tuning may be needed to provide acceptable performance. Why column partitioning is advantageous or not for a characteristic will become clearer as the following chapters are read (after reading the chapters, you may want to revisit this use case). Chapters and sections which provide more information are referenced. Chapters 9 (for guidelines), 10 (for performance considerations), and 11 (for tuning opportunities) are applicable to all these characteristics.

Characteristic Reference An hourly INSERT-SELECT of millions of rows

+ This is the recommended method for loading data into a CP table. Section 6.1 Appendix D

Inserts complete within the maintenance windows

– Longer insert times may occur due to converting rows to columns and applying autocompression.

+ Inserts still complete within the maintenance window with an acceptable impact on the system workload.

Section 6.1 Section A.7Appendix D

Table is large – 3 years of history, 100’s of billions rows

+ Space usage is reduced using autocompression that is enhanced with user-specified MVC and ALC. Also, temperature based block compression (BLC) for automatic compression of data blocks for column partitions that are rarely accessed is used.

Chapter 4

Table is row partitioned by call DATE (by day) or TIMESTAMP (by hour)

+ Row partitioning is supported with column partitioning.

+ Row partition elimination reduces I/O.

Section 2.1 Chapter 7

No or a few updates

– Updates are expensive (handled as a delete of the old row and an insert of the updated row) – space for a deleted old row is not reclaimed for the update. But since updates are rare, this has minimal impact on workload performance and space usage.

+ Updates are allowed for a CP table but it is recommended that they be infrequent.

Section 6.4

No or a few deletes that aren’t whole row partition deletes

– Space for such deleted rows is not reclaimed by the delete. But since deletes are rare, this has minimal impact on space usage.

– When data blocks are read, they may include some logically deleted data which increases the I/O. But since deletes are rare this has minimal impact on performance.

+ Deletes are allowed for a CP table but it is recommended that they be infrequent.

Section 6.4

Old data deleted one or more whole row partitions at a time

+ Delete uses fastpath to delete the row partitions.

+ Delete reclaims storage for the deleted row partitions.

Section 6.2



Characteristic Reference Most queries reference a subset of the columns such that there are sufficient available column partition contexts to access those columns.

+ Column partition elimination reduces the I/O since only the columns referenced by the query need to be read.

Chapter 5 Chapter 8

Section A.3Section A.4

Most queries select a subset of the table rows based on one or more selective predicates and often on a date or timestamp range predicate that leads to row partition elimination

+ The I/O is reduced since columns are only read to retrieve values if all preceding predicates do not disqualify the logical row.

+ Row partition elimination further reduces the I/O since predicates only need to be evaluated for the columns values within the noneliminated row partitions.

Section 2.1 Section 2.2 Chapter 5

SELECTs of all or most columns and all or most rows5 are rare

– When a SELECT of all or most columns and all or most rows is needed, it is more costly than for a non-CP table in order to reconstruct the rows from the columns. The cost increases if the number of column partitions that need to be accessed exceeds the number of available column partition contexts.

+ These queries are allowed but it is recommended that they be infrequent.

Chapter 5 Chapter 8

Section A.2

Queries that would benefit from the CDR table having a primary index are rare or, if more common, perform adequately as a reasonable tradeoff with a column-partitioned CDR table

– Secondary or join indexes can be added to improve the performance of these queries but this does require additional storage and maintenance overhead. Note that these might not be needed if the queries perform adequately without them.

– Also, even though these queries might not perform as well as when the table has a PI or PPI, their performance is acceptable considering the overall workload improvements when using column partitioning.

Chapter 5 Section A.5

Other tables for a Telcom database or tables for other industries that have similar characteristics and advantages may also be good candidates for column partitioning. Particularly, this is the case when the disadvantages are not pronounced or can be compensated for with other complementary physical database design choices.

5 For example, a SELECT * FROM CDR_table; query.



Chapter 3: Storing Column-Partitioned Data on Disk

Column partitioning relies on many already-established capabilities in Teradata. It makes use of the existing file system including rowid structures, data manipulation commands, secondary index definitions, and statistics collection routines. This chapter looks at how some of the usual Teradata functionality is used and enhanced to support column partitioning.

3.1 How Column Partitions are Formatted

The database communicates with the disk subsystem using the file system. The file system performs the physical reads and writes of data blocks based on direct access using a rowid or by sequentially reading through the data blocks (that each consist of one or more physical rows) for a table. The file system always stores physical rows in rowid sequence. A physical row’s rowid and row length are indicated in the physical row’s row header.

There are several different physical row formats. Some existing formats are:

Regular row: The physical row contains a series of values for different columns representing one table row. A PI table, a PPI table, or a NoPI table (that doesn’t have column partitioning) uses this format for the primary and fallback data.

Table header: The physical row contains table header information in a prescribed sequence.

Secondary index: The physical row contains one indexed value plus one or more rowids indicating the table rows in the base table that match to this value.

For a CP table, two new physical row formats are introduced: COLUMN and ROW format.

3.1.1 COLUMN Format

A physical row with COLUMN format supports column-storage for a column partition. A physical row with COLUMN format is referred to as container and each container holds a series of column partition values for a column partition. The following Figure 5 illustrates containers with different sized column partition values for the first two column partitions of a table with fifteen table rows.

Col1_Value1, Col1_Value2, Col1_Value3, Col1_Value4, Col1_Value5

Column Partition #1 requires three containers

Col2_Value1, Col2_Value2, Col2_Value3



Column Partition #2 requires five containers



Column partition values are narrow, more values can fit into one container

Column partition values are wide, fewer values can fit into one container, more containers are requiredCol2_Value10, Col2_Value11, Col2_Value12


Figure 5: Using Containers for Column Partitions



The number of column partition values is the same for each column partition; in Figure 5 on the previous page, this is 15 column partitions values for each column partition since 15 table rows are represented.

The column or group of columns whose column partition values a container represents is recognized based on the internal partition number assigned to that container. When a column partition is stored on disk, one or more containers may be needed to hold all the column partition values of the column partition. Since a container is a physical row, the size of a container may vary but is limited by the maximum physical row size of 65KB.

Note that a typical container (unlike in Figure 5 on the previous page) for the table would contain 1000’s of column partition values for these two column partitions if the table was populated with a large number of rows.

See also section 11.2.1 for more information about COLUMN format.

3.1.2 ROW Format

As an alternative to COLUMN format, a single column partition value for a column partition may be held in a physical row using ROW format. A physical row with ROW format supports row-storage for a column partition and is referred to as a subrow. Each subrow holds only one column partition value for a column partition. A subrow has the same format as a regular row except that it is a subset of the columns for a table row. The following Figure 6 illustrates subrows for column partition #3 with one VARCHAR(1000) column and multicolumn partition #4 with two INTEGER columns and one CHAR(500) column. The table has fifteen table rows (as in section 3.1.1) so there are 15 subrows (not all shown) in each column partition.

Col3_Value1

Column Partition #3 requires 15 subrows

Col4_Value1 Col5_Value1 Col6_Value1

Column Partition #4 requires 15 subrows

Each subrow contains one variable-length column value

Each subrow contains 3 fixed-length column values

Col3_Value2

Col3_Value15

...

...Col4_Value2 Col5_Value2 Col6_Value2

Col4_Value15 Col5_Value15 Col6_Value15

Figure 6: Using Subrows for Column Partitions

The column or group of columns whose column partition value a subrow represents is recognized based on the internal partition number assigned to that subrow. When a column partition with ROW format is stored on disk, as many subrows as there are table rows for the table are needed to hold all the column partition values of the column partition. Since a subrow is a physical row, the size of a subrow, just like a container, may vary but it is limited by the maximum physical row size of 65KB.6

See also section 11.2.2 for more information about ROW format.

6 Note that the table row limit of 65KB would be exceeded before the subrow limit could be exceeded.



3.1.3 COLUMN and ROW Formats

A column partition may have COLUMN format or ROW format but not a mix of both. However, different column partitions in a CP table may have different formats. By default, the system determines the format it considers to be best for a column partition; the system will usually correctly determine the most appropriate format. See section 11.2 for a discussion of when you might consider overriding the system’s choice of format.

For a column partition with ROW format, the subrow for a specific rowid (and therefore its column partition value) can be directly accessed by the file system. For a column partition with COLUMN format, the container for a rowid can also be directly accessed by the file system but then the corresponding column partition value for the rowid must be found within the container.

A container also requires extra bytes to manage multiple column partition values (even when there is only one value in the container) and autocompression. If the containers for a column partition only hold one or a few column partition values, the column partition doesn’t benefit from row header compression but carries the overhead of these extra bytes.

3.2 Columns and Column Partition Numbers

For a CP table, the table header carries the mapping between a column and the specific column partition number to which the column has been assigned.7 More than one column may be assigned to the same column partition. Within the table header, there is a field descriptor for each column, as usual. A column partition number is added to each field descriptor for a CP table. Since the table header for a table being accessed by a query is required to be in the memory of each AMP, the correlation of a column to a column partition number is always readily available system-wide without any additional effort.

3.3 Rowids for NoPI Tables

A CP table, by definition, has no primary index; that is, it is a NoPI table. In order to understand how containers and subrows for column partitions are stored on disk, it is useful to review the differences between a rowid of a PI table and that of a NoPI table.

For a row of a PI table, a primary index value is passed through the hashing algorithm to produce a hash value of which the hash bucket portion determines which AMP owns the row based on the primary hash bucket map. For a NoPI table, there is no primary index value. When a NoPI table is loaded, blocks of rows are randomly assigned to AMPs without undergoing hashing, redistribution, or sorting. In order to accommodate this situation and to satisfy the system’s need to associate a row to a single AMP, a rowid of a NoPI table is formed differently than a rowid of a PI table.

With a NoPI table, all rows owned by a single AMP initially use the lowest hash bucket associated to that AMP by the NoPI hash map in their rowid.8 The NoPI hash map only includes hash buckets from the regular PI hash map where the AMP owning this hash bucket does not change if the number of AMPs increases; this avoids having to move rows to a different AMP

7 There is no particular significance to which number is assigned or the order of the column partitions by

this number. A column partition just needs a number to uniquely identify it internally to the system. Numbers are initially assigned sequentially but they could just as well be assigned arbitrarily. As the table is altered by ALTER TABLE statements, column partitions may be assigned to different numbers and then the assignment will become to appear arbitrary.

8 The next hash bucket from the NoPI hash map is used only if the uniqueness reaches its maximum value. There are 64 hash buckets for a system with 20-bit hash buckets and 4 hash buckets for a system with 16-bit hash buckets available for each AMP from the NoPI hash map.



when doing a Reconfig or restore/copy. The purpose of carrying a hash bucket in the rowid is to have a means to identify the AMP on which a row resides when there is indexed access from a secondary or a join index. Since hashing a primary index value does not take place with a NoPI table, indicating the row’s owning AMP via the hash bucket is all that is required of a rowid for a NoPI table.

With only a hash bucket being carried instead of a full row hash value, the number of bits for the uniqueness can be increased without increasing the size of a rowid. Instead of 32 bits of uniqueness that are available for a PI or PPI table, 44 bits are available for the uniqueness for a NoPI table (if the system has 16-bit hash buckets, 4 bits of the rowid are unused). If the uniqueness reaches the maximum for a hash bucket, the next hash bucket for the AMP from the NoPI hash map is used for the next row with the uniqueness reset to 1. As a result, there is a limit of 1,125,899,906,842,560 rows per AMP 9 for a NoPI table. It is unlikely this limit would be exceeded and even unlikely that more than one hash bucket would be needed; so usually, all rows on an AMP for a NoPI table have the same hash bucket. For more information on NoPI tables, see the Orange Book: No Primary Index Table User’s Guide.

3.4 Applying NoPI Rowid Conventions to a Column-Partitioned Table

The following Figure 7 contrasts the composition of the rowid for PI tables, PPI tables, NoPI tables, and CP tables. “Partition number” represents the internal partition number formed from the partition numbers at each of the partitioning levels.10 For a nonpartitioned table, the internal partition number is zero.

Rowid

Part # Row Hash Uniqueness

Row-Partitioned Table

1st hash bucket on the AMP

NoPI Table

Incremented for each row on that AMP (a Row #)


Incremented for each row on that AMP (a Row #)

Rowid

Part # HB Uniqueness

Always Zero

Column-Partitioned Table

Rowid


Hash Bucket

Nonpartitioned Table

Rowid


Always Zero

Hashed PI value

Hash Bucket

Hashed PI value

Figure 7: Partitioned vs. Nonpartitioned Tables

9 For systems with a 16-bit hash bucket, the limit is reduced to 70,368,744,177,660 rows per AMP. 10 The internal partition number effectively orders physical rows by the partition number of the first level,

then within that, by the partition number of the second level, etc. for each physical row.



As seen in Figure 7 on the previous page, a table with column partitioning uses the same rowid format as does a NoPI table.11 However, in the case of a partitioned table, the internal partition number is nonzero. That internal partition number logically always comes first in the rowid, the same as the rowid of any other table. This ensures that all physical rows within the same combined partition are stored together in the same or adjacent data blocks since physical rows are stored in rowid order. A difference to be aware of with a CP table is that each column partition value has its own rowid, whereas with a nonpartitioned NoPI table each row as its own rowid, as illustrated in the following Figure 8.

Rowid1 HB = 12 Unique = 1

RowidRowid1 HB = 12 Unique = 1

Rowid1st logical row’s column rowids

TxnNo column TxnDate column ItemNo column Quantity column

2nd logical row’s column rowids

Rowid2 HB = 12 Unique = 1

Rowid Rowid3 HB = 12 Unique = 1

Rowid Rowid4 HB = 12 Unique = 1

Rowid

Rowid

1 HB = 12 Unique = 2

RowidRowid


Rowid Rowid


Rowid Rowid


Rowid Rowid


Rowid

Figure 8: Rowids for Column Partition Values

The implications of this rowid format for a CP table are:

1. The rowids for the different column partition values of a logical row are the same except for the column partition number which is unique to each column partition. This means the row partition numbers (if any), hash bucket, and uniqueness are the same for each column partition value of a logical row.

2. Each logical row has a distinct uniqueness (for a given combination of internal partition number and hash bucket value) starting at 1 and incrementing by 1 for each row appended by an INSERT or UPDATE statement.12

3. A logical row has a unique rowid, referred to as a logical rowid. This rowid is the same as the rowids for each of the column partitions values of the logical row except that the column partition number in the rowid is one. The rowid of a specific column partition value of a logical row can be derived by modifying the column partition number in a logical rowid to be the column partition number for that column partition value. A reference to a logical row in an index uses the logical rowid of that logical row.

4. Conversely, every column partition value has as an associated logical rowid which is the logical rowid of the logical row to which the column partition value belongs.

5. There is a maximum of 1,125,899,906,842,560 column partition values13 per combined partition per AMP.

Note that the 32 bits in a NoPI or CP table’s rowid corresponding to the 32 bits used for a row hash in a PI table’s rowid are referred to as the row hash of a NoPI or CP table’s rowid even though it is not actually a hash value computed from values of the row. For instance, row hash locks for a NoPI or CP table are placed on these 32-bits from the rowid (see section 5.5).

11 Conceptually, a NoPI table can be considered to be a column-partitioned table with only one column

partition having a column partition number of 0, containing all the columns of the table, having ROW format with NO AUTO COMPRESS, and each column partition value having its own rowid.

12 Note that an UPDATE is handled as delete of the old row followed by an insert of the updated row. 13 This limit increases by 60 for a system with 16-bit hash buckets.



Chapter 4: Autocompression

Autocompression, the default14 for column partitions, is applied automatically to a container. Column partition values are appended without any autocompression until a container is full. Then the form of autocompression is determined for the container and the container is compressed. Subsequent column partitions values are appended using the determined form of autocompression until the container is again full. When more column partition values are to be appended, another container is started and the process repeats. Note that, as for a NoPI table, column partition values are only appended (i.e., not inserted between two values as compared to PI table where a row may be inserted between two existing rows).

Each container is assessed separately to see how, and if, it can be compressed. Several available compression techniques are considered for compressing a container but, unless there is some size reduction, no compression is performed. If a container is compressed, the needed data is automatically uncompressed as it is read. Autocompression is most effective when the column partition consists of only a single column and is less effective as more columns are included in the column partition.

Some of the compression techniques that may be selected by the system include:

1. Null compression.

2. Run-length compression, where a count of the number of consecutive occurrences is included with a value instead of including each consecutive occurrence of the value.

3. Local value-list compression, where often occurring column values are placed in a dictionary local to the container, eliminating nonsequential repeating values from having to be represented over and over.

4. Trim compression, where high-order zero bytes of numeric values and trailing pad bytes of character data are removed.

5. Delta from mean, useful for a limited range of numeric values where the arithmetic mean for the column values in the container is stored once and the delta from the arithmetic mean is stored for each column value in the container.

6. UNICODE to UTF8 compression (when ASCII is stored in UNICODE columns).

The compression techniques used are system determined based on the characteristics of the data within a container and may differ from container to container.

User-specified multivalue compression (MVC) and algorithmic compression (ALC) are honored and carried forward if they help compress a container. If block level compression is specified, it applies for data blocks holding the physical rows of the table independent of whether autocompression is applied or not.

Note that autocompression is applied locally to a container based on column partition values (which may be multicolumn) while user-specified MVC and ALC are applied globally for a column and are applicable to both containers and subrows.

14 See Chapter 11: Tuning Opportunities for a discussion of when you might consider overriding the

default of autocompression for a column partition.



Autocompression is differentiated from block level compression in several key ways:

1. Autocompression requires no parameter setting, but rather is completely transparent to the user while block level compression is a result of the appropriate settings of parameters.

2. Autocompression acts on a container (a physical row) while block level compression acts on a data block (which consists of one or more physical rows).

3. Decompressing a column partition value in a container has little overhead while software-based block level compression incurs noticeable decompression overhead.

4. Only column partition values that are needed by the query are decompressed. BLC has to decompress the entire data block even if only one or a few values are needed from the data block.

5. Determining the autocompression to use for a container, compressing a container, and compressing additional values to be inserted into the container can cause an increase in the CPU needed for appending values to column partitions.

You can expect additional CPU to be required when inserting rows into a CP table that uses autocompression. This is similarly to the increase in CPU if MVC or ALC compression is added to the columns.

Autocompression can be an easy way to obtain significant compression for a CP table. However, autocompression may not be the solution for all your compression needs. Complementing autocompression with user-specified MVC and ALC, BLC, and temperature based BLC can provide even further compression. See also Chapter 11: Tuning Opportunities.



Chapter 5: Reading from Column-Partitioned Tables

The key advantage of column partitioning is efficiency of access. Reduced I/O can be realized if only a subset of the columns in a table needs to be read when those column values are held in separate column partitions. Data is stored on disk by partition, so when partition elimination takes place, data blocks in the eliminated partitions are simply not read.

Reduced I/O can also be realized if there are predicates that select a subset of the table rows even if a large number of column values are needed. Only the data for those columns from the selected table rows needs to be retrieved.

I/O is reduced the most when both of the above occur. Also, autocompression, row header compression, and the other compression options can contribute to further reducing I/O.

It may be less obvious how data from different column partitions can be brought together to satisfy a query when the query accesses columns from multiple column partitions of a CP table (how this is done is discussed in the following sections). This question does not come up with other kinds of tables. Each table row of a PI, PPI, or non-CP NoPI table is complete within a physical row; such a physical row contains all the column values for a table row and satisfies any query no matter what columns it references – the downside is that column values not needed must be read as part of the physical row increasing I/O.

There are three ways to initiate read access to data within a CP table: a full-column partition scan, indexed access (using a secondary or join index), or a direct join to the CP table. Both unique and nonunique secondary indexes are allowed on CP tables, as are join indexes. There is no primary index access path to provide direct access from the query to the database since there is no primary index for a CP table.

First, we look at the column partition scan approach, then how secondary index access takes place, followed by discussions on joins and row hash locking. All examples in this chapter assume that row partitioning is not being used.

5.1 Scanning Column-Partitioned Data

If indexing is not available, a way to get the data out of a CP table is to start out by scanning a column partition on all the AMPs in parallel. The following describes the scanning of CP data:

1. Columns within the table definition that aren’t referenced in the query are ignored due to partition elimination.

2. If there is a predicate column in the query, its column partition is read.

3. Values within the predicate column partition are examined and compared against the value passed in the query WHERE clause.

4. Each time a qualifying value is located, the next step is building up a row for the output spool.

5. All the column partition values for a logical row have the same rowid except for the column partition number (see section 3.4). The rowid associated with each predicate column value that matches the constraint in the query becomes the link to other column partition values of the same logical row by simply modifying the column partition number of the rowid to the column partition number for each of these other column partition values.



The following Figure 9 shows how a predicate column value can be used to select a column value in a different column partition for the same logical row (only the uniqueness of the rowid is shown assuming the rest of rowid doesn’t change for this example).

SELECT State FROM Stores WHERE Code = 5;

3, 2, 5, 1, 8, 9, 1, 1, 5, 7, 6, 6, 4 Scan all containers in the Code PartitionGet uniqueness for each value of 5

Uniqueness = 36 Uniqueness = 42

CA, NY, OR, FL, SC, MN, WY, MA, CO, AK

Code Partition

State Partition

Scan

Direct Access Go directly to the matching uniqueness values in the State Partition

...

...

...

...

Figure 9: Scanning a Predicate on a Column Partition

If there is a single predicate column in the query that can be used to qualify rows, it is chosen as the initial access point and column partition to scan – in Figure 9, this is the Code column. A projected column – in Figure 9, this is the State column – can be found by going to the matching column partition value with the same uniqueness as the qualifying value in the Code column. That is, modify the column partition number in the rowid of the Code value to be the column partition number of the State partition and use the modified rowid to directly access the corresponding container15 in the State partition. Then the corresponding column partition value is found within that container.

If there is more than one predicate column in the query that can be used to disqualify rows, the column for one of these predicates is chosen and its column partition is scanned. Statistics, as well as other heuristics, are used by the optimizer to pick the most selective and least costly predicate. Once that decision has been made, only that single column partition needs to be scanned. A column for another predicate or projected column (if not in the same column partition) can be found by going to the matching column partition value using the modified rowid of a qualifying column partition value for the predicate column being scanned. If the evaluation of another predicate disqualifies the logical row, this eliminates the rowid from further consideration and the scan continues for a qualifying value for the first predicate. If a logical row has satisfied all the predicates, any projected columns for the logical row are then accessed to form a row for the output spool.

If there are no useful predicate columns in the query (for instance, OR’ed predicates), one column partition is chosen to be scanned and for each of its column partition values additional corresponding column partition values are accessed until either predicate evaluation disqualifies the logical row or all the projected column values have been retrieved and brought together to form rows for the output spool. Chapter 8: EXPLAIN Terminology discusses how this is accomplished.

Queries are best suited for accessing a CP table under these conditions:

1. From all the predicates for a query, there are one or a few predicates that are very selective in combination.

15 The internal partition number (formed from the column partition number and the row partition numbers,

if any), hash bucket, and uniqueness form a rowid. The file system uses the B*-tree implemented in the master index (which is in memory) and cylinder indexes (which are maintained on disk; the cylinder index that indexes this rowid is read into the FSG cache if it is not already there) to find the data block (which is read into the FSG cache if it is not already there) that contains the physical row with the column partition value for this rowid.



2. If the number of columns accessed is small enough that a spool is not required for their consolidation or, if a spool is required, the spool (indicated as a CP merge spool in an EXPLAIN of a query) can be held in memory.

5.2 Bringing Together Columns from the Same Logical Row

A container holds multiple column partition values for the same column or columns of a CP table. (In this section, the assumption is being made that each column partition contains only a single column so a column partition value is the same as a column value.) Each of these column values has the same rowid except for the column partition number that differentiates it from other column values for the same column partition within the container being read (see section 3.4). The rowid in the row header of that container is the rowid for the first value in the container. For the other values in the container, the internal partition and hash bucket are the same as for this first rowid. The uniqueness for other values is based on a value’s position in the container.16 Therefore, the only rowid that is explicit is the one in the row header of the container. This is illustrated in the following Figure 10 (for simplicity, only the uniqueness of a row header is shown).


Column Partition #1, three containers




Column Partition #2, five containers



Value 1 Uniqueness

Value 6 Uniqueness

Value 11 Uniqueness

Value 1 Uniqueness

Value 4 Uniqueness

Value 7 Uniqueness



Value 10 Uniqueness

Value 13 Uniqueness

Figure 10: Uniqueness in a Container’s Row Header

When a rowid for a predicate column is being used to locate a different column’s value for the same logical row, the first step in the process is to locate the correct container in that column partition. A container’s row header carries the rowid (consisting of the internal partition number that indicates the column partition, hash bucket, and uniqueness) reflected by the first column partition value in the container. Using this rowid, the file system knows which container holds the desired column partition value. The exact location of the column partition value is known based on relative position within the container. For example, if the uniqueness of the rowid in the container’s row header is 201 and a column partition value with a uniqueness of 205 needs to be located, the 5th entry in that container is the corresponding column partition value.

16 Note that for a container, all the column partition values must have the same internal partition number

and hash bucket. If a different internal partition number or hash bucket is needed for a column partition value, it must be placed in another container.



5.3 Reading Column-Partitioned Data using a Secondary Index

Secondary index access to a CP table uses the same techniques as secondary index access to a nonpartitioned table. Take the example of a nonunique secondary index (NUSI). On a primary-indexed table, each NUSI subtable row contains two pieces of information:

1. The indexed value

2. The list of rowids of the base table rows that carry that value

Because NUSI’s are AMP-local, these rowids point only to base table rows that are on the same AMP as the referencing NUSI subtable row.

With a CP table, NUSI rows look the same as they look for a NoPI table except rowids in the NUSI subtable rowid list are logical rowids (see section 3.4) that indicate the logical rows that carry that indexed value.

A logical rowid is the important link between the NUSI subtable and the column values of interest for the query. The table header is used, in combination with the columns referenced in the query, to obtain the column partition numbers of column partition values from just the subset of column partitions needed by the query. The column partition number (which is 1) in the logical rowid can be modified to be the column partition number for each of those column partition values to get their rowids.

The following Figure 11 contrasts a NUSI for a nonpartitioned table to CP table.

Column-partitioned Table


Incremented for each logical row

on that AMP


Always column

partition #1

(Repeated)NUSI Value Rowid List


Always Zero

(Repeated)NUSI Value Rowid List

Nonpartitioned Table

Hash Bucket

Hashed PI value

Figure 11: Nonunique Secondary Index Row Layout

After accessing the NUSI subtable, each logical rowid in the rowid list is used to locate the values for the referenced logical rows from column partitions that reflect the columns referenced in the query. Since multiple I/Os may be needed to locate those column values, there is some overhead involved in building up the required column values for the query. For this reason, a query that accesses a large number of columns from a table does not perform as well against a CP table as a table that is not column partitioned.

When you create a nonunique secondary index on a CP table, the effort of reading the data from the table is significantly less. Instead of scanning the table and consolidating the values for all the columns in each logical row, only the column partitions that represent that NUSI need to be read.

There may be less need for NUSIs on a CP table. Accessing data is expected to be much faster on such a table when a limited number of columns are requested. This shifts the tradeoffs in defining NUSIs as NUSIs come with maintenance overhead. Don’t assume NUSIs are needed on a CP table; add them only if they prove to provide a performance benefit that offsets the extra space usage and maintenance cost.



5.4 Joins with a Column-Partitioned Table

The optimizer knows how to build an intelligent query plan for CP tables that join to other tables.

A CP object (a table or join index) can be directly accessed for a dynamic hash join17 or product join. This means it must be joining to a duplicated spool and that spool must be relatively small for the join to be efficient.

A CP object can also be directly accessed to do a rowid join. In this case, the rowids must come from a secondary index or join index on the CP object, or from a previous retrieve or join to the CP object.

For joins with a CP table, there is an optimization which delays spooling of those columns that are needed to process the query but are not needed to process a join until the join has completed finding the rows that qualify the join conditions. A rowid spool containing rowids of the qualified rows is then joined back to the CP table to retrieve the columns needed for subsequent joins and/or for the final result spool. This is a cost-based optimization that is expected to be applied when the join requires spooling of the CP table and the join is selective. There must be statistics on the join columns so costing can be done.

Other joins methods are possible but, in order to use those join methods, the selected rows with the projected columns need to be constructed from the column partitions and spooled first (possibly with a redistribution and local AMP sort or duplication to all AMPs) or, if an index is applicable (for example, a join index), make use of the index. In the former case, this may be a reasonable plan if few rows are selected and/or few columns are needed from the CP object.

Note that any autocompression for the CP table is not carried over to the spool so the spool may be much larger than the compressed data selected from the CP table. User-specified compression is carried over to the spool; therefore, if spool usage (in space and I/O) becomes an issue, applying user-specified compression to the CP table may be beneficial.

5.5 Row Hash Locking

The lowest locking granularity for a table is row hash locking. In the more familiar primary-indexed table, the row hash is the hashed primary index value. Therefore, any row that has this hash value is locked even if it has a different internal partition number or uniqueness.

Even though the rowid for a CP or NoPI table doesn’t have an actual hash value, the 32 bits of the rowid that correspond to the row hash for a PI table are used as the value on which to place the lock. For a CP or NoPI table, this means the lock is placed on the hash bucket combined with 16 or 12 bits (depending on the hash bucket size) of the uniqueness. Therefore, any row that has this value in its rowid, even if it has a different internal partition number or other bits in the uniqueness are different, will also be locked.

Typically, the hash value portion of a rowid will be the same for every rowid of a NoPI or CP table (see section 3.3). This means that for a NoPI or CP table a row hash lock can, and usually does, place a lock on all the rows for the table on an AMP.

Think about a CP table as though it were a very highly nonunique NUPI table. A large number of rows are likely to be involved when such a row hash lock is set. In addition, with column partitioning, you may have multiple containers within each column partition and multiple column partitions within the table. All are usually included under a single row hash level lock, whether it is an access lock, a read lock, a write lock, or exclusive lock.

17 This is some times referred to as a hash join on the fly.



Chapter 6: Loading and Maintenance Operations

This chapter briefly discusses loading and maintaining a CP table using inserts, deletes, and updates. Section F.1, section F.4.3, Appendix D: Loading a Column-Partitioned Table, and the Orange Book: No Primary Index Table User’s Guide provide additional details for loading and maintenance operations.

6.1 INSERT-SELECT

INSERT-SELECT is the expected and most efficient method of loading data into a CP table. All the rows of the table may be deleted and the INSERT-SELECT repeated periodically in order to refresh the data. If the data originates from an external source, use FastLoad, MultiLoad, or TPump to load the data into a staging table from which the INSERT-SELECT can take place.

For an INSERT-SELECT into a CP or NoPI target table, rows are by default copied locally on each AMP from the source to the target table. Therefore, if the source is skewed (for example, the source is a SELECT that includes several joins resulting in skewed data), the rows will also be skewed in the target table. See section F.4.3.

The following options can be added to the INSERT-SELECT statement to avoid a skewed CP table, improve the effectiveness of autocompression, or improve performance for some cases:

HASH BY: The selected rows are redistributed by the hash to remove skew. Also, hashing on a column places rows with the same value for the column on the same AMP increases the number of occurrences of the value on the AMP making autocompression more effective for that column. Alternatively, HASH BY RANDOM can be used to have data blocks redistributed randomly – this can be more efficient due to the block-at-a-time processing. In some cases of a NoPI to a row-partitioned CP table, this may improve performance of the INSERT-SELECT (see Appendix D: Loading a Column-Partitioned Table).

LOCAL ORDER BY: A local sort is done on each AMP before physically storing the rows. This could help autocompression to be more effective by ensuring like values of the sorting columns appear together.

For example, if a source NoPI table is not skewed, simply insert into the CP table.

INSERT INTO Sales_CP SELECT * FROM Sales_staging;

If the source is skewed, add a HASH BY RANDOM clause.

INSERT INTO Sales_CP SELECT * FROM Sales_staging HASH BY RANDOM;

Alternatively, if the source is skewed, add a HASH BY clause on the TxnNo column. This is a unique column so it provides even distribution to the AMPs. However, this is less efficient than the INSERT-SELECT above since it distributes a row at a time instead of a data block at a time.

INSERT INTO Sales_CP SELECT * FROM Sales_staging HASH BY TxnNo;

To improve run-length autocompression for the TxnDate column if the source is not skewed, add a LOCAL ORDER BY clause:

INSERT INTO Sales_CP SELECT * FROM Sales_staging LOCAL ORDER BY TxnDate;

To handle a skewed source and improve run-length autocompression for the TxnDate column, add a HASH BY RANDOM and a LOCAL ORDER BY clause:

INSERT INTO Sales_CP SELECT * FROM Sales_staging HASH BY RANDOM LOCAL ORDER BY TxnDate;



The following might improve run-length autocompression for the TxnDate column even more by distributing rows with the same value of TxnDate to the same AMP and then locally ordering the rows on each AMP such that rows with the same value of TxnDate are grouped together. However, this may introduce unacceptable skew for the space usage of the table and/or skew (hot AMPs) for queries against the table.

INSERT INTO Sales_CP SELECT * FROM Sales_staging HASH BY TxnDate LOCAL ORDER BY TxnDate;

In one method for processing an INSERT-SELECT, each source row is read locally, and its columns individually appended to the column partitions to which they belong on the same AMP as the source row. On each AMP (in parallel), as many column partition values as can fit are built up simultaneously in memory, and written out to disk when the buffer is full. This is illustrated in the following Figure 12.

Col1_Value1, Col1_Value2,


Part #1



Part #2

Part #3

Part #4

Place Column Values at end of Current Containers NoPI Source Data Bring 1 row into

memory

Col1 Col2 Col3 Col4

Figure 12: Building Multiple Containers during an INSERT-SELECT

Since there is a limited amount of memory available to process the INSERT-SELECT, if the CP table being loaded has a large number of column partitions, additional scans of the source may be required to append the column values to their respective column partitions with each scan processing a subset of the column partitions.

The above method and another method for processing an INSERT-SELECT are discussed in Appendix D: Loading a Column-Partitioned Table.

6.2 Deleting Rows

Rows can be deleted from a CP table en masse using the DELETE ALL, or selectively using DELETE. The former, the unconstrained delete, uses the standard fast-path delete as is done on a primary-indexed table (the delete must be the last statement in a request that ends the transaction). If a CP table also happens to include row partitioning, the same fast-path delete can be applied to one or more row partitions. Space is immediately reclaimed.

The selective DELETE requires a scan of a column partition or indexed access to the CP table. In this case, the row being deleted is not physically removed, but only flagged as having been deleted (see section 6.3). The space taken by a row being deleted is scattered across multiple column partitions and is not reclaimed at the time of the deletion. This form of delete should only be used to delete a small percentage of rows.

During a delete operation, all large objects (CLOBs and BLOBs) are immediately deleted, as are entries in secondary indexes. Join indexes are updated to reflect the change as it happens.



6.3 The Delete Column Partition

Each CP table has one delete column partition, in addition to the user-specified column partitions. It holds information about deleted table rows so they do not get included in an answer set. When a single row delete takes place in a CP table, rather than removing each deleted value across all the column partitions (whether or not the format is COLUMN or ROW), which would involve multiple physical row updates, a single action is performed: one bit in the delete column partition is set for the row partition numbers (if any), hash bucket, and uniqueness of the table row.

This delete column partition is accessed any time a query is made against a CP table. At the time the predicate partition is scanned, the delete column partition is checked to make sure a table row being requested by the query has not been deleted (if it has, the value is skipped). This additional partition access is included in the count of accessed column partitions in the EXPLAIN text as illustrated in Chapter 8: EXPLAIN Terminology.

6.4 Updating Rows

Updating a logical row that is represented in a CP table involves marking the appropriate bit in the delete column partition, and then re-inserting columns for the new updated version of the table row. An update to a row in a CP table is similar to an “unreasonable update” in which the primary index value of a PI table is changed. However, this is a mild form of an unreasonable update since in the case of a CP table update, the deletion and re-insertion takes place on the same AMP.

The part of the update that re-inserts a new table row is essentially a re-append. The highest uniqueness on that AMP is incremented by one, and all the column values for that updated row are appended to their corresponding column partitions. The space that is being used by the old row is not reclaimed, but a delete bit is turned on in the delete column partition, indicating that the old version of the row is obsolete.

Because multiple I/Os are performed in doing this re-append and space for the old rows is not reclaimed, row-at-a-time updates on CP tables should be approached with caution. An UPDATE statement should only be used to update a small percentage of rows. An unconstrained update like the following is not recommended due to performance and space usage concerns:

UPDATE employee SET salary = salary * 1.05;

This would cause the space usage for the table to double and the update would be costly as every existing row is logically deleted, reconstructed, updated, and re-inserted back into the column partitions. In this case, it would be better to create a new table and INSERT-SELECT into the table making the changes to the salary in the select list – this is still costly but at least the space can be reclaimed by deleting the original table.

A CP table is not allowed with an UPDATE statement that uses the upsert form. This is because there is no primary index on a CP table as required by the upsert form of an UPDATE.



Chapter 7: Row Partitioning with Column Partitioning

Row partitioning can be combined with column partitioning on the same table. This allows queries to read only noneliminated combined partitions. Such partitions are defined by the intersection of the columns referenced in the query and any partitioning column selection criteria.

There is usually an advantage to putting the column partitioning at level one of the combined partitioning scheme. The following Figure 13 illustrates the difference. In the first layout, column partitioning is placed at the first level. Then below it, row partitioning by State is placed at the first level.

Col ANY

Col ACA

Col AIL

Col BNY

Col BCA

Col BIL

Col CNY

Col CCA

Col CIL

NY Col A

NY Col B

NY Col C

CA Col A

CA Col B

CA Col C

ILCol A

IL Col B

IL Col C

Partition Layout when Column is the 1st Partitioning Level

Combined partitions containing Col A are together on disk

Partition Layout when Row is the 1st Partitioning Level

Combined partitions containing the State = NY are together on disk

Col B Col C

State = CA State = IL

Figure 13: Column Partitioning in a Multilevel-Partitioned Table

When accessing a multilevel-partitioned table as shown in Figure 13 above, combined partitions are grouped on disk based on level 1 first. If you expect to get a greater degree of partition elimination based on columns, it is more efficient to have column partitioning at level 1. All combined partitions that include Col A, for example, are co-located on disk.

There are two key considerations when combining row and column partitioning:

1. Whether row or column partitioning is the first level

2. How to best avoid over partitioning the table

7.1 Determining the Column-Partitioning Level

It is initially recommended that column partitioning either be defined as the first level or, if not as the first level, at least as the second level. When column partitioning is defined as the first level it is easier for the file system to locate related data that is from the same logical row of the table. When column partitioning is defined at a lower level, more boundary checks have to be made, possibly causing an impact on performance.

If you are inserting a new table row, it takes more effort if the column partitioning is not the first level. Values of columns from the newly-inserted table row need to be appended at the end of each column partition. If column-partitioning is not first, it is necessary to read through several combined partitions to find the correct container that represents the end point.

On the other hand, if you have row partitioning at the second or a lower level partitioning level so that column partitioning can be at the first level, this can be less efficient when row partition elimination based on something like a date range is taking place.



Some considerations that might lead to putting the column partitioning at a lower level are having date or timestamp partitioning for the first level to provide potential improvements in the effectiveness of cylinder migration and temperature based block level compression for hot and cold data. By having the row partitioning first, old data is grouped together rather than being spread across the column partitions when column partitioning is first.

7.2 Avoiding Over Partitioning a Table

Rows from a partitioned table, just like rows from a nonpartitioned table, are spread across all AMPs in the configuration. Each AMP has a subset of the table’s rows, reducing the over-all partition size from the perspective of the AMP. You may have 100,000 rows overall that fall into your daily sales date partition for last Tuesday, but in actuality only 200 rows in Tuesday’s partition are likely to be located on any given AMP if you have 500 AMPs. To take this example further, say you have 40 column partitions that have COLUMN format. A container for such column partitions can easily hold 200 column partition values if the column partition values are narrow. Each combined partition on each AMP then has only 1 small physical row which only partially fills a data block; physical rows for preceding or following combined partitions may be placed in this same data block in order to fill up the data block. To read values of a small combined partition, the entire data block containing it must be read minimizing the effectiveness of column partition elimination.

Three factors increase the risk of having under-sized, and therefore inefficient, combined partitions:

1. A large number of AMPs in the configuration

2. Multiple levels of partitioning resulting in a very large number of combined partitions

3. The increased partition limit feature of Teradata 14.0 makes it easy to over partition a table

Over-partitioning exists if nonempty combined partitions typically have less than 10 data blocks. On average ½ of the first data block and the ½ of the last data block with the physical rows of a combined partition are for preceding and following combined partitions. At 10 data blocks for a combined partition, 10% of the data read is not needed – this is maybe acceptable but as the number of data blocks decreases more data is read that is not needed. In the worst case, even if there is column partition elimination, all the data blocks of the table must be read.

Over-partitioning can be problematic for a column partition with COLUMN format (see section 11.2.1) because a new container must be started for every combined partition. This makes full containers less likely. With completely full containers, the row header overhead is reduced to a single row header for the entire container. This reduction is possible because a given column partition value can be located within a container based on its position within the container from the first entry. This elimination of row headers provides a significant space savings benefit when using COLUMN format if the containers contain many values.

If more containers are required, each supporting fewer actual column partition values, the row header compression advantage is lost. In addition, more I/O is required to access the same amount of useful data. A data block may contain a mix of eliminated and noneliminated combined partitions for a query. But to read the noneliminated combined partitions, the entire data block must be read18 and, therefore, eliminated combined partitions in the data block are also being read unnecessarily.

18 The file system does I/O in units of data blocks, not physical rows. A boundary between combined

partitions may occur within a data block but not within a physical row.



Chapter 8: EXPLAIN Terminology

This chapter examines some of the new phraseology present in query EXPLAIN text when accessing a CP table, and offers further explanations about how column-partitioned data is processed by the Teradata Database. Further discussion of the EXPLAIN phraseology and additional examples can be found in Appendix E: EXPLAIN Phrases and Examples.

The first two examples use this table definition, which includes one column for ever letter in the alphabet (26 columns in all). Based on the PARTITION BY COLUMN clause, this table has one column partition for each of the 26 columns plus one for the delete column partition.

CREATE TABLE Table1 ( a INT, b INT, c INT, d INT, e INT, f INT, g INT, h INT, i INT, j INT, k INT, l INT, m INT, n INT, o INT, p INT, q INT, r INT, s INT, t INT, u int, v INT, w INT, x INT, y INT, z INT) PARTITION BY COLUMN;

8.1 Small SELECT from a Column-Partitioned Table

This example looks at the EXPLAIN text for a simple query that accesses four columns with no predicate.

EXPLAIN SELECT a, b, g, p FROM Table1; *** Help information returned. 12 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.Table1. 2) Next, we lock PLS.Table1 for read. 3) We do an all-AMPs RETRIEVE step from 5 column partitions of PLS.Table1 by way of an all-rows scan with no residual conditions into Spool 1 (all_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 2 rows (614 bytes). The estimated time for this step is 0.01 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.01 seconds.

Four columns (a, b, g, and p) are being selected in the query above. Yet step 3 of the EXPLAIN text above states that 5 column partitions are accessed. The fifth partition is the delete column partition (see section 6.3), which is included when column partitions are accessed, in order to eliminate any logically deleted rows from the answer set.

Because there are a low number of column partitions involved, all five column partitions can be read at the same time. A container from each column partition is held in AMP memory until all column partitions have been completely processed, and the relevant data has been brought together. Column partition values within different column partitions are recognized as belonging to the same table row based on having the same logical rowid (see section 3.4).

This query is a good use of a CP table since it accesses a small subset of the column partitions.



8.2 Large Selection from a Column-Partitioned Table

The following is an example of EXPLAIN text that shows a table scan accessing 20 columns of Table1 which are to be included in the query’s answer set.

EXPLAIN SELECT a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, r, s, t, u FROM Table1; *** Help information returned. 13 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.Table1. 2) Next, we lock PLS.Table1 for read. 3) We do an all-AMPs RETRIEVE step from 21 column partitions (20 contexts) of PLS.Table1 using covering CP merge Spool 2 (2 subrow partitions and Last Use) by way of an all-rows scan with no residual conditions into Spool 1 (all_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 2 rows (614 bytes). The estimated time for this step is 0.03 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.

As shown in the EXPLAIN text above, this query reads from 21 column partitions: 20 for the user-defined columns and one for the delete column partition. Notice that there are more column partitions to access in this query compared to the previous one. On this platform, there are only 20 available column partition contexts.19 Consequently, a subset of 20 or less column partitions are processed at a time and then written as subrows to a CP merge spool. This is illustrated in the following Figure 14.

Part #1 Part #2 Part #3 Part #4

Col A Col B Col C Col D

Spool subrows in Spool 2, Partition # 1

Part #5 Part #6 Part #7 Part #8

Col E Col F Col G Col H

Spool subrows inSpool 2, Partition # 2

Uniqueness #1

Uniqueness #2

Uniqueness #3

Uniqueness #4

Uniqueness #5

Uniqueness #6

Uniqueness #7

Final Answer Set in Spool 1

Column Partition Merge

Uniqueness #1

Uniqueness #2

Uniqueness #3

Uniqueness #4

Uniqueness #5

Uniqueness #6

Uniqueness #7

Figure 14: Merging Subsets of Column Partitions

The EXPLAIN text reports how many contexts are available to satisfy this request. That number depends on the hardware configuration as well as how system parameters have been set (see section 11.4 and Appendix H: System Settings).

19 In an actual production system, the number of available column partition contexts is usually higher.



The following terminology is specific to reading a column table and can be seen in the EXPLAIN text above:

covering CP merge. This is a merge of column partitions to produce subrow partitions that are built up by matching uniqueness across as many column partitions for which there are available contexts. These subrow partitions are contained within the CP merge spool. The “CP” indicates that the spool is column partitioned, and “covering” says that all the information required by the query exists in the CP merge spool. There is no additional access of the base table required once the CP merge spool is created.

contexts. The EXPLAIN indicates how many column partition contexts (concurrent reading of column partitions) are available to satisfy this query. Only if the number of contexts is less than the number of column partitions required does the number of contexts appear in the EXPLAIN.

subrow partitions. This indicates the number of subrow column partitions that have to be built to satisfy the query. This is less than or equal to the number of available column partition contexts.

This spool of subrows that is built in the above EXPLAIN is created and then read (Last Use) all in the same step. The purpose of this intra-step spool is to consolidate two subsets of column partition data that have to be built up one at a time in memory. The intermediate answer sets are held temporarily in spool.

Each subrow represents a row in a column partition of the CP merge spool. As such, each subrow has its own row header and its own rowid. This spool subrow rowid has these characteristics:

1. Each spool subrow has the same logical rowid that indicates the logical row representing the table row being consolidated.

2. Each subrow’s rowid has a partition number that represents a numbering among the various different subrow partitions that may be built.

This query is not an ideal use of a CP table since it accesses most of the columns and all the rows of the table.

Having the number of contexts in the EXPLAIN may indicate a non-optimal number of column partitions are being accessed. This may be acceptable if the query only accesses a small subset of the rows. But if such a query is not selective (as in the example above), the query might not be making good use of column partitioning. If these types of queries are typical for this table, consider not using column partitioning or adding appropriate indexes to improve their performance.

8.3 Select All Rows from a Column/Row-Partitioned Table

This third example illustrates a full-table scan with a predicate. Table4 is a CP table that is also row partitioned.

CREATE TABLE Table4 as Table1 WITH NO DATA NO PRIMARY INDEX PARTITION BY (COLUMN, RANGE_N(b BETWEEN 1 AND 1000 EACH 1));

The query accesses all columns in the table, but only where the partitioning column is between 4 and 5. There is no column partition elimination, but there is row partition elimination.

EXPLAIN SELECT * FROM Table4 WHERE b BETWEEN 4 AND 5; *** Help information returned. 14 rows.



*** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.Table4. 2) Next, we lock PLS.Table1 for read. 3) We do an all-AMPs RETRIEVE step from 54 combined partitions (27 column partitions and 20 contexts) of PLS.Table4 using covering CP merge Spool 2 (2 subrow partitions and Last Use) with a condition of ("(PLS.Table4.b <= 5) AND (PLS.Table4.b >= 4)") into Spool 1 (all_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (307 bytes). The estimated time for this step is 0.03 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.

The 26 columns in the table plus the delete column partition must be read since there is no column partition elimination. Row partition elimination indicates only 2 row partitions need to be read. The text above specifies that 54 combined partitions are read (2 row partitions times 27 column partitions). As in the preceding example, there are not enough contexts for the column partitions and a CP merge spool is used to consolidate column values.

This query is a reasonable use of a CP table. Even though it accesses all the columns of the table and exceeds the number of available column partition contexts, it is selective due to the range constraint in the predicate, so only a relatively few rows need to be reconstructed for the result.



Chapter 9: Guidelines for Use

These guidelines serve as a basic starting point but may not be applicable for all workloads. As experience is gained using column partitioning and innovative approaches emerge, alternate choices may prove more appropriate in some cases.

9.1 General Considerations

1. Column partitioning should only be used where appropriate.

For a given workload, some tables may be suitable for column partitioning and some not.

For example, a large table (such as a fact table or call detail record table) with wide rows used by an analytic workload is a potential candidate for using column partitioning. Teradata provides the flexibility for a query to access both CP and non-CP tables.

Column partitioning is unlikely to be suitable for a table with highly volatile data, with relatively few or narrow rows, or that is used in tactical query workloads.

For a workload that depends heavily on access, joins, and aggregations based on a primary index of a table, column partitioning is unlikely to be suitable for that table since a CP table does not have a primary index.20 However, a NUSI or join index may be able to provide acceptable performance for those queries while column partitioning the base table may provide superior performance for other classes of workloads.

If the table needs to be a temporal table (i.e., the table has VALIDTIME or TRANSACTIONTIME columns), it cannot be column partitioned.

2. Column partitioning is intended to reduce I/O for queries.

This occurs when there is column partition elimination, row partition elimination, and/or selective predicates for a query. Row header compression, autocompression, and user-specified compression can also contribute to reducing I/O.

3. Column partitioning is intended to reduce disk space usage for tables.

This occurs when row header and autocompression are effective – see also point 14 on the next page.

4. Column partitioning is not intended to be a CPU savings feature.

Though there are cases where CPU usage decreases for queries against a CP table, CPU may actually increase for some operations (such as inserts) involving a CP table. Trading I/O for CPU may be beneficial for an I/O-bound system. See Chapter 10: Performance Considerations.

For a CPU-bound system, column partitioning may not provide any benefit and may actually degrade the performance. An exception would be if there is a subset of the workload that is I/O bound (even if overall the system is CPU bound) for which column partitioning could be applicable.

20 See section B.8 for why a primary index is not allowed in Teradata 14.0 for a column-partitioned (CP)

table.



5. A CP table (or row partitions of the CP table) should primarily be periodically refreshed or appended to using large INSERT-SELECTs.

Date or timestamp row partitioning may help to improve CP table maintenance.

Always measure the insert cost for tables that are candidates for column partitioning. Do not use column partitioning when the increased cost of inserting data is not acceptable or is not offset by improved performance in the query workload.

6. For a CP table, updates should be rare and deletes should be for the entire table or for entire row partitions.

7. For all or most queries, the number of column partitions accessed by a query should not exceed the number of available column partition contexts,21 or the query should be very selective.

In the best case, queries are expected to not need more than the number of available column partition contexts and to be very selective.

Ideally, queries in this workload that do not conform to this recommendation should rarely occur, as their performance may be degraded. In order to support acceptable performance for queries that don’t have the desired profile, physical database design options such as secondary and join indexes may be helpful. However, before you add these database structures, make sure you understand any additional maintenance costs that may be incurred.

8. While column partitioning can be defined at any level, it is recommended in most cases to put the column-partitioning level either before any row partitioning or as the second level after DATE/TIMESTAMP row partitioning. See section 7.1 for more information.

9. If row partitioning is specified with the column partitioning for a table, consider specifying an ADD option for the levels that may need to have their number of partitions increased (see section C.1).

10. In general, the defaults when simply specifying PARTITION BY COLUMN are appropriate and should not be overridden without due consideration (see Chapter 11: Tuning Opportunities for discussion of when you might consider overriding a default).

11. Specify NOT NULL for columns that should not be null. Avoid a column being nullable unless there is a sound reason for doing so. A nullable column can decrease the effectiveness of autocompression.

12. Where applicable, specify CHECK constraints for columns.

13. Where applicable, specify PRIMARY KEY, UNIQUE, and referential constraints.

14. Specify MVC for known high-occurrence values and/or ALC for columns where known they provide effective compression. For example, if it is known that a column is limited to a few values, MVC should be specified for those values. Also, use an appropriate ALC for UNICODE columns that contain a majority of non-ASCII characters. See also section 11.1 in regard to whether to turn off autocompression for a column partition if other compression techniques are being used.

21 In an actual production system, the number of available column partition contexts may be as high as 40

or more depending on the memory configured, system settings, and, for read partition contexts, the maximum multirow data block size for the table (see section 11.4 and section 11.5).



15. Collect statistics on the columns and indexes following the same guidelines as for other tables.

16. Always collect statistics on the system-derived column PARTITION.

17. Last but not least, any physical database design should be tested and proven (preferably first on a test system with a valid sample of the data and then on the production system with the full data before releasing into the production environment). This includes testing query, workload, load, maintenance, and archive/restore/copy performance. Also, check the space usage of the tables.

9.2 CP and NoPI Common Considerations

Many considerations for a NoPI table without column partitioning apply to a CP table since it is also a NoPI table:

1. Fallback, unique secondary indexes, nonunique secondary indexes, join indexes and reference indexes are all allowed.

2. Large Objects (LOBs) are allowed. However, there is a limit of 268435455 rows per rowkey (internal partition number and hash value22) per AMP with LOBs. With a CP table, there is normally one hash value23 on each AMP so the limit is 268435455 column partition values per combined partition per AMP with LOBs.

3. Primary key and foreign key constraints are allowed.

4. A CP table is created as a MULTISET table when neither MULTISET nor SET is specified and a CP table cannot be specified to be a SET table. Since a CP table is a MULTISET table, checks for duplicate rows are not done.

5. A CP table does not have a primary index and a primary index must not be specified. Therefore, if the typical queries of workload against the table would benefit from doing accesses, joins, and aggregations using a primary index, having a CP table instead of a PI table may not be a good choice for that workload or an alternate approach (such as using secondary or join indexes) may be needed to achieve performance goals with a CP table.

This also implies queries against a CP table are all-AMP operations unless access is via a unique secondary index or via a join index that can provide single-AMP or few-AMP access.

6. The upsert form of the UPDATE statement, the MERGE statement, and MultiLoad, since they are based on the target table having a primary index, are not allowed if the target table is a CP table.

7. Permanent journaling is not allowed.

8. The table may be skewed across the AMPs after certain operations. See section F.4.

22 For a CP or NoPI table, the hash value (even though not actually a computed hash value) is the 32 bits

of the rowid that correspond to the row hash for PI table. See section 3.4. 23 For the hash value to change, there would need to be over 4 billion column partition values on the AMP

for a combined partition.



9.3 Differences between CP and NoPI Tables

Unlike a NoPI table without column partitioning, a CP table

1. Cannot be created as a global temporary or volatile table

2. Can have an identity column

3. Does not require explicit specification of NO PRIMARY INDEX

For a CP table, the system default behavior for a CREATE TABLE without specifying the primary index is NO PRIMARY INDEX (the setting of the PrimaryIndexDefault General field in DBS Control does not affect this behavior).

9.4 Space Usage Considerations

As described in Chapter 6: Loading and Maintenance Operations, inserts to CP tables act more like appends at the ends of the column partitions (or at the ends of combined partitions if the table is also row partitioned), and are therefore less likely to cause block splits so the average data block size can be expected to remain high.

In some cases, the total space taken by a CP table could end up larger than its equivalent PI table. A container can only hold column partition values of rows that have the same internal partition number and hash bucket value. If only a few values are appended into a given container, more containers are needed to store the data (with each container having a rowid). In addition, column partitioning takes place on each AMP individually. Partitioning a medium or small table by column may be inefficient in terms of space usage for systems with large numbers of AMPs and low numbers of logical rows per AMP. See section 7.2.

You may also see an increase in space required by the table if the column partitions are wide such that only a few column partition values can fit into one container. The solution in that case (where the system determined to use COLUMN format, which is unlikely, or COLUMN format was specified explicitly) is to use ROW format for the physical rows of that column partition. By default, the system determines the format for a column partition; see Chapter 11: Tuning Opportunities for a discussion of when you might consider overriding the default.



Chapter 10: Performance Considerations

Column partitioning has the following positive and negative performance impacts which are important to be understood before deciding to use column partitioning for a table.

1. A significant I/O performance improvement for queries that access a variable but small subset of the columns (either in predicates and/or projected) and rows of a CP table or join index

For example, if 10% of the data in rows is needed for a query, the I/O for a CP table should be approximately 10% of the I/O for the table without column partitioning. However, be aware that some additional I/O may occur to put rows back together if many columns are projected or used in predicates. This would be the case if the number of accessed column partition exceeds the number of available contexts (see discussion later in this section).

2. A reduction in I/O depending on the effectiveness of autocompression

3. A potential negative performance impact on queries that access more than a small subset of the columns and/or rows of a CP table or join index

4. There may be a reduction in CPU

However, CPU may increase to process the column partitions, containers, autocompression, and decompression.

For example, if selecting more that 10% or 15% of the columns of a CP table, the savings in I/O may be offset by the increase in CPU.

Inserting into a column-partitioned (CP) table increases CPU as the number of column partitions increases and due to applying autocompression.

With a reduction in I/O and a possible increase in CPU, workloads can change from being I/O bound to CPU bound. Performance of CPU-bound workloads may possibly be worse with column partitioning.

5. A potential negative performance impact on queries where the query plan takes advantage of a primary index of a table and the table is changed to be column partitioned

In particular, single-AMP queries become all-AMP queries, alternative join plans or additional pre-join preparation may be needed to perform joins, etc. Section 5.4 discusses more considerations for joins.

To offset this impact, secondary and join indexes may be needed on the CP table (for instance, to support tactical queries). Or it may be better to leave the table as a PI table and consider using a CP join index on the table that can be used for queries where column partitioning provides an advantage. However, secondary and join indexes require additional space and have maintenance costs that have to be considered.

6. A potential negative performance impact on inserts for a CP table or join index

Single-row inserts can be particularly costly. This is less so for bulk (block-at-a-time) inserts such as array INSERT and INSERT-SELECT.

For an INSERT-SELECT, the CPU cost increases as the number of column partitions increases since there is a cost to split a row into multiple column partitions. Also, if other



than a source nonpartitioned NoPI table for a target column and row partitioned table, the source is scanned CEILING(number of user-specified column partitions / available column partition contexts) times; this may be less of a factor to consider than the increase in CPU as column partitions are increased. If the source is a nonpartitioned NoPI table for a target column and row partitioned table, the source rows are buffered; if this buffer is not large enough, there can be degradation in performance due to re-reading and re-writing of data blocks to append values. See Appendix D: Loading a Column-Partitioned Table for more information how best to load a CP table.

It is recommended that tradeoffs be fully understood when considering the number of column partitions for a table. For inserts, fewer column partitions may be better for CPU usage but having columns in individual partitions may be better for space usage and decreasing I/O, so an appropriate balance must be determined. A good candidate for column partitioning is a table where the workload is heavily query oriented and benefits from the column partitioning making the increased CPU cost to load the data a good tradeoff.

Compressing values (for example, using multivalue compression) and autocompression can have a negative impact on CPU for insertion of rows, similar to the impact on a regular table that has compression. Since autocompression is applied by default to every column partition, this can cause a significant CPU increase compared to multivalue compression that is applied selectively to columns. However, compression can reduce the space usage and decrease the I/O needed for the insert and for subsequent queries. So this tradeoff of CPU and I/O must be considered. See also section 11.1 in regard to whether to turn off autocompression for a column partition.

Note that FastLoad and MultiLoad are not supported for a CP table. However, FastLoad can be used to load into a staging table and then an INSERT-SELECT from the staging into the CP table can be submitted. TPump is supported (see section D.2).

7. A potentially negative performance impact for updates that select a large subset of rows that are updated

Note that an update is done as a delete (by marking the corresponding bit in the delete column partitioned) followed by an insert (see section 6.4) and, therefore, all the columns of rows selected for update must be accessed, reconstructed, updated, and re-inserted (appended) to the table. This may be acceptable if a few rows are updated. But, if many rows are updated, this can lead to a large increase in space usage (since space for logically deleted rows is not reclaimed) and a large amount of CPU and I/O. The cost of performing updates that involve many rows when the table is column-partitioned might not be acceptable and needs to be avoided. For these kinds of updates, doing an insert-select into a copy of the table may be a better alternative.

8. A relative increase in spool size compared to the size of a source CP table

When data from an autocompressed CP table is spooled, the autocompression is not carried over to the spool (the spool is row-oriented) which may lead to a large spool relative to the compressed data in the CP table. User-specified compression is carried over to the spool; therefore, if spool usage (in space and I/O) becomes an issue, applying user-specified compression to the CP table may be beneficial.

9. A potential for an over-partitioned table

Row partitioning may cause over-partitioning (see section 7.2) such that only a few values with the same combined partition number occur and, therefore, only a few values



go into each of the containers reducing the effectiveness of row header compression. In extreme cases of over-partitioning, a CP table may be up to 22 times larger than a non-CP table.

The performance impact of using a CP table is likely to range from one or more orders of magnitude of improved performance to one to more orders of magnitude of degraded performance depending on the workload. Obviously, column partitioning should not be used when performance is degraded so severely and which cannot be offset by complementing with additional physical database design choices such as join indexes. The most improvement occurs when there is a highly selective predicate on a column in a single-column partition of a table with 100’s or 1000’s of columns and only a few columns are projected. The worst case is when these conditions are all in place:

Most or all columns are projected

The query is not very selective

There are 1000’s of column partitions

There are only the minimum number (8) column partition contexts available (see section 11.4)

The table is row partitioned such that there are very few physical rows in nonempty combined partitions, and the physical rows contain one or very few column partition values



Chapter 11: Tuning Opportunities

This following are some of the tuning options to consider for a CP table:

1. Adding NOT NULL, CHECK, and UNIQUE constraints

These can improve the plans generated by the optimizer and NOT NULL can improve the effectiveness of autocompression. For example, if a column is not supposed to be null, specify NOT NULL to override the default of nullable.

2. Specifying multivalue compression (MVC) and algorithmic compression (ALC)

These can complement autocompression or provide compression if autocompression is not effective for a column partition (such as when a column partition has ROW format or is a multicolumn partition) or is too expensive – see section 11.1. For example,

Apply multivalue compression for known common values of a column and let autocompression find additional common values.

Apply an appropriate ALC for UNICODE columns that contain a majority of non-ASCII characters.

3. Specifying block level compression (BLC) or temperature based BLC for a CP table

The same tradeoff considerations apply as with a non-CP table. For a CP table, consider whether the space savings of BLC or temperature based BLC offset the additional overhead of compressing and decompressing data blocks in order to access or update24 the data blocks. With temperature based BLC, this overhead may be minimal since the BLC would only be applied to data that is infrequently accessed.

4. Tuning data loading

See Appendix D: Loading a Column-Partitioned Table.

5. Adding join indexes or secondary indexes

These can be alternative access paths which can potentially improve the performance of queries that would have benefited from the table having a primary index. In some cases, it may be better to keep the table with a primary index and have a join index (possibly sparse) that is column partitioned (or vice versa).

6. Modifying the defaults for PARTITION BY COLUMN

In general, these defaults are appropriate and should not be overridden without due consideration. Overriding these defaults should be done only as exception and to resolve a particular performance or space usage issue.

a) Specifying NO AUTO COMPRESS for a column partition – see section 11.1

b) Explicitly specifying COLUMN or ROW format for a column partition – see section 11.2

c) Grouping of columns – see section 11.3

d) Modifying DBS Control fields or Cost Profile constants – see section 11.4, section 11.5, and section 11.6

24 Typically for a CP table, updating a data block would not occur since data is inserted by appending to

the ends of combined partitions.



11.1 Autocompression On or Off

In order to make a decision to turn off autocompression,25 the administrator has to perform analysis on each column partition to consider if the column partition benefits from autocompression. Accepting the default lets the system do this analysis automatically on a container by container basis; the system autocompresses a container only if a space savings is going to result for the container.

On the other hand, there is overhead when the system goes through the effort to determine whether or not there are autocompression techniques that effectively compress a container. If multiple techniques are applicable, further analysis is made about which autocompression technique or combinations of techniques are the most effective for the container.

Note that autocompression is most effective for single-column partitions with COLUMN format, less so for multicolumn partitions (especially as the number of columns increases) with COLUMN format. When considering to group columns (see section 11.3), this tradeoff must also be considered.

If the autocompression does not effectively compress a column partition with COLUMN format, the analysis overhead done by the system can be eliminated by specifying the NO AUTO COMPRESS option for the column partition. Note that it is unnecessary to specify NO AUTO COMPRESS for a column partition with ROW format (currently, autocompression is not applied in this case so there is no additional overhead whether or not NO AUTO COMPRESS is specified). Note that if other compression techniques are being applied (such MVC, ALC, or BLC), they may provide adequate compression such that the extra overhead of autocompression is not worth the benefit of any additional compression obtained with autocompression; in this case, consider setting NO AUTO COMPRESS for one or more of the column partitions (in particular, consider NO AUTO COMPRESS for the column partitions that already have MVC or ALC or, if BLC is being used, all the column partitions).

If NO AUTO COMPRESS is specified, user-specified MVC or ALC compression, if present, is still applied and row header compression26 for column partitions with COLUMN format is still applied. Note that NO AUTO COMPRESS is specified a column partition at a time. Examples are shown in Appendix C: DDL Details.

To check the effectiveness of autocompression and row header compression for a column partition, one of the following can be done:

1. Collect statistics on PARTITION for the table. Then examine the collected statistics (using SHOW STATISTICS VALUES or a dictionary view on the statistics) to see the estimated compression ratio for each of the column partitions. This is fairly accurate for fixed-width column partitions but may be less accurate for variable-width column partitions. This is because the compression ratio is computed as

(compressed size of the column partition)

(estimate of the column partition width) * (number of logical rows)

The estimate for a variable width column assumes the average width is the maximum column size / 3 + 2 which may significantly under or overestimate the average.

25 See Chapter 4: Autocompression for a discussion of autocompression. 26 Note that row header compression is different than autocompression. Row header compression is only

applied to a column partition that has COLUMN format. If it is determined to be necessary to disable row header compression for a column partition, this can be done by specifying ROW format for that column partition (see section 11.2 and section 3.1).



2. Create three CP tables each with a single column partition defined the same as the column partition of interest in the original table (including any user-specified MVC and ALC) except use COLUMN format with autocompression for the 1st table, COLUMN format with NO AUTO COMPRESS for the 2nd table, and ROW format for the 3rd table. Then INSERT-SELECT into each of these tables from the original table and check the space usage of each table. If both the 1st and 2nd tables are not significantly smaller than the 3rd table, using ROW format for the column partition may be a better choice. Otherwise, if the 1st table is not significantly smaller than the 2nd, using NO AUTO COMPRESS for the column partition may be a better choice.

11.2 COLUMN vs. ROW Format

As discussed in section 3.1.1, a column partition of a CP table can use one of two different formats to store column partition values:

COLUMN. Implemented as physical rows that are referred to as containers.

ROW. Implemented as physical rows that are referred to as subrows.

By default, the system determines the format it considers to be best for a column partition. Most often it is expected that the system correctly determines the most appropriate format. When the COLUMN or ROW format is system-determined, the system bases it choice on the size of a column partition value for the column partition and other factors such as whether a column partition value for the column partition has fixed or variable length and whether the column partition is a single-column or multicolumn partition. Generally, a narrow column partition is determined to have COLUMN format and a wide column partition is determined to have ROW format. The HELP COLUMN statement can be used or a data dictionary view queried to find out what format was chosen for a column partition (see Appendix G: Partitioning Meta Data).

If the system chooses COLUMN format when ROW format is more appropriate, the impact in most cases is minimal. However, if the system chooses ROW format when COLUMN format is more appropriate, the impact could be significant. For example, the system may choose ROW format for a multicolumn partition having a column with a VARCHAR, VARGRAPHIC, or VARBYTE data type defined with a large maximum length since it estimates that column partition values are wide; but if the column partition values are actually very narrow in most cases, COLUMN format may be more appropriate.

See section 11.1 for methods to determine whether COLUMN or ROW format is more appropriate for a column partition if there is concern about the format picked by the system.

If the system-determined format is ROW for a column partition but you determine COLUMN format is more appropriate (for example, benefits from row header compression and autocompression), specify COLUMN explicitly (see Appendix C: DDL Details).

If the system-determined format is COLUMN for a column partition but you determine ROW format is more appropriate (for example, provides faster access to a column partition value) specify ROW explicitly (see Appendix C: DDL Details).

COLUMN and ROW format are discussed further in the following two sections.

11.2.1 COLUMN Format

The COLUMN format packs column partition values into a physical row, referred to as a container, up to a system-determined limit. Whether or not to change a column partition to use ROW format depends on the whether the benefit of row header compression and autocompression can be realized or not.



A row header consists of either 14 or 20 bytes. If 2-byte partitioning is adequate for the table, the row header consists of 14 bytes but, if 8-byte partitioning is required because of the number of combined partitions in the table, 20 bytes are needed for the row header. The row header consists of a length, rowid, flag byte, and 1st presence byte. When the row header is for a container, this first presence byte carries information about autocompression, such as the value-list dictionary for the container, or other details.

A row header occurs once for a container, with the rowid of the first column partition value becoming the rowid of the container itself. In a CP table, each column partition value is assigned its own rowid, but in a container these rowids are implicit except for the first one. The subsequent rowids can be determined from the position of a column partition value relative to the first column partition value.

If many column partition values can be packed into a container, this form of compression (referred to as row header compression) can reduce the space needed for a CP table compared to the table without column partitioning. If only a few column partition values (because they are wide) can be placed in a container, there can be an increase in the space needed for the table compared to the table without column partitioning. In this case, ROW format may be more appropriate to avoid this increase and to obtain the advantages of ROW format (see section 11.2.2).

For COLUMN format, the column partition value must be located within the container. Depending on the autocompression types used, if any, for the container, this may be as simple as indexing into the container or require a sequential access through bits indicating how a value is compressed and/or sequential access through the column partition values to locate the specific column partition value that is to be accessed

Autocompression has some space overhead in the header. If the container has autocompression, two bytes are used as an offset to compression bits, one or more bytes indicate the autocompression types and their arguments, if any, for the container, one or more bytes (depending on the number of column partition values) of autocompression bits, zero or more bytes are used for a local value-list dictionary (depending the autocompression type), and zero or more bytes are used for present column partition values.

If the container does not have autocompression (either because NO AUTO COMPRESS is specified for the column partition or no autocompression types are applicable for the column partition values of this container), zero or more bytes are used for present (non-null) column partition values.

A container requires a larger row header to identify the autocompression used for the container. This takes space from the column partition values carried in the row, and adds more overhead; however, this overhead is negligible if many column partition values can be placed in the container. In addition, analysis of row header bits and sequential access into the container is usually required to find a column partition value for a specific uniqueness. The advantage of a container is that it provides the potential for row header compression and autocompression. Row header compression used for COLUMN format is especially effective compared to using ROW format when the column partition values are narrow.

11.2.2 ROW Format

A physical row of a column partition with ROW format is referred to as a subrow. A subrow has a format that is the same as a regular row (except it only has the values of a subset of the columns). Each column partition value is in its own subrow with a row header. Subrows are not subject to autocompression but may be in the future.



If ROW format is used for a column partition of a CP table with many narrow column partition values, a large increase in the size can occur for a CP table compared to the table without column partitioning. It is not recommended to use ROW format for narrow column partitions since each column partition value has a row header (14 or 20 bytes) as compared to the table without column partitioning. A non-CP table only has a row header (12, 14, or 20 bytes) for each table row which is represented by one physical row that has regular row format.

For example, consider a table with 100 columns where each column is BYTEINT. The non-CP table would have 112 bytes per table row. For the CP table with every column in its own partition and the column partitions are explicitly specified to have ROW format, it would require 1600 bytes per logical row (16 bytes per value due to a row header per value and rounding up to an even number of bytes); that is, the CP table would be over 14 times bigger than the non-CP table. If both tables are also row partitioned with 8-byte partitioning, the non-CP table would have 120 bytes per table row and the CP table would have 2200 bytes per table row; that is, the CP table would be over 18 times bigger than the non-CP table.

ROW format is applicable for wide column partitions where, if COLUMN format was used, one or only a few column partition values would fit in a container and there is little benefit (or a negative impact) from autocompression or row header compression as described in section 11.2.1. ROW format provides quicker and more direct access to a specific column partition value than COLUMN format.

11.3 Grouping Columns into a Column Partition

When a table is defined with column partitioning, by default each column becomes its own column partition. For columns that are often referenced in queries but where the specific set of columns referenced varies from query to query, having those columns in single-column partitions is probably the appropriate choice. Also, autocompression is usually more effective for single-column partitions (see section 11.1).

However, the administrator may group columns, using parentheses as shown in section C.1, so that more than one column resides in a column partition. This has the result of fewer column partitions with more data held within each column partition. Grouping columns into fewer column partitions may be appropriate in these situations:

1. When the table has a large number of columns

Grouping columns to reduce the number of column partitions may need to be considered to improve INSERT-SELECT performance to an acceptable level. See Chapter 10: Performance Considerations.

2. When access to the table often involves a large percentage of the columns and the access is not very selective

3. When a common subset of columns are frequently accessed together

4. When a subset of columns are infrequently accessed (cold data)

5. When a multicolumn NUSI is created on a group of columns

6. When there are too few available column partition contexts to access all the needed column partitions for queries (see section 11.4)

Autocompression may be less effective if columns are grouped together instead of being in their own column partitions; this tradeoff must be considered when deciding whether to group columns or not. If autocompression is not effective for a column partition, user-specified MVC or ALC for the columns, or the use of BLC or temperature based BLC, may be an effective



alternative. In this case, specify NO AUTO COMPRESS for the column partition if it has COLUMN format to avoid the overhead of checking containers for autocompression opportunities (see section 11.1). See also Chapter 9: Guidelines for Use.

11.4 PPICacheThrP

The default setting of 1% is recommended for PPICacheThrP.27 Only in exceptional cases should PPICacheThrP be changed.

PPICacheThrP specifies the percentage of an AMP’s FSG Cache memory or an AMP’s allocatable memory that are made available for multicontext operations. Multicontext operations are operations such as joins and aggregations on PPI tables and join indexes (for more information on PPICacheThrP for a PPI table, see the section on PPICacheThrP in the DBS Control chapter of the Teradata Utilities Manual). Multicontext operations are also used when accessing a CP table to consolidate columns from multiple column partitions and when performing an INSERT-SELECT into a CP table.

For a CP table, the number of available contexts (referred to as column partition contexts when being used for a CP table) is the size of the available memory divided by the amount of memory needed to process a column partition. However, there are a minimum and a maximum number of available contexts no matter how PPICacheThrP is set. The minimum is 8 and the maximum is 256. This means that if the number of available contexts determined using PPICacheThrP is less than 8, 8 contexts are available for the request; if the number of contexts determined using PPICacheThrP is more than 256, only 256 contexts are made available. With the default of 1%, 30 to 80 available column partition contexts are expected to be available for a properly configured system.

Each column partition context is used to read a column partition or to write to a column partition. The number of available column partition contexts defines how many column partitions (sometimes this includes the delete column partition) can be processed at the same time. A file context for a data block in FSG cache is associated with each column partition context for read operations; for write cases, a buffer is allocated from the AMP’s allocatable memory and associated with each column partition context. Changing the maximum multirow data block size can affect the number of read partitions contexts (see section 11.5). For instance, changing the block size from 127.5KB to 64KB can double the number of read partitions contexts. Note changing the block size has no impact on the number of write column partition contexts.

Ideally, the number of column partition contexts should be at least equal to the number of column partitions (sometimes this includes the delete column partition) accessed by a query or in a target CP table for an INSERT-SELECT; otherwise, performance can degrade since not all the needed column partitions can be processed at one time.

An EXPLAIN of a query or INSERT-SELECT indicates the number of available column partition contexts when it is less than the number of column partitions that need to be processed (see Chapter 8: EXPLAIN Terminology and Appendix E: EXPLAIN Phrases and Examples). Having the number of contexts in the EXPLAIN may indicate a non-optimal number of column partitions need to be processed. For a query, this may be acceptable if the query only accesses a small subset of the rows. But if such a query is not selective, the query might not be making good use of column partitioning. For an INSERT-SELECT, this may be acceptable if the re-reading of the source for each set of column partitions in the target CP table that can be processed at the same time has an acceptable impact on the performance of the INSERT-SELECT.

27 Although PPICacheThrP was initially intended for use with PPI tables, which accounts for its name

beginning with “PPI,” it applies to both row-partitioned (RP) and CP tables.



If the number of contexts is being included frequently in EXPLAINs, you may need to evaluate whether this is causing an unacceptable impact on performance. If it is, you may need to do one or more of the following:

1. Decrease the maximum multirow data block size for the table (see above and section 11.5).

2. Reduce the number of column partitions for the CP table.

3. Modify queries to access fewer column partitions or make sure the queries are highly selective on a predicate that references one or a few column partitions.

4. Add a join index that is not column partitioned.

5. Increase the value of PPICacheThrP such that the number of available column partition contexts is increased. However, you need to monitor the performance and memory usage to make sure it is not set too high. If PPICacheThrP is set too high, performance and memory usage can be impacted – this may lead to memory thrashing or a system crash. The default of 1% is conservative, and intended to avoid such memory problems but still be applicable for most workloads.

6. Remove column partitioning from the table and add a CP join index.

7. Reconsider the applicability of using column partitioning for the table.

If memory trashing or a system crash occurs due to running out memory, PPICacheThrP may need to be decreased to resolve the problem. See Appendix H: System Settings for information on setting PPICacheThrP.

11.5 DATABLOCKSIZE/PermDBSize

For the table-level option DATABLOCKSIZE or DBS Control File System field PermDBSize, it is recommended to use the maximum multirow data block size (127.5KB) for a CP table unless performance analysis indicates otherwise. Changing the maximum multirow data block size can affect the number of read partitions contexts (see section 11.4). For instance, changing the block size from 127.5KB to 64KB can double the number of read partitions contexts. Note changing the block size has no impact on the number of write column partition contexts.

11.6 FREESPACE/FreeSpacePercent

The table-level option FREESPACE or DBS Control File System field FreeSpacePercent specifies the amount of space on each cylinder that is to be left unused during load operations. The reserved free space allows table data to expand on current table cylinders, preventing or delaying the need for additional table cylinders to be allocated, therefore preventing or delaying data migration operations associated with new cylinder allocations. Keeping new table data physically close to existing table data, and avoiding data migrations, can improve overall system performance.

For a CP table or join index with small combined partitions (which might be the case if it is also row partitioned), it may need to have some free space allocated if data is added incrementally. However, as noted in section 7.2, this is not an ideal use case for a CP table or join index.

If a large INSERT-SELECT is done and combined partitions are large (or empty), little or no free space is needed.



Chapter 12: Final Thoughts

Teradata Columnar can be beneficial but must be used for appropriate workloads – see Chapter 9: Guidelines for Use and Chapter 10: Performance Considerations. See Chapter 11: Tuning Opportunities in order to take the best advantage of Teradata Columnar.

Key points to remember about column partition tables that were discussed in the previous chapters include the following:

1. Column partitioning offers a particular performance benefit for a very particular selection of database tables. Column partitioning should not be applied indiscriminately.

2. While possible, it is unlikely that all aspects of an application will improve with the introduction of column partitioning on an existing table. Expect a mix of performance benefits and compromises across queries and maintenance activities involved with a CP table.

3. Planning and tradeoff analysis is strongly recommended when implementing column partitioning. Consider this an initial checklist of questions to ask yourself as you approach column partitioning implementation:

a) Overall, is the table a suitable candidate for column partitioning?

b) Will there be a benefit in including row partitioning along with column partitioning?

c) Will the estimated size of the combined partitions be appropriate?

d) Can data loading be accomplished solely using insert-selects?

e) Is there any apparent skew coming from the load source that needs to be removed by a redistribution of the rows before insertion into the CP table?

f) Will lack of primary index access need to be ameliorated?

g) What is the extent of updates or nonpartitioned deletes against the table?

h) Will other forms of compression supplement autocompression?

i) Is it necessary to combine some columns into multicolumn partitions?

4. Verify your physical database design choices since you may not always be able to anticipate all the impacts (both positive and negative) of using a CP table. Be prepared to adjust and tune your physical database design as needed – in some cases, it may be determined that a table should not be column partitioned.

For an existing workload, adding a CP join index to a table rather than making the table column partitioned itself may be a good approach to obtain the benefits of column partitioning while minimizing the negative impacts.

The following appendixes are provided as a reference for comparative performance test results, frequently asked questions, data definition language (DDL) statements for column partitioning, loading a CP table, EXPLAIN phrases and examples, some miscellaneous topics, how to obtain information about CP tables and join indexes, and system settings. The document ends with a glossary of terms.



Appendix A: Comparative Performance Tests

In order to observe the impact of column partitioning, several SQL requests were executed against an Orders table, Items table, or both. Metrics are reported for each query using five different variations of the table or tables being accessed:

PI: Traditional primary index (no partitioning)

PPI: Partitioned primary index table with 84 monthly partitions

CP: Column partitioned, with each column as its own partition

CR: Multilevel partitioning with column partitioning first, then row partitioning (by month)

RC: Multilevel partitioning with row partitioning (monthly) first, then column partitioning

I/O counts, AMP CPU seconds, and elapsed times are captured for each query using DBQL.

Please note: All queries described in this appendix were run stand-alone using beta software on a 6650 Enterprise Data Warehouse platform where all resources were fully available for just the one query. Because of this, comparative increases in CPU usage for the column-partitioned examples did not always result in equivalent increases in elapsed time since idle CPU was available. Similar queries run in a production environment with a mixed workload or a different platform may exhibit different elapsed time characteristics. Other differences in the metrics may also occur due to corrections to the beta software.

A.1 Size Comparisons

The first useful observation is the reduction in perm size for the column-partitioned variations of the Items table as shown in the following chart.

Note that the CP, CR, and RC tables are close to 1/3rd the size of PI and PPI tables. This is a result of autocompression that has effectively reduced the space required to store the rows. Further compression may be possible if combined with user-specified compression and block level compression.

The slight additional reduction in size for the CR and RC tables is due to the fact that when row partitioning is present the rows are sorted and stored by the partitioning expression (based on a date column in this case). When the table has been sorted by a specific column, such as date,



that column can greatly benefit from run-length compression which suppresses like values that appear one after the other within the column partition’s container.

Note that the PPI table is somewhat larger than the PI table. This is due to the PPI table’s row headers being 2 bytes longer to account for the internal partition number.

The following chart below contrasts the sizes for the variations of the Orders table.

Note that this is similar to the Items table except that autocompression is not as effective for the Orders table. The observation here is that the effectiveness of autocompression may vary.

A.2 Full-Table Scan Comparison

This first query comparison is a simple select with no predicates accessing all 16 columns of the Items table using a full-table scan. A BTEQ RETLIMIT command is used to reduce the number of rows returned to the client in the answer set.

The elapsed time is essentially the same for the PI and the PPI tables. However, for the three column-partitioned tables, the elapsed time is about 50% longer since additional CPU is used to bring together all the columns and reconstruct each row in the answer set.

562 558

812856 844

0

200

400

600

800

1000

PI PPI CP CR RC

Elapsed Tim

e

Full‐Table Scan



The way column partitions are formatted produces some savings in I/O for the CP, CR, and RC tables. Column-partitioned data can be stored more compactly and, as a result, the query requires less I/O when scanning. The detailed metrics associated with each variation of the table are as follows:

Table Type I/O Count AMP CPU Time Elapsed Time PI 32,771,499 26,417 562

PPI 32,957,752 26,476 558 CP 24,205,741 46,736 812 CR 24,082,499 46,574 856 RC 24,494,340 45,718 844

Notice that the CPU has increased to a greater degree (78%) than the I/O has been reduced (26%) for the CP, CR, and RC tables; this contributes to the longer elapsed times for those three tables.

As expected, this query is not particularly suited to column partitioning (especially on a CPU-bound system) since there is no column partition elimination and all rows are selected. Even though this is a worst case type of query for column partitioning, the performance may be reasonable if this type of query is rarely expected to occur. On an I/O-bound system, the decreased I/O may actually be beneficial and reduce the elapsed time.

A.3 Simple Aggregation Comparison

In this comparison, a simple aggregation on the Items table is performed using 2 GROUP BY columns (which only produce four distinct values) and an average of a quantity column. Only 3 columns are accessed by the first query variation (labeled 3 of 16 columns). In the second and third query variations (labeled 9 of 16 columns and 15 of 16 columns, respectively), the query references an increasing number of columns in the table.

As expected for the PI and PPI tables, the elapsed time is about the same no matter how many columns are accessed since all columns are read even if very few are actually used. However, CP, CR, and RC tables show a predictable pattern of longer elapsed times as more columns have to be accessed. This increase represents the overhead in CPU to assemble all the referenced columns into a single output row. The detailed metrics for each query variation are provided on the next page.



3 of 16 columns:


PPI 13,764,320 2,629 113 CP 173,833 5,941 86 CR 180,117 5,900 86 RC 250,996 5,953 86

Note the significantly lower I/O but higher CPU for the CP, CR, and RC tables compared to the PI and PPI tables.

9 of 16 columns:


PPI 13,764,320 2,827 113 CP 1,388,846 10,679 156 CR 1,632,828 10,560 156 RC 1,858,108 10,143 161

Note the expected increase in both I/O and CPU for the CP, CR, and RC tables as the number of columns accessed increases. The elapsed time is now higher for the CP, CR, and RC tables compared to the PI and PPI tables due to the higher CPU even though the I/O is still much less.

15 of 16 columns: Below are the metrics for this case. Note that this continues the trend discussed in the previous case.


PPI 13,764,320 5,116 116 CP 2,697,405 18,491 266 CR 2,541,000 18,458 270 RC 2,924,652 18,465 269

A.4 Rollup Query Comparison

This comparison uses a complex rollup query against the Items table. The query includes 12 different functions in the SELECT list, including KURTOSIS, SKEW, and ABS among others. In addition, it contains 13 different predicate conditions that reduce the rows of interest to 2% of the table. This same basic query is executed with 4 different variations:

Few columns, all rows: Only 4 columns are referenced, all rows qualify

Few columns, 1 month: Same as above but only one month out of 7 years of data is requested

Many columns, all rows: 14 out of 16 columns are accessed, all rows qualify

Many columns, 1 month: Same as above but only one month of out of 7 years of data is requested

To keep it simple, the chart on the next page focuses on just the PI and CP variations of the Items table, showing the elapsed times across 4 different query variations. As can be seen, this type of query is good for a column-partitioned table across the board. Following the chart, the detailed metrics for each query variation are discussed.



Few columns, all rows:


PPI 13,764,254 1,153 113 CP 1,371,560 2,890 42 CR 754,132 2,858 42 RC 876,867 2,775 42

With only four columns accessed, it is understandable why the elapsed times for the CP, CR, and RC tables is better than for the PI and PPI tables. For the CP, CR, and CR tables, only the column partitions for those four columns are accessed. The PI and PPI tables incur the overhead of reading all the columns even if they are not actually used in the query.

Few columns, 1 month:

Table Type I/O Count AMP CPU Time Elapsed Time PI 13,578,001 945 111

PPI 172,663 21 2 CP 1,090,431 1,963 30 CR 10,006 56 1 RC 10,552 57 1

The query against the CR and RC tables runs extremely fast – 1 second. As in the case for the PPI table, the CR and RC tables are able to benefit from row partition elimination by month. Also, the CR and RC tables benefit from run-length compression for the date column containers. Run-length compression for a table partitioned by date reduces the size of that column partition to practically nothing since all values for the same date end up being represented in a single data block. (With run-length compression, each value following the first appearance of a value that is identical to this first value is removed from the container completely.) For that reason, the elapsed time for the CR and RC tables is shorter than when all rows are accessed as in the previous case (1 second vs. 42 seconds).

The CP table does well also. The CP elapsed time is shorter than the PI time (30 seconds vs. 111 seconds) since, even though the CP table is not partitioned by date, only that single date column partition (the only predicate) is scanned. Once the appropriate date values are found, values from the three other columns are directly accessed within their containers without further scanning.

However, the reduction in elapsed time and I/O comes with increased CPU for the CP table compared to the PI table, and for the CR and RC tables compared to the PPI table.



Many columns, all rows:


PPI 13,764,254 1,438 113 CP 4,089,172 5,143 75 CR 3,434,071 5,045 75 RC 3,783,586 4,959 74

Notice how the I/O for the CR and RC tables have gone up more than 4 times compared to the few columns, all rows equivalent shown earlier in this section. This is due to the additional effort to access the greater number of columns. CPU is almost two times greater using the same comparison. That increased CPU is the overhead of bringing together the larger number of column values.

Of interest is that the elapsed times for the CP, CR, and RC tables are still well below that of the PI and PPI tables even though 14 of the 16 columns are being accessed. One reason is that the CP, CR, and RC tables consume 1/3 of the I/O compared to the PI and PPI tables. The I/O is lower due to the impact of autocompression in reducing the space required to hold the data. So while there is some additional CPU overhead, it is not enough to cause elapsed times for the CP, CR, and RC tables to exceed the elapsed times of the PI and PPI tables.

As for all these cases, the query is running stand-alone so it can make use of unused CPU on the platform. Having CPU available softens the overhead of re-assembling columns. In a mixed workload environment, the elapsed time for this type of query may increase if there is less available CPU.

Note that the Items table has only 16 columns which could be regarded as a low number of columns. It is not uncommon for Teradata sites to have tables containing 50, 100, or even more columns. You should expect the CPU overhead involved in bringing together all the required columns to be greater when tables have a larger number of columns and more of those columns are accessed by a query. On the other hand, queries often access only a small subset of the columns of a table and a PI or PPI table would have to read all the columns while column-partitioned table would only read the columns needed by the query.

Many columns, 1 month:


PPI 178,355 26 2 CP 3,256,328 2,114 32 CR 47,492 88 2 RC 48,472 88 2

The short elapsed times for the PPI, CR, and RC tables are described in the few columns, 1 month case above. However, in this case, more columns are accessed so there is slightly more CPU and about 4 times as many I/Os. Note that the CR and RC tables required 3 times the CPU as the PPI table; this is to re-assemble the accessed columns.

A.5 Join Comparison

In this comparison, the Orders and Items tables are joined on the column that happens to be the primary index for the PI and PPI variations and a simple average of one column from the join result is calculated. Each query variation joins tables that have the same table variation. For example in the PI case, both Orders and Items are defined with a PI without partitioning; in the CP case, they are both defined with column partitioning but without row partitioning.



The chart below shows the elapsed times for the query when three columns are accessed across the two tables (labeled Few columns), and then when 15 out of 16 columns in the Items table and 8 out of 9 columns in the Orders table are accessed (labeled Many columns).

As expected, this query is not particularly suited to column partitioning as shown in the above results. Adding a NUSI or join index on the join column might help improve this query for a column-partitioned table. For column-partitioning, a large CP table joined to smaller table where dynamic hash join could be used would be a more suitable type of query.

The following discusses the detailed metrics for each case.

Few columns:


PPI 39,411,626 15,900 337 CP 24,419,299 35,757 693 CR 25,147,819 37,440 711 RC 25,341,401 37,457 717

Interestingly, I/O is about the same for the CP, CR, and RC case compared to the PI case. But CPU is more than 4 times higher with the column-partitioned tables than with the PI case, even when few columns are involved. This increase in CPU is the reason for elapsed times that are 2.8 times longer than for the PI case and 2.1 times longer than the PPI case. The added CPU represents the overhead of doing the row redistributions and sorts that are required to perform a join when there is no primary index defined on either table for the join column.

For the PPI case, there are too many row partitions to be able to do a sliding-window merge join directly between the Orders and Items tables; in this case, the columns needed for the join for both tables are spooled, the spools sorted, and a merge join is done between the spools. For the PI case, the Orders and Items tables can be directly merged joined without spooling.

Many columns:


PPI 39,825,215 22,057 343 CP 17,904,275 53,188 837 CR 18,373,294 59,457 943 RC 18,999,061 59,527 927



CPU goes up for the CP, CR, and RC tables, as does elapsed time, compared to the few columns case above. For the PI case because of the additional predicates, the Orders table is spooled locally while applying predicates and leads to reduced I/O. For CP, CR, and tables, the additional predicates also help reduce the I/O.

A.6 I/O Intensive Request

This comparison is for a multistatement request that includes a full-table scan of the Items table and a full-table scan of the smaller Orders table that are performed in parallel. There is a predicate that cannot be satisfied on each table that prevents any rows from being selected. In the few columns case, only two columns from each table are accessed (one for the predicate and one projected in the result). In the many columns case, 15 out of 16 columns for the Items table are accessed as well as 8 out of 9 columns for the Orders table (for each table, one column is projected and the other columns are accessed in predicates).

The following chart illustrates the differences in elapsed times for the two variations of this request across all five table variations. For a given request case, both tables used in the request have the same table variation. For example, in the PI example of this multistatement request, both tables being accessed are defined with a primary index but without partitioning; in the CP case, both tables are defined with column partitioning but without row partitioning.

All tables defined with column partitioning perform exceptionally well with both variations of this I/O intensive request. The following discusses the detailed metrics for each case.

Few columns:


PPI 16,739,228 1,086 115 CP 108,301 1,928 28 CR 186,485 1,938 28 RC 245,331 1,936 28

Notice how much less I/O there is for the column-partitioned tables (about 150 times fewer I/Os) even though the CPU is close to being doubled. The huge reduction in I/O is responsible for the much lower elapsed times for the CP, CR, and RC tables compared to the PI and PPI tables. This reduced I/O comes from having to only access one column from each table to evaluate the



predicate (which is always false so other columns, including the projected column, do not need to be accessed). Autocompression for this column also helps to further reduce the I/O.

It is interesting to note that the PI and PPI tables the metrics are almost identical. Slightly more effort is expended with the PPI table due to the somewhat larger size of the table that comes from carrying an extra 2 bytes per row in the row header for each row of the table.

Many columns:


PPI 16,739,228 1,166 115 CP 283,151 3,044 44 CR 363,781 3,058 44 RC 490,637 3,057 44

Even though more columns are touched in each table, the elapsed time for the column partitioned tables, when comparing few columns vs. many columns, are only slightly longer (28 seconds vs. 44 seconds) and remain well below comparative PI and PPI elapsed times. Elapsed times remain low in the many columns case because the I/O is significantly less even though I/O is 2 to 3 times higher than in the few columns case. The I/O increases because the predicate that always evaluates to false is not the first predicate to be evaluated for the Orders table but is the fourth predicate (so for a row, 1, 2, 3, or 4 columns need to be accessed depending on whether the first three predicates evaluate to false or not). The observation here is that collecting statistics is important so that the optimizer can order the predicates with the most selective predicate evaluated first.

A.7 INSERT-SELECT Comparisons

This comparison considers an INSERT-SELECT into the Items table, and then again for the smaller Orders table. In both cases, the source data comes from a table defined with no primary index (a NoPI table). The chart below compares the PI and CP variations of the two tables and is followed by the detailed metrics for each table.

Items Table:

Table Type I/O Count AMP CPU

Time Elapsed

Time Spool Usage

CPU Utilization

PI 143,763,636 43,325 4466 876,983,687,168 15% CP 53,548,028 113,141 1743 0 91%



Note that no spool file is required when insert-selecting from a NoPI table into a CP table. Since a CP table is also a NoPI table, the INSERT-SELECT is AMP-local. For a PI table, rows being inserted from a NoPI table must be hash redistributed and then sorted on the primary index value, requiring a large intermediate spool file.

Because of the extra steps involved, the PI table requires over 2 ½ times more I/O and the elapsed time for the PI table is also over 2 ½ times longer than the CP case even though the CP table requires over 2 ½ time more CPU. But notice the CPU utilization differences. While the PI case uses very little CPU on the platform (17%), CPU utilization for the CP case is above 90%. If the CP case had been run on a busy platform where there was little additional CPU available, the elapsed time would likely be longer than what is reported here.

Orders Table:

Table Type I/O Count AMP CPU

Time Elapsed

Time Spool Usage

CPU Utilization

PI 28,856,078 10453 947 189,199,163,392 16% CP 11,614,404 17491 280 0 77%

The Orders table is about 1/4th the size of the Items table and it has fewer columns (9 vs. 16). With fewer columns to manage when inserting into a column-partitioned table, the CPU utilization is somewhat lower than the case with the Items table (77% vs. 91%). For the same reason, the increase in CPU comparing CP vs. PI is less in this case since fewer columns are involved. When inserting into tables that contain an even larger number of columns than these examples provide, expect the CPU to increase. Note that the CPU would also increase for the PI case if MVC and ALC compression is used, as is typical in many environments.

A.8 Conclusions

These test results highlight particular characteristics about column-partitioned tables described earlier in this orange book:

1. They can result in significantly smaller table sizes.

2. Table scans that access few rows and columns use much less I/O and may perform better than their PI counterparts.

3. Full-table scans that access all columns and all rows consume more CPU and tend to have longer elapsed times.

4. The greater the number of columns accessed, the greater the CPU usage and the longer the elapsed time.

5. Joins on what would be the primary index for PI or PPI table involve more CPU and have a longer elapsed time even when few columns are touched.

6. Basic INSERT-SELECT processing is faster when there is enough available CPU because no spool file is needed and there is no redistribution or sort activity.

7. Collecting statistics on columns used in single-table predicates against a CP table can help improve predicate ordering leading to better performance.

8. These results are based on beta software and a specific hardware platform. Your results may differ.

Be careful in generalizing these conclusions. For instance, point 5 should not be generalized to “a column-partitioned does not perform well for joins;” other types of joins may benefit significantly or perform adequately when using column-partitioning.



Appendix B: Frequently Ask Questions

B.1 What is Teradata Columnar?

Teradata Columnar is an option introduced in Teradata 14.0 for organizing the data of a user-defined table or join index on disk.

Teradata Columnar offers the ability to partition a table or join index by column. It includes column-storage as an alternative choice to row-storage for a column partition and autocompression. Column partitioning can be used alone in a single-level partitioning definition or with row partitioning in a multilevel partitioning definition.

Teradata Columnar is a new paradigm for partitioning, storing data, and compression that changes the cost-benefit tradeoffs of the available physical database design choices and their combinations. Teradata Columnar provides a benefit to the user by reducing I/O for certain classes of queries while at the same time decreasing space usage.

A column-partitioned (CP) table or join index has several key characteristics:

It does not have a primary index.

Each of its column partitions can be composed of a single column or multiple columns.

Each column partition usually contains multiple physical rows.

A new physical row format COLUMN may be utilized for a column partition; such a physical row is called a container. This is used to implement column-storage, row header compression, and autocompression for a column partition.

Alternatively, a column partition may have physical rows with ROW format that are used to implement row-storage; such a physical row is called a subrow.

A CP table is just another type of table that can be accessed by a query. Multiple kinds of tables including CP tables can be accessed by a single query.

For example,

CREATE TABLE Sales_CR ( TxnNo INTEGER, TxnDate DATE, ItemNo INTEGER, Quantity INTEGER ) PARTITION BY COLUMN;

The following adds a level of row partitioning using multilevel partitioning:

CREATE TABLE Sales_CPRP ( TxnNo INTEGER, TxnDate DATE, ItemNo INTEGER, Quantity INTEGER ) PARTITION BY ( COLUMN, RANGE_N(TxnDate BETWEEN DATE '2011-01-01' AND DATE '2011-12-31' EACH INTERVAL '1' DAY) );

Additional options are provided to group multiple columns into column partitions, explicitly specify whether column-storage or row-storage is used for a column partition, and suppress autocompression for a column partition. Note that most other Teradata table options are allowed with a CP table. These include user-specified multivalue compression, ALC compression, block level compression, identity columns, constraints, etc.



B.2 Is Teradata Columnar enabled by default?

No. Teradata Columnar is optional and disabled by default. Teradata Columnar can be enabled by the GSC or an authorized Teradata representative. Note that it may require a specific purchase or license fee before it will be enabled.

B.3 Does enabling Teradata Columnar mean all tables will be columnar?

No. A table must be created or altered explicitly to have a PARTITION BY COLUMN clause for the table to have columnar capabilities; otherwise, tables are just as before.

A CP table works with NoPI/PI/PPI/MLPPI tables, other CP tables, secondary indexes, and join indexes. That is, a query can reference multiple kinds of tables and the optimizer will figure out the best plan to access the tables, do the joins, use join indexes instead of a base table, etc. in order to be able to execute the query.

B.4 Can I alter an existing table to be a CP table?

Yes, but only if the existing table is empty. If the table has a primary index, ALTER TABLE must specify NO PRIMARY INDEX along with a PARTITION BY COLUMN clause so that the primary index is removed. If the table is nonempty, you will need to create a new table and copy the data from the old table to the new table. Allowing a nonempty table to be altered to be a CP table is deferred to a future release.

B.5 Should I change all my tables to be column partitioned?

No. Column partitioning is not suitable for all tables just as a specific PPI or MLPPI is not suitable for all tables. If tables and the workload are highly bound to having a PI it might not be a good candidate for CP unless you can use a join index so that advantages of both column partitioning and a PI/PPI/MLPPI can be obtained. A CP table or join index is usually more useful for ad hoc, analytic, or non-tactical query workloads.

B.6 Why use a CP table or join index?

A CP table or join index is used to reduce I/O for queries. This occurs when there is column partition elimination, row partition elimination, and/or selective predicates for a query. Row header, auto, and user-specified compression can also contribute to reducing I/O.

Column partitioning is also used to reduce disk space usage for tables. This occurs when row header and autocompression are effective.

B.7 Why is there an increase in CPU usage?

Column partitioning (as is the traditional row partitioning for PPI and MLPPI) is targeted to reduce I/O; it was not targeted to reduce CPU. In some cases, CPU usage decreases but there are cases where CPU usage increases. There is extra work to process the column partitions that is sometimes not offset by the reduced data that that needs to be processed. Some operations (such as inserts) involving a CP table may see a significant increase in CPU usage.

Trading I/O for CPU may be beneficial for an I/O-bound system. For a CPU-bound system, column partitioning may not provide any benefit and may actually degrade the performance. An exception would be if there is a subset of the workload that is I/O bound (even if overall the system is CPU bound) for which column partitioning could be applicable.

As the feature matures, opportunities to minimize CPU usage will be explored.



B.8 Why no primary index (NoPI)?

Let's start by understanding what a primary index does. A primary index (PI) is used to distribute the rows of a table to the AMPs and on each AMP to order the rows by hash value within the combined partitions defined by a PARTITION BY clause (if this clause is not specified, there is just one partition, that is, the entire table).

One purpose of column-storage (implemented using physical rows called containers) for a column partition is to provide row header compression by having sequential values with increasing uniqueness without having an explicit row header for each value. This means that, for a column-partitioned table with a PI, a container would only be able to contain values with the same internal partition number and hash value. This is likely to cause over-partitioning (partitioning to too fine of a granularity). That is, each container would only have a few values. The row header overhead for these containers would cause a large increase (up to 23 times) in space usage. The maintenance overhead could be very expensive.

Some of the complications of having a PI for a CP table could be avoided if the PI was only used for distribution of rows to AMPs but not to use it to order rows within combined partitions on the AMPs. This would allow single-AMP access improving tactical queries against a CP table. Also, this would allow for local-based joins on the PI columns. However, merge joins where the rows are ordered by the hash of the PI columns would not be applicable and other local join methods would need to be modified or implemented.

While there are some use cases that might benefit when rows are distributed by a hash value (instead of randomly or locally) and either ordered by the hash value within a combined partition or not, they require further implementation effort. As the feature matures, enhancement opportunities will be explored.

Note: all join methods are supported with a CP table. However for some cases, selected rows from the CP table may need to be distributed/spooled/sorted prior to a join step (such as a merge join). Since only the needed columns are scanned, a scan of the CP table can be much more efficient than scanning a non-CP table where entire rows must be read.

If a PI or partitioned PI (PPI) is needed for some queries (tactical queries, primary index joins, etc.), a CP table can have a join index on it where the join index does have a PI, PPI, or MLPPI (but not column partitioning). Alternatively, a PI, PPI, or MLPPI table could have a join index on it where the join index (possibly sparse) has column partitioning (but not a primary index) plus optionally one or more levels of row partitioning. Either of these would allow the benefits of both PI/PPI/MLPPI and column partitioning, albeit with the extra space usage and maintenance of the join index. The optimizer would pick the table or the join index depending on which one was better suited for the query.

B.9 Can a CP table have a PPI or MLPPI?

Technically, no. PPI stands for partitioned primary index. MLPPI stands for multilevel partitioned primary index. This would mean a table with a PPI or MLPPI has a primary index (PI) with one or more levels of row partitioning (using partitioning expressions). But currently a CP table can’t have a PI so saying it can have PPI or MLPPI is a misuse of these acronyms.

However, a CP table is sometimes loosely said to have a PPI or MLPPI. In this case, what is meant is the CP table has the first P of PPI or the MLP of MLPPI – that is, the CP table has multilevel partitioning with column partitioning and one or more levels of row partitioning (using RANGE_N or CASE_N partitioning expressions) but no PI.



B.10 Can NoPI table have a PPI or MLPPI?

No. A NoPI table can’t have a partitioned primary index (PPI) since it doesn’t have a primary index (PI). While a misuse of the acronym PPI, saying a NoPI table has a PPI would mean that the NoPI table has the first P of PPI – that is, the NoPI table would have row partitioning but no PI. Currently, a NoPI table can't have row partitioning unless it also has column partitioning. While technically a NoPI table could have row partitioning, this is deferred to a future release.

B.11 Can a CP table be temporal?

No. The use of VALIDTIME and TRANSACTIONTIME columns in a CP table is currently disabled in TD 14.0. Attempting to create/alter a CP table having these columns causes error 9318 ”A table without a primary index cannot be a temporal table.” to occur. This will be enabled (as long as Teradata Columnar and Temporal are both enabled) in a forthcoming release of Teradata pending completion of feature testing of this capability.

See also Is Teradata Columnar enabled by default? in section B.2.

B.12 Can a GLOBAL TEMPORARY or VOLATILE table be column partitioned?

No. This capability is deferred to a future release.

B.13 Why is dictionary autocompression local to a container?

A container-specific compression dictionary can have compression tokens that are specific to the values actually in the container. A token must be large enough to have a value for each compressible value – for example, the token would be one byte to be able to compress 255 values. Also, with autocompression, a larger dictionary is supported; it is not limited to 255 values as currently with user-specified multivalue compression (MVC).

The drawback is that a value that occurs often also occurs once in each local dictionary reducing the compression obtained for that value. If the value only occurs a few times in a container, the compression might not be as effective as when using a global compression dictionary. But note the following for a global dictionary:

1. A token occurs for every value in the table and a value of this token is reserved for each compressible value even if a compressible value does not occur in the container. For a value that is not compressed, the token still occurs for that value making that value actually take more space.

2. Generating a global dictionary requires seeing all the data then going back and compressing it all. If data is added to the table, this may need to be repeated to update the global dictionary and recompress all the data.

3. A global dictionary can be created using MVC, albeit user-specified. Autocompression uses this dictionary if available plus adds a local dictionary for additional values that occur frequently in a container.

Since local compression has its advantages and a global dictionary support is already available in Teradata, adding local compression was considered to be a higher priority for TD 14.0 than automating a global dictionary especially considering the potential performance impact. Automating global dictionary creation may be considered for a future release.

Autocompression can be an easy way to obtain significant compression for a CP table. However, autocompression may not be the solution for all your compression needs. Complementing autocompression with user-specified MVC, ALC, BLC and temperature based



BLC may provide even further compression. Also, there are use cases where suppressing autocompression for a column partition may be appropriate (see section 11.1).

B.14 Are rows inserted using round robin distribution to the AMPs?

No. For a CP table and a NoPI table, INSERT VALUES rows are actually distributed randomly, not using round robin. A round robin method was initially used for NoPI sometime before GCA of NoPI but it was changed to random prior to GCA since it provided a more even distribution.

Note that an insert-select copies locally – no redistribution randomly or otherwise. However, TD 14.0 adds an option to INSERT-SELECT to force a random or hash redistribution (HASH BY clause) which useful for getting rid of skew that may be in the result of the SELECT.

B.15 Are CP tables usable when data is highly volatile?

Usually, no. For CP, updates are allowed but currently an update is done by logically deleting the old row and appending the updated row. And, fastpath delete all and fastpath row partition delete are actual deletes, not logical deletes; other deletes are logical – the space, other than for LOBs is not reclaimed until a fastpath delete occurs.

Therefore, updates and logically deleting rows are recommended to be relatively rare at least in TD 14.0. Preferably, loading and maintenance should be done in one of the following ways:

1. Large INSERT-SELECTs into (preferably) empty row partitions (probably by partitioned by date or timestamp) with old data being deleted using a fastpath DELETE by ranges that map to row partitions.

2. A large INSERT-SELECT into an empty table with a periodic fastpath DELETE ALL and INSERT-SELECT to refresh the data.

Using a CP table when the data is highly volatile should be considered only if the impacts of updates, deletes, and refreshes of the entire table or partitions are acceptable.

Enhancements (such an in-place update) to improve the handling of volatile data for a CP table will be explored in the future.

B.16 Why is space not reclaimed for a DELETE?

Fastpath delete all and fastpath row partition delete do reclaim the space for the deleted rows; however, other deletes are logical – the space, other than for LOBs is not reclaimed until a fastpath delete occurs.

Actually deleting the column partition values in each of the column partitions for the deleted row would be expensive and, for containers, the containers would have to be split into two containers so the following column partition value could have a row header that specifies its rowid so that rowids don’t change. But splitting the container in each column partition could actually require more space than what is reclaimed by removing the column partition values.

Providing capabilities that would reclaim space (such as, initiating a scan of a CP table to reclaim space where enough contiguous column partition values have been logically deleted) will be explored in the future.



B.17 Why isn’t FastLoad supported for a CP table?

A CP table cannot be sorted in place. Sorting in place is required by the current architecture of FastLoad if the target table is row partitioned – it is expected that CP tables will often be row partitioned. Also, sorting will often be required to obtain good compression. To do the sort, the incoming rows would have to be stored in a staging table (or subtable), sorted and then split up and transformed into column partitions – this is basically the current recommended approach for loading a CP table. Even if sorting was not needed, FastLoad would at minimum need to be modified to split up rows and transform them into column partitions. Supporting CP tables would require re-architecting FastLoad to have a staging area internally and insert into a CP table under the covers; even then there would be subtle differences in error messages, recovery processes, etc. compared to FastLoad into a non-CP table. Alternatively, a caching approach to avoid a staging table might be possible to help speed up performance if there is large enough cache but, if there isn’t, performance could be much worse than with the staging/INSERT-SELECT approach.

FastLoad into a CP table would not be applicable or provide any performance advantage in many of the use cases for a CP table where the CP table is row partitioned, the rows need to be ordered to get better row compression, the CP table is not empty, source rows are coming from other tables on the same system, etc. In some cases, FastLoad into a CP table might be useful but only if it performed significantly faster that loading into a staging table and then doing an INSERT-SELECT.

Opportunities to enhance FastLoad to support CP tables and improve performance over loading into a staging table and doing an INSERT-SELECT will be explored in the future.

B.18 Why isn’t MultiLoad supported for a CP table?

MultiLoad works based on matching primary index values. Since a CP table has no primary index, MultiLoad is not applicable for a CP table.

B.19 Why isn’t the upsert form of UPDATE supported for a CP table?

The upsert form of the UPDATE statement works based on matching primary index values. Since a CP table has no primary index, upsert is not applicable for a CP table.

B.20 Why isn’t MERGE supported for a CP table?

The MERGE statement works based on matching primary index values. Since a CP table has no primary index, the MERGE statement is not applicable for a CP table.

B.21 Why may data skew occur for restore and copy of a CP table?

See section F.4 and specifically section F.4.1.

B.22 Why does data skew occur for Reconfig?


B.23 Why may data skew occur for INSERT-SELECT into a CP table?


B.24 Why may data skew occur for Down-AMP recovery?




Appendix C: DDL Details

This appendix provides details about DDL statements of interest for Teradata Columnar.

C.1 Column Partitioning Syntax

Additional syntax for a CP table is in bold font.

CREATE MULTISET TABLE t ( (p INTEGER, c INTEGER) ,d1 DATE ,d2 BYTEINT ,d3 SMALLINT ,ROW(a1 CHAR(100), a2 VARCHAR(1000)) ) NO PRIMARY INDEX PARTITION BY ( COLUMN ADD 50, RANGE_N(d1 BETWEEN DATE '2006-01-01' AND DATE '2010-12-31' EACH INTERVAL '1' MONTH), RANGE_N(d2 BETWEEN 1 AND 4 EACH 1) ) UNIQUE INDEX (p, c);

The following describes the various syntax elements shown in the above example.

MULTISET

For a CP table, MULTISET is the default for both Teradata (BTET) and ANSI mode and MULTISET may be explicitly specified. SET is not allowed for a CP table.

(p INTEGER, c INTEGER)

The grouping of columns in parentheses defines a set of columns for a column partition. Since this is a narrow column partition, the system determines COLUMN format is to be used for the physical rows of this column partition. The HELP COLUMN statement can be used or a data dictionary view queried to find out what format was chosen for a column partition (see Appendix G: Partitioning Meta Data). The column partition is autocompressed by default. Grouping columns into a column partition is useful if these columns are usually accessed together, for cold columns, or to reduce the number of column partitions (see section 11.3). Note that as the number of columns increases for a column partition, the effectiveness of autocompression decreases; in this case, apply MVC, ALC, BLC, or temperature based BLC to provide compression. Additional options include:

COLUMN(column_definition_list) [NO AUTO COMPRESS]

COLUMN indicates a series of column partition values per physical row (i.e., a container); this option is useful if the system-determined format is ROW and this needs to be overridden. NO AUTO COMPRESS may be specified if autocompression is not effective to avoid the overhead of checking containers for autocompression opportunities (see section 11.1).

ROW(column_definition_list) [NO AUTO COMPRESS]

ROW indicates one column partition value per physical row (using regular row format); this option is useful if the system-determined format is COLUMN and needs to be overridden. NO AUTO COMPRESS may be specified; however, note that autocompression is currently not applied to column partitions with ROW format.



ROW(a1 CHAR(100), a2 VARCHAR(1000))

ROW is explicitly specified for this column partition. Since this is a wide multicolumn partition, the system would have determined to use ROW format for the physical rows of this column partition if ROW had been omitted. The column partition is autocompressed by default; however, autocompression is currently not applied to a column partition with ROW format. Apply MVC, ALC, BLC, or temperature based BLC to provide compression for a column partition with ROW format.

d1 DATE

This defines a single-column partition. Since this is a narrow column partition, the system determines COLUMN format is to be used for physical rows of the column partition. Also, the column partition is autocompressed by default. This is useful if queries access a variety of column subsets of a table. To specify COLUMN format explicitly, ROW format, or NO AUTO COMPRESS, put the definition in parentheses. For example,

COLUMN(d1 DATE) NO AUTO COMPRESS

NO PRIMARY INDEX

This may be omitted. A CP table is a NO PRIMARY table by default. A PRIMARY INDEX cannot be specified.

PARTITION BY

This specifies one or more levels of partitioning. One level may specify column partitioning. Using multilevel partitioning, other levels may specify row partitioning using a RANGE_N or CASE_N function as the partitioning expression.

COLUMN

This specifies that the table has column partitioning. Note the two uses of COLUMN; one indicates column partitioning, as in this case, and the other indicates that a column partition has COLUMN format as discussed above.

Additional options may follow COLUMN if grouping of columns is not defined in the column definition list:

COLUMN [ALL BUT] ([COLUMN|ROW]{(column,...)| column} [NO AUTO COMPRESS], ...)

For each column list or column, the one or more columns listed are included in a column partition with default system-determined COLUMN or ROW format and autocompression unless COLUMN, ROW, or NO AUTO COMPRESS is specified for the column list or column.

If ALL BUT is not specified, columns not listed are combined into one column partition. If ALL BUT is specified, columns not listed are each in their own single-column partition. In either case, these column partitions have system-determined COLUMN or ROW format and autocompression.

ADD 50

This specifies the number of row or column partitions that may be added using ALTER TABLE for the partitioning level (the value is increased if excess combined partitions are assigned to the level). For a column-partitioning level, the default is 10 unless excess combined partitions are assigned to the level. For a row-partitioning level, the default is 0



unless excess combined partitions are assigned to the level. See the Orange Book: Increased Partition Limit and other Partitioning Enhancements for more information on the ADD option.

RANGE_N(d2 BETWEEN 1 AND 4 EACH 1)

As currently, this specifies a level of row partitioning. A row-partitioning level can alternatively use a CASE_N function as the partitioning expression.

UNIQUE INDEX (p, c)

This defines a unique secondary index for the table. Defining secondary indexes is optional; however, since there is no primary index, uniqueness, if needed, must be specified using a unique secondary index. Nonunique secondary indexes and join indexes can also be defined on a CP table. Nonunique secondary indexes can be defined on a CP join index.

C.2 Column Partitioning

If a column/constraint grouping with COLUMN format is specified, the grouping defines a column partition and one or more column partition values are stored in a physical row (referred to as a container using COLUMN format). If a column or constraint grouping with ROW format is specified, the grouping defines a column partition and only one column partition value is stored in a physical row as a subrow (i.e., ROW format). If neither is specified for a column/constraint grouping, the grouping defines a column partition and the system determines whether COLUMN or ROW format is used for the column partition. A column partition value consists of the values of the columns in the column partition for a specific table row.

If a column/constraint grouping is specified with NO AUTO COMPRESS, autocompression for physical rows is not applied; however, any user-specified compression and, for column partitions with COLUMN format, row header compression is still applied for the column partition. If NO AUTO COMPRESS is not specified, autocompression for physical rows is applied.

If a COLUMN clause as a partitioning level of a PARTITION BY clause does not specify column grouping and a column/constraint definition is not in parentheses defining a group, it is treated as a group defining a single-column partition with autocompression and system-determined COLUMN or ROW format.

To specify the column partition format (COLUMN or ROW) or NO AUTO COMPRESS for a column partition consisting of a single column, the column or constraint definition must be in parentheses as a single-column group. Note that there can be one or more column/constraint definitions grouped into a column partition using parentheses. Alternatively, these options and grouping may be specified in a COLUMN clause in the PARTITION BY clause.

When the COLUMN or ROW format is system-determined, the system bases it choice on the size of a column partition value for the column partition and other factors such as whether a column partition value for the column partition has fixed or variable length and whether the column partition is a single-column or multicolumn partition. Generally, a narrow column partition is determined to have COLUMN format and a wide column partition is determined to have ROW format. The HELP COLUMN statement can be used or a data dictionary view queried to find out what format was chosen for a column partition (see Appendix G: Partitioning Meta Data). A user may explicitly specify the format if the user wants a specific format for a column partition.

Note that a column partition either has COLUMN or ROW format (it cannot have a mix of both formats). However, different column partitions of a CP table may have different formats. That



is, the column partitions of a CP table can have all COLUMN format (containers), all ROW format (subrows), or some COLUMN format and some ROW format.

Column grouping in a COLUMN clause of the PARTITION BY clause allows for more flexibility in specifying which columns go in which partitions. Column grouping in the column definition list of a table or the select expression list of a join index allows for a simpler (but less flexible) specification of column groupings.

A column with CHARACTER SET KANJI1 is not allowed when a table or join index is column partitioned. It is recommended to use CHARACTER SET UNICODE instead.

C.3 CREATE TABLE Statement

The CREATE TABLE statement can be used to create a multiset table with no primary index to have column partitioning and, optionally, row partitioning.

1. It cannot be a global temporary, volatile, temporal, queue, global temporary trace, or error table. It cannot have permanent journals.

2. The table may have an identity column, security constraints, and triggers. On-line archive is supported for a CP table.

3. A CREATE ERROR TABLE … FOR <table> statement creates an error table associated with the table. If the table is a CP table, the error table is created the same as for a non-CP table except that it is a NoPI table without partitioning.

4. If a table is partitioned (including column partitioned), its fallback rows are partitioned the same as its primary data rows.

5. A CREATE TABLE statement must not specify a replication group if the table being created is column partitioned.

6. A CREATE TABLE … AS statement may specify grouping in the SQL table element list for the target table following the rules above.

7. A CREATE TABLE … AS <srctable> … statement copies the indexes and partitioning for the source table to the new table if an index list is not specified. If the partitioning is copied and the source partitioning includes column partitioning, the column-partitioning definition is copied as part of the partitioning except, if grouping is specified in the SQL table element list of the target table, this specified grouping is used for the column partitioning of the new table instead of the grouping defined for the source table. If AND STATISTICS is specified, there are PARTITION statistics, and column partitioning is specified via the grouping in the select list, a PARTITION BY clause is specified, or indexes are specified, PARTITION statistics are not copied (however, the summary statistics are copied).

Example 1

CREATE TABLE Orders (o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL, o_shippriority INTEGER, o_comment VARCHAR(79)) NO PRIMARY INDEX PARTITION BY COLUMN UNIQUE INDEX (o_orderkey);



This defines a table with no primary index and with column partitioning. Each column is in its own partition with system-determined COLUMN format and autocompression. A 2-byte partitioning is defined. Note that, by default, a table with no primary index is a multiset table; MULTISET may be explicitly specified but SET must not be specified.

Example 2

CREATE TABLE Orders (o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL, o_shippriority INTEGER, ROW(o_comment VARCHAR(79)) NO AUTO COMPRESS) PARTITION BY COLUMN, UNIQUE INDEX (o_orderkey);

This defines a table with no primary index and with column partitioning. Each column is in its own partition with system-determined COLUMN format and autocompression except for o_comment which is in its own partition with user-specified ROW format and no autocompression. A 2-byte partitioning is defined.

Example 3

CREATE TABLE Orders (o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL, o_shippriority INTEGER, o_comment VARCHAR(79)) PARTITION BY COLUMN ALL BUT (ROW(o_comment) NO AUTO COMPRESS), UNIQUE INDEX (o_orderkey);

This defines a table that is the same as in example 2 above.

Example 4

CREATE TABLE Orders (o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL, o_shippriority INTEGER, o_comment VARCHAR(79)) PARTITION BY COLUMN ( o_orderkey, o_custkey, o_orderstatus, o_totalprice, o_orderdate, ROW o_comment NO AUTO COMPRESS), UNIQUE INDEX (o_orderkey);

This defines a table that is the same as in example 2 and 3 above. Note that columns not listed in the grouping clause are grouped together into a column partition by default but, since o_shippriority is the only one not listed in the COLUMN grouping clause, each column is in its own single-column partition.

Example 5

CREATE TABLE Orders (o_orderkey INTEGER NOT NULL, o_custkey INTEGER,



o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL, o_shippriority INTEGER, o_comment VARCHAR(79)) PARTITION BY COLUMN ALL BUT (ROW(o_totalprice, o_comment)), UNIQUE INDEX (o_orderkey);

This defines a table with no primary index and with column partitioning. Each column is in its own partition with system-determined COLUMN format and autocompression except for o_totalprice and o_comment which are grouped together in a partition with user-specified ROW format and autocompression (however, note that autocompression is currently not applied to a column partition with ROW format). A 2-byte partitioning is defined. The maximum number of column partitions is 65534 and the maximum column partition number is 65535. An ADD clause with a value up to 65525 could be specified and the same table would be defined.

Example 6

CREATE TABLE Orders (o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL, o_shippriority INTEGER, o_comment VARCHAR(79)) NO PRIMARY INDEX PARTITION BY COLUMN ADD 68000, UNIQUE INDEX (o_orderkey);

This defines a table with no primary index and with column partitioning. Each column is in its own partition and an 8-byte partitioning is defined. 7 partitions plus 2 internal partitions are defined. The maximum number of column partitions is 9,223,372,036,854,775,80628 and the maximum column partition number is 9,223,372,036,854,775,807.

Example 7

CT rx (ROW( c1 INT, c2 INT)) NO PRIMARY INDEX PARTITION BY COLUMN;

This defines a table with one column partition that has ROW format (subrows).

Example 8

CT cx (( c1 INT, c2 INT)) NO PRIMARY INDEX PARTITION BY COLUMN;

This defines a table with one multicolumn partition that has system-determined COLUMN format (containers). The column partition has two columns and is autocompressed.

Example 9

CREATE TABLE t9 (a INT, b INT, c INT) PARTITION BY ( COLUMN ADD 9994, RANGE_N(a BETWEEN 1 and 10000000 EACH 1), RANGE_N(b BETWEEN 1 AND 92233720 EACH 1) );

This defines a maximum of 9999 column partitions (3 user-specified, 2 for internal use, and 9994 for adding more partitions) and a maximum column partition number of 10000. The maximum combined partition number is 9,223,372,000,000,000,000 (10000 * 10000000 * 92233720) which does not exceed 9,223,372, 036,854,775,807. Though there are excess combined partitions, there are not enough to be able to add a partition to any of the partitioning

28 However, the limit on the number of columns in a table would be reached well before this limit.



levels so the default is ADD 0 for level 2 and 3. Therefore, the number of partitions at level 2 and level 3 cannot be increased by an ALTER TABLE statement.

Example 10

CREATE TABLE t10 (a int, b int, (c int, d int)) PARTITION BY (RANGE_N(a BETWEEN 1 AND 10 EACH 1), COLUMN);

Columns a and b are each in their own single-column partitions with system-determined COLUMN format and autocompression. Columns c and d are in a two-column partition with system-determined COLUMN and autocompression. This results in a table definition for t10 as follows:

CREATE MULTISET TABLE PLS.t10 ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO ( a INTEGER, b INTEGER, c INTEGER, d INTEGER) NO PRIMARY INDEX PARTITION BY ( RANGE_N(a BETWEEN 1 AND 10 EACH 1 ), COLUMN (a,b) ADD 10 );

Then table t10a is created:

CREATE TABLE t10a AS t10 WITH NO DATA;

This results in a table definition for t10a with the same PARTITION BY clause as for t10:

CREATE MULTISET TABLE PLS.t10a ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO ( a INTEGER, b INTEGER, c INTEGER, d INTEGER) NO PRIMARY INDEX PARTITION BY ( RANGE_N(a BETWEEN 1 AND 10 EACH 1 ), COLUMN (a,b) ADD 10 );

Then table t10b is created with grouping specified in the SQL table element list as follows:

CREATE TABLE t10b (a, (b, c), d) AS t10 WITH NO DATA;

This results in a table definition for t10b as follows with the same PARTITION BY clause as for t10 except the grouping is as specified in the SQL table element list of the CREATE TABLE statement above for t10b:

CREATE MULTISET TABLE PLS.t10b ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO ( a INTEGER, b INTEGER, c INTEGER, d INTEGER) NO PRIMARY INDEX PARTITION BY ( RANGE_N(a BETWEEN 1 AND 10 EACH 1 ), COLUMN (a,d) ADD 10);



Example 11

In the following example, as many leftover combined partitions as possible are assigned to the various levels as explained following the CREATE TABLE statement.

CREATE TABLE t11 (a INTEGER, b INTEGER, c INTEGER) PARTITION BY ( COLUMN, -- 3 specified definitions + 2 internal = 5 defined partitions RANGE_N(c BETWEEN 1 AND 10), -- 1 defined partition (note no EACH clause) RANGE_N(b BETWEEN 1 AND 1000 EACH 1) ADD 5 –- 1000 defined partitions );

Level 1 has 3 user-specified column partitions (since 3 columns and no grouping) plus 2 internal partitions for a total of 5 defined partitions. The ADD for level 1 defaults to 10. The maximum number of column partitions for level 1 is 5+10, that is, 15. The maximum column partition number for level 1 is 16 (one more than the maximum number of column partitions).

Level 2 has 1 defined partition (since there is no EACH clause). Level 2 has preliminary maximum partition number of 2 since there is no ADD clause and the maximum partition number for a partitioning level must be at least 2.

Level 3 has a 1000 defined partitions and a preliminary maximum partition number of 1005 (due to the ADD 5).

Since 16*2*1005 is 32160 which does not exceed 65335, this defines a 2-byte partitioning.

Level 2 gets any extra combined partitions that it can take first since it is the first row-partitioning level that doesn't have an explicit ADD clause. This extra allows an ADD 3 since floor(65535/((15+1)*1005)) - #defined for level 2 is equal to 3. This ADD 3 is used instead of the preliminary default of ADD 1. The maximum number of partitions and the maximum partition number for level 2 are both 4.

After giving extra combined partitions to level 2, there are not enough additional extra combined partitions to increase the ADD for level 1 without exceeding the limit on the maximum combined partition number. That is, floor(65535/(4*1005)) = 16.

Level 2 is skipped since it has already been assigned as many extra combined partitions as possible.

For level 3, the maximum partition number can be increased by 18 to 1023 with the excess combined partitions that could not be applied to level 2 or level 1: floor(65535/((15+1)*4)) = floor(65535/64) = 1023. So instead of 5 for the ADD, 23 can be used for the ADD. The maximum number of partitions and the maximum partition number for level 3 are therefore both adjusted to be 1023.

DefinedCombinedPartitions = product of defined partitions at each level = 5*1*1000 = 5000.

MaxCombinedPartitions = product of maximum partitions at each level = 15*4*1023 = 61380.

Maximum Combined Partition Number = product of maximum partition number at each level = 65472. Note that this is still less than 65535 but increasing the maximum partition number even by only 1 for any of the levels would cause this value to exceed 65535.

The following select can be used to obtain information about the partitioning for the table:

SELECT * FROM DBC.TableConstraints WHERE TVMId IN (SELECT TVMId FROM DBC.TVM WHERE TVMNameI = 't11' AND DataBaseId IN (SELECT DatabaseId FROM DBC.DBASE WHERE DatabaseNameI ='test')) ORDER BY TVMId;



The select result:

TVMId 000017070000 Name ? DBaseId 0000F303 TableCheck CHECK (/*03 02 01*/ PARTITION#L1 /*1 5+10*/ =1 AND RANGE_N(c BETWEEN 1 AND 10 ) /*2 1+3*/ IS NOT NULL AND RANGE_N(b BETWEEN 1 AND 1000 EACH 1 ) /*3 1000+23*/ IS NOT NULL ) CreateUID 00000100 CreateTimeStamp 2010-12-12 18:16:38 LastAccessTimeStamp ? AccessCount ? ConstraintType Q IndexNumber 1 ConstraintCollation U CollName ? CharSetID ? SessionMode ? VTCheckType ? TTCheckType ? ResolvedCurrent_Date ? ResolvedCurrent_TimeStamp ? DefinedCombinedPartitions 5000 MaxCombinedPartitions 61380 PartitioningLevels 3 ColumnPartitioningLevel 1

C.4 CREATE JOIN INDEX Statement

The CREATE JOIN INDEX statement can be used to create a CP join index by specifying a COLUMN clause in a PARTITION BY clause. A CP join index must be a single-table, nonaggregate, noncompressed, join index with no primary index, and no value-ordering. A CP join index may optionally be row partitioned. Also, it may be a sparse join index.

The system-derived column ROWID with a column name alias (AS clause or NAMED) must be specified in the select expression list. The alias for ROWID may be used in a column grouping clause specified in a COLUMN clause in the PARTITION BY clause.

Column grouping may be specified in the select list or in the COLUMN clause of a PARTITION BY clause. If column grouping clause is specified in the COLUMN clause, only the name of a column, an alias of an expression, or an alias for ROWID in the select expression list may be referenced in the column grouping.

A join index with or without partitioning is allowed on a table with column partitioning and, optionally, row partitioning.

If a join index is partitioned (including column partitioned), its fallback rows are partitioned the same as its primary data rows.

Note that a CP join index cannot be a unique join index since it cannot have a unique primary or unique secondary index. A NoPI join index must be column partitioned.

When there is a mixed workload where some queries perform better if data is not column partitioned and some where it is column partitioned, or a different column partitioning of the data performs better for different queries, or some queries perform better with the same column partitioning and it is a sparse join index, etc., join indexes allow creation of alternate physical layouts for the data with the optimizer automatically choosing whether to access the base table and/or one of its join indexes. See Chapter 10: Performance Considerations for more information.



Note that a hash index may not be specified on a CP table and a hash index may not be column partitioned; use a join index instead.

Assume the following create table has been done:

CREATE TABLE t1 (a INT, b int, c int, d int) PRIMARY INDEX (a);

Example 1

CREATE JOIN INDEX jt1_1 AS SELECT a, b, d, ROWID as rw FROM t1 PARTITION BY COLUMN;

Example 2

CREATE JOIN INDEX jt1_2 AS SELECT a, b, d FROM t1 PARTITION BY COLUMN;

9480 ROWID must be specified with an alias for a column-partitioned join index.

Example 3

CREATE JOIN INDEX jt1_3 AS SELECT ROWID rw, a, b FROM t1 PARTITION BY COLUMN (ROWID NO AUTO COMPRESS, a, b);

5464 Error in Join Index DDL, A name specified in column grouping was not in the select list.

ROWID is not allowed in the COLUMN clause of PARTITION BY. Instead, use the required alias specified for ROWID in the select list:

CREATE JOIN INDEX jt1_3 AS SELECT ROWID rw, a, b FROM t1 PARTITION BY COLUMN (rw NO AUTO COMPRESS, a, b);

Example 4

CREATE JOIN INDEX jt1_4 AS SELECT ROWID rw, a, (b, d) FROM t1 PARTITION BY RANGE_N(a BETWEEN 1 AND 100 EACH 1);

3732 The facility of no primary index with row partitioning but no column partitioning has not been implemented yet.

Example 5

CREATE JOIN INDEX jt1_5 AS SELECT ROWID rw, a, (b, d) FROM t1 PRIMARY INDEX(a), PARTITION BY RANGE_N(a BETWEEN 1 AND 100 EACH 1);

3706 Syntax error: Column grouping without specifying COLUMN partitioning.

C.5 Replication

Replication is not allowed for a CP table or join index.

A CP table must not be explicitly included in a replication group by one of the following statements:

ALTER REPLICATION GROUP CREATE REPLICATION GROUP CREATE TABLE CREATE TABLE … AS …

If there is an attempt to alter a table that is in a replication group to be column partitioned by an ALTER TABLE … MODIFY statement, an error occurs.

If a CP table or join index is otherwise qualified to be replicated by a replication rule set for a replication group introduced by CREATE/REPLACE REPLICATION RULESET statement, it is not included in the replication group (there is no warning or error in this case).



C.6 ALTER TABLE Statement

In addition to its other capabilities, the ALTER TABLE statement allows removing or adding a primary index for an empty table, adding columns into a column partition, adding column partitions to a CP table, dropping columns from column partitions, dropping column partitions, modifying the partitioning of a CP table, and revalidating a table or join index.

C.6.1 Adding Columns to a Table

To add a column to a column partition, use existing syntax extended with the INTO clause that specifies the name of a column in the column partition to which the new column is to be added. For example,

ALTER TABLE x ADD storeid INT INTO store_name;

To add multiple columns to a column partition, group the columns with parentheses and specify the INTO clause. For example,

ALTER TABLE x ADD (storeaddr INT, storecd INT) INTO store_name;

To add a single-column partition, use existing syntax to add the column. For example,

ALTER TABLE x ADD store_status INT;

You can group columns to add a multicolumn partition. For example,

ALTER TABLE x ADD (storezip INT, storemgr INT, storeregion INT);

The default for a new column partition is system-determined COLUMN/ROW format and autocompression. You may optionally specify COLUMN/ROW format or NO AUTO COMPRESS to override the defaults (see Chapter 11: Tuning Opportunities). For example,

ALTER TABLE x ADD ROW(store_location VARCHAR(100)) NO AUTO COMPRESS;

Note that you cannot exceed the table’s maximum number of column partitions or the maximum number of columns allowed for a table.

If columns are added to a CP table, CHARACTER SET KANJI1 is not allowed. It is recommended to use CHARACTER SET UNICODE instead.

See section 6.4 if you are considering updating the values of added columns.

If columns are added to a column partition and the column partition has a system-determined format, the column partition format (COLUMN or ROW) is redetermined by the system. If columns are added to a column partition and the column partition has user-specified COLUMN or ROW format, the format is unchanged for the altered column partition.

If column partitions are also dropped in the same ALTER TABLE statement, column partitions to be dropped are dropped before adding any column partitions in order to free up column partition numbers for any added column partitions.

Note that column partition numbers might not correspond to the order that the column partitions were defined.

Note that, with or without an INTO clause, an added column is added as the last column in regard to an * that is used to select all columns from the table as is done currently. That is, the column is assigned the next FieldId.

C.6.2 Dropping Columns from a Column-Partitioned Table

The existing syntax is used to drop a column.



For example,

ALTER TABLE x DROP storezip;

To drop a column partition, use existing syntax to drop each of the columns in the column partition. For example,

ALTER TABLE x DROP storemgr, DROP storeregion;

If column partitions are also added in the same ALTER TABLE statement, column partitions to be dropped are dropped before adding column partitions.

Note that all the existing columns can be dropped from a column partition and the partition is not dropped if new columns are added to the column partition in the same ALTER TABLE statement.

If columns are dropped from a column partition, there are other columns in the altered column partition, and the column partition has system-determined column partition format, the column partition format (COLUMN or ROW) is redetermined by the system. If the column partition has user-specified COLUMN or ROW format, the format is not changed.

C.6.3 RI Error Table

When a new referential constraint is added to a table, the system generates a referential integrity (RI) error table. For a child CP table, this RI error table is a NoPI table (with neither column nor row partitioning).

C.6.4 REVALIDATE

The REVALIDATE option regenerates the table headers for a table or join index that has partitioning. Also, the REVALIDATE option verifies and corrects the row partitioning of rows in a table if the WITH DELETE or WITH INSERT option is specified (note that a WITH option is not allowed for a join index). Also, some columns in the data dictionary are revalidated and corrected as needed for a table or join index (whether partitioned or not).

1. If the table or join index has a primary index, PRIMARY INDEX may be specified following REVALIDATE (but is not required to be specified).

2. If a WITH clause is specified for a table, the table must have row partitioning.

3. A WITH clause may not be specified for a join index.

4. PIColumnCount of the DBC.TVM row for the table or join index with a primary index is set to the number of primary index columns if it is currently 0.

5. The PartitioningLevels of the DBC.TVM row for the table or join index with (row and/or column) partitioning is set to the number of partitioning levels if it is currently 0.

6. The PartitioningColumn of DBC.TVFields row for a partitioning column of the table or join index is set to ‘Y’ if it is currently ‘N’.

7. The DefinedCombinedPartitions, MaxCombinedPartitions, and PartitioningLevels of the partitioning constraint row in DBC.TableConstraints for a partitioned table or join index are set to their appropriate values if they are zero.

After an upgrade, the data dictionary columns above are set to 0 for tables and join indexes created before these columns were added to the data dictionary. ALTER TABLE ... REVALIDATE can be submitted to correct the values of these columns for a table or join index.



C.6.5 MODIFY Primary Index and/or Partitioning

ALTER TABLE … MODIFY statement may be used to alter the primary index (including to specify NO PRIMARY INDEX) and/or the partitioning (including specifying NOT PARTITIONED). Other than for allowed forms of dropping and adding ranges of a RANGE_N partitioning, the table must be empty to do the modification.

C.6.6 ALTER TABLE ... TO CURRENT Statement

For an ALTER TABLE … TO CURRENT statement, new partitioning expressions with newly resolved current date and timestamp are defined.

Note that there is an impact to performance if the table or join index has column partitioning in addition to row partitioning since it can be more costly to move a row from one row partition to another.

C.6.7 ALTER TABLE Examples

Note that, in the following examples, a resulting definition is not necessarily what SHOW TABLE would display but it would create an equivalent table.

Example 1

CREATE TABLE Orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_ordertsz TIMESTAMP(6) WITH TIME ZONE NOT NULL, o_comment VARCHAR(79) ) UNIQUE PRIMARY INDEX(o_orderkey);

INSERT INTO Orders (1, NULL, NULL, 12.99, CURRENT_TIMESTAMP, NULL);

Attempt to alter the table to not have a primary index and to add column and row partitioning:

ALTER TABLE Orders MODIFY NO PRIMARY INDEX PARTITION BY ( COLUMN ADD 5, RANGE_N(o_ordertsz BETWEEN TIMESTAMP '2003-01-01 00:00:00.000000+00:00' AND TIMESTAMP '2009-12-31 23:59:59.999999+00:00' EACH INTERVAL '1' MONTH) );

5735 The primary index columns may not be altered for a nonempty table.

Example 2

The following deletes the rows in the table, alters the table to not have a primary index, alters the table to add column and row partitioning, and adds a secondary index:

DELETE Orders ALL; -- the table must be emptied in order to do the following ALTER

ALTER TABLE Orders MODIFY NO PRIMARY INDEX PARTITION BY ( COLUMN ADD 5, RANGE_N(o_ordertsz BETWEEN TIMESTAMP '2003-01-01 00:00:00.000000+00:00' AND TIMESTAMP '2009-12-31 23:59:59.999999+00:00' EACH INTERVAL '1' MONTH) );

CREATE UNIQUE INDEX(o_orderkey) on Orders;

This results in a table definition as follows:

CREATE TABLE Orders ( o_orderkey INTEGER NOT NULL,



o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_ordertsz TIMESTAMP(6) WITH TIME ZONE NOT NULL, o_comment VARCHAR(79) ) PARTITION BY ( COLUMN ADD 5, RANGE_N(o_ordertsz BETWEEN TIMESTAMP '2003-01-01 00:00:00.000000+00:00' AND TIMESTAMP '2009-12-31 23:59:59.999999+00:00' EACH INTERVAL '1' MONTH) ), UNIQUE INDEX(o_orderkey);

Each column is in its own single-column partition with system-determined COLUMN format and autocompression.

Example 3

Either of the two following ALTER TABLE statements adds a single-column partition (with system-determined COLUMN format and autocompression) to the table:

ALTER TABLE Orders ADD (o_salesperson VARCHAR(5));

ALTER TABLE Orders ADD o_salesperson VARCHAR(5);

These both result in a table definition as follows (with o_salesperson set to NULL in each row of the table):

CREATE TABLE Orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_ordertsz TIMESTAMP(6) WITH TIME ZONE NOT NULL, o_comment VARCHAR(79), o_salesperson VARCHAR(5) ) PARTITION BY ( COLUMN ADD 4, RANGE_N(o_ordertsz BETWEEN TIMESTAMP '2003-01-01 00:00:00.000000+00:00' AND TIMESTAMP '2009-12-31 23:59:59.999999+00:00' EACH INTERVAL '1' MONTH) ), UNIQUE INDEX(o_orderkey);

Example 4

Either of the two following ALTER TABLE statements adds a column into the single-column partition for o_orderstatus and modifies the partition to be a multicolumn partition (with system-determined COLUMN format and autocompression):

ALTER TABLE Orders ADD o_ordersubstatus CHAR(1) CASESPECIFIC INTO o_orderstatus;

ALTER TABLE Orders ADD (o_ordersubstatus CHAR(1) CASESPECIFIC) INTO o_orderstatus;

These both result in a table definition as follows:

CREATE TABLE Orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_ordertsz TIMESTAMP(6) WITH TIME ZONE NOT NULL, o_comment VARCHAR(79), o_salesperson VARCHAR(5), o_ordersubstatus CHAR(1) CASESPECIFIC ) PARTITION BY ( COLUMN ALL BUT ((o_orderstatus, o_ordersubstatus)) ADD 4, RANGE_N(o_ordertsz BETWEEN TIMESTAMP '2003-01-01 00:00:00.000000+00:00'



AND TIMESTAMP '2009-12-31 23:59:59.999999+00:00' EACH INTERVAL '1' MONTH) ), UNIQUE INDEX(o_orderkey);

Example 5

Any of the following ALTER TABLE statement can be used to add two columns into the single-column partition for o_comment and modify the partition to be a multicolumn partition (with system-determined COLUMN format and autocompression):

ALTER TABLE Orders ADD (o_comment_ext1 VARCHAR(79), o_comment_ext2 VARCHAR(79)) INTO o_comment;

ALTER TABLE Orders ADD o_comment_ext1 VARCHAR(79) INTO o_comment, ADD o_comment_ext2 VARCHAR(79) INTO o_comment;

ALTER TABLE Orders ADD (o_comment_ext1 VARCHAR(79)) INTO o_comment, ADD (o_comment_ext2 VARCHAR(79)) INTO o_comment;

ALTER TABLE Orders ADD (o_comment_ext1 VARCHAR(79)) INTO o_comment, ADD o_comment_ext2 VARCHAR(79) INTO o_comment;

ALTER TABLE Orders ADD o_comment_ext1 VARCHAR(79) INTO o_comment, ADD (o_comment_ext2 VARCHAR(79)) INTO o_comment;

These all result in a table definition as follows:

CREATE TABLE Orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_ordertsz TIMESTAMP(6) WITH TIME ZONE NOT NULL, o_comment VARCHAR(79), o_salesperson VARCHAR(5), o_ordersubstatus CHAR(1) CASESPECIFIC, o_comment_ext1 VARCHAR(79), o_comment_ext2 VARCHAR(79) ) PARTITION BY ( COLUMN ALL BUT ( (o_orderstatus, o_ordersubstatus), (o_comment, o_comment_ext1, o_comment_ext2) ) ADD 4, RANGE_N(o_ordertsz BETWEEN TIMESTAMP '2003-01-01 00:00:00.000000+00:00' AND TIMESTAMP '2009-12-31 23:59:59.999999+00:00' EACH INTERVAL '1' MONTH) ), UNIQUE INDEX(o_orderkey);

Example 6

The following ALTER TABLE statement adds a two-column partition to the table (with user-specified ROW format and no autocompression):

ALTER TABLE Orders ADD ROW(o_ship_addr VARCHAR(500), o_bill_addr VARCHAR(200)) NO AUTO COMPRESS;

This results in a table definition as follows (the new columns are set to NULL in each row of the table):

CREATE TABLE Orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_ordertsz TIMESTAMP(6) WITH TIME ZONE NOT NULL, o_comment VARCHAR(79), o_salesperson VARCHAR(5), o_ordersubstatus CHAR(1) CASESPECIFIC,



o_comment_ext1 VARCHAR(79), o_comment_ext2 VARCHAR(79), o_ship_addr VARCHAR(500), o_bill_addr VARCHAR(200) ) PARTITION BY ( COLUMN ALL BUT ( (o_ o_orderstatus, o_ordersubstatus), (o_comment, o_comment_ext1, o_comment_ext2), ROW(o_ship_addr, o_bill_addr) NO AUTO COMPRESS) ) ADD 3, RANGE_N(o_ordertsz BETWEEN TIMESTAMP '2003-01-01 00:00:00.000000+00:00' AND TIMESTAMP '2009-12-31 23:59:59.999999+00:00' EACH INTERVAL '1' MONTH) ), UNIQUE INDEX(o_orderkey);

Example 7

Either of the two following ALTER TABLE statements adds a two-column partition and a single-column partition to the table (with user-specified COLUMN format for the first, system-determined COLUMN format for the second, and autocompression for both):

ALTER TABLE Orders ADD COLUMN(o_alt_ship_addr VARCHAR(500), o_alt_bill_addr VARCHAR(200)), ADD (o_item_count INTEGER);

ALTER TABLE Orders ADD COLUMN(o_alt_ship_addr VARCHAR(500), o_alt_bill_addr VARCHAR(200)), ADD o_item_count INTEGER;

These both result in a table definition as follows:

CREATE TABLE Orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_ordertsz TIMESTAMP(6) WITH TIME ZONE NOT NULL, o_comment VARCHAR(79), o_salesperson VARCHAR(5), o_ordersubstatus CHAR(1) CASESPECIFIC, o_comment_ext1 VARCHAR(79), o_comment_ext2 VARCHAR(79), o_ship_addr VARCHAR(500), o_bill_addr VARCHAR(200), o_alt_ship_addr VARCHAR(500), o_alt_bill_addr VARCHAR(200), o_item_count INTEGER ) PARTITION BY ( COLUMN ALL BUT ( (o_orderstatus, o_ordersubstatus), (o_comment, o_comment_ext1, o_comment_ext2), ROW(o_ship_addr, o_bill_addr) NO AUTO COMPRESS, COLUMN(o_alt_ship_addr, o_alt_bill_addr) ) ADD 1, RANGE_N(o_ordertsz BETWEEN TIMESTAMP '2003-01-01 00:00:00.000000+00:00' AND TIMESTAMP '2009-12-31 23:59:59.999999+00:00' EACH INTERVAL '1' MONTH) ), UNIQUE INDEX (o_orderkey);

Example 8

The following ALTER TABLE statement drops a column from a three-column partition and modifies the partition to be a two-column partition (with system-determined COLUMN format and autocompression):

ALTER TABLE Orders DROP o_comment_ext2;




CREATE TABLE Orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_ordertsz TIMESTAMP(6) WITH TIME ZONE NOT NULL, o_comment VARCHAR(79), o_salesperson VARCHAR(5), o_ordersubstatus CHAR(1) CASESPECIFIC, o_comment_ext1 VARCHAR(79), o_ship_addr VARCHAR(500), o_bill_addr VARCHAR(200), o_alt_ship_addr VARCHAR(500), o_alt_bill_addr VARCHAR(200), o_item_count INTEGER ) PARTITION BY ( COLUMN ALL BUT ( (o_orderstatus, o_ordersubstatus), (o_comment, o_comment_ext1), ROW(o_ship_addr, o_bill_addr) NO AUTO COMPRESS, COLUMN(o_alt_ship_addr, o_alt_bill_addr) ) ADD 1, RANGE_N(o_ordertsz BETWEEN TIMESTAMP '2003-01-01 00:00:00.000000+00:00' AND TIMESTAMP '2009-12-31 23:59:59.999999+00:00' EACH INTERVAL '1' MONTH) ), UNIQUE INDEX(o_orderkey);

Example 9

The following ALTER TABLE statement drops a column from a two-column partition and modifies the partition to be a single-column partition (with system-determined COLUMN format and autocompression):

ALTER TABLE Orders DROP o_comment;


CREATE TABLE Orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_ordertsz TIMESTAMP(6) WITH TIME ZONE NOT NULL, o_salesperson VARCHAR(5), o_ordersubstatus CHAR(1) CASESPECIFIC, o_comment_ext1 VARCHAR(79), o_ship_addr VARCHAR(500), o_bill_addr VARCHAR(200), o_alt_ship_addr VARCHAR(500), o_alt_bill_addr VARCHAR(200), o_item_count INTEGER ) PARTITION BY ( COLUMN ALL BUT ( (o_orderstatus, o_ordersubstatus), ROW(o_ship_addr, o_bill_addr) NO AUTO COMPRESS, COLUMN(o_alt_ship_addr, o_alt_bill_addr) ) ADD 1, RANGE_N(o_ordertsz BETWEEN TIMESTAMP '2003-01-01 00:00:00.000000+00:00' AND TIMESTAMP '2009-12-31 23:59:59.999999+00:00' EACH INTERVAL '1' MONTH) ), UNIQUE INDEX(o_orderkey);



Example 10

The following ALTER TABLE statement drops a column from a partition and, since it is the last column in the partition, the partition is also dropped:

ALTER TABLE Orders DROP o_comment_ext1;


CREATE TABLE Orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_ordertsz TIMESTAMP(6) WITH TIME ZONE NOT NULL, o_salesperson VARCHAR(5), o_ordersubstatus CHAR(1) CASESPECIFIC, o_ship_addr VARCHAR(500), o_bill_addr VARCHAR(200), o_alt_ship_addr VARCHAR(500), o_alt_bill_addr VARCHAR(200), o_item_count INTEGER ) PARTITION BY ( COLUMN ALL BUT ( (o_orderstatus, o_ordersubstatus), ROW(o_ship_addr, o_bill_addr) NO AUTO COMPRESS, COLUMN(o_alt_ship_addr, o_alt_bill_addr) ) ADD 2, RANGE_N(o_ordertsz BETWEEN TIMESTAMP '2003-01-01 00:00:00.000000+00:00' AND TIMESTAMP '2009-12-31 23:59:59.999999+00:00' EACH INTERVAL '1' MONTH) ), UNIQUE INDEX(o_orderkey);

Example 11

The following ALTER TABLE statement drops two columns from a partition and, since that would leave the partition with no columns, the partition is also dropped:

ALTER TABLE Orders DROP o_alt_ship_addr, DROP o_alt_bill_addr;


CREATE TABLE Orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1) CASESPECIFIC, o_totalprice DECIMAL(13,2) NOT NULL, o_ordertsz TIMESTAMP(6) WITH TIME ZONE NOT NULL, o_salesperson VARCHAR(5), o_ordersubstatus CHAR(1) CASESPECIFIC, o_ship_addr VARCHAR(500), o_bill_addr VARCHAR(200), o_item_count INTEGER ) PARTITION BY ( COLUMN ALL BUT ( (o_orderstatus, o_ordersubstatus), ROW(o_ship_addr, o_bill_addr) NO AUTO COMPRESS ) ADD 3, RANGE_N(o_ordertsz BETWEEN TIMESTAMP '2003-01-01 00:00:00.000000+00:00' AND TIMESTAMP '2009-12-31 23:59:59.999999+00:00' EACH INTERVAL '1' MONTH) ), UNIQUE INDEX(o_orderkey);



C.7 COLLECT/DROP/HELP STATISTICS Statements

These statements are applicable for a CP table or join index, optionally, with row partitioning.

In regard to the system-derived column PARTITION, the value for the column partition number is always one so collected statistics are for the number of logical rows in each combined row partition if there is at least one row-partitioning level for the table. If there is a column-partitioning level, requesting the collection of statistics for the system-derived column PARTITION causes the statistics to be collected for the system-derived column PARTITION#Ln where n corresponds to the column-partitioning level (see next paragraph); in this case, if there is no row partitioning, statistics for PARTITION are not collected but, if there is also row partitioning, statistics for PARTITION are collected.

Collecting statistics on a system-derived column PARTITION#Ln is allowed when n corresponds to a column-partitioning level of the table. This collects the compression ratio of each column partition instead of getting row counts.

Collecting statistics on a system-derived column PARTITION#Ln is not allowed when n corresponds to a row-partitioning level of the table.



Appendix D: Loading a Column-Partitioned Table

The expected method of populating a CP table is an INSERT-SELECT statement from one or more source tables. If the data is from an external source, FastLoad can be used to load the data to a staging table29 and then populate the CP table with an INSERT-SELECT statement. Alternatively, to populate a CP table from an external source, TPump array INSERT may be used to insert data into the CP table. However, TPump is not expected to be as efficient as a FastLoad and INSERT-SELECT statement.

Single-row INSERT statements are generally not recommended except possibly in very low volume; these statements can incur a large degradation in performance due to transforming the row to append a column partition value to each of the column partitions.

Inserting data into a CP table is expected to gain some efficiency over a PI table because the data is just appended to the end of column partitions or combined partitions and the source rows are not required to be in any particular order. However, this can be negatively offset for a CP table by the transformation from rows to columns. Bulk insert via an INSERT-SELECT statement or TPump array INSERT can minimize this impact for CP tables since they can apply the transformation of multiple rows to columns as a set instead of individually. TPump array insert can group as many data rows as allowed in an AMP step because the rows are not required to be sent to a specific AMP.

The transformation from rows to columns performs better if the number of column partitions (not including the delete column partition) does not exceed the number of available column partition contexts; otherwise, additional scans of the source data may be needed.

For INSERT-SELECT (without a HASH BY clause) into a NoPI (including a CP) target table, data from the source table or spool is not redistributed and is locally appended into the target NoPI table; this results in skew in the target table if the source is skewed (see section F.4.3) for more information about skew). The source can be skewed either after a series of joins or after applying single-table predicates. The HASH BY option can be used to redistribute the rows to a NoPI table (including a CP table) in order to avoid skewing by specifying RANDOM or by properly choosing expressions by which to hash (see section 6.1).

Fallback and index maintenance on a CP table is done the same way as that on a PI table.

Many considerations for inserting data into a NoPI table without column partitioning apply to a CP table since it also a NoPI table:

1. Without a primary index, rows are not hashed based on any column in a CP table. However, a hash bucket is still generated for each row in a CP table by selecting a hash bucket that an AMP owns from the NoPI hash map. This hash bucket is then used to generate a rowid.30 This strategy helps make fallback and index maintenance similar to the maintenance on a PI table.

2. For single-statement INSERT, multistatement INSERT, and array INSERT into a CP table, rows are sent to the AMPs through a random number generator. This generator is designed in such a way that for a new request, data is generally sent to a different AMP

29 Usually this would be a nonpartitioned NoPI table, though a PI or PPI may be used in some cases. 30 Each physical row stored in a table has a row header that includes a rowid that is a unique identifier of

that row in the table. For an explanation of rowids, see section 3.3 and section 3.4.



from the one that the previous request sent data. The idea is to try to balance out the data among the AMPs as much as possible without the use of a primary index.

3. For INSERT-SELECT into a CP target table, the SELECT part can be a simple SELECT (retrieving all data from the source table) or a complex SELECT (retrieving data from one or more source tables). Spooling the source table before inserting can be avoided for a simple SELECT in some cases.

4. Data can be sent to any AMP in whole blocks prior to inserting into a target CP table. There is no data redistribution for each individual row. This is beneficial for TPump Array INSERT.

5. The hash bucket for each row in a CP table is internally controlled and generated. Because of that, values from the rows are always appended at the end of the column partitions (if the table does not also have row partitioning) or combined partition (if the table also has row partitioning) and never inserted in the middle of a hash bucket. Sorting of the rows is therefore avoided if the CP table does not have row partitioning.

For bulk INSERT, there are two main performance advantages:

1. No sorting (if there is no row partitioning)

Note that, although TPump Array INSERT does not have an explicit sort phase of rows in the array that is being inserted into a PI table, there is an implicit in-memory sort of the data by the File System. This sort work is also avoided when data is appended to the end of the table for a NoPI or CP table that does not have row partitioning.

2. Data can be sent to and stored on any AMP

Another performance advantage is that there is no requirement that a NoPI row must be stored on any particular AMP.31 This is very useful for TPump Array INSERT because data sent in a buffer from TPump can all be combined into the same step going to the same AMP for insertion. On a PI table, data is generally split out into multiple steps targeting the AMP destination to which the rows are hashed. The performance impact increases as the size of a system, in term of the number of AMPs, increases. With fewer steps to process, CPU and I/O are reduced.

For single-row INSERT, the performance for a NoPI table compared to a PI table is not much different. Appending a row into a NoPI table is perhaps a little more efficient than inserting a row into a PI table in the middle of a hash. But the end-transaction processing including the flushing of the WAL log still remains the same. And for a CP table, there is the additional work to transform the row into the column partitions.

An INSERT-SELECT is handled in one of the following ways depending on the source, target, and number of available column partition contexts:

1. If the source is a NoPI table (without column partitioning) that is the same as a target row-partitioned CP table other than for the partitioning and the options HASH BY/LOCAL ORDER BY are not specified for the INSERT-SELECT, a number of data blocks are read at a time into a cache (which is sized based on the number of available column partition contexts). The rows in memory are sorted by the row partitioning of the target CP table. All the values of one column partition at a time are appended to their respective combined partitions. This is repeated until all the data blocks of the source NoPI table have been read. Note that this can work well when the cache is large

31 For a PI table, data is hashed by the PI value. The hash bucket portion of the generated hash value

determines on which AMP a row is stored.



enough (this avoids re-reading the source if there are insufficient available column partition contexts); however, if the cache is not large enough, performance can degrade sharply if not enough data is written to each column partition before having to write to another column partition. If this occurs, a user can specify HASH BY/LOCAL ORDER BY for the INSERT-SELECT (see section 6.1) to cause the NoPI table to be spooled and one of the following two methods to be used.

2. Otherwise, if there are sufficient available column partition contexts for the target CP table (see section 11.4), the source rows (after spooling as needed) are read a block-at-a-time, and then for each row in the block, and then for each column partition, a column partition value is built and appended to the last container (if not one, a new container is started) of the corresponding internal partition or, if the column partition has ROW format, the subrow is written out. A new container is started for the column partition when the current container is full or the combined row partition number, if any, changes. The buffer of containers is written when it is full.

3. Otherwise, if there are insufficient available column partition contexts, the source is read for one set of column partitions and then read again for another set of column partitions until all the column partition values have been appended. This eliminates multiple writes of the last container of internal partitions at the cost of multiple reads of the source but reading is much less costly than writing.

D.1 Load Utilities

The load and unload utilities other than FastLoad and MultiLoad are supported for a no primary index table with column partitioning and, optionally, row partitioning in the same manner and with the same usage rules as for a no primary index table and, in regard to row partitioning, a row-partitioned table.

FastLoad is not supported for a CP table since it would require a staging area to store the rows before splitting them up into column partitions.

MultiLoad is not supported for a CP table since it is also a NoPI table and a target table must have primary index in order to perform an IMPORT task. Note that MultiLoad does not use the primary index of the target table in order to perform a DELETE task32 but MultiLoad does not currently allow a CP or NoPI table as the target table in this case.

Note that FastLoad, MultiLoad, or TPump could be used to load into a staging table and then the data from the staging table can be moved into a CP table using an INSERT-SELECT statement.

D.2 TPump Array INSERT into a CP Table

TPump Array INSERT uses a specialized SQL Server-Client protocol that allows multiple data parcels to be passed in a single request. When the Server detects the presence of multiple data parcels in a request, it iterates the execution of the request for each data parcel by binding the request’s user data references on each iteration to the data record passed in the corresponding data parcel.

32 A full-table scan is used to search for the rows to be deleted.



There are two main performance benefits from using Array INSERT:

1. Request-Text Size

The request text in an iterated request such as an Array INSERT request specifies one single instance of an INSERT statement and therefore is independent of the number of times the INSERT operation is iterated. When a USING modifier is used, the USING clause only specifies the fields that are referenced by one single iteration. Therefore, the size of the request text for an iterated request can be greatly reduced when compared to an equivalent multistatement INSERT request. The smaller request text reduces the cost of building the request from the Client and sending it to the Server. The smaller request text also allows more data parcels per request and requires less cache space in the Server and helps reduce the cost of searching the cache.

2. Multiple-Row INSERT

For an Array INSERT request, multiple rows can be grouped together by AMP ownership into a single AMP step. The processing of the rows is optimized by performing the inserts in the same step instead of multiple steps. This helps reduce both CPU path (because less steps are generated and sent/received between the PE and the AMP) and I/O (because more rows can be inserted with one single I/O call). The performance impact is directly dependent on the number of rows that can be grouped together in the same AMP step. That depends on several factors:

a) The total number of data records in the request – a higher PACK factor leads to more rows in a request which leads to an increase in the average number of rows in the same step.

b) The system configuration, particularly the number of AMPs – smaller systems with fewer AMPs increase the average number of rows in the same step with the same PACK factor. As the number of AMPs increases, the average goes down.

c) The clustering of inserts with same NUPI values – Inserts into NUPI table with a lot of duplicate NUPI values makes the average go up. This is because rows with the same NUPI value all go to the same AMP.

Since rows in a CP or NoPI table are not hashed based on a primary index, they can go to any AMP as desired. To maximize this benefit, TPump Array INSERT on a CP or NoPI table always packs as many rows as there are in the request from the Client into the same AMP step. These rows are sent to the same AMP in one INSERT step. This is independent of the system configuration and the clustering of data. Note that the sending of the rows in a TPump job that are to be inserted into a CP or NoPI table uses a random number generator that determines the AMP destination for each INSERT step. The random number generator is designed to choose a different AMP from the one that the previous request sent data so that data is spread out across the AMPs to avoid skewing.

With a constant PACK factor, the number of rows that are packed into the same AMP step on a PI table depends on the number of AMPs in the system and the clustering of data. As the number of AMPs increases and the clustering of data decreases, the number of rows that are packed into the same AMP step decreases. For a UPI table, the clustering of data is generally minimal. For a NUPI table, the clustering of data can be high and in that case, the number of rows that are packed into the same AMP step is also high independent of the number of AMPs in the system. Therefore, Array INSERT has the most benefit when comparing performance on a CP or NoPI table against a UPI table on large systems with many AMPs. Array INSERT does



not have as much benefit when comparing performance on a CP or NoPI table against a NUPI table with high clustering of data.

For a CP table without row partitioning, any improvement in performance can be offset by the processing to split the rows to be inserted into column partition values and append those values to the ends of the corresponding column partitions.

For a CP table with row partitioning, the rows to be inserted must be sorted in memory and then the rows must be split into column partition values which are appended to the end of the internal combined partitions to which they belong.

The performance of transforming rows into columns and appending column partition values is affected by whether there are at least as many available column partition contexts as column partitions and the number of column partition values (the more the better) being inserted into a combined partition at one time. PPICacheThrP (see section 11.4) is used by the optimizer to determine the number of available column partition contexts that can be used at one time to append column partition values to their column partitions.

D.2.1 SERIALIZE

Setting the SERIALIZE option to ON in the BEGIN LOAD statement serves two purposes:

1. Order of Data Application – rows are applied in the order that they occur in the input data source. This is done by using the KEY option to specify the primary index of the table to force rows with the same primary index value to go into the same session.

2. Avoid Hash Lock Contention – by forcing rows with the same primary index value to go into the same session, hash lock contention among multiple sessions can be reduced.

The SERIALIZE option is mostly important to a NUPI table, especially with highly nonunique data. There is some additional CPU cost on the Client side when SERIALIZE is set to ON.

For a CP or NoPI table, the traditional hash lock contention no longer applies since there is no PI on the table. If the order of data application is not important, SERIALIZE should always be set to OFF.

D.2.2 TPump Sessions

A hash lock on a CP or NoPI table usually locks all the rows on an AMP (see section 5.5). Typically, multiple TPump sessions running on the same AMP on the same CP or NoPI table block each other. Therefore, it is recommended that the number of TPump sessions is not more than the number of AMPs in the system.



Appendix E: EXPLAIN Phrases and Examples

This appendix provides additional information on EXPLAIN terminology and examples.

E.1 EXPLAIN Phrases

The following EXPLAIN phrases are used when referencing a CP table or join index. Some of the descriptions for these phrases mention the delete partition; for more information on the delete partition, see section 6.3. See section 11.4 for information about PPICacheThrP.

a single column partition of

This phrase indicates that one column partition of the table or join index may need to be accessed. For this phrase to occur, the CP table or join index would not have row partitioning and would be accessed via rowids from an index or rowid spool. A “using rowid Spool” phrase may follow the table or join index name; in this case, one column partition is accessed when using the rowid spool while m2 in the “using rowid Spool” phrase is the number of column partitions accessed to build the rowid spool.

This phrase is used in steps such as RETRIEVE, DELETE, JOIN, MERGE, etc. that may read a column-partitioned source.

For a delete, storage for any deleted rows (other than for LOBs) is not immediately reclaimed. A subsequent fastpath delete of all the rows of the table or join index reclaims the storage of all deleted rows in the table (including previously deleted rows).

m1 column partitions of

This phrase indicates that up to m1 column partitions of the CP table or join index may need to be accessed. There are column partition contexts available for each of the m1 column partitions. For this phrase to occur, the CP table or join index would not have row partitioning. m1 is 2 or greater. One of the column partitions accessed may be the delete column partition. Not all of the m1 column partitions may actually need to be accessed if no rows qualify. A “using rowid Spool” phrase may follow the table or join index name; in this case, m1 is the number of column partitions accessed when using the rowid spool while m2 in the “using rowid Spool” phrase is the number of column partitions accessed to build the rowid spool.


For a delete, storage for any deleted rows (other than for LOBs) is not immediately reclaimed. A subsequent fastpath delete of all the rows of the table or join index reclaims the storage of all deleted rows in the table (including previously deleted rows).

m1 column partitions (c1 contexts) of

This phrase indicates that up to m1 column partitions of the CP table or join index may need to be accessed using c1 column partition contexts. For this phrase to occur, the CP table or join index would not have row partitioning. m1 is 2 or greater. c1 is between 2 and m1-1 and is equal to or one less than the number of available column partition contexts. One of the column partitions accessed may be the delete column partition. Not all of the m1 column partitions may actually need to be accessed if no rows qualify. A “using CP merge Spool” or “using covering CP merge Spool” phrase follows the table or join index name.



This phrase is used in steps such as RETRIEVE, DELETE, JOIN, MERGE, etc. that may read a column-partitioned source. To access the m1 column partitions, columns from the column partitions are merged up to c1 column partitions at a time into a CP merge spool until all the projected columns have been retrieved for the qualifying rows. Note that one or more of the merges may require accessing the delete column partition; for each such merge, the delete column partition is included in m1.

Note that performance can degrade if c1 is much less than m1 and the retrieve is not very selective. In this case, consider decreasing the number of column partitions that need to be accessed, combining column partitions so there are fewer column partitions, or increasing PPICacheThrP if there is enough available memory to do so.

For a delete, storage for any deleted rows (other than for LOBs) is not immediately reclaimed. A subsequent fastpath delete of all rows of the table or join index reclaims the storage of all deleted rows in the table (including previously deleted rows).

a single combined partition of

This phrase indicates that one column partition of one row partition of the table or join index may need to be accessed. For this phrase to occur, the CP table or join index would have both row and column partitioning; also, the CP table or join index would be accessed via rowids. A “using rowid Spool” phrase may follow the table or join index name; in this case, one column partition is accessed when using the rowid spool while m2 in the “using rowid Spool” phrase is the number of column partitions accessed to build the rowid spool.

This phrase is used in steps such as RETRIEVE, JOIN, MERGE, etc. that may read a column-partitioned source. This would not occur for a delete step since there would be at least one user-specified column partition and the delete column partition that would need to be accessed.

n1 combined partitions (one column partition) of

This phrase indicates that one column partition of multiple row partitions of the table or join index may need to be accessed. For this phrase to occur, the CP table or join index would have both row and column partitioning; also, the CP table or join index would be accessed via rowids. A “using rowid Spool” phrase may follow the table or join index name; in this case, one column partition is accessed when using the rowid spool while m2 in the “using rowid Spool” phrase is the number of column partitions accessed to build the rowid spool.

This phrase is used in steps such as RETRIEVE, JOIN, MERGE, etc. that may read a CP source. This would not occur for a delete step since there would be at least one user-specified column partition and the delete column partition that are being accessed.

n1 combined partitions (m1 column partitions) of

This phrase indicates that the rows and columns for up to n1 combined partitions of the table or join index may need to be accessed. For the column-partitioning level, m1 column partitions may need to be accessed. There are column partition contexts available for each of the m1 column partitions. For this phrase to occur, the CP table or join index would have both row and column partitioning. One of the column partitions accessed may be the delete column partition. Not all of the m1 column partitions may need to be actually accessed if no rows qualify. A “using rowid Spool” phrase may follow the table or join index name; in this case, m1 is the number of column partitions



accessed when using the rowid spool while m2 in the “using rowid Spool” phrase is the number of column partitions accessed to build the rowid spool.


For a delete, storage for any deleted rows (other than for LOBs) is not immediately reclaimed. A subsequent fastpath delete of all the rows in a row partition reclaims the storage of all deleted rows in that row partition (including previously deleted rows in that partition). A subsequent fastpath delete of all the rows of the table or join index reclaims the storage of all deleted rows in the table (including previously deleted rows).

n1 combined partitions (m1 column partitions and c1 contexts) of

This phrase indicates that the rows and columns for up to n1 combined partitions of the table or join index may need to be accessed. For the column-partitioning level, m1 column partitions may need to be accessed. For this phrase to occur, the CP table or join index would have both row and column partitioning. For the column-partitioning level, c1 column partition contexts are used to access up to m1 column partitions. n1 and m1 are 2 or greater. c1 is between 2 and m1-1 and is equal to or one less than the number of available column partition contexts. One of the column partitions accessed may be the delete column partition. Not all of the m1 column partitions may need to be actually accessed if no rows qualify. A “using CP merge Spool” or “using covering CP merge Spool” phrase follows the table or join index name.

This phrase is used in steps such as RETRIEVE, DELETE, JOIN, MERGE, etc. that may read a column-partitioned source. To access the m1 column partitions, columns from the column partitions are merged up to c1 column partitions at a time into a CP merge spool until all the projected columns have been retrieved for the qualifying rows. Note that one or more of the merges may require accessing the delete column partition; for each such merge, the delete column partition is included in m1.

Note that performance can degrade if c1 is much less than m1 and the retrieve is not very selective. In this case, consider decreasing the number of column partitions that need to be accessed, combining column partitions so there are fewer column partitions, or increasing PPICacheThrP if there is enough available memory to do so.

For a delete, storage for any deleted rows (other than for LOBs) is not immediately reclaimed. A subsequent fastpath delete of all the rows in a row partition reclaims the storage of all deleted rows in that row partition (including previously deleted rows in that partition). A subsequent fastpath delete of all the rows of the table or join index reclaims the storage of all deleted rows in the table (including previously deleted rows).

using CP merge Spool q (one subrow partition and Last Use) using CP merge Spool q (s2 subrow partitions and Last Use) using CP merge Spool q (s1 + s2 subrow partitions and Last Use)

These three phrases indicate that a CP merge spool is created and used to merge some column partitions from the table or join index. Then some additional column partitions from the table or join index and the resulting one subrow column partition or s2 subrow column partitions of this CP merge spool are read. s1 indicates the number of intermediate subrow column partitions needed for the merges (and s1 + s2 indicates the number of merges needed); if s1 is not included, no intermediate subrow column partitions are needed for the merges (and also there is one or s2 merges needed). The number of merges needed depends on the number of column partitions that need to be



accessed and the number of available column partition contexts as indicated in a preceding “from” phrase. This is the last usage of this CP merge spool so it can be deleted. q is the spool number for the CP merge spool.

using covering CP merge Spool q (s2 subrow partitions and Last Use) using covering CP merge Spool q (s1 + s2 subrow partitions and Last Use)

These two phrases indicate that a CP merge spool is created and used to merge column partitions from the table or join index and then the resulting s2 subrow column partitions are read from the CP merge spool. s1 indicates the number of intermediate subrow column partitions needed for the merges (and s1 + s2 indicates the number of merges needed); if s1 is not included, no intermediate subrow column partitions are needed for the merges (and also there are s2 merges needed). The number of merges needed depends on the number of column partitions that need to be accessed and the number of available column partition contexts as indicated in a preceding “from” phrase. This is the last usage of this CP merge spool so it can be deleted. q is the spool number for the CP merge spool.

using CP merge Spool q (one subrow partition and Last Use) and rowid Spool k (Last Use)

using CP merge Spool q (s2 subrow partitions and Last Use) and rowid Spool k (Last Use)

using CP merge Spool q (s1 + s2 subrow partitions and Last Use) and rowid Spool k (Last Use)

These three phrases indicate that a CP merge spool is created and used to merge column partitions from the table and then some other column partitions from the table or join index and the resulting one subrow column partition or s2 subrow column partitions from the CP merge spool are read driven by the rowid spool. s1 indicates the number of intermediate subrow column partitions needed for the merges (and s1 + s2 indicates the number of merges needed); if s1 is not included, no intermediate subrow column partitions are needed for the merges (and also there is one or s2 merges needed). This is the last usage of both this CP merge spool and rowid spool so they can be deleted. q is the spool number for the CP merge spool and k is the spool number for the rowid spool. If rowid Spool k is generated by this step instead of a previous step, a ‘built from’ phrase follows this ‘using’ phrase.

using covering CP merge Spool q (s2 subrow partitions and Last Use) and rowid Spool k (Last Use)

using covering CP merge Spool q (s1 + s2 subrow partitions and Last Use) and rowid Spool k (Last Use)

These two phrases indicate that a CP merge spool is created and used to merge column partitions from the table or join index and then the resulting s2 subrow column partitions are read from the CP merge spool driven by the rowid spool. s1 indicates the number of intermediate subrow column partitions needed for the merges (and s1 + s2 indicates the number of merges needed); if s1 is not included, no intermediate subrow column partitions are needed for the merges (and also there are s2 merges needed). The number of merges needed depends on the number of column partitions that need to be accessed and the number of available column partition contexts as indicated in a preceding “from” phrase. This is the last usage of both this CP merge spool and rowid spool so they can be deleted. q is the spool number for the CP merge spool and k is the spool number for the rowid spool. If rowid Spool k is generated by this step instead of a previous step, a ‘built from’ phrase follows this ‘using’ phrase.



using rowid Spool k (Last Use)

This phrase indicates that a rowid spool contains the rowids of rows in the table that qualify and then the CP table or join index is read driven by this rowid spool. This is the last usage of this rowid spool so it can be deleted. k is the spool number for the rowid spool. If rowid Spool k is generated by this step instead of a previous step, a ‘built from’ phrase follows this ‘using’ phrase.

This phrase is used in steps such as RETRIEVE, JOIN, MERGE, etc. that may read a column-partitioned source.

built from m2 column partitions

This phrase indicates that the preceding rowid Spool k is built containing the rowids of rows in the table that qualify and then the CP table or join index is read driven by this rowid spool.

Up to m2 column partitions of the CP table or join index may need to be accessed to evaluate predicates and build the rowid spool. There are column partition contexts available for each of the m2 column partitions. For this phrase to occur, the CP table or join index would not have row partitioning. m2 is 2 or greater. One of the column partitions accessed may be the delete column partition. Not all of the m2 column partitions may actually need to be accessed if no rows qualify.


built from m2 column partitions (c2 contexts)

This phrase indicates that the preceding rowid Spool k is built containing the rowids of rows in the table or join index that qualify and then the CP table or join index is read driven by this rowid spool.

Up to m2 column partitions of the CP table or join index may need to be accessed using c2 column partition contexts to evaluate predicates and build the rowid spool. For this phrase to occur, the CP table or join index would not have row partitioning. m2 is 2 or greater. c2 is between 2 and m2-1 and is equal to or one less than the number of available column partition contexts. One of the column partitions accessed may be the delete column partition. Not all of the m2 column partitions may actually need to be accessed if no rows qualify.


built from n2 combined partitions (m2 column partitions)

This phrase indicates that the preceding rowid Spool k is built containing the rowids of rows in the table or join index that qualify and then the CP table or join index is read driven by this rowid spool.

The rows and columns for up to n2 combined partitions of the table or join index (which has both row and column partitioning in this case) may need to be accessed to evaluate predicates and build the rowid spool. There are column partition contexts available for each of the m2 column partitions. m2 is 2 or greater. One of the column partitions accessed may be the delete column partition. Not all of the m2 column partitions may actually need to be accessed if no rows qualify.




built from n2 combined partitions (m2 column partitions and c2 contexts)

This phrase indicates that the preceding rowid Spool k is built containing the rowids of rows in the table that qualify and then the CP table is read driven by this rowid spool.

The rows and columns for up to n2 combined partitions of the table or join index (which has both row and column partitioning in this case) may need to be accessed to evaluate predicates and build the rowid spool. For the column-partitioning level, c2 column partition contexts are used to access up to m2 column partitions. n2 and m2 are 2 or greater. c2 is between 2 and m2-1 and is equal to or one less than the number of available column partition contexts. One of the column partitions that may be accessed is the delete column partition. Not all of the m2 column partitions may need to be actually accessed if no rows qualify.


of n3 combined partitions

This phrase indicates that all the rows and columns in each of n3 combined partitions of a table or join index (which has both row and column partitioning in this case) are to be completely deleted as a fastpath delete and storage for the deleted rows is recovered. n3 is greater than one.

INSERT into a single column partition of

This phrase indicates that one or more rows are being inserted into a table or join index with only one user-specified column partition.

INSERT into m3 column partitions of

This phrase indicates that that one or more rows are being inserted into a table or join index with m3 column partitions. m3 is 2 or greater.

MERGE into a single column partition of MERGE into m3 column partitions of

The first of these two phrases indicates that rows are being inserted (merged) into a table or join index with only one user-specified column partition. The second phrase indicates that rows are being inserted (merged) into a column-partitioned table or join index with m3 user-specified column partitions where m3 is 2 or greater.

One of two methods may be used to insert the source rows into the column-partitioned table or join index:

1. One scan of the source rows is done to insert the rows. The column values from each source row that is read are buffered with one 128KB output buffer for each target column partition (a buffer is written to its corresponding column partition when it is full or all the source rows have been read). Note that the source rows either do not need to be sorted for the MERGE step or have already been sorted by previous a step. The phrase with buffered output occurs in the MERGE step for this method.



2. A buffer33 is filled with source rows and sorted. Then the rows in the buffer are scanned once for each column partition in order to insert the column partition values for that column partition. This is repeated until all the source rows have been read. This avoids a step to spool and sort the rows. This can be efficient if enough data is inserted for each set of buffered source rows into each combined partition when there is row partitioning for the CP table and the source is NoPI table. If not, this can cause rereading and rewriting of data blocks. This inefficiency can occur when only a few column partition values are going into a combined partition at a time. The phrase with buffered input and sorted input occurs in the MERGE step for this method.

MERGE into m3 column partitions (c3 contexts) of

This phrase indicates that rows are being inserted (merged) into a table or join index with m3 user-specified column partitions. m3 is 2 or greater. c3 is between 2 and m3.

One of two methods may be used to insert the source rows into the column-partitioned table or join index:

1. CEILING(m3/c3) scans of the source rows are done to insert the rows. The column values from each source row that is read are buffered with one 128KB output buffer for each target column partition being processed by a pass (a buffer is written to its corresponding column partition when it is full or all the source rows have been read). Note that the source rows either do not need to be sorted for the MERGE step or have already been sorted by previous a step. Also, the source must be spooled if the source is a table or join index with an access lock. Even though multiple passes over the source rows are needed, this method is often less costly than method 2. The phrase with buffered output occurs in the MERGE step for this method.

2. A buffer34 is filled with source rows and sorted. Then the rows in the buffer are scanned once for each column partition in order to insert the column partition values for that column partition. This is repeated until all the source rows have been read. This avoids a step to spool and sort the rows. Also, this avoids multiple scans of the source rows. This can be efficient if enough data is inserted for each set of buffered source rows into each combined partition when there is row partitioning. If not, this can cause rereading and rewriting of data blocks. This inefficiency can occur when only a few column partition values are going into a combined partition at a time. The phrase with buffered input and sorted input occurs in the MERGE step for this method.

Note the number of available column partition contexts is based on the size of the AMP’s FSG cache (for reading) or the AMP’s available memory (for writing), the estimated block size, and PPICacheThrP.

33 The size in bytes of this buffer is equal to the number of available column partition contexts times 128K. 34 The size in bytes of this buffer is equal to the number of available column partition contexts (that is, c3)

times 128K.



E.2 EXPLAIN Examples

Assume tables are created as follows:

CREATE TABLE t1 ( a INT, b INT, c INT, d INT, e INT, f INT, g INT, h INT, i INT, j INT, k INT, l INT, m INT, n INT, o INT, p INT, q INT, r INT, s INT, t INT, u INT, v INT, w INT, x INT, y INT, z INT) PARTITION BY COLUMN; CREATE TABLE t2 AS t1 WITH NO DATA PRIMARY INDEX (a, b); CREATE TABLE t3 AS t1 WITH NO DATA; CREATE TABLE t4 as t1 WITH NO DATA NO PRIMARY INDEX PARTITION BY (COLUMN, RANGE_N(b BETWEEN 1 AND 10 EACH 1)); CREATE TABLE t5 ( a INT, b INT, c INT, d INT, e INT, f INT, g INT, h INT, i INT, j INT) PARTITION BY COLUMN; CREATE TABLE t6 AS t5 WITH NO DATA PRIMARY INDEX (a, b);

For these examples, assume the requests are submitted to a system where there are 20 available column partition contexts35 unless otherwise noted. Note that an EXPLAIN given for a query in an example may be one of a number of possible plans that could be generated depending on the actual data demographics.

Example 1

EXPLAIN SELECT * FROM /*CP table*/ t1; *** Help information returned. 13 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.t1. 2) Next, we lock PLS.t1 for read. 3) We do an all-AMPs RETRIEVE step from 27 column partitions (20 contexts) of PLS.t1 using covering CP merge Spool 2 (2 subrow partitions and Last Use) by way of an all-rows scan with no residual conditions into Spool 1 (all_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 2 rows (614 bytes). The estimated time for this step is 0.03 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.

Note that the count of 27 for the number of column partitions includes the 26 user-specified column partitions of the CP table and the delete column partition. 27 exceeds the number of available column partition contexts. So 20 column partitions (including the delete column partition) of the CP table are merged into the first subrow column partition of the CP merge spool. The remaining 7 column partitions that need to be accessed from the CP table are merged into the second subrow column partition of the CP merge spool (for a total of 2 subrow column partitions). This reduces the number of column partitions to be accessed at one time

35 In an actual production system, the number of available column partition contexts is usually higher.



(which is limited by the number of available column partition contexts). The result is then retrieved from the two subrow column partitions of the CP merge spool.

Example 2

EXPLAIN SELECT * FROM /*CP/RP table*/ t4 WHERE b BETWEEN 4 AND 5; *** Help information returned. 14 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.t4. 2) Next, we lock PLS.t1 for read. 3) We do an all-AMPs RETRIEVE step from 54 combined partitions (27 column partitions and 20 contexts) of PLS.t4 using covering CP merge Spool 2 (2 subrow partitions and Last Use) with a condition of ("(PLS.t4.b <= 5) AND (PLS.t4.b >= 4)") into Spool 1 (all_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (307 bytes). The estimated time for this step is 0.03 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.

This is similar to example 1 except only two row partitions of the CP/RP table need to be read. Since there are 27 column partitions and 2 row partitions to be accessed, there are 54 combined partitions to be accessed (that is, 2 row partitions from each of the 27 column partitions).

Example 3

EXPLAIN SELECT a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, r, s, t, u FROM /*CP/RP table*/ t4 WHERE b BETWEEN 4 AND 5; *** Help information returned. 14 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.t4. 2) Next, we lock PLS.t1 for read. 3) We do an all-AMPs RETRIEVE step from 42 combined partitions (21 column partitions and 20 contexts) of PLS.t4 using covering CP merge Spool 2 (2 subrow partitions and Last Use) with a condition of ("(PLS.t4.b <= 5) AND (PLS.t4.b >= 4)") into Spool 1 (all_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (307 bytes). The estimated time for this step is 0.03 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.

This is similar to example 2 except only 20 user-specified column partitions plus the delete column partition of the CP/RP table need to be read. Since there are 21 column partitions and 2 row partitions to be accessed, there are 42 combined partitions to be accessed (that is, 2 row partitions from each of the 21 column partitions).



Example 4

EXPLAIN SELECT a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, r, s, t FROM /*CP/RP table*/ t4 WHERE b BETWEEN 4 AND 5; *** Help information returned. 13 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.t4. 2) Next, we lock PLS.t1 for read. 3) We do an all-AMPs RETRIEVE step from 40 combined partitions (20 column partitions) of PLS.t4 with a condition of ("(PLS.t4.b <= 5) AND (PLS.t4.b >= 4)") into Spool 1(all_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (307 bytes). The estimated time for this step is 0.03 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.

This is similar to example 3 except only 19 user-specified column partitions plus the delete column partition of the CP/RP table are read. Since there are 20 column partitions (which can all be read using the 20 contexts) and 2 row partitions to be accessed, there are 40 combined partitions to be accessed (that is, 2 row partitions from each of the 20 column partitions).

Example 5

EXPLAIN SELECT a, q FROM /*CP*/ table*/ t1 WHERE a>1 AND b<=1000 AND c=10 AND d=11 AND e<40 AND f<>87 AND g=92 AND h>=101 AND i=3000 AND j>=5 AND k=12 AND l=0 AND m=0 AND n IS NOT NULL AND o=1 AND p=-1 AND q=1 AND r<10 AND s=9 AND t>33 AND u=0 AND v=0 AND w=0 AND x>101 AND y=0 AND z=0; *** Help information returned. 20 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.t1. 2) Next, we lock PLS.t1 for read. 3) We do an all-AMPs RETRIEVE step from 2 column partitions of PLS.t1 using rowid Spool 2 (Last Use) built from 27 column partitions (20 contexts) with a condition of ("(PLS.t1.a > 1) AND (PLS.t1.b <= 1000) AND (PLS.t1.c = 10) AND (PLS.t1.d = 11) AND (PLS.t1.e < 40) AND (PLS.t1.f <> 87) AND (PLS.t1.g = 92) AND (PLS.t1.h >= 101) AND (PLS.t1.i = 3000) AND (PLS.t1.j >= 5) AND (PLS.t1.k = 12) AND (PLS.t1.l = 0) AND (PLS.t1.m = 0) AND (PLS.t1.n IS NOT NULL) AND (PLS.t1.o = 1) AND (PLS.t1.p = -1) AND (PLS.t1.q = 1) AND (PLS.t1.r < 10) AND (PLS.t1.s = 9) AND (PLS.t1.t > 33) AND (PLS.t1.u = 0) AND (PLS.t1.v = 0) AND (PLS.t1.w = 0) AND (PLS.t1.x > 101) AND (PLS.t1.y =

0) AND (PLS.t1.z = 0)") into Spool 1 (all_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (307 bytes). The estimated time for this step is 0.03 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.



20 contexts are used to evaluate 19 predicates plus check the delete column partition in order to generate a rowid spool for qualifying logical rows. This rowid spool is used to drive processing 8 column partitions for the remaining predicates in order to generate rowid Spool 2. Rowid Spool 2 is used to drive processing 2 column partitions to project the column values to build the rows for Spool 1.

Example 6

EXPLAIN SELECT a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u FROM /*CP*/ table*/ t1 WHERE a>1 AND b<=1000 AND c=10 AND d=11 AND e<40 AND f<>87 AND g=92 AND h>=101 AND i=3000 AND j>=5 AND k=12 AND l=0 AND m=0 AND n IS NOT NULL AND o=1 AND p=-1 AND q=1 AND r<10 AND s=9 AND t>33 AND u=0 AND v=0 AND w=0 AND x>101 AND y=0 AND z=0; *** Help information returned. 21 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.t1. 2) Next, we lock PLS.t1 for read. 3) We do an all-AMPs RETRIEVE step from 21 column partitions (20 contexts) of PLS.t1 using covering CP merge Spool 3 (2 subrow partitions and Last Use) and rowid Spool 2 (Last Use) built from 27 column partitions (20 contexts) with a condition of ("(PLS.t1.a > 1) AND (PLS.t1.b <= 1000) AND (PLS.t1.c = 10) AND (PLS.t1.d = 11) AND (PLS.t1.e < 40) AND (PLS.t1.f <> 87) AND (PLS.t1.g = 92) AND (PLS.t1.h >= 101) AND (PLS.t1.i = 3000) AND (PLS.t1.j >= 5) AND (PLS.t1.k = 12) AND (PLS.t1.l = 0) AND (PLS.t1.m = 0) AND (PLS.t1.n IS NOT NULL) AND (PLS.t1.o = 1) AND (PLS.t1.p = -1) AND (PLS.t1.q = 1) AND (PLS.t1.r < 10) AND (PLS.t1.s = 9) AND (PLS.t1.t > 33) AND (PLS.t1.u = 0) AND (PLS.t1.v = 0) AND (PLS.t1.w = 0) AND (PLS.t1.x > 101) AND (PLS.t1.y = 0) AND (PLS.t1.z = 0)") into Spool 1 (all_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (307 bytes). The estimated time for this step is 0.03 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.03 seconds.

20 contexts are used to evaluate 19 predicates plus check the delete column partition in order to generate a rowid spool for qualifying logical rows. This rowid spool is used to drive processing 8 column partitions for the remaining predicates in order to generate rowid Spool 2. Rowid Spool 2 is used to drive processing 20 column partitions to project column values to build the first subrow partition in CP merge Spool 3. Rowid Spool 2 is then reused to drive processing 1 column partition to project column values to build the second subrow partition in CP merge Spool 3. The retrieve can then process these 2 subrow partitions to build the rows for Spool 1.

Example 7

EXPLAIN SELECT a, c, d, e, f, g, h, i, j, k, l, m, n, o FROM /*CP/RP table*/ t4 WHERE b>5 AND p=-1 AND q=1 AND r<10 AND s=9 AND t>33 AND u=0 AND v=0 AND w=0 AND x>101 AND y=0 AND z=0; *** Help information returned. 17 rows. *** Total elapsed time was 1 second.



Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.t4. 2) Next, we lock PLS.t4 for read. 3) We do an all-AMPs RETRIEVE step from 14 column partitions of PLS.t4 using rowid Spool 2 (Last Use) built from 60 combined partitions (12 column partitions) with a condition of ("(PLS.t4.p = -1) AND ((PLS.t4.q = 1) AND ((PLS.t4.s = 9) AND ((PLS.t4.u = 0) AND ((PLS.t4.v = 0) AND ((PLS.t4.w = 0) AND ((PLS.t4.y = 0) AND ((PLS.t4.z = 0) AND ((PLS.t4.b > 5) AND ((PLS.t4.r < 10) AND ((PLS.t4.t > 33) AND (PLS.t4.x > 101)))))))))))") into Spool 1 (all_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (175 bytes). The estimated time for this step is 0.01 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.11 seconds.

13 contexts are used to evaluate 12 predicates plus check the delete column partition in order to generate rowid Spool 2 for qualifying logical rows (only 5 out of the 10 row partitions need to be scanned). Rowid Spool 2 is used to drive processing 14 column partitions to build the rows for Spool 1.

Example 8

For this example, assume the request (same as in example 7) is submitted to a system where there are only 10 available column partition contexts.

EXPLAIN SELECT a, c, d, e, f, g, h, i, j, k, l, m, n, o FROM /*CP/RP table*/ t4 WHERE b>5 AND p=-1 AND q=1 AND r<10 AND s=9 AND t>33 AND u=0 AND v=0 AND w=0 AND x>101 AND y=0 AND z=0; *** Help information returned. 19 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.t4. 2) Next, we lock PLS.t4 for read. 3) We do an all-AMPs RETRIEVE step from 14 column partitions (10 contexts) of PLS.t4 using covering CP merge Spool 3 (2 subrow partitions and Last Use) and rowid Spool 2 (Last Use) built from 60 combined partitions (12 column partitions and 10 contexts) with a condition of ("(PLS.t4.p = -1) AND ((PLS.t4.q = 1) AND ((PLS.t4.s = 9) AND ((PLS.t4.u = 0) AND ((PLS.t4.v = 0) AND ((PLS.t4.w = 0) AND ((PLS.t4.y = 0) AND ((PLS.t4.z = 0) AND ((PLS.t4.b > 5) AND ((PLS.t4.r < 10) AND ((PLS.t4.t > 33) AND (PLS.t4.x > 101 )))))))))))") into Spool 1 (all_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row (175 bytes). The estimated time for this step is 0.01 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.11 seconds.

10 contexts are used to evaluate 9 predicates plus check the delete column partition in order to generate a rowid spool for qualifying logical rows (only 5 out of the 10 row partitions need to be scanned). This rowid spool is used to drive processing 3 column partitions for the remaining



predicates in order to generate rowid Spool 2. Rowid Spool 2 is used to drive processing 10 column partitions to project column values to build the first subrow partition in CP merge Spool 3. Rowid Spool 2 is then reused to drive processing 4 column partitions to project column values to build the second subrow partition in CP merge Spool 3. The retrieve can then process these 2 subrow partitions to build the rows for Spool 1.

Example 9

EXPLAIN INSERT INTO /*CP table*/ t5 SELECT * FROM /*PI table*/ t6; *** Help information returned. 12 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.t6. 2) Next, we lock a distinct PLS."pseudo table" for write on a RowHash to prevent global deadlock for PLS.t5. 3) We lock PLS.t6 for read, and we lock PLS.t5 for write. 4) We do an all-AMPs MERGE into 10 column partitions of PLS.t5 from PLS.t6 with buffered output. The size is estimated with no confidence to be 2 rows. The estimated time for this step is 0.71 seconds. 5) We spoil the parser's dictionary cache for the table. 6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.

The 10 column partitions are the user-specified column partitions. The delete column partition is not affected. The MERGE step performs one scan of table t6.

Example 10

EXPLAIN LOCKING t6 FOR ACCESS EXPLAIN INSERT INTO /*CP table*/ t5 SELECT * FROM /*PI table*/ t6; *** Help information returned. 12 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for write on a RowHash to prevent global deadlock for PLS.t5. 2) Next, we lock a distinct PLS."pseudo table" for access on a RowHash to prevent global deadlock for PLS.t6. 3) We lock PLS.t5 for write, and we lock PLS.t6 for access. 4) We do an all-AMPs MERGE into 10 column partitions of PLS.t5 from PLS.t6 with buffered output. The size is estimated with no confidence to be 2 rows. The estimated time for this step is 0.71 seconds. 5) We spoil the parser's dictionary cache for the table. 6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.

The 10 column partitions are the user-specified column partitions. The delete column partition is not affected. The MERGE step performs one scan of table t6. The source table t6 does not need to be spooled even though it has an access lock.



Example 11

EXPLAIN INSERT INTO /*CP table*/ t1 SELECT * FROM /*PI table*/ t2; *** Help information returned. 13 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.t2. 2) Next, we lock a distinct PLS."pseudo table" for write on a RowHash to prevent global deadlock for PLS.t1. 3) We lock PLS.t2 for read, and we lock PLS.t1 for write. 4) We do an all-AMPs MERGE into 26 column partitions (20 contexts) of PLS.t1 from PLS.t2 with buffered output. The size is estimated with no confidence to be 2 rows. The estimated time for this step is 0.71 seconds. 5) We spoil the parser's dictionary cache for the table. 6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.

Since there are only 20 available column partition contexts but 26 column partitions, the MERGE step performs two scans of table t2; one to append to the first 20 column partitions and another to append to the other 6 column partitions.

Example 12

EXPLAIN LOCKING t2 FOR ACCESS INSERT INTO /*CP table*/ t1 SELECT * FROM /*PI table*/ t2; *** Help information returned. 19 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for write on a RowHash to prevent global deadlock for PLS.t1. 2) Next, we lock a distinct PLS."pseudo table" for access on a RowHash to prevent global deadlock for PLS.t2. 3) We lock PLS.t1 for write, and we lock PLS.t2 for access. 4) We do an all-AMPs RETRIEVE step from PLS.t2 by way of an all-rows scan with no residual conditions into Spool 1 (all_amps), which is spooled locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 2 rows (614 bytes). The estimated time for this step is 0.01 seconds. 5) We do an all-AMPs MERGE into 26 column partitions (20 contexts) of PLS.t1 from Spool 1 (Last Use) with buffered output. The size is estimated with low confidence to be 2 rows. The estimated time for this step is 0.71 seconds. 6) We spoil the parser's dictionary cache for the table. 7) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1. The total estimated time is 0.72 seconds.

Note that, in this case, the source table must be spooled before inserting into the target CP table since there is an access lock on table t1 so t1 is only read once. Also, since there are only 20 available column partition contexts but 26 column partitions, the MERGE step performs two scans of Spool 1; one to append to the first 20 column partitions and another to append to the other 6 column partitions.



Example 13

EXPLAIN INSERT INTO /*CP table*/ t1 SELECT * FROM /*CP table*/ t3; *** Help information returned. 19 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for write on a RowHash to prevent global deadlock for PLS.t1. 2) Next, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.t3. 3) We lock PLS.t1 for write, and we lock PLS.t3 for read. 4) We do an all-AMPs RETRIEVE step from 27 column partitions (20 contexts) of PLS.t1 using CP merge Spool 2 by way of an all-rows scan with no residual conditions into Spool 1 (all_amps), which is spooled locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 2 rows (614 bytes). The estimated time for this step is 0.01 seconds. 5) We do an all-AMPs MERGE into 26 column partitions (20 contexts) of PLS.t1 from Spool 1 (Last Use) with buffered output. The size is estimated with no confidence to be 2 rows. The estimated time for this step is 0.71 seconds. 5) We spoil the parser's dictionary cache for the table. 6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.

Since there are only 20 available column partition contexts but 26 column partitions, the MERGE step performs two scans of Spool 1; one to append to the first 20 column partitions and another to append to the other 6 column partitions. The source table t3 is spooled locally into Spool 1 in order to construct the rows to be inserted into t1.

Example 14:

EXPLAIN INSERT INTO /*CP/RP table*/ t4 SELECT * FROM /*NoPI table*/ t7; *** Help information returned. 13 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we lock a distinct PLS."pseudo table" for read on a RowHash to prevent global deadlock for PLS.t7. 2) Next, we lock a distinct PLS."pseudo table" for write on a RowHash to prevent global deadlock for PLS.t4. 3) We lock PLS.t7 for read, and we lock PLS.t4 for write. 4) We do an all-AMPs MERGE into 26 column partitions (20 contexts) of PLS.t4 from PLS.t7 with buffered input and sorted input. The size is estimated with no confidence to be 2 rows. The estimated time for this step is 0.71 seconds. 5) We spoil the parser's dictionary cache for the table. 6) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> No rows are returned to the user as the result of statement 1.

Since the source is a NoPI table and the target is column-partitioned table with row partitioning, the source rows are buffered and sorted in memory (per the target’s row partitioning) to avoid using an intermediate spool.



Example 15:

For an array insert, multiple column partition contexts are not needed so the EXPLAIN only specifies the number of column partitions. One column partition context is used to append the values for one column partition and then reused to append the values for each of the other column partitions.

EXPLAIN USING ( a INT, b INT, c INT, d INT, e INT, f INT, g INT, h INT, i INT, j INT, k INT, l INT, m INT, n INT, o INT, p INT, q INT, r INT, s INT, t INT, u INT, v INT, w INT, x INT, y INT, z INT) INSERT t1 VALUES ( :a, :b, :c, :d, :e, :f, :g, :h, :i, :j, :k, :l, :m, :n, :o, :p, :q, :r, :s, :t, :u, :v, :w, :x, :y, :z); *** Help information returned. 4 rows. *** Total elapsed time was 1 second. Explanation --------------------------------------------------------------------------- 1) First, we do an INSERT into 26 column partitions of PLS.t1. The estimated time for this step is 1.72 seconds. -> No rows are returned to the user as the result of statement 1. The total estimated time is 1.72 seconds.



Appendix F: Miscellaneous Topics

F.1 Column-Partitioned Table as a Source Table

Typically, a CP table is not expected to be used as source table; however, there may be some situations where a CP table is the source for an insert.

Target PI Table

Operations such as INSERT-SELECT, MERGE, UPDATE-FROM, and DELETE-FROM for a target PI table can run slower when the source is a CP table compared to a source PI table with the same primary index as the target. This is because, with a source CP table, selected rows must be reconstructed from the referenced columns of the column partitions, redistributed, and sorted locally on the AMPs. Note that copying data from one PI table to another PI table with a different primary index or partitioning also requires redistribution and AMP local sorting.

Target NoPI Table

Operations such as INSERT-SELECT, MERGE, UPDATE-FROM, and DELETE-FROM for a target NoPI table can run slower when the source is a CP table compared to a source NoPI or PI table. This is because, with a source CP table, selected rows must be reconstructed from the referenced columns of the column partitions.

Target CP Table

Operations such as INSERT-SELECT, MERGE, UPDATE-FROM, and DELETE-FROM for a target CP table can run slower when the source is a CP table compared to a source NoPI or PI table. This is because, with a source CP table, selected rows may need to be reconstructed from the referenced columns of the column partitions and this may require spooling the source CP table to reconstruct rows and then reading the spooled rows and converting to the column partitioning of the target CP table.

F.2 Archive and Restore

Users are permitted to archive, restore, and copy a CP table or join index and databases containing such tables and join indexes. The number of rows output for an archive is the number of table rows (not the number of physical rows for the table).

Selected partition archive, restore, and copy of a CP table or join index are not allowed.

See section F.4.1 for a discussion on skew issues for a restore/copy to a CP table.

F.3 CheckTable

CheckTable at level three performs duplicate checks on a MULTISET table when there is a unique index on the table. This check can be very expensive and can take a long time to complete when there are many rows in the table having the same hash value as in the case of a very highly nonunique PI table. For each hash value, CheckTable scans through the entire hash, row by row, and does the duplicate check on the unique index. Since a CP or NoPI table generally has only one hash value, it can be very slow for CheckTable to run on a CP or NoPI table if the table has a unique secondary index. Users should be aware of this CheckTable performance issue.



F.4 Data Skewing

For CP and NoPI tables, data is randomly sent to the AMPs by single-row INSERT statements, array INSERT statements, and TPump. The randomization, or choosing the destination where to send data, generally is not a problem as far as data skewing is concerned – data balances out among the AMPs.

However, there are several operations on a CP or NoPI table that can lead to data skewing. These are discussed in the following subsections.

F.4.1 Restore/Copy

When the AMP configuration of the source system and the target system are the same in a restore or copy, there is no data redistribution involved so rows originally stored in an AMP are restored or copied to that same AMP. In that case, the data demographics stay the same between the source system and the target system.

For a restore or copy of a NoPI or CP table, the hash bucket in a row coming from the source system does not change when it is stored in the target system. In addition, the NoPI table only uses hash buckets that never move to a new AMP with a Restore/Copy (or for Reconfig) if the target system has more AMPs.36

Therefore, any additional AMPs on the target system do not have any data at all after a restore or copy. In addition, rows for AMPs that do not exist in the target system are moved to the AMP that owns that hash bucket for those rows per the normal hash map.37 In either case, this makes the NoPI table skewed and can affect performance. Also, this means more space38 could be required for a NoPI table than a traditional PI table on the target system.

To avoid this issue when the target system has a different number of AMPs than the source system, INSERT-SELECT the NoPI table into a PI table for archive (and do not archive the NoPI table) and after the restore, INSERT-SELECT from the PI table into the NoPI table. For the primary index of the PI table, consider a PI that provides good distribution and, if appropriate, improves the effectiveness of autocompression for the insert back into the NoPI table.

If the NoPI table is populated from other source tables, another option is to archive the source tables (and not the NoPI table) and, after restore, repopulate the NoPI table from the source tables using an INSERT-SELECT (with a HASH BY clause as needed to provide good distribution). If the NoPI table is on the archive (for example, the archive was made prior to knowing system configuration or hash would be different), do not restore the NoPI table and after the restore of the source tables, populate the NoPI.

36 A special hash map is used specifically for NoPI (including CP) tables. The NoPI hash map only

contains hash bucket numbers that never move to a new AMP with Reconfig or Restore/Copy when the target system has more AMPs (note that some hash bucket numbers may move if the target system has fewer AMPs). For a 20-bit hash bucket, there are 64 such hash bucket numbers and, for a 16-bit hash bucket, there are 4 such hash bucket numbers.

37 Note that this creates a situation where rows are on AMPs where they do not belong according to the NoPI hash map. If the system is subsequently reconfigured to have more AMPs, the NoPI table is marked RCOAborted.

38 When the target system or configuration has more AMPs than the source system or configuration, target space allocation for the table is greater than or equal to the source space allocation * number of target AMPs / number of source AMPs When the target has fewer AMPs than the source, the target space allocation for the table is greater than or equal to 2 (or more) * source space allocation * number of target AMPs / number of source AMPs.



If neither is done and the NoPI table is restored in a skewed state, INSERT-SELECT with a HASH BY clause from the skewed NoPI table into another NoPI table and drop the original NoPI table.

If the NoPI table is a CP table, consider use of a LOCAL ORDER BY clause with the INSERT-SELECT to improve the effectiveness of autocompression.

F.4.2 Reconfig

Reconfig has a similar issue as Restore/Copy. As discussed in the previous section for Restore/Copy, rows for a NoPI or CP table are not moved to new AMPs in the configuration (they stay on their original AMP) if more AMPs are added. Rows for AMPs that are dropped are moved to the AMP that owns that hash bucket for those rows per the normal hash map.39 As with Restore and Copy, this makes a NoPI table skewed and can affect performance. Also, more space38 is required for a NoPI table to do the Reconfig than a traditional PI table.

To avoid this issue when the number of AMPs is changed, INSERT-SELECT the NoPI table into a PI table before the Reconfig and delete all the rows in the NoPI table. After the Reconfig, INSERT-SELECT from the PI table into the NoPI table. For the primary index of the PI table, consider a PI that provides good distribution and, if appropriate, improves the effectiveness of autocompression for the insert back into the NoPI table.

If the NoPI table is populated from other source tables, another option is to delete all the rows in the NoPI table and, after the Reconfig, repopulate the NoPI table from the source tables using an INSERT-SELECT (with a HASH BY clause as needed to provide good distribution).

If neither is done and the NoPI is in a skewed state after the Reconfig, INSERT-SELECT with a HASH BY clause from the skewed NoPI table into another NoPI table and drop the original NoPI table.

If the NoPI table is a CP table, consider use of a LOCAL ORDER BY clause with the INSERT-SELECT to improve the effectiveness of autocompression.

F.4.3 INSERT-SELECT (CP/NoPI Target Table)

When the target table of an INSERT-SELECT is a CP or NoPI table, data coming from the source table, whether it is directly from the source table or from an intermediate spool, is locally inserted into the target CP or NoPI table. Performance wise, this is very efficient since it avoids a redistribution and sort. But if the source table or the resulting spool is skewed, the target CP/NoPI table can also be skewed; in this case, a HASH BY clause can be used to redistribute the data from the source before the local copy.

For the expressions to hash on, consider ones that provide good distribution and, if appropriate, improves the effectiveness of autocompression for the insert back into the CP or NoPI table. Alternatively, use a HASH BY RANDOM clause for a good distribution if there is not a clear choice for the expressions to hash on.

When inserting into a CP table, also consider use of an LOCAL ORDER BY clause with the INSERT-SELECT to improve the effectiveness of autocompression.

39 Note that this creates a situation where rows are on AMPs where they do not belong according to the

NoPI hash map. If the system is subsequently reconfigured to have fewer AMPs and then later reconfigured to have more AMPs, a NoPI table is marked RCOAborted.



F.4.4 Down-AMP Recovery

When there is a down-AMP for a NoPI or CP table in a cluster of more than 2 AMPs and rows are inserted into the table while the AMP is down, the table may be skewed after the AMP is back online and recovered. That is, the previously down AMP and another AMP (used as the fallback AMP for the down AMP) may have a fewer proportion of the inserted rows than the other AMPs in the cluster. There is no problem when there are 2 AMPs per cluster. For example,

1. If there is a down-AMP in a 3 AMP cluster and 300,000 rows are sent to the AMPs in the cluster, after recovery the down-AMP will have 50,000 rows from the insert (1/6th of the rows), one of the other AMPs (which is designated as the fallback AMP for the down-AMP) will have 100,000 rows from the insert (1/3rd of the rows), and the third AMP will have 150,000 rows from the insert (1/2 of the rows).

2. If there is a down-AMP in a 4 AMP cluster and 300,000 rows are sent to the AMPs in the cluster, after recovery the down-AMP will have 25,000 rows from the insert (1/12th of the rows), one of the other AMPs (which is designated as the fallback AMP for the down-AMP) will have 75,000 rows from the insert (1/4th of the rows), and other two AMPs will each have 100,000 rows from the insert (1/3rd of the rows each).

If the skew is significant after down-AMP recovery, INSERT-SELECT with a HASH BY RANDOM clause from the skewed NoPI table into another NoPI table and drop the original NoPI table. If the NoPI table is a CP table, consider use of a LOCAL ORDER BY clause with the INSERT-SELECT to improve the effectiveness of autocompression.

The following explains why this skew occurs for NoPI and CP tables (this is optional and somewhat technical reading):

1. For a PI table when there is a down AMP in a cluster, an inserted row intended to go to that down AMP is re-routed to one of the down AMP’s fallback AMPs in the cluster based on the fallback hash bucket map. When the down AMP comes back online, the fallback AMPs send these rows to the previously down AMP as part of the recovery process. This synchronizes both primary and fallback data on all of the AMPs before the table is opened up for more work and the rows are evenly distributed (assuming a good PI).

2. For a CP or NoPI table, data is sent to the AMPs for insertion based on the hash bucket of a randomly generated hash value. When there is a down AMP in a cluster with more than 2 AMPs, multiple fallback AMPs in the cluster can receive data on behalf of the down AMP based on how a hash bucket is mapped to an AMP by the fallback hash bucket map.

A receiving AMP could append these rows to the fallback subtable for the down AMP using the hash bucket of the previous fallback row appended with the uniqueness of the previous row plus one.40 Since there is typically one hash bucket on an AMP being used for all of the rows in a NoPI table or in a combined partition for a CP table (see section 3.3 and section 3.4), multiple AMPs would be assigning uniqueness to rows for the same internal partition number and hash bucket and could thereby append rows with the same internal partition, hash bucket, and uniqueness, albeit on different AMPs.

40 For the first row inserted for an internal partition number, the hash bucket is the first hash bucket from

the NoPI hash map for the AMP with a uniqueness of one; if when inserting a new row, the uniqueness would exceed the maximum uniqueness, the next hash bucket from the NoPI hash map is used starting again with a uniqueness of one.



However, when the down AMP is back online, the rows from the fallback AMPs would be sent to the previously down AMP as part of the recovery and that AMP could receive rows that have the same internal partition number, hash bucket, and uniqueness. This is problematic since the internal partition number, hash bucket, and uniqueness are supposed to uniquely identify just one row.

Therefore to be able to handle a CP or NoPI recovery, only one AMP (which is designated as the fallback AMP for the down AMP) puts the rows intended for the down AMP into a fallback data subtable – that data goes to the down AMP when it is back online. The other online AMPs can still receive rows on behalf of a down AMP but those AMPs append these rows to their primary data subtable where they remain even after the down AMP is back online. This means that when there are more than 2 AMPs in a cluster and there is a down AMP while rows are being inserted into a CP or NoPI table, data in that CP or NoPI table can be skewed when the down AMP is back online. The previously down AMP ends up with smaller percentage of the inserted rows than its fair share, its fallback AMP ends with its fair share of the rows, while the other AMPs end up with a larger share of the rows than their fair share (see the examples on the previous page).

See also the Orange Book: No Primary Index Table User’s Guide.



Appendix G: Partitioning Meta Data

This appendix discusses statements, data dictionary tables and views, etc. relevant to obtaining information about CP tables and join indexes.

For tables or join indexes created prior to Teradata 14.0, an ALTER TABLE … REVALIDATE statement must be submitted for the table or join index in order to update the data dictionary (see section C.6.4).

G.1 System-Derived Column PARTITION[#Ln]

System-derived column PARTITION provides the combined partition number for a table row. System-derived columns PARTITION#Ln, where n is between 1 and 62, inclusive, provide the partition number of a table row for the specified level.

The partition number for the column-partitioning level for a table row is always 1. That is, if the table has column partitioning and no row partitioning, PARTITION and PARTITION#L1 both return 1 and PARTITION#Ln, where n is between 2 and 62, inclusive, returns 0. If the table has both column partitioning and row partitioning and the column partitioning is at level m, PARTITION#Lm is 1 and, for the column level partitioning, 1 is used as the column partition number in calculating the combined partition number value for PARTITION.

G.2 HELP COLUMN Statement

The following four attributes of interest are returned by the HELP COLUMN statement:

Partitioning Column

N indicates an attribute for an expression or not a partitioning column.

Y indicates partitioning column in a partitioning expression.

Column Partition Number

0 if attribute for an expression or not column partitioned; otherwise, the column partition number of the column partition in which the column belongs. Note that columns of a table or join index with the same column partition number belong to the same column partition.

Column Partition Format

NA = not applicable (i.e., an attribute for an expression or not column partitioned)

CS = system-determined COLUMN format

CU = user-specified COLUMN format

RS = system-determined ROW format

RU = user-specified ROW format

Column Partition AC

NA = not applicable (i.e., attribute for an expression or not column partitioned)

NC = no auto compress

AC = auto compress



G.3 SHOW TABLE, JOIN INDEX, and DML Statements

For a partitioned table or join index, an ADD option is included for a partitioning level if the level has column partitioning or if the number of defined partitions for a row-partitioning level is less than the maximum number of partitions for the level and the level is not the first level that has row partitioning.

For a CP table or join index, grouping and options, if any, is included in the COLUMN clause – not in the column definition list or select expression list. The column grouping and options in the COLUMN clause are output to be as short as possible.

Example

BTEQ -- Enter your SQL request or BTEQ command: CREATE TABLE t1 (a INT, b INT, c INT, d INT, e INT, f INT, g INT, h INT, i INT, j INT, k INT, l INT, m INT, n INT, o INT, p INT, q INT, r INT, s INT, t INT, u INT, v INT, w INT, x INT, y INT, z INT) NO PRIMARY INDEX PARTITION BY COLUMN ALL BUT (a, b, (g, d), ROW(s, t, j), h NO AUTO COMPRESS, x);

*** Table has been created. *** Total elapsed time was 2 seconds.

BTEQ -- Enter your SQL request or BTEQ command: SHOW TABLE t1;

*** Text of DDL statement returned. *** Total elapsed time was 1 second.

--------------------------------------------------------------------------- CREATE MULTISET TABLE PLS.t1 ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL, CHECKSUM = DEFAULT, DEFAULT MERGEBLOCKRATIO ( a INTEGER, b INTEGER, c INTEGER, d INTEGER, e INTEGER, f INTEGER, g INTEGER, h INTEGER, i INTEGER, j INTEGER, k INTEGER, l INTEGER, m INTEGER, n INTEGER, o INTEGER, p INTEGER, q INTEGER, r INTEGER, s INTEGER, t INTEGER, u INTEGER, v INTEGER, w INTEGER, x INTEGER, y INTEGER, z INTEGER) NO PRIMARY INDEX PARTITION BY COLUMN ALL BUT ((d, g), h NO AUTO COMPRESS, ROW(j, s, t)) ADD 65509;



G.4 HELP INDEX Statement

A table or join index with column partitioning must not have a primary index and a HELP row would not be returned for a primary index for such a table or join index. Therefore, the HELP INDEX statement cannot be used to determine whether or not the table or join index is column partitioned.

Rows are returned for secondary indexes on a table or join index with column partitioning.

G.5 DBC.TVM System Table

The value stored in DBC.TVM.TableKind for a partitioned table is the current kind for permanent tables of ‘T’ and for a partitioned join index is the current kind for join indexes of ‘I’.

Two columns of interest are included in the DBC.TVM system table:

PIColumnCount

0 indicates no primary index; otherwise, this indicates the number of columns in the primary index. Note that, for a table with no primary index, TableKind is ‘O’ if the table is not partitioned and is ‘T’ if the table is partitioned. For a join index that does not have a primary index, TableKind is ‘I’.

PartitioningLevels

Indicates the number of partitioning levels for the table, join index, or primary index (a value between 0 and 62, inclusive). 0 indicates not partitioned.

G.6 DBC.TablesV[X] System View

These views include the two columns discussed for DBC.TVM in the previous section.

G.7 DBC.TVFields System Table

Four columns of interest are included in DBC.TVFields:

PartitioningColumn

N indicates not a partitioning column.

Y indicates partitioning column of a partitioning expression.

ColumnPartitionNumber

0 if not column partitioned; otherwise, the column partition number of the column partition in which the column belongs. Note that columns of a table or join index with the same column partition number belong to the same column partition.

ColumnPartitionFormat

NA = not applicable (i.e., not column partitioned)

CS = system-determined COLUMN format

CU = user-specified COLUMN format

RS = system-determined ROW format

RU = user-specified ROW format

ColumnPartitionAC

NA = not applicable (i.e., not column partitioned)

NC = no auto compress

AC = auto compress



G.8 DBC.ColumnsV[X] System View

These views include the four columns discussed for DBC.TVFields (see previous section).

G.9 DBC.TableConstraints System Table

This table contains one row for each table-level check constraint and each partitioning constraint (formerly referred to as an index constraint) in the system.

Seven columns are of interest in DBC.TableConstraints:

DefinedCombinedPartitions

Indicates the number of currently defined combined partitions. This is the product of the number of defined partitions for each level (the number of defined partitions for a column-partitioning level includes the 2 internal use partitions). This is 0 if ConstraintType is not ‘Q’. If the partitioning is altered when a table is nonempty, this value can change to be smaller or larger as long as it does not exceed MaxCombinedPartitions.

MaxCombinedPartitions

Indicates the maximum number of combined partitions allowed. This is the product of the maximum number of partitions for each level; the maximum number of partitions for a partitioning level is the currently defined number of partitions for this level and the number of partitions that could be added to the level (the number of defined partitions for a column-partitioning level includes the 2 internal use partitions). If ConstraintType is not ‘Q’, this is 0. This value is greater than or equal to DefinedCombinedPartitions and is 0 if DefinedCombinedPartitions is 0. This value cannot change for a nonempty table.

PartitioningLevels

Indicates the number of partitioning levels for the table, join index, primary index (a value between 1 and 62, inclusive) if ConstraintType is ‘Q’; otherwise, this is 0.

ColumnPartitioningLevel

Indicates the level number for the column-partitioning level (a value between 1 and 62, inclusive). This is 0 if ConstraintType is not ‘Q’ or there is no column partitioning.

lndexName IndexNumber

For a partitioning constraint of a column-partitioned table, the IndexName and IndexNumber columns are NULL since there is no associated index (a column-partitioned table does not have a primary index).

TableCheck

Prior to Teradata 14.0, a ConstraintType is ‘Q’ indicated an index constraint for table. In this case, the TableCheck column of this table contained the unresolved constraint text generated from a primary index’s partitioning expressions. Since the purpose of the index constraint is actually specific to partitioning and a table with column partitioning has no primary index, ConstraintType value of ‘Q’ is changed to mean partitioning constraint instead of index constraint.

For a partitioning constraint, the TableCheck column contains the unresolved constraint text generated from the one or more partitioning levels and DBC.TableConstraints contains one row for each table-level check constraint and each partitioning constraint.



A new form of a partitioning constraint text is used if any of the Teradata 14.0 capabilities (including 8-byte partitioning, column partitioning, and ADD option) are used or the Cost Profile constant PartitioningConstraintForm (see section H.2.1) is set to 1 when creating or altering a table; otherwise, one of the three old forms is used. The three old forms are only used for compatibility so there is not an unexpected increase in the size of the partitioning text causing the character limit on the size the text to be exceeded.

The new partitioning constraint text is as follows:

CHECK (/*nn bb cc*/ partitioning_constraint_1 [AND partitioning_constraint_2]... )

where:

1. nn is two digits indicating the number of partitioning levels. For 2-byte partitioning, nn may have a value between 01 and 15, inclusive. For 8-byte partitioning, nn may have a value between 01 and 62, inclusive.

2. bb is 02 for 2-byte partitioning and 08 for 8-byte partitioning.

3. cc is two digits indicating the level for the column partitioning. If there is no column partitioning, cc is 00; otherwise, cc is between 01 and nn, inclusive.

4. Each one of the partitioning constraints corresponds to a level of partitioning in the order defined for the table.

5. partitioning_constraint_i is partitioning_expression_i /*i d+a*/ IS NOT NULL if there is row-partitioning at level i where partitioning_expression_i is the partitioning expression at level i.

6. partitioning_constraint_i is PARTITION#Li /*i d+a*/ =1 if there is column partitioning at level i.

7. i is between 1 and nn, inclusive.

8. d is the number of currently defined partitions for the level. For a column-partitioned level, this includes the two internal column partitions.

9. a is the number of additional partitions that could be added (this may be 0) or X. X occurs for level 2 and higher if this new partitioning constraint form is only being used because the Cost Profile constant PartitioningConstraintForm is set to 1 to force use of the new constraint form in all cases. If the new constraint form would be used regardless of the setting of PartitioningConstraintForm or this is for level 1, this is a number.

10. Leading zeros are not included in i, d, and a.

For this example,

CREATE TABLE Orders ( o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_totalprice DECIMAL(13,2) NOT NULL, o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL, NO PRIMARY INDEX, UNIQUE INDEX (o_orderkey), PARTITION BY ( RANGE_N(o_custkey BETWEEN 0 AND 100000 EACH 1), COLUMN );



The ConstraintText for the above 8-byte partitioning would be the following:

CHECK (/*02 08 02*/ RANGE_N(o_custkey BETWEEN 0 AND 100000 EACH 1) /*1 100001+485440633518572409*/ IS NOT NULL AND PARTITION#L2 /*2 8+10*/ =1)

The following query could be used to retrieve the level for the column partitioning for each of the objects that have column partitioning in the system:

SELECT DBaseId, TVMId, ColumnPartitioningLevel FROM DBC.TableConstraints WHERE ConstraintType = 'Q' AND ColumnPartitioningLevel >= 1 ORDER BY 1, 2;

G.10 DBC.PartitioningConstraintsV[X] System Views

These system views provide information about partitioning constraints (see the previous section). The VX version limits access to only objects on which the user has appropriate privileges. Both views support UNICODE object names. The column definitions are the same as for the DBC.IndexConstraintsV[X] views with the following differences:

1. These views include the first four columns of interest discussed in the previous section for DBC.TableConstraints except the first three cannot be 0.

2. IndexName and IndexNumber are NULL if the partitioning is associated with a table or join index that does not have a primary index.

3. The ConstraintText column has the same values as described for the TableCheck column in the DBC.TableConstraints system table (see the previous section).

In the following, V can be replaced by VX if the user only wants limited results (or doesn’t have the privilege to use the V version).

To obtain a list of objects with partitioning, the following query could be used:

SELECT DatabaseName, TableName (TITLE 'Table/Join Index Name') FROM DBC.PartitioningConstraintsV ORDER BY DatabaseName, TableName;

To obtain a list of objects with column partitioning, the following query could be used:

SELECT DatabaseName, TableName (TITLE 'Table/Join Index Name') FROM DBC.PartitioningConstraintsV WHERE ColumnPartitioningLevel >= 1 ORDER BY DatabaseName, TableName;

To obtain a list of objects with 8-byte multilevel partitioning where one of the levels has column partitioning, the following query could be used:

SELECT DatabaseName, TableName (TITLE 'Table/Join Index Name') FROM DBC.PartitioningConstraintsV WHERE MaxCombinedPartitions >= 65536 AND PartitioningLevels >= 2 AND ColumnPartitioningLevel >= 1 ORDER BY DatabaseName, TableName;

To obtain a list of objects with 2-byte single-level column partitioning, the following query could be used:

SELECT DatabaseName, TableName (TITLE 'Table/Join Index Name') FROM DBC.PartitioningConstraintsV WHERE MaxCombinedPartitions <= 65535 AND PartitioningLevels = 1 AND ColumnPartitioningLevel = 1 ORDER BY DatabaseName, TableName;



To obtain a list of objects with 8-byte single-level column partitioning (note that an ADD option must have been used in order to have 8-byte partitioning), the following query could be used:

SELECT DatabaseName, TableName (TITLE 'Table/Join Index Name') FROM DBC.PartitioningConstraintsV WHERE MaxCombinedPartitions >= 65536 AND PartitioningLevels = 1 AND ColumnPartitioningLevel = 1 ORDER BY DatabaseName, TableName;

To obtain a list of objects with 2-byte single-level column partitioning or 2-byte multilevel partitioning where one of the levels has column partitioning, the following query could be used:

SELECT DatabaseName, TableName (TITLE 'Table/Join Index Name') FROM DBC.PartitioningConstraintsV WHERE MaxCombinedPartitions <= 65535 AND ColumnPartitioningLevel >= 1 ORDER BY DatabaseName, TableName;

To obtain a list of objects with 2-byte multilevel partitioning where one of the levels has column partitioning, the following query could be used:

SELECT DatabaseName, TableName (TITLE 'Table/Join Index Name') FROM DBC.PartitioningConstraintsV WHERE MaxCombinedPartitions <= 65535 AND PartitioningLevels >= 2 AND ColumnPartitioningLevel >= 1 ORDER BY DatabaseName, TableName;

To obtain a list of objects with 2-byte single-level partitioning, the following query could be used:

SELECT DatabaseName, TableName (TITLE 'Table/Join Index Name') FROM DBC.PartitioningConstraintsV WHERE MaxCombinedPartitions <= 65535 AND PartitioningLevels = 1 ORDER BY DatabaseName, TableName;

To obtain a list of objects with multilevel partitioning, the following query could be used:

SELECT DatabaseName, TableName (TITLE 'Table/Join Index Name') FROM DBC.PartitioningConstraintsV WHERE PartitioningLevels >= 2 ORDER BY DatabaseName, TableName;

To obtain a list of objects with single-level row partitioning, the following query could be used:

SELECT DatabaseName, TableName (TITLE 'Table/Join Index Name') FROM DBC.PartitioningConstraintsV WHERE PartitioningLevels = 1 AND ColumnPartitioningLevel = 0 ORDER BY DatabaseName, TableName;

To obtain a list of objects with 8-byte partitioning that only would need to use 2-byte partitioning, the following query could be used:

SELECT DatabaseName, TableName (TITLE 'Table/Join Index Name') FROM DBC.PartitioningConstraintsV WHERE DefinedCombinedPartitions <= 65535 AND MaxCombinedPartitions >= 65536 ORDER BY DatabaseName, TableName;

G.11 DBC.DBQLStepTbl

Three columns of interest are included in DBC.DBQLStepTbl and populated for a relation that has partitioning.

NumCombinedPartitions

If there is static partition elimination for the step (for a query submitted to a Teradata Database 14.0 or later) or a source relation has column partitions, the number of combined partitions accessed (not eliminated). Otherwise, the column is null.



NumContexts

The number of contexts allocated, if any, to access partitions at the same time of a source or target relation that has partitioning. Otherwise, the column is null.

NumCPReferences

The number of column partitions referenced in a column-partitioned source or target. Otherwise, the column is null.

G.12 DBC.QryLogStepsV System View

Three new columns of interest are the same as discussed in the previous section for DBC.DBQLStepTbl system table.

G.13 Query Capture Database

Three columns of interest are included in the Relation table of a Query Capture Database and populated for a relation that has partitioning.

NumCombinedPartitions

If there is static partition elimination for the relation (for a query submitted to Teradata Database 14.0 or later) or the relation has column partitions, the number of combined partitions accessed (not eliminated). Otherwise, the column is null.

NumCPContexts

The number of contexts allocated, if any, to access partitions at the same time of a relation that has partitioning. Otherwise, the column is null.

NumCPReferences

The number of column partitions referenced in a column-partitioned relation. Otherwise, the column is null

G.14 XML Plan

The following three attributes are recorded in the XML plan produced by the BEGIN QUERY LOGGING … XMLPLAN option and the INSERT EXPLAIN and EXPLAIN IN XML option.

Relation/@NumCombinedPartitions

If there is static partition elimination for the relation (for a query submitted to a Teradata Database 14.0 or later) or the relation has column partitions, the number of combined partitions accessed (not eliminated). Otherwise, the attribute is not specified.

Relation/@NumContexts

The number of contexts allocated, if any, to access partitions at the same time of a relation that has partitioning. Otherwise, the attribute is not specified.

Relation/@NumCPReferences

The number of column partitions referenced in a column-partitioned relation. Otherwise, the attribute is not specified.



Appendix H: System Settings

This appendix discusses DBS Control fields and Costs Profile constants relevant to partitioning.

H.1 DBS Control Fields

H.1.1 PPICacheThrP

The PPICacheThrP field in the Performance group is used to specify the percentage of an AMP’s FSG cache memory that can be used for keeping a set of data blocks in FSG cache for operations accessing multiple partitions at the same time (including both row and column partitions). The number of data blocks that can be kept in this amount of memory defines how many file contexts (but at least 8 and no more than 256) can be opened at the same time to access combined partitions.

PPICacheThrP also specifies the percentage of an AMP’s available memory that can be used to buffer column partition values in order to append them to column partitions. The size of the memory that can be used (minus some overhead) divided by the size of a column partition context determines the number of available column partition contexts. If there are more column partitions in a target CP table than available column partition contexts, multiple passes over the source rows are required to process a set column partitions where the number of column partitions in each set is up to the number of available column partition contexts. Note that, in this case, there is only one file context open (but each column partition context allocates buffers in memory).

Note that operations that deal with multiple column partitions at the same time do not also deal with multiple row partitions at the same time (for example, a sliding-window merge join is not directly applicable to a CP table since it does not have a primary index).

The default is 10 (that is, 1%). This default is expected to be adequate in most cases and should not be changed without due consideration.

The DBS Control setting for PPICacheThrP is overridden by the Cost Profile constant PPICacheThrP (see section H.2.2) if that constant is set to a nonzero value in a currently applicable Cost Profile.

See also section 11.4.

H.1.2 PrimaryIndexDefault

The PrimaryIndexDefault field in the General group is used to specify the default when neither a PRIMARY INDEX nor NO PRIMARY INDEX clause is specified for a table or join index. Note that, for a CP table or join index, the default is NO PRIMARY INDEX regardless of the setting of this field.

H.2 Cost Profile Constants

The following Cost Profile constants can be changed by a user in a “V” (variable) Cost Profile. Note that these constants cannot be changed in an “F” (fixed) Cost Profile. Unlike DBS Control fields that apply system-wide, Cost Profile constants can be set to be applicable system-wide or applicable to specific users (by associating the user with a profile that specifies the cost profile to be used).



H.2.1 PartitioningConstraintForm

The new PartitioningConstraintForm constant in a Cost Profile is used to specify if the new partitioning constraint form is used in all cases or one of the three old forms is used for compatibility for a table or join index that does not use any of the TD 14.0 partitioning capabilities (including 8-byte partitioning, column partitioning, and the ADD option).

This constant determines the form of the partitioning constraint text. When this constant is set to 0 or not set, the partitioning constraint text for partitioning that does not use partitioning specification features added in Teradata 14.0 conforms to a pre-Teradata 14.0 partitioning constraint text form. When this constant is set to 1, all newly created or altered partitioning constraint text conforms to the new constraint form added in Teradata 14.0 (see section G.9). The default is not to be set.

Note that it is recommended that the value be changed to 1 so that the additional information in the Teradata 14.0 form of the partitioning constraint text is available for all partitioned tables and join indexes. However, this may require changes to scripts, applications, etc. that examine the partitioning constraint text. Note that these changes would be required in any case if the partitioning features added in Teradata 14.0 start being used.

H.2.2 PPICacheThrP

The Cost Profile constant PPICacheThrP is the same as the DBS Control performance group PPICacheThrP field (see section H.1.1) except that the Cost Profile PPICacheThrP constant is in units of 1% instead of 0.1% and has a FLOAT type so a fractional percentage can be specified. For example, a value of 1.5 indicates 1.5%.

A nonzero value specified for this constant in a currently applicable Cost Profile for a session overrides the DBS Control field setting. If not set or set to 0, the DBS Control Performance group field PPICacheThrP is used. The default is not to be set.

The default (indicating 1%) in DBS Control is expected to be adequate in most cases and the Cost Profile constant PPICacheThrP should not be set to a nonzero value without due consideration.

See also section 11.4.



Glossary

Words or phrases in italics in a definition of a term are defined in this glossary.

autocompression

The compression of data automatically by the system as rows are inserted into a column-partitioned table or join index. The compression techniques applied are system-determined on a per container or subrow basis. For some values, there is no applicable compression technique and the system determines not to compress the values for that container or subrow. The system decompresses any compressed values when they are retrieved as needed.

autocompression bits

A series of bits in a container where each of a set of bits in the series indicate whether a corresponding column partition value is compressed or not and information on how to decompress a compressed column partition value.

columnar The storage of data by columns instead of rows with optimizations for column-oriented workloads. This is achieved in Teradata with column partitioning and COLUMN format for a column partition.

columnar-store or columnar-storage

A component of a DBMS where columns can be stored. Teradata provides an integrated storage component that supports both storing regular rows (physical rows with ROW format, i.e., as a row-store) and storing column partitions (either in physical rows with ROW format, i.e., as a row-store or COLUMN format, i.e., as a columnar-store).

COLUMN format

The format of the physical row that is used as a container in a column partition. The format includes an indication of the autocompression used (if any), an optional value-list compression dictionary, a series of column partition values, and optional sets of autocompression bits (one set for each column partition value in the container). This format along with column partitioning provides a columnar-store (or column-store) in Teradata. See also ROW format.

column partition A column partition consists of one more columns of a table that are stored separately from other column partitions of the table.

column partition context

A column partition context is allocated in memory to keep information about a column partition when it is being processed. There is a column partition context for each of the column partitions being accessed at the same time. Since the needed memory for each column partition context can be fairly large, the number of column partitions (and, therefore, the amount of memory used for column partition contexts) is limited.

column partition number

A number assigned to a column partition that uniquely identifies the column partition.

column partition value

The values of the columns in a column partition projected from a table row.

If the column partition is specified explicitly or is system-determined to have COLUMN format, a column partition value is a just a column value if the column partition only has a single column or multiple column values if the column partition has more than one column. One or more column partition values for a column partition can be represented in a container.

If the column partition is specified explicitly or is system-determined to have ROW format, the column partition is represented using subrows and the column partition value is the set of column values in the subrow. There is one column partition value represented per subrow.

column partitioned indicates that a table or join index has column partitions, as in a table is column partitioned.



column-partitioned indicates that a table or join index has column partitions, as in column-partitioned table.

column partitioning A method for vertically partitioning sets of columns of a table. column-store or column-storage

See columnar-store.

column value The value of a column projected from a table row.

combined partitioning expression

An expression that combines the partitioning expressions of a PARTITION BY clause, if any, and a column partition number of 1, if column partitioned, into a single expression. If not partitioned, the expression is 0. The combined partitioning expression is such that ordering on the combined partitioning expression value for rows would have the same ordering for rows if they were ordered by the value of the first partitioning expression, the 2nd, etc.

combined partition number

The result of the combined partitioning expression for a specific set of values of the partitioning columns. If column partitioned, this is the combined partitioned number for the table row using one as the column partition number for a combined partition number of a table row; the combined partition number is adjusted to the partition number of a specific column partition when determining the combined partition number for that column partition. For a nonpartitioned table, the combined partition number is 0.

container

A physical row that has COLUMN format. A container contains a representation of a series of column partition values for a column partition that is user-specified or system-determined to have COLUMN format. A container may be a single-column container or multicolumn container. A series of containers (with increasing rowid) represent a table column or set of table columns. See also subrow and ROW format.

CP Column-partitioned or column partitioning depending on context.

internal partition number

A value calculated from the combined partition number of the combined partitioning expression that is used to number partitions internally and placed in a physical row’s rowid. This number may be the same as the combined partition number if no modification is needed. Modification is needed41 for a single-level partitioning expression that consists solely of a CASE_N or RANGE_N function; partitions for NO RANGE [OR UNKNOWN], NO CASE [OR UNKNOWN], and UNKNOWN options are placed at fixed internal partitions (internal partition numbers 2, 2, and 1, respectively) with partitions for ranges and conditions following (starting at internal partition number 3 initially). Modification may also occur for multilevel partitioning. For both single-level and multilevel partitioning, additional modification is used in order to retain existing internal partition numbers after an ALTER TABLE statement that drops or adds ranges or partitions. An internal partition number in a rowid referring to a table row indicates a column partition number of one by convention; this can be modified to indicate a specific column partition to access that column partition.

logical row A table row of a column-partitioned table or join index. A logical row is physically split across column partitions of the column-partitioned table or join index.

logical rowid

The rowid associated with a logical row. The column partition number represented in the internal partition number for a logical rowid is 1. The rowid of any specific column partition value of the logical row can be determined by setting its column partition number in the logical rowid.

41 This is to allow a subsequent ALTER TABLE of the partitioning to be more efficient by being able to

retain existing internal partition numbers.



multicolumn container

A container with a subset of two or more columns of a table row. This kind of container contains a series of column partition values. Each column partition value has a format similar to the ROW format used for a regular row or subrow but without a row header. A column partition value in this case does not correspond to a physical row but is a structure that can repeat within a container.

multicolumn partition A column partition with two or more columns of a table row.

multicolumn partition value

The values of columns in a multicolumn partition projected from a table row.

If the multicolumn partition is user-specified or system-determined to have COLUMN format, one or more multicolumn partition values can be represented in a container.

If the multicolumn partition is user-specified or system-determined to have ROW format, the column partition is represented using subrows and the multicolumn partition value is the set of column values in the subrow. There is one multicolumn partition value represented per subrow.

multilevel partitioned primary index

A primary index with multilevel partitioning. All the levels must specify row partitioning using a partitioning expression. For a primary index, column partitioning using the COLUMN clause may not be specified for a partitioning level in a PARTITION BY clause.

multilevel partitioning

A partitioning scheme where partitions at any one level are subpartitioned. Multiple partitioning-level specifications (either a partitioning expression for row partitioning or COLUMN for column partitioning) are used to define the partitioning. Only one level at most can be specified as COLUMN (note column partitioning is not supported if there is primary index).

nonpartitioned primary index (NPPI)

A primary index that is not partitioned (sometimes referred to as a traditional primary index). The hash of the primary index columns orders the rows on an AMP. An NPPI can be considered as a partitioned primary index with only one partition that has a combined partition number and internal partition number that are both zero.

NoPI A type of object (table or join index) that does not have a primary index. A column-partitioned table or join index is a NoPI object.

NPPI See nonpartitioned primary index.

partitioned primary index (PPI)

A type of primary index for a table or join index that additionally defines the row partitioning of the object's data rows on the AMPs based on one or more partitioning expressions referencing columns (referred to as the partitioning columns) that may or may not be part of the set of primary index columns. For a primary index, column partitioning using the COLUMN clause may not be specified for a partitioning level in a PARTITION BY clause. A PPI may be single-level (defined with a single partitioning expression) or multilevel (defined with multiple partitioning expressions). A combined partitioning expression is derived from the partitioning expressions. The hash of the primary index orders rows within a row partition of the combined partitioning expression. See also nonpartitioned primary index.

partitioning column

A column that is in the set of partitioning columns (that is, referenced in one or more of the partitioning expressions specified by an optional PARTITION BY clause for a table or join index. Note that a partitioning column may also be a primary index column if the table or join index has a primary index.

partitioning columns The set of columns referenced by one or more of the partitioning expressions for a table or join index with a PARTITION BY clause.

partitioning expression An expression in a PARTITION BY clause that is used to compute a row partition number for row.



physical row

A row as seen by the file system and identified by a rowid. A physical row consists of a row header followed by a sequence of bytes. The sequence of bytes in a physical row can have ROW format, COLUMN format, secondary index format, compressed join index format, table header format, etc. For this document, only physical rows with ROW format or COLUMN format are of interest. A physical row with ROW format can either be a regular row or a subrow. A physical row with COLUMN format is a container.

PI See primary index. PPI See partitioned primary index.

primary index (PI)

An index that determines the distribution of data rows to AMPs for a table, join index, or hash index. A PI does not have a structure (subtable) physically separate from the data rows. The distribution is based on a hashing algorithm. A table, join index, or hash index must have exactly one primary index. A primary index may be a partitioned primary index or nonpartitioned primary index.

primary-indexed (PI) An adjective that indicates an object has a primary index. If not preceded by partitioned (or abbreviated as PPI), this indicates an object with primary index but without partitioning.

regular row

A physical row representing an entire table row. Note that a regular row and a subrow both are physical rows that have ROW format. The only difference is that a subrow only includes the column values for the columns in the corresponding column partition for a column-partitioned table while a regular row includes all the column values of the corresponding table row for a table that is not column partitioned.

ROW format

A format for a physical row that consists of a row header, presence bits, and a fixed number of column values (i.e., the traditional Teradata row format). A regular row and a subrow both are physical rows that have ROW format. The only difference is that a subrow only includes the column values for the columns in the corresponding column partition while a regular row includes all the column values of the corresponding table row. See also COLUMN format.

row header A row header occurs at the beginning of each physical row and indicates the length, rowid, flags, and 1st present byte of the physical row. A row header is then followed by the data for the physical row.

row partition A row partition consists of zero or more rows of a table that have the same value for a partitioning expression defined for the table and that are stored separately from other row partitions of the table.

row-store or row-storage

A component of a DBMS where rows can be stored. Teradata provides an integrated storage component that supports both storing regular rows (physical rows with ROW format, i.e., as a row-store) and storing column partitions (either in physical rows with ROW format, i.e., as a row-store or COLUMN format, i.e., as a columnar-store).

single-column partition A column partition with only one column of a table row.

single-column partition value

The value of the column in a single-column partition projected from a table row.

If the single-column partition is user-specified or system-determined to have COLUMN format, one or more single-column partition values can be represented in a container.

If the single-column partition is user-specified or system-determined to have ROW format, the column partition is represented using subrows and the single-column partition value is the column value in the subrow. There is one single-column partition value represented per subrow.



single-level partitioned primary index

A primary index with single-level partitioning.

single-level partitioning

A partitioning scheme with one level of partitioning. A single partitioning expression (for row partitioning) or COLUMN (for column partitioning) is used to define the partitioning.

sparse join index Sparse join indexes include only a subset of a base table’s rows by using a WHERE clause in their definition to determine which base table rows are included in the join index and which are not.

subpartition A partition that is not at the first level for multilevel partitioning.

subrow

A physical row formed from a subset of the column values of a table row. The columns values are for a set of one or more columns corresponding to the set of columns defined for a column partition that is user-specified or system-determined to have ROW format. A subrow is a physical row with ROW format. A series of subrows (with increasing rowid) represent a table column or set of table columns (that is, a column partition). Note that a regular row and a subrow both are physical rows that have ROW format. The only difference is that a subrow only includes the column values for the columns in the corresponding column partition of a column-partitioned table or join index while a regular row includes all the column values of the corresponding table row of a table or join index that is not column partitioned.

system-determined

This is where the system, based on a number of factors, determines between multiple options which to use. The precise rules for that determination are not provided. For example, if neither COLUMN format nor ROW format is user-specified for a column partition, the system determines based on the width of the column partition and other factors whether COLUMN format or ROW format is used. Similarly for autocompression, the system determines the autocompression techniques to be used based on the data being autocompressed.

table column A column of values in a table. The number of values is variable but all columns of the table have the same number of values. This is a logical concept.

table row

A row of values for the columns of a table. The number of columns is fixed for the table. The number of rows is variable. This is a logical concept.

A table row is physically represented as a regular row for an object that is not column partitioned.

A table row is split across column partitions for a column-partitioned object. The value or values from the table row for a column partition are represented in a container if the column partition has COLUMN format or a subrow if the column partition has ROW format. Note that a container may have column partition values from multiple table rows. Note that a subrow only has column values from a single table row.

Teradata Columnar

Documents

Transcript of Teradata Columnar