Introduction to Teradata And How Teradata Works

download Introduction to Teradata And How Teradata Works

of 14

  • date post

    24-Jan-2017
  • Category

    Education

  • view

    1.655
  • download

    4

Embed Size (px)

Transcript of Introduction to Teradata And How Teradata Works

MicroStrategy

1

Introduction to Teradata

2

How Teradata Works

3

How Does Teradata Store Rows?

Teradata uses hashing algorithm to randomly and evenly distribute data across all AMPs.The rows of every table are distributed among all AMPs - and ideally will be evenly distributed among all AMPs.Each AMP is responsible for a subset of the rows of each table.Evenly distributed tables result in evenly distributed workloads.The data is not placed in any particular order

The benefits of unordered data include:No maintenance needed to preserve order, andIt is independent of any query being submitted.

The benefits of automatic data placement include:Distribution is the same regardless of data volumeDistribution is based on row content, not data demographics

4

Primary Indexes

The mechanism used to assign a row to an AMPA table must have a Primary IndexThe Primary Index cannot be changedUPIIf the index choice of column(s) is unique, we call this a UPI (Unique Primary Index).A UPI choice will result in even distribution of the rows of the table across all AMPs.NUPIIf the index choice of column(s) isnt unique, we call this a NUPI (Non-Unique Primary Index).A NUPI choice will result in even distribution of the rows of the table proportional to the degree of uniqueness of the index.UPIs guarantee even data distribution and eliminate duplicate row checking.Why would you choose an Index that is different from the Primary Key? Join performance Known access paths

5

Data Storage based on Primary IndexThe value of the Primary Index for a specific row determines its AMP assignment.This is done using the hashing algorithm.

PI Value

AMPAMPAMP

PE

Row assignmentRow access

HashingalgorithmAccessing the row by its Primary Index value is:Always a one-AMP operation The most efficient way to access a row

6

Row Distribution Using a UPI

The PK column(s) willoften be used as a UPI.PI values for Order_Number are known to be unique (its a PK).Teradata will distribute different index values evenly across AMPs.Resulting row distribution among AMPs is uniform.

AMP 1AMP 2AMP 3AMP 4

720224/09C

740234/16C

732524/13C

722524/15C

718814/13C

738414/12C

732434/13C

710314/10C

741514/13C

Order

7

Row Distribution Using a NUPI

OrderCustomer_Number may be the referred access column for ORDER table, thus a good index candidate.Values for Customer_Number are non-unique and therefore a NUPI.Rows with the same PI value distribute to the same AMP causing row distribution to be less uniform or skewed.

722524/15C

732524/130

741514/13C

738414/12C

732434/130

740234/16C

710314/10C

AMP 1AMP 2AMP 4720224/09C

718814/13C

AMP 3

7

8

Secondary IndexesThree general ways to access a table: Primary index access(one-AMP access) Secondary index access(two-or all-AMP access) Full Table Scan(all-AMP access)A secondary index is an alternate path to the rows of a table.A table can have from 0 to 32 secondary indexes.Secondary indexes:Do not affect table distribution.Add overhead, both in terms of disk space and maintenance.May be added or dropped dynamically as needed.Are chosen to improve table performance.

8

9

Customer table Id = 100USI Value = 56Table IDRow HashUSI Value

10060256

Hashing AlgorithmPECREATE UNIQUE INDEX (cust) on customer;SELECT *FROM customerWHERE cust = 56;Create USIAccess via USI- * -

AMP 1

AMP 2

AMP 3

AMP 4

RowIDCustRowID

RowIDCustRowID

RowIDCustRowID

RowIDCustRowID

BYNET

AMP 2

Table ID100Row Hash778Unique Val7

USI Subtable

USI Subtable

USI Subtable

USI SubtableBYNETAMP 1AMP 3AMP 4

74775127884, 1639, 1915, 9388, 1244, 1505, 1744, 4757, 184985649536, 5555, 6778, 7147, 1296, 1135, 1602, 1969, 131404595638, 1640, 1471, 1778, 3288, 1339, 1372, 2588, 1175, 1 37107, 1489, 1 72 717, 2838, 1 12147, 2919, 1 62 822, 1

AdamsSmithRiceWhite555-4444111-2222222-3333666-555531374084107, 1536, 5638, 1640, 1RowIDCustNamePhoneNUPI

Base TableUSI

RowIDCustNamePhoneNUPI

Base TableUSI

Base Table

Base TableAdamsSmithBrownAdams444-6666666-7777555-6666333-999972457498471, 1555, 6717, 2884, 1

JonesBlackYoungSmith111-6666222-8888444-5555777-444427496212147, 1147, 2388, 1822, 1RowIDCustNamePhoneNUPIUSI

SmithMarshPetersJones777-6666555-7777888-2222555-777756775195639, 1778, 3778, 7915, 9RowIDCustNamePhoneNUPIUSI

Unique Secondary Index (USI) Access

9

10

Non-Unique Secondary Index (NUSI) Access

Table ID100Row Hash567NUSI ValueAdams

Hashing Algorithm

Customer table Id = 100

BYNET

AMP 2

NUSI Value = Adams

PECREATE INDEX (name) on customer;SELECT *FROM customerWHERE name = Adams;Create NUSIAccess via NUSIAMP 1

BrownAdamsSmith

555, 6471, 1 717, 2884, 1852, 1567, 2432, 3

RowIDNameRowID

WhiteRiceAdamsSmith107, 1536, 5638, 1640, 1448, 1656, 1567, 3432, 8RowIDNameRowID

NUSI Subtable

NUSI SubtableSmithYoungJonesBlack147, 1147, 2338, 1822, 1432, 1770, 1567, 6448, 4RowIDNameRowID

NUSI SubtableJonesPetersSmithMarsh639, 1778, 3778, 7915, 9262, 1396, 1432, 5155, 1RowIDNameRowID

NUSI Subtable

AMP 4AMP 3

AdamsSmithRiceWhite555-4444111-2222222-3333666-555531374084107, 1536, 5638, 1640, 1RowIDCustNamePhoneNUPI

Base Table

RowIDCustNamePhoneNUPI

Base Table

Base Table

Base TableAdamsSmithBrownAdams444-6666666-7777555-6666333-999972457498471, 1555, 6717, 2884, 1

JonesBlackYoungSmith111-6666222-8888444-5555777-444427496212147, 1147, 2388, 1822, 1RowIDCustNamePhoneNUPI

SmithMarshPetersJones777-6666555-7777888-2222555-777756775195639, 1778, 3778, 7915, 9RowIDCustNamePhoneNUPINUSINUSINUSINUSI

10

11

Comparison of Primary and Secondary Indexes

11

12

Primary Keys and Primary Indexes

Primary Key

Logical concept of data modeling

Teradata doesnt need to recognizeNo limit on column numbersDocumented in data model(Optional in CREATE TABLE)Must be uniqueUniquely identifies each row

Values should not changeMay not be NULLrequires a valueDoes not imply an access pathChosen for logical correctnessPrimary Index

Physical mechanism for access andstorageEach table must have exactly one16-column limitDefined in CREATE TABLE statement

May be unique or non-uniqueUsed to place and locate each rowon an AMPValues may be changed (Del+ Ins)May be NULLDefines most efficient access pathChosen for physical performance

Indexes are conceptually different from keys:A PK is a relational modeling convention which uniquely identified each row.A PI is a Teradata convention which determines how the rows are stored and accessed.A significant percentage of tables may use the same columns for both the PK and PI.A well-designed database will use a PI that is different from the PK for some tables.

12

Learn Teradata Online Contact: USA: +1 732 325 1626 India: +91 800 811 4040Mail: info@bigclasses.com

/bigclasses/bigclasses/bigclasseshttp://bigclasses.com/teradata-online-training.html

Thank you 14

Watch Teradata DEMO Video On YouTube

www.youtube.com/user/bigclassescom