Download - Teradata Architecture

Tera-Tom Tera-Cram for Teradata Basics V12: Understanding is the Key!

by Tom Coffing Coffing Data Warehousing. (c) 2011. Copying Prohibited.

Reprinted for Mausam Upadhyay, Accenture

[email protected]

Reprinted with permission as a subscription benefit of Skillport, http://skillport.books24x7.com/

All rights reserved. Reproduction and/or distribution in whole or in part in electronic,paper or other forms without written permission is prohibited.

http://skillport.books24x7.com/

Chapter 1: The Teradata Architecture

The Teradata Architecture

“Let me once again explain the rules. Teradata rules!”

Tera-Tom Coffing

Hello friend! My name is Tom Coffing and I am going to guide you through the certification process. I have been writing about Teradata for over 15 years and I have developed a certification track that is second to none for you. The first goal is to get you through the Teradata V12 Basics test. To do so all you need to do is read this book. But I also have some wonderful surprises for you. The first surprise is that I have also developed easy to learn and fun to watch Videos. You will see links to the video in the book. If you are reading this book electronically you will be able to click on the link to see the videos of the subject you are currently learning. If you are reading this book from paper you can place the link in your web browser to see the video.

The second surprise I have is that you can take practice tests for the basics, but even better is that I have created a Video Game that will allow you to challenge yourself on what you have learned. The game blows up once you miss three questions and you must start over. Pass all three levels of the game and you know you are ready for the test. To play the Tera-Tom Certification game all you have to do is Download the Nexus softer from our website at www.CoffingDW.com, connect to a Teradata system, click on the button that says DBA and start playing.

Let’s get started Teradata relies on three architectural components that have set the rules for parallel processing. They are the Parsing Engine, which is also called the PE or the Optimizer, the Access Module Processors, which are referred to as the AMPs, and two BYNETs to communicate between PE’s and AMPs.

The PE is the boss and tells the AMPs exactly what to do. The AMPs each have their own virtual disk, which no other AMP can read, and they merely read and write to their respective disks.

When a user logon to Teradata their logon is accepted or rejected by a Parsing Engine. The Parsing Engine will take care of that user for the entire session, which really means until that user Logs Off.

The Parsing Engine will accept each query from that user and come up with a plan for the AMPs to satisfy the request. The PE’s plan is passed to the AMPs via the BYNET. The AMPs will retrieve the data requested from their virtual disks and pass it back up the BYNET to the PE. The PE will then deliver the data to the user.

TeraTom TeraCram for Teradata Basics V12: Understanding is the Key!

Reprinted for OET7P/573736, Accenture Coffing Data Warehousing, Coffing Publishing (c) 2011, Copying Prohibited

/ 15

http://www.coffingdw.com/

The Parsing Engine

“Fall seven times, stand up eight.”

--Japanese Proverb

The Parsing Engine never falls seven times, but it can handle 128 stand-up sessions.

The Parsing Engines are perfectly balanced, with each having the capability to handle up to 120 users at a time. This could be 120 distinct users or a single user utilizing the power of all 120 sessions for a single application. That is why there are multiple PE’s in every Teradata system. Each PE has total command over every AMP.

Divided they stand (PE’s) and United are the AMPs!

Each PE will take users SQL and do three things:

The PE will check the users SQL syntax. If there is a syntax error the user will receive and error. For example, if the user wanted to use the KEY WORD SELECT and instead wrote SLLLECCCT the PE would reject the SQL, but be kind enough to send the user a message to help them correct the error. That’s because the PE’s are Stand-up guys!

If the SQL passes the syntax check the PE will check the users ACCESS RIGHTS to ensure the user has permission to access the data in that table. If not then the user receives a message ACCESS Denied!

If the user passes the Security Check then the Parsing Engine will come up with a PLAN to satisfy the user request. The fastest plan is a Single-AMP retrieve. The second fastest plan is a Two-AMP retrieve. The next fastest plan will be all AMPs reading only a portion of the table, and the slowest plan is the full table scan. That is where each AMP reads every row they contain for a table.



/ 15

The AMPs

“Not all who wander are lost.”

– J. R. R. Tolkien

The AMPs are never lost because the PE always tells them what to do. One PE to rule them all? No! Each PE rules them all because the rows of every table are spread across all the AMPs. The AMPs organize every table in separate blocks just like you might organize your clothes in separate dresser drawers. Organizing their tables and the rows they contain is an obsession with the AMPs. They make organization a hobbit!

The PE passesthe PLAN to the AMPs over the BYNET. The AMPs then retrieve the rows they own from their disks and pass it back to the PE over the BYNET.

When a table is first created each AMP creates a table header on their disk. Even though the table is empty the AMPs at least know the table name, the columns in the table, and any indexes the table.

When the table is loaded each AMP receives rows for that table that they and only they own. They carefully place the rows inside data blocks where they can easily be retrieved.

Now each AMP will own their own Table Header for the table and they will also own data blocks where they place the rows for that table. Now the AMP is truly Lord of the Disks!



/ 15

Born to be Parallel

“Only he who attempts the ridiculous may achieve the impossible.”

– Don Quixote

The concept of parallel processing back in 1979 was almost as outrageous as attempting to go to the moon, but Teradata attempted the ridiculous and the impossible was achieved. Teradata took every table and spread the rows across all the AMPs in the system and the birth of parallel processing happened.

You will never see a Teradata table that is only on one AMP. The parallel processing aspect is then lost. You will see every Teradata table spread the rows of the table across all AMPs. Teradata was born to be parallel and the impossible was born.

The first picture on the opposite page never happens. The second picture below that is exactly the design behind Teradata.

Teradata NEVER lays out data like this!

Teradata lays out data like this!



/ 15

Every table spreads its rows over the AMPs

The BYNET

“A Journey of a thousand miles begins with a single step.”

-Lao Tzu

The Parsing Engine passes a plan in Steps to the AMPs and the AMPs merely follow the plan. Those steps are passed over the BYNET. A Journey of a single step can be transferred to a thousand AMPs in a millisecond over the BYNET.

The BYNET is the communication network between AMPs and PE’s. The PE comes up with a PLAN and passes the plan to the AMPs in steps over the BYNET. This step and all the steps of the plan travel down the BYNET highway which guarantees delivery to each AMP.

The AMPs then retrieve the data requested by the PE and they deliver their portion of the answer set to the PE over the BYNET.

The BYNET provides the communications between AMPs and PEs – so no matter how large the data warehouse physically gets, the BYNET makes each AMP and PE think that they are right next to one another. The BYNET gets its name from the Banyan tree. The Banyan tree has the ability to continually plant new roots to grow forever. Likewise, the BYNET scales as the Teradata system grows in size. The BYNET is scalable.

There are always two BYNETs for redundancy and extra bandwidth. AMPs and PEs can use both BYNETs to send and retrieve data simultaneously. What a network! It is like having to phone lines to talk. Each AMP or PE can use one BYNET to retrieve communication and simultaneously accept messages using the other BYNET. Both BYNETs can be used to send a message or to receive a message!

Below is the steps to completely satisfy a query.

n The PE checks the user’s SQL Syntax;

n The PE checks the user’s security rights;

n The PE comes up with a plan for the AMPs to follow;

n The PE passes the plan along to the AMPs over the BYNET;

n The AMPs follow the plan and retrieve the data requested;

n The AMPs pass the data to the PE over the BYNET; and

n The PE then passes the final data to the user.



/ 15

Watch the Tera-Tom Video on Architecture

You can watch the video by clicking on the link below or copying and pasting the link in your browser. This video will give you a chance to review this chapter’s material via Video. I think you will love this intro created by genius William Coffing, who is a writer, director and actor in New York..

Http://www.CoffingDW.com/TBasicsV12/architecture1.wmv

Collecting Statistics



/ 15

http://www.coffingdw.com/TBasicsV12/architecture1.wmv

“If you are not true to your teeth they will be false to you.”

– Teradata Certified Dentist

I asked my dentist, “Do I have to floss all my teeth”? He said, “No, just the ones you want to keep?” Do you have to Collect Statistics on all your tables? Only the ones you want to query!

Whether the Parsing Engine (PE) is checking a user’s security rights or if statistics were collected on a table the PE will go to user DBC for the answers.

The PE uses statistics to help decide what plan to build so the AMPs can satisfy a user’s query. Before the PE can come up with a plan it wants to know if a table is large, medium, or small. It wants to know about certain columns or indexes. Does a particular column have a lot of duplicates, nulls or are the values unique? Does a particular index unique or non-unique or is the index strongly or weakly selective? These questions are often answered by Collect Statistics.

What is Collect Statistics? When a table is created and loaded with data the DBA will run a COLLECT STATISTICS command on certain columns and indexes of that table. That will help the PE answer key questions that will give the PE a better understanding of the table in general.

If more data is loaded or deleted the DBA will then Recollect Statistics to ensure that the statistics reflect the true data inside the table.

It is not mandatory to collect statistics on a table as it is not mandatory that a person brushes their teeth or cleans their clothes. If statistics are not collected on a table then the PE will perform a Random Sample and make an educated guess.

I asked my DBA, “Do I have to Collect Statistics on all the columns and indexes”? The answer was, “No – Only on the important ones, but never the entire table”. I hear that is good advice, but I became concerned when I noticed he was missing teeth!

Parsing Engine uses Statistics for the Plan

“You cannot depend on your eyes when your imagination is out of focus.”

– Mark Twain

The Parsing Engine is the most mature Optimizer in the business. One of its secrets is that it uses the statistics collected



/ 15

on a table (by the DBA) to determine the best plan for the AMPs to follow. The PE uses Collect Statistics to decide between using an index or if it should perform a Full Table Scan. The PE also needs to know if a table is large or small, if it has a bunch of NULL values, and how many duplicate values exist per average. The Next couple of pages will give you a better idea about the Collect Statistics process.

Columns and Indexes to Collect Statistics On

“This is a test. It is only a test. Had it been an actual job, you would have received raises, promotions, and other signs of appreciation.”

– Anonymous

You sincerely don’t collect statistics on every column and index in a table. These statistics are stored inside DBC and it takes up Perm Space. You only want to collect on certain columns and indexes such as:

n All Non-Unique Indexes

n Columns frequently used in user queries in the WHERE Clause

n All Primary Indexes of small tables

n Columns used as Join Conditions



/ 15

A Scalable Architecture

“No wonder nobody comes here – It’s too crowded”

Yogi Berra

Your data warehouse can get crowded and when it does it can through your Teradata system a knuckle ball, but Teradata has the ability to scale the wall and add AMPs and PE’s so soon your users queries look like a fast ball. The system will give you a sign that things are slowing down and in turn you can get an upgrade, but they do cost money. There is no stealing a Base Table!

When it comes to scalability Teradata has put together a team of PEs and AMPs that are guaranteed to hit a home run, while their competition continues to strike out when trying to catch them in terms of scalability.

In Teradata land it never gets too crowded because Teradata can easily scale by adding additional AMPs and PEs. This is considered to be something called Linear Scalability. That means if you double your AMPs you will double your speed. A 4-AMP system can double its speed by adding 4 more AMPs to become an 8-AMP system. This can theoretically go on forever.

Other vendor systems can double their size and double their speed for a while, but eventually they max out. Teradata has many customers who start with a small system configuration and grow each year. Some of the largest data warehouses in the world are Teradata systems who have proven their value each year and continually grow.

In the picture on the following page you can see we have a 4,000 AMP system. This system is literally 1,000 times more powerful than a 4-AMP system.



/ 15

VProcs

“The longer I live the more beautiful life becomes.”

-Frank Lloyd Wright

Virtual Processors are really AMPs and PE’s and your Teradata system becomes more beautiful the longer their live! When Tera-Tom was young the AMPs and PE’s were actual hardware, but now they are processes that live in memory.

Teradata utilizes Parsing Engines (PE) and Access Module Processors (AMPs) in which they call VProcs. These refer to virtual processors or VProcs. Each AMP and PE lives inside the memory of a Node. There are anywhere between 25 and 35 VProcs inside each node.

Think of a Node as a giant Personal Computer. One that has 4 Intel Processors that work and act as if there were 8 Intel Processors. This node also has up to 16 GBs of memory.

The VProcs get loaded inside the Nodes memory and then we connect this node via the BYNET with all the other nodes and now we are part of the Teradata warehouse.



/ 15

Nodes and MPP

“The surprising thing about young fools is how many survive to become old fools.”

-Doug Larson

Teradata places their Vprocs (AMPs and PEs) inside the memory of each node. In other words, Teradata bought a PC laptop, put AMP and PE software processes in the memory, connected the node to a disk farm, jacked up the price to $1 million dollars and called it a warehouse. The surprising thing about young Tewls is that they grow up to be rich Tewls.

Teradata has taken a simple PC, filled the memory with AMPs and PEs and calls it a node. Connect multiple nodes together with the BYNET and you have a Massively Parallel Processing or MPP system.

The great news is that now technology grows faster than PC technology. I recently bought the fastest PC on the planet and by the time I left the store it was obsolete! This fast moving technology has allowed Teradata to improve their system speed each and every year.

I once heard a customer say that Teradata was a PC in a box. The answer is yes and it is actually a good thing!



/ 15

Watch the Tera-Tom Video on How AMPS store and process rows in Parallel

You can watch the video by clicking on the link below or copying and pasting the link in your browser. This video will give you a chance to see exactly how AMPs logically create their tables and store the rows in an organized manner. I think you will love this video. I was teaching in Africa and captured some amazing animals while on Safari. You will see them in the intro, plus this is the first time you will hear about the Nexus...

Http://www.CoffingDW.com/TBasicsV12/parallel1.wmv



/ 15

http://www.coffingdw.com/TBasicsV12/parallel1.wmv

Nexus Queries every System

Nexus has changed the data warehouse industry. SQL Assistant is designed to query Teradata, but Nexus is designed to query Teradata, DB2, Netezza, Oracle, Greenplum, SQL Server, and SQL Server Parallel Warehouse. Query every system simultaneously and build a data warehouse enterprise encompassing them all!



/ 15



/ 15