Vertica-Database

35
Vertica Database A Brief Intro to Vertica Database – Navin Chakraborty

Transcript of Vertica-Database

Vertica Database

A Brief Intro to Vertica Database – Navin Chakraborty

ME

• Vertica Architect • Vertica Certified Professional• Exploring Database engines

Topics1. What is Vertica2. Vertica concepts a. Column storage b. Hybrid Storage c. Projections vs / Tables d. Types of Projections3. Vertica Objects a. Projections b. Views c. Tables d. UDF SQL functions e. Sequences4. Vertica Operations a. DML operations ( Simple SQL statements ) b. Bulk Data Loading c. Bulk Updating ( MERGE statement ) d. Exporting Data5. Comparison with Teradata

What is Vertica

• Vertica is a True Columnar, MPP, Analytic database• Vertica is just a DBMS like other DBMS( Oracle, Teradata,

MySQL)• Vertica is made for storing Structured and semistrucutred

data.• It supports Standard SQL like other Databases

Release and Versions

• The most latest release of Vertica( launched in December 2013) worldwide is Vertica 7.1 and is called Crane

• The version used in our working environments by the Client is Vertica 6.1.2 which is called Bulldozer( Funny but they seem to use the heavy lifting automotive names for every major release )

Vertica Concepts – Column Storage • Unlike other RDBMS, Vertica reads the columns from database objects called

projections. No resources are wasted by reading large numbers of unused columns. Every byte of data is used by the execution engine. For example, consider this simple two-table schema:

Suppose you want to run this query: SELECT A, C, N FROM Table1 JOIN Table2 ON H = J; • A row store must read 16 columns (A through H and J through Q) from physical

storage for each record in the result set. A column store with a query-specific projection reads only three columns: A, C, and N.

Vertica Concepts – Hybrid Storage • To support Data Manipulation Language (DML) commands (INSERT, UPDATE, and

DELETE) and bulk load operations (COPY), intermixed with queries in a typical data warehouse workload, Vertica implements the storage model shown in the illustration below. This model is the same on each Vertica node.

• WOS – Write optimised Store (memory)• ROS – Rad Optimised Store(Disk)• Tuple Mover – Operation to move data

Vertica Concepts –Projection vs Tables

• Projection are nothing but materialized views same like in Oracle or any other database. The only difference in Vertica is Data is stored in Projections and not tables.

• When it comes to understanding this term quickly, My tagline will be

“Projections are physical and Tables are logical in Vertica”

Continued..

• So all the data that you store in Tables are physically stored in Projection for those tables. and the projections are automatically created for you by vertica when you create your tables.

Vertica Concepts – Types of Projections

• Super Projections - A superprojection is a projection for a single table that contains all the columns in the table.

• PreJoin Projection - Projections can contain joins between tables that are connected by PK/FK constraints. These projections are called pre-join projections. Pre-join projections can have only inner joins between tables on their primary and foreign key columns.

Continued..

• Query specific Projections - You can optimize your queries by creating one or more projections that contain only the subset of table columns required to process the query. These projections are called query-specific projections.

Vertica Objects - Projections

• For each table in the database, Vertica requires a minimum of one projection, called a superprojection. A superprojection is a projection for a single table that contains all the columns in the table.

• To get your database up and running quickly, Vertica automatically creates a default superprojection for each table created through the CREATE TABLE and CREATE TEMPORARY TABLE statements.

Anatomy of Projection• When you need to create a projection manually for queries to optimize you can

use CREATE PROJECTI)ON statement• The CREATE PROJECTIONS statement

Vertica Objects - Views

• Views in Vertica are just like any other database views they store the query on base tables and when you select from views the query is executed on the base table and the data is retrieved.

• Syntax CREATE [ OR REPLACE ] VIEW viewname [ ( column-name [, ...] ) ] AS query ]

Continued..• Example CREATE VIEW myview AS SELECT SUM(annual_income), customer_state FROM public.customer_dimension WHERE customer_key IN (SELECT customer_key FROM store.store_sales_fact) GROUP BY customer_state ORDER BY customer_state ASC;

• Dropping A ViewDROP VIEW myview;

Vertica Object - Tables

• Tables are logical in Vertica. You can create a table and drop a table and alter a table , but it just changes the logical schema of the database and not the physical schema.

• A single table in vertica can have 1600 Columns

• The data is stored in the projections corresponding to a particular table and not in the table.

Continued..

• Syntax CREATE TABLE [ IF NOT EXISTS ] [[db-name.]schema.]table-name { ... ( column-definition (table) (page 779) [ , ... ] ) ... | [ column-name-list (create table) (page 780) ] AS [COPY] [ [ AT EPOCH LATEST ] ... | [ AT TIME 'timestamp' ] ] [ /*+ direct */ ] query ... | [ LIKE [[db-name.]schema.]existing-table [ INCLUDING PROJECTIONS | EXCLUDING PROJECTIONS ] ] } ... [ ORDER BY table-column [ , ... ] ] ... [ ENCODED BY column-definition [ , ... ] ... [ hash-segmentation-clause (page 750) ... | range-segmentation-clause (page 790) ... | UNSEGMENTED { NODE node | ALL NODES } ] ... [ KSAFE [k_num] ] ... [ PARTITION BY partition-clause ]

Continued..

ExampleCREATE TABLE Public.Employee_Dimension ( Employee_key integer PRIMARY KEY NOT NULL, Employee_gender varchar(8) ENCODING RLE, Employee_title varchar(8), Employee_first_name varchar(64), Employee_middle_initial varchar(8), Employee_last_name varchar(64), ) ORDER BY Employee_gender, Employee_last_name, Employee_first_name;

ExampleCREATE TABLE time( ..., date_col date NOT NULL, ...) PARTITION BY extract('year' FROM date_col);

Vertica Objects - SQL functions

• SQL function is Vertica is where you can embed small sql functions to use it in your SQL statements and it also facilitates code re-usability.

• Lets you store SQL expressions as functions in Vertica for use in queries. These functions are useful for executing complex queries or combining Vertica built-in functions. You simply call the function name you assigned.

• A SQL function can be used anywhere in a query where an ordinary SQL expression can be used, except in the table partition clause or the projection segmentation clause.

Conitnued..• Syntax CREATE [ OR REPLACE ] FUNCTION ... [[db-name.]schema.]function-name ( [ argname argtype [, ...] ] ) ... RETURN rettype ... AS ... BEGIN ...... RETURN expression; ... END;

• Example • This following statement creates a SQL function called myzeroifnull that accepts an

INTEGER argument and returns an INTEGER result.CREATE FUNCTION myzeroifnull(x INT) RETURN INT AS BEGIN RETURN (CASE WHEN (x IS NOT NULL) THEN x ELSE 0 END); END;

Conitnued..You can use the new SQL function (myzeroifnull) anywhere you use an ordinary SQL expression. For example, create a simple table: => CREATE TABLE tabwnulls(col1 INT); => INSERT INTO tabwnulls VALUES(1); => INSERT INTO tabwnulls VALUES(NULL); => INSERT INTO tabwnulls VALUES(0); => SELECT * FROM tabwnulls; a --- 1 0 (3 rows) Use the myzeroifnull function in a SELECT statement, where the function calls col1 from table tabwnulls: => SELECT myzeroifnull(col1) FROM tabwnulls; myzeroifnull -------------- 1 0 0(3 rows) Use the myzeroifnull function in the GROUP BY clause: => SELECT COUNT(*) FROM tabwnulls GROUP BY myzeroifnull(col1); count ------- 2 1 (2 rows) If you want to change a SQL function's body, use the CREATE OR REPLACE syntax. The following command modifies the CASE expression: => CREATE OR REPLACE FUNCTION zerowhennull(x INT) RETURN INT AS BEGIN RETURN (CASE WHEN (x IS NULL) THEN 0 ELSE x END); END;

Vertica Objects - Sequences• The sequence is a feature by some database products which just creates unique

values. It just increments a value and returns it. The special thing about it is: there is no transaction isolation, so several transactions can not get the same value, the incrementation is also not rolled back. Without a database sequence it is very hard to generate unique incrementing numbers.

Example • Create a simple sequence: CREATE SEQUENCE my_seq MAXVALUE 5000 START 150; CREATE SEQUENCE

Vertica Operation – DML operations

• SQL data manipulation language (DML) commands INSERT, UPDATE, and DELETE perform the same functions in Vertica as they do in row-oriented databases. These commands follow the SQL-92 transaction model and can be intermixed.

• DML statements acquire write locks to prevent other READ COMMITTED transactions from modifying the same data. SELECT statements do not acquire locks, which prevents read and write statements from conflicting.

• All the DML operations (Insert, Update, Delete) are similar to other row oriented databases.

Insert• Inserts values into all projections of a table. You must insert one complete tuple at a time.

By default, Insert first uses the WOS. When the WOS is full, the INSERT overflows to the ROS.

• Syntax INSERT [ /*+ direct */ ] [ /*+ label(label-name)*/ ] ... INTO [[db-name.]schema.]table ... [ ( column [, ...] ) ] ... { DEFAULT VALUES... | VALUES ( { expression | DEFAULT } [, ...] ) ... | SELECT...

Examples • => INSERT INTO t1 VALUES (101, 102, 103, 104); • => INSERT INTO customer VALUES (10, 'male', 'DPR', 'MA', 35); • => INSERT INTO retail.t1 (C0, C1) VALUES (1, 1001); • => INSERT INTO films SELECT * FROM tmp_films WHERE date_prod < '2004-05-07';

Update• Replaces the values of the specified columns in all rows for which a specific condition is true. All other

columns and rows in the table are unchanged. By default, UPDATE uses the WOS and if the WOS fills up, overflows to the ROS.

• SelfJoins are not permitted in Updates in Vertica• UPDATE inserts new records into the WOS and marks the old records for deletion. • You cannot UPDATE columns that have primary key or foreign key referential integrity constraints. Syntax UPDATE [ /*+ direct */ ] [ /*+ label(label-name)*/ ] ... [[db-name.]schema.]table-reference (on page 882) [AS] alias ... SET column = ... { expression | DEFAULT } [ , ... ] ... [ FROM from-list ] ... [ WHERE clauseExampleUPDATE Result_Table r SET address=n.new_address FROM New_Addresses n WHERE r.cust_id = n.new_cust_id;

Delete• Unlike most databases, the DELETE command in Vertica does not delete data; it marks records as

deleted.• Marks tuples as no longer valid in the current epoch, marking the records for deletion in the WOS, rather

than deleting data from disk storage. By default, delete uses the WOS and if the WOS fills up, overflows to the ROS. You cannot delete records from a projection.

• DELETE statements support subqueries and joins, which is useful for deleting values in a table based on values that are stored in other tables.

Syntax DELETE [ /*+ direct */ ] [ /*+ label(label-name)*/ ] ... FROM [[db-name.]schema.]table ... WHERE clauseExample• The following command truncates a temporary table called temp1: DELETE FROM temp1; • The following command deletes all records from base table T where C1 = C2 - C1. DELETE FROM T WHERE C1=C2-C1; • The following command deletes all records from the customer table in the retail schema where the state

attribute is in MA or NH: DELETE FROM retail.customer WHERE state IN ('MA', 'NH');

Vertica Operations – Bulk Data Loading

• The COPY command is designed for bulk load operations and can load data into the WOS or the ROS.

• In Vertica, the COPY statement is designed for bulk loading data into the database. COPY reads data from text files or data pipes and inserts it into WOS (memory) or directly into the ROS (disk). COPY automatically commits itself and any current transaction but is not atomic; some rows could be rejected. Note that COPY does not automatically commit when copying data into temporary tables.

• You can use the COPY statement's NO COMMIT option to prevent COPY from committing a transaction when it finishes copying data. You often want to use this option when sequentially running several COPY statements to ensure the data in the bulk load is either committed or rolled back at the same time. Also, combining multiple smaller data loads into a single transaction allows Vertica to more efficiently load the data.

Conitnued..Syntax COPY [ [db-name.]schema-name.]table ... [ ( { column-as-expression | column } ...... [ FILLER datatype ] ...... [ FORMAT 'format' ] ...... [ ENCLOSED BY 'char' ] ...... [ ESCAPE AS 'char' | NO ESCAPE ] ...... [ NULL [ AS ] 'string' ] ...... [ TRIM 'byte' ] ...... [ DELIMITER [ AS ] 'char' ] ... [, ... ] ) ] ... [ COLUMN OPTION ( column ...... [ FORMAT 'format' ] ...... [ ENCLOSED BY 'char' ] ...... [ ESCAPE AS 'char' | NO ESCAPE ] ...... [ NULL [ AS ] 'string' ] ...... [ DELIMITER [ AS ] 'char' ] ... [, ... ] ) ] FROM { STDIN ...... [ BZIP | GZIP | UNCOMPRESSED ] ...| 'pathToData' [ ON nodename | ON ANY NODE ] ...... [ BZIP | GZIP | UNCOMPRESSED ] [, ...] ...| LOCAL STDIN | 'pathToData' ...... [ BZIP | GZIP | UNCOMPRESSED ] [, ...] } ...[ NATIVE | NATIVE VARCHAR | FIXEDWIDTH COLSIZES (integer [, ....]) ] ...[ WITH ] ...[ WITH [ SOURCE source(arg='value')] [ FILTER filter(arg='value') ] [ PARSER parser(arg='value') ]] ...[ DELIMITER [ AS ] 'char' ] ...[ TRAILING NULLCOLS ] ...[ NULL [ AS ] 'string' ] ...[ ESCAPE AS 'char' | NO ESCAPE ] ...[ ENCLOSED BY 'char' ] ...[ RECORD TERMINATOR 'string' ] ...[ SKIP records ] ...[ SKIP BYTES integer ] ...[ TRIM 'byte' ] ...[ REJECTMAX integer ] ...[ EXCEPTIONS 'path' [ ON nodename ] [, ...] ] ...[ REJECTED DATA 'path' [ ON nodename ] [, ...] ] ...[ ENFORCELENGTH ] ...[ ABORT ON ERROR ] ...[ AUTO | DIRECT | TRICKLE ] ...[ STREAM NAME 'streamName'] ...[ NO COMMIT ]

Conitnued..• Examples The following examples load data with the COPY statement using the FORMAT, DELIMITER, NULL and ENCLOSED BY string options, as well as a DIRECT option. =>

COPY public.customer_dimension (customer_since FORMAT 'YYYY') FROM STDIN DELIMITER ',' NULL AS 'null' ENCLOSED BY '"'

=> COPY a FROM STDIN DELIMITER ',' NULL E'\\\N' DIRECT;

=> COPY store.store_dimension FROM :input_file DELIMITER '|' NULL '' RECORD TERMINATOR E'\f'

Vertica Operations - Merge• Lets you load a batch of new records while simultaneously updating existing records by

internally combining INSERT and UPDATE SQL operations in one statement. In a MERGE operation, Vertica replaces the values of the specified columns in all rows of the target table for which a specific condition is true. All other columns and rows in the table are unchanged. By default MERGE uses the WOS, and if the WOS fills up data overflows to the ROS.

• When you write a MERGE statement, you specify both a target and source table. You also provide a search condition (ON clause), which Vertica uses to evaluate each row in the source table in order to update or insert into the target table records from the source table.

• You can use the optional WHEN MATCHED or WHEN NOT MATCHED clauses to further refine results. For example, if you use:

• WHEN MATCHED THEN UPDATE: Vertica updates (replaces) the values of the specified columns in all rows when it finds more than one matching row in the target table for a row in the source table. All other columns and rows in the table are unchanged. If Vertica finds more than one matching row in the source table for a row in the target table, it returns a run-time error.

• WHEN NOT MATCHED THEN INSERT: Vertica inserts into the target table all rows from the source table that do not match any rows in the target table.

Continued..• Syntax MERGE [/*+ direct */] INTO [[db-name.]schema.]target-table-name [alias] ... USING [[db-name.]schema.]source-table-name [alias ] ON ( condition ) ... [ WHEN MATCHED THEN UPDATE SET column1 = value1 [, column2 = value2 ... ] ] ... [ WHEN NOT MATCHED THEN INSERT ( column1 [, column2 ...]) VALUES ( value1 [, value2 ... ] ) ]

Vertica Operation -Exporting Data

• Vertica supports spooling of data as in Oracle where you can retrieve all the results of a select query into a file as specified.

• \o parameter start the spooling in Vertica after this option any thing you select in the database will be redirected onto the file and not on the standard output.

• Again \o will end spooling for that session

Syntax

\o /home/tmp/output.txtSelect * from store.store_sales_fact ;\o

Comparision with Teradata

• In Teradata there is a provision of Control Flow of statements ( IF, GOTO) , Vertica does not support these statements

• Update query in Vertica is different then in Teradata• There is no provision for Qualify function in Vertica

as in Teradata• Volatile tables in Teradata, Local Temporary tables in

Vertica.• Update queries cannot have a self join in Vertica as

in Teradata

Conitnued..

• There are no templates/utilites in vertica for loading,exporting data as in Teradata ( FastLoad,Fastexport), Vertica uses the vsql terminal for all its operations

Thankyou