Row and Column Security for Hive data with Big SQL

24
© 2015 IBM Corporation October 29, 2015 Access Hive Data FASTER and more SECURELY with Big SQL

Transcript of Row and Column Security for Hive data with Big SQL

© 2015 IBM Corporation

October 29, 2015

Access Hive Data

FASTER and more SECURELY

with Big SQL

© 2015 IBM Corporation3

Watch this on YouTube @

www.youtube.com/watch?v=SYQgzRGhqVU

© 2015 IBM Corporation4

SQL on Hadoop Matters for Big Data Analytics

For BI Tools like Cognos

Visualizations from Cognos 10.2.2

© 2015 IBM Corporation5

Hive is Really 3 Things…

Storage Format, Metastore, and Execution Engine

5

SQL Execution Engine

Hive

(Open Source)

Hive Storage Model

(open source)

CSV Parquet RC Others…Tab Delim.

Hive Metastore

(open source)M

apR

edu

ce

Applications

© 2015 IBM Corporation6

OutputReduceMap

Hive “Execution Engine”

SQL

Hive

References Hive Meta Store to understand data

Translates SQL to Map Reduce

© 2015 IBM Corporation7

Big SQL preserves open source foundationLeverages Hive metastore and storage formats.

No Lock-in. Data part of Hadoop, not BigSQL. Fall back to Open Source Hive Engine at any time.

7

SQL Execution Engines

IBM BigSQL

(IBM)

Hive

(Open Source)

Hive Storage Model

(open source)

CSV Parquet RC Others…Tab Delim.

Hive Metastore

(open source)

Applications

© 2015 IBM Corporation8

WHY WOULD YOU WANT TO

DO THAT?

Ok…. But…..

© 2015 IBM Corporation9

Performance Test – TPC DS Workload

20 (Physical Node) Cluster

TPC-DS stands for Transaction Processing Council – Decision Support (workload) which is

an industry standard benchmark for SQL

Hive 1.2.1

IBM Open Platform V4.1

20 Nodes

Big SQL V4.1

IBM Open Platform V4.1

20 Nodes

*Not an official TPC-DS Benchmark.

© 2015 IBM Corporation10

Big SQL V4.1 vs Hive @ 1TB TPC-DS

© 2015 IBM Corporation11

Big SQL V4.1 vs Hive @ 1TB TPC-DS

© 2015 IBM Corporation12

Big SQL V4.1 vs Hive @ 1TB TPC-DS

© 2015 IBM Corporation13

Big SQL V4.1 vs Hive @ 1TB TPC-DS

© 2015 IBM Corporation14

Performance Test Summary

Big SQL V4 vs. Hive 1.2.1 @ 1TB

In 99 / 99 Queries, Big SQL was faster

On Average, Big SQL was 21X faster

Excluding the Top 5 and Bottom 5 results, Big SQL was 19X faster

© 2015 IBM Corporation15

ONLY BIG SQL COULD RUN

THE COMPLETE WORKLOAD

Actually, we originally set out to run 10TB, but …

© 2015 IBM Corporation16

Performance Test Summary

Big SQL @ 10TB vs. Hive @ 1 TB

How does Big SQL running with 10X the data?

In 89 / 99 Queries, Big SQL was still faster

On Average, Big SQL still 3.8X faster

Excluding the Top/Bottom 5 results, Big SQL was still 3.2X faster

© 2015 IBM Corporation17

AND, we’re really good with lots of users….

Clear benefit on workload throughput with WLM enabled:

© 2015 IBM Corporation18

MORE SECURE

And, Big SQL makes SQL Access on Hadoop

© 2015 IBM Corporation19

Enhanced Security - Good to Know

Role Based Access Control

Row Level Security

Column Level Security

Separation of Duties

Security Administrator

Database Administrator

Workload Manager

Others..

© 2015 IBM Corporation20

Recap - Big SQL preserves open source foundation

SQL Execution Engines

IBM BigSQL

(IBM)

Hive

(Open Source)

Hive Storage Model

(open source)

CSV Parquet RC Others…Tab Delim.

Hive Metastore

(open source)

Applications

Big SQL Makes Hive

FASTER and more SECURE

© 2015 IBM Corporation

October 29, 2015

Comparing

Big SQL 10TB vs Hive @ 1TB

© 2015 IBM Corporation26

Big SQL 4.1 @10TB vs Hive @ 1TB TPC-DS

© 2015 IBM Corporation27

Big SQL 4.1 @10TB vs Hive @ 1TB TPC-DS (cont..)

© 2015 IBM Corporation28

Big SQL 4.1 @10 TB vs Hive @ 1TB TPC-DS (cont..)

© 2015 IBM Corporation29

Big SQL 4.1 @ 10TB vs Hive @ 1TB TPC-DS (cont..)