Integration with Hadoop PolyBase in SQL...

12
1 Integration with Hadoop PolyBase in SQL 2016 Adastra Pavel Stejskal, Consultant Pavel.Stejskal@adastra grp.com linkedin.com/in/pavelstejskal 20.4.2016

Transcript of Integration with Hadoop PolyBase in SQL...

Page 1: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016  · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00

1

Integration with Hadoop

PolyBase in SQL 2016

Adastra

Pavel Stejskal, Consultant

[email protected]

linkedin.com/in/pavelstejskal

20.4.2016

Page 2: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016  · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00

Integration with Hadoop using PolyBase

2

Excel + Power BI add-insQuery, Pivot, View, Map

SharePointPower Pivot Gallery, Power View

ExcelData Mining

Power BI Desktop Power BI Portal

Azure ML

Power BI Mobile App

Analytics Platform System (APS)

PolyBase allows you to use Transact-SQL (T-SQL) statements to access data stored in Hadoop or Azure Blob Storage and query it in an ad-hoc fashion.

Page 3: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016  · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00

3

PolyBase

Page 4: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016  · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00

What is PolyBase and where belongs to?

4

Hadoop cluster

Hortonworks / Cloudera

Azure

Blob Storage

Cloud solution

On-Premises solution

Hadoop cluster

Hortonworks / Cloudera

Relational Non-relational

Po

lyB

ase

Standard BI tools Integration

Page 5: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016  · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00

How to start with PolyBase - requirements

5

• Hardware

– Server for SQL (SMP architecture)

– Hadoop cluster (MPP architecture)

– Fast network between SQL and Hadoop

• Software

– MS SQL 2016 – RDBMS

– Hadoop distribution (Hortonworks or Cloudera)

• In case of cloud solution

– Hadoop in cloud

– Azure blob storage

Page 6: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016  · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00

PolyBase for SQLServer 2016 – How it works

6

SQL Server engine

PolyBase engine

PolyBase DMS*

Hadoop cluster

NameNode DataNode DataNode DataNode

T-SQL query

Direct JOIN

without ETL

DB Table

External Table

* DMS = Data Movement Service

MS SQL 2016

Data transfer

Page 7: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016  · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00

3 basic concepts for PolyBase object

7

1. External data sourceCREATE EXTERNAL DATA SOURCE HadoopHDP2 WITH (

TYPE = HADOOP,

LOCATION ='hdfs://10.xxx.xx.xxx:xxxx',

RESOURCE_MANAGER_LOCATION = '10.xxx.xx.xxx:xxxx',

CREDENTIAL = HadoopUser1 (for Kerberos-secured Hadoop)

);

2. External file formatCREATE EXTERNAL FILE FORMAT TextFileFormat WITH (

FORMAT_TYPE = DELIMITEDTEXT,

FORMAT_OPTIONS (FIELD_TERMINATOR ='|',

USE_TYPE_DEFAULT = TRUE)

);

Page 8: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016  · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00

3 basic concepts for PolyBase object

8

3. External tableCREATE EXTERNAL TABLE ClickStream (

url varchar(50),

event_date date,

user_IP varchar(50)

)

WITH (

LOCATION='/webdata/employee.tbl', --path in HDFS)

DATA_SOURCE = HadoopHDP2,

FILE_FORMAT = TextFileFormat

);

Page 9: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016  · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00

External table

9

• Adding a shape to semi-structured data

File format – “|” as delimiter

Defined types of columns

Table for T-SQL query

1

2

3

Page 10: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016  · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00

10

Demo

Page 11: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016  · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00

Sqoop vs. PolyBase

11

SQL

Hadoop cluster

SQL

Hadoop clusterSqoop PolyBase

2 TB 100 TB 2 TB 100 TB

T-SQL queryT-SQL query Hive SQL

Data volume Data volume

???

Page 12: Integration with Hadoop PolyBase in SQL 2016download.microsoft.com/documents/cs-cz/enterprise/... · 20/4/2016  · ADASTRA CZECH REPUBLIC Adastra, s.r.o. Karolinská 654/2, 186 00

ADASTRA CZECH REPUBLICAdastra, s.r.o.

Karolinská 654/2, 186 00 Praha 8

Tel.: +420 271 733 303

[email protected]

www.adastra.cz

ADASTRA GROUP North America8500 Leslie St.

Markham, Ontario, L3T 7M8

Tel: +1 905 881 7946

[email protected]

Restrictions for public release and use:This document can comprise confidential information. As such it may not, without Adastra’s prior consent, be copied or transferred.

Important:All brands and names of products given in this documentation are or can be registered trademarks of their owners.© 2016 Adastra, all rights reserved.

12

Thank you!