Ai big dataconference_eugene_polonichko_azure data lake

Post on 21-Jan-2018

65 views 5 download

Transcript of Ai big dataconference_eugene_polonichko_azure data lake

Azure Data Lake: What is it? Why is it? Where is it?

EUGENE POLONICHKO

DATA PLATFORM MVP

BI\DWH ARCHITECT

About me

Eugene Polonichko has over 7 years of experience with SQL Server. He mainly focused on BI projects (SSAS, SSIS, PowerBI, Cognos, InformaticaPowerCenter, Pentaho, Tableau). Eugene is a passionate speaker and SQL community volunteer presenting regularly at PASS SQL Saturday events and local user groups around Ukraine and Europe. Eugene is PASS Chapter Leader and he has a status MVP Data Platform

https://www.linkedin.com/in/eugenepolonichko/

https://twitter.com/EvgenPolonichko

Agenda What is Data Lake?

Architecture of Azure Data Lake

Azure Data Lake Store

Overview of Azure Data Lake Store

Compare

For big data processing

Azure Data Lake Analytics

U-SQL

Concepts

U-SQL Script Structure

Extractors

U-SQL Jobs

U-SQL catalog

Monitoring and performance U-SQL jobs

Data Lake Analytics pricing

Data Lake

Data Lake

Architecture of Azure Data Lake

Azure Data Lake Stores

Azure Data Lake Store is a hyper-scale repository for big data analytic workloads. Azure Data Lake enables you to capture data of any size, type, and ingestion speed in one single place for operational and exploratory analytics.

The Azure Data Lake store is an Apache Hadoop file system compatible with Hadoop Distributed File System (HDFS)

Can be accessed from Hadoop (available with HDInsight cluster) using the WebHDFS-compatible REST APIs

Azure Data Lake Stores

Use Cases

Store social media

posts, log files, sensor

data

Store corporate data

such as

relational databases

(as flat files)

Data Lake Storage vs Azure Storage

Optimized storage for big data analytics workloads

General purpose object store for a wide variety of

storage scenarios

Batch, interactive, streaming analytics, log files and etc

Any type of text or binary data, such as application

back end,

account contains folders, which in turn contains data stored as

files

Storage account has containers

Optimized performance for parallel analytics workloads. High

Throughput and IOPS.

Not optimized for analytics workloads

Big Data requirements

Pricing

Transaction prices

Storage prices

DEMO

Azure Data Lake Analytics

Azure Data Lake Analytics is an on-demand analytics job service to simplify big data analytics. You can focus on writing, running, and managing jobs rather than on operating distributed infrastructure.

Dynamic scaling

Develop faster, debug, and optimize smarter using familiar tools

Affordable and cost effective

Works with all your Azure Data

U-SQL: simple and familiar, powerful, and extensible

U-SQL

T-SQL C#

U-SQL

Concepts

Retrieve data from stored locations in rowset format

Transform the rowset(s)

Transform the rowset(s)

U-SQL Script Structure

Script :=

Statement_List.

Statement_List :=

{ [Statement] ';' }.

Statement := Use_Statement

| If_Else_Statement| Declare_Variable_Statement| Reference_Assembly_Statement| Deploy_Resource_Statement| DDL_Statement| Query_Statement| Procedure_Call| Import_Package_Statement| DML_Statement| Output_Statement.

U-SQL Script Structure

U-SQL Built-in Extractors:

Extractors.Text() :

Extractors.Csv()

Extractors.Tsv()

Extractors

U-SQL Jobs

UNIT

V--

V--

V—V---

V--

V--

ADLAUs

U-SQL Jobs

ADLAUs

Azure

Data

Lake

Analytics

Unit

Parallelism N = N ADLAUs1 ADLAU ~=A VM with 2 cores and 6GB of memory

U-SQL Jobs

U-SQL Catalog

Database

Table

Views

Procedures

DEMO

Monitoring

1 Azure Portal

Monitoring

Visual Studio

DEMO

Pricing

Links

http://www.sqlservercentral.com/stairway/142480/

https://azure.microsoft.com/en-us/solutions/data-lake/

Questions?

Thank you