USQL Trivadis Azure Data Lake Event

Data sourcesNon-relational data

DESIGNED FOR THE QUESTIONS YOU KNOW!

The Data Lake Approach

Ingest all data regardless of requirements

Store all data in native format

without schema

definition

Do analysisHadoop, Spark, R,

Azure Data Lake

Analytics (ADLA)

Interactive queries

Batch queries

Machine Learning

Data warehouse

Real-time analytics

Devices

Microsoft’s Big Data Journey

We needed to better leverage data and analytics to do more experimentation

So, we built a Data Lake for Microsoft:• A data lake for everyone to put their data

• Tools approachable by any developer

• Batch, Interactive, Streaming, ML

By the numbers• Exabytes of data under management

• 100Ks of Physical Servers

• 100Ks of Batch Jobs, Millions of Interactive Queries

• Huge Streaming Pipelines

• 10K+ Developers running diverse workloads and scenarios

2010 2013 2017

Windows

CRM/Dynamics

Xbox Live

Office365

Malware Protection Microsoft Stores

Commerce Risk

Exchange

Yammer

Data Stored

Culture Changes EngineeringHow is the system performing? What is the experience my customers are having? How does that correlate to other actions?

Is my feature successful ?

MarketingWhat can we observe from our customers to increase revenues?

ManagementHow do I drive my business based on the data?

FieldWhere are there new opportunities? How can I connect with my customers more deeply?

SupportHow does this customer’s experience compare with others?

HDFS Compatible REST API

ADL Store

.NET, SQL, Python, R

scaled out by U-SQL

ADL Analytics

Open Source Apache

Hadoop ADL Client

Azure Databricks

HDInsight

• Performance at

• Optimized for

analytics

• Multiple

analytics engines

• Single repository

sharing

ADL Store

Storage

• Architected and built for very high throughput at scale for Big Data workloads

• No limits to file size, account size or number of files

• Single-repository for sharing

• Cloud-scale distributed filesystem with file/folder ACLS and RBAC

• Encryption-at-rest by default with Azure Key Vault

• Authenticated access with Azure Active Directory integration• Formal Certifications incl. ISO, SOC, PCI, HIPAA

ADL Store

Analytics

Storage

Cloudera CDH

Hortonworks HDP

Qubole QDS

• Open Source Apache® ADL client

for commercial and custom Hadoop

• Cloud IaaS and Hybrid

Best of Databricks Best of Microsoft

Designed in collaboration with the founders of Apache Spark

One-click set up; streamlined workflows

Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.

Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage)

Enterprise-grade Azure security (Active Directory integration, compliance, enterprise -grade SLAs)

A Z U R E D ATA B R I C K SA F A S T , E A S Y , A N D C O L L A B O R A T I V E A P A C H E S P A R K B A S E D A N A L Y T I C S P L A T F O R M

HDInsight

ADL Store

HiveAnalytics

Storage

• 63% lower TCO

than on-premise*

• SLA- managed,

monitored and

supported by

Microsoft

• Fully managed

Hadoop, Spark

• Clusters

deployed in

minutes

*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”

ADL Store

.NET, SQL, Python, R

scaled out by U-SQL

ADL Analytics• Serverless. Pay per job. Starts in

seconds. Scales instantly.

• Develop massively parallel

programs with simplicity

• Federated query from multiple data

sources

Scales out your custom code in .NET, Python, R over

your Data Lake

Familiar syntax to millions of SQL & .NET developers

Unifies

• Declarative nature of SQL with the imperative

power of your language of choice (e.g., C#,

Python)

• Processing of structured, semi-structured and

unstructured data

• Querying multiple Azure Data Sources

(Federated Query)

U-SQLA framework for Big Data

• SQL forms the declarative basis of the language:

• GROUP BY/Aggs

• Windowing Expressions

• PIVOT/UNPIVOT

• CROSS APPLY

• JOINs

• Etc.

• Uses .NET Types and C# Expression language

• Rich Extensibility model that allows to scale out your custom

extension code written in .Net/C#, Python, R

• Operates on unstructured data (Csv, images etc)

• Operates on semistructured data (XML, JSON, Avro)

• Operates on structured files (Parquet)

• Provides Metadata Catalog (DB, Schema):

• U-SQL Tables (for improved performance)

• U-SQL code objects (View, TVFs, Procs)

• Extension code objects (U-SQL Assemblies)

• Etc.

• Provides Federated Queries against “SQL in Azure”

Develop massively parallel programs with simplicity

A simple U-SQL script can scale from Gigabytes to Petabytes without learning complex big data programming techniques.

U-SQL automatically generates a scaled out and optimized execution plan to handle any amount of data.

Execution nodes immediately rapidly allocated to run the program.

Error handling, network issues, and runtime optimization are handled automatically.

@searchlog = EXTRACT UserId int,

Start DateTime, Region string, Query string, Duration int, Urls string, ClickedUrls string

FROM @"/Samples/Data/SearchLog.tsv"USING Extractors.Tsv();

OUTPUT @searchlogTO @"/Samples/Output/SearchLog_output.tsv"USING Outputters.Tsv();

• Admin and Dev Tooling in

• Azure Portal

• VisualStudio 2013 to 2017 (with local execution mode!)

• VS Code (cross platform)

• Azure Data Factory:

• Data movement

• Job submission and orchestration

• Powershell and Cross-Platform CLI support

• SDKs for common languages:

• .Net

• Java

• Python

• Node.js

Automatic "in-lining"

optimized out-of-the-box

Per job parallelization

visibility into execution

Heatmap to identify bottlenecks

https://github.com/Azure/usql/tree/master/Examples/ImageApp

https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-sql-cognitive

Parked

Outdoor

Racing

High-level

Roadmap

• Worldwide Region Availability (currently US and EU)

• Interactive Access with T-SQL query

• Scale out your custom code in the language of choice

(.Net, Java, Python, etc)

• Process the data formats of your choice (incl. Parquet,

ORC; larger string values)

• Continued ADF, AAS, ADC, SQL DW, EventHub, SSIS

integration

• Administrative policies to control usage/cost for storage

& compute

• Secure data sharing between common AAD and public

read-only sharing, fine grained ACLing

• Intense focus on developer productivity for authoring,

debugging, and optimization

• General customer feedback

http://aka.ms/adlfeedback

Resources http://usql.io

http://blogs.msdn.microsoft.com/azuredatalake/

http://blogs.msdn.microsoft.com/mrys/

https://channel9.msdn.com/Search?term=U-SQL#ch9Search

http://aka.ms/usql_reference

https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-sql-programmability-guide

https://docs.microsoft.com/en-us/azure/data-lake-analytics/

https://msdn.microsoft.com/en-us/magazine/mt614251

https://msdn.microsoft.com/magazine/mt790200

http://www.slideshare.net/MichaelRys

Getting Started with R in U-SQL

https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-u-sql-python-extensions

https://social.msdn.microsoft.com/Forums/azure/en-US/home?forum=AzureDataLake

http://stackoverflow.com/questions/tagged/u-sql

http://aka.ms/adlfeedback

Continue your education at

Microsoft Virtual Academy

online.

USQL Trivadis Azure Data Lake Event

Technology

Transcript of USQL Trivadis Azure Data Lake Event

DOAG SIG Security Oracle Database 12c New … Oracle Database 12c New Security Features Stefan Oehrli Senior Consultant Discipline Manager Trivadis AG . 2013 © Trivadis Trivadis is

Azure Data Lake

Enabling Next Gen Analytics with Azure Data Lake and StreamSets

manuelmeyer.net | Full Power Azure – Faster, Better, More ... · WPF Troubleshooting mit VS2015 Manuel Meyer I Trivadis AG . Manuel Meyer Senior Consultant & Trainer für .NET bei

Trivadis TechEvent 2017 Der Azure App Service by Manuel Meyer

Azure Big Data & Machine Learning Matthias Gessenay ... · 2 Agenda Introduction to Azure Data Science Tools Azure Data Lake Hadoop Azure Jupyter Notebooks Azure Machine Learning

Azure Data Lake Customer Deckazurebootcampdk.com/presentations/DataLake-Organize-v2.pdf · Azure Data Lake How to organize Jan Cordtz, Microsoft Denmark jcordtz@Microsoft.com Cloud

Introduction to Azure Data Lake

Processing Big Data with Azure Data Lake - GitHub · Processing Big Data with Azure Data Lake Lab 1 - Getting Started with Azure Data Lake Overview In this lab, you will create an

Cloud based Data Lake - modern Data Platform Management Azure Data Explorer Azure Storage Event Hub IoT Hub Customer Data Lake Kafka Sync Logstash Plugin Event Grid Azure Portal Power

Azure Data Lake Intro (SQLBits 2016)

Christoph Pletz Senior Consultant Trivadis AG Meinrad Weiss Principal Consultant Trivadis AG.

2013 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN TechTalk February 2013 Windows Azure.

Cloud transition - The Trivadis approach

Amazing azure-lake-in-the-mountains-4106-1920x1080

USQ Landdemos Azure Data Lake

Azure Data Lake Storage Gen1 to Gen2 Migration

Trivadis Company Presentation - english

Azure Data Lake and U-SQL

Analyzing StackExchange data with Azure Data Lake