Let’s Get Started… Stream Analytics, Azure Data Lake, Azure SQL Data Warehouse, Azure Data...

17
Let’s Get Started…

Transcript of Let’s Get Started… Stream Analytics, Azure Data Lake, Azure SQL Data Warehouse, Azure Data...

Let’s Get Started…

What is Cortana Analytics Suite

Cortana Analytics Suite is designed to deliver analytics as a service. It Transform data into intelligent

action. Data can be collected from Apps Sensors & Devices.

Cortana Analytics Suite are designed to take advantage on IoT .

Big Picture

A Suite of Products that allow you to Predict Outcomes, Prescribe Actions and Automate Decisions

Cortana Analytics Suite

Cortana Analytics Suite is comprised

Cortana Personal Assistant,

Power BI,

Azure HDInsight,

Azure Machine Learning,

Azure Stream Analytics,

Azure Data Lake,

Azure SQL Data Warehouse,

Azure Data Factory,

Azure Data Catalog and

Azure Event Hub.

What … is a cloud-based data integration service. ADF Ingest Data from various data sources, prepare, validate, transform and analyze the data with job schedule and then publish ready-to-use data for consumption .

Why

• Cloud based managed service

• No hardware & software required

• Pay as you use

• HDInsight compatible

• Less administrative effort

How 1. Reads data from the source data store.

2. Performs serialization/deserialization,

3. compression/decompression, column mapping,

and type conversion.

4. It does these operations based on the configurations

of the input dataset, output dataset, and Copy Activity.

5. Writes data to the destination data store.

Architecture

Components

1. Define Architecture: Set up objectives and flow

2. Create the Data Factory: Portal, PowerShell, VS

3. Create Linked Services: Connections to Data and Services

4. Create Datasets: Input and Output

5. Create Pipeline: Define Activities

6. Monitor and Manage: Portal or PowerShell, Alerts and

Metrics

Linked services Linked services define the information needed for Data Factory to connect

to external resources .Represents either

a. data store

File system

On-premises SQL Server

Azure storage

Azure DocumentDB

Azure Data Lake Store

etc.

b. compute resource

HDInsight (own or on demand)

Azure Machine Learning Endpoint

Azure Batch

Azure SQL Database

Azure Data Lake Analytics

Data sets

Named references to data

Used for both input and output

Identifies structure

Files, tables, folders, documents

Internal or external

ActivitiesDefine actions to perform on data Zero or more input data sets

One or more output data sets Unit of orchestration of a pipeline

Activities for

data movement

data transformation

data analysis

Use WindowStart and WindowEnd system variables to select relevant data using a tumbling window

Pipelines

Logical grouping of activities

Provides a unit of work that performs a task

Can set active period to run in the past to back fill data slices

Back filling can be performed in parallel

Data movement Globally available service for data movement

Exactly one input and exactly one output

Support for securely moving between on-premises and the cloud

Automatic type conversions from source to sink data types

File based copy supports binary, text and Avro formats, and allows for conversion between formats

Data Management Gateway supports multiple data sources but only a single Azure Data Factory

Monitoring

Data slices may fail

Drill in to errors, diagnose, fix and rerun

Failed data slices can be rerun and all dependencies are managed by Azure Data Factory

Upstream slices that are Ready stay available

Downstream slices that are dependent stay Pending

Enable diagnostics to produce logs, disabled by default

Add Alerts for Failed or Successful Runs to receive email notification

Q & A

Thanks