Azure Data Factory presentation with links
-
Upload
chris-testa-oneill -
Category
Data & Analytics
-
view
79 -
download
5
Transcript of Azure Data Factory presentation with links
![Page 1: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/1.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Classified as Microsoft General
Key Concepts
This session is brought to you by Microsoft’s Analytics and Data Science Team.
1
![Page 2: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/2.jpg)
Cortana Intelligence Suite Workshop Class Notebook
1. Understand how Azure Data Factory (ADF) fits into the Cortana Intelligence Suite
2. Understand the ADF logical flow
3. Create an ADF instance
4. An example of the ADF process
5. Understand and create the ADF components
Agenda
At the end of this Module, you will:
1. Understand how Azure Data Factory (ADF) fits into the Cortana Intelligence Suite
2. Understand the ADF logical flow
3. Create an ADF instance
2
![Page 3: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/3.jpg)
Cortana Intelligence Suite Workshop Class Notebook
4. An example of the ADF process
5. Understand and create the ADF components
2
![Page 4: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/4.jpg)
Cortana Intelligence Suite Workshop Class Notebook
This section of the course will cover:
• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence
3
![Page 5: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/5.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Cortana Intelligence is a Platformand a Process to perform advanced analytics from start to finish
1. What you can do with CIS: https://www.microsoft.com/en-us/server-cloud/cortana-intelligence-suite/why-cortana-intelligence.aspx
2. More about the process: https://channel9.msdn.com/Blogs/Seth-Juarez/Understanding-Data-Science-for-building-Predictive-Analytics-Solutions-by-Francesca-Lazzeri
4
![Page 6: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/6.jpg)
Cortana Intelligence Suite Workshop Class Notebook
For all of the technology that is available in Cortana Intelligence, they can be categorized into the following areas:
• Information management• Big data stores• Machine learning and analytics• Intelligence• Dashboards and visualization
Azure SQL Data Warehouse is categorized as a big data store. It is different to Data Lake in that it provides a relational big data store for structured data, but it does have the capability to interact with unstructured data as well.
5
![Page 7: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/7.jpg)
Cortana Intelligence Suite Workshop Class Notebook
This section of the course will cover:
• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence
6
![Page 8: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/8.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Azure Data Factory
Creates, orchestrates, & automates the movement, transformation and/or analysis of data through the cloud
1. Learning Path: https://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/
2. Developer Reference: https://msdn.microsoft.com/en-us/library/azure/dn834987.aspx
7
![Page 9: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/9.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Azure Data Factory Logical Flow
1. Learning Path: https://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/
2. Quick Example: http://azure.microsoft.com/blog/2015/04/24/azure-data-factory-update-simplified-sample-deployment/
8
![Page 10: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/10.jpg)
Cortana Intelligence Suite Workshop Class Notebook
This section of the course will cover:
• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence
9
![Page 11: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/11.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Create the Data Factory
AzurePortal
PowerShell
Visual Studio
ARM Templates
1. Setting Up: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/
10
![Page 12: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/12.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Using the Portal
• Use in Non-MS Clients• Use for Exploration• Use when in demo/POC
1. Overview: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/
2. Using the Portal: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline-using-editor/
11
![Page 13: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/13.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Using PowerShell
• Use in MS Clients
• Use for Automation
• Use for quick set up and tear down
1. Learning Path: https://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/
2. Full Tutorial: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/
12
![Page 14: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/14.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Using Visual Studio
• Use in mature dev environments• Use when integrated into larger development process
1. Overview: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/
2. Using the Portal: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline-using-editor/
13
![Page 15: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/15.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Azure Resource Manager Templates
• Use in multiple environment
• Dev, Test, UAT and Production
• Works well where there are similar patterns
• ARM templates can be parameterized.
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-how-to-use-resource-manager-templates
14
![Page 16: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/16.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Create an ADF Instance
1. Open the ADF Student Workbook file from your \Resources folder
2. Follow the steps for Lab 1 to setup the lab environment
3. The follow the steps for Lab 2 to setup Azure Data Factory
4. Note – There’s a useful JSON prettifier here: http://www.jsoneditoronline.org/
15
![Page 17: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/17.jpg)
Cortana Intelligence Suite Workshop Class Notebook
This section of the course will cover:
• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence
16
![Page 18: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/18.jpg)
Cortana Intelligence Suite Workshop Class Notebook
ADF Process
1. Define Architecture: Set up objectives and flow2. Create the Data Factory: Portal, PowerShell, VS3. Create Linked Services: Connections to Data and
Services4. Create Datasets: Input and Output5. Create Pipeline: Define Activities6. Monitor and Manage: Portal or PowerShell, Alerts
and Metrics
1. Full Tutorial: https://azure.microsoft.com/en-us/documentation/articles/data-factory-build-your-first-pipeline/
17
![Page 19: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/19.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Example - Churn
Call Log Files
Customer Table
Call Log Files
Customer Table
Customer Churn Table
Azure Data
Factory:
Data Sources
Customers Likely to Churn
Customer Call Details
Transform & Analyze PublishIngest
1. Video of this process: https://azure.microsoft.com/en-us/documentation/videos/azure-data-factory-102-analyzing-complex-churn-models-with-azure-data-factory/
18
![Page 20: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/20.jpg)
Cortana Intelligence Suite Workshop Class Notebook
This section of the course will cover:
• Cortana Intelligence in a sentence• The team data science process• The Cortana Intelligence platform• Summarizing Cortana Intelligence
19
![Page 21: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/21.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Azure Data Factory Components
1. ADF Components: https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-introduction#relationship-between-data-factory-entities
20
![Page 22: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/22.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Linked ServicesCompute resource
Data transformation activity
Compute environment
Hive HDInsight [Hadoop]
Pig HDInsight [Hadoop]
MapReduce HDInsight [Hadoop]
Hadoop Streaming HDInsight [Hadoop]
Machine Learning activities: Batch Execution and Update Resource
Azure VM
Stored ProcedureAzure SQL, Azure SQL DW, or SQL Server
Data Lake Analytics U-SQL Azure Data Lake Analytics
DotNetHDInsight [Hadoop] or Azure Batch
Category Data storeSupported as a source
Supported as a sink
Azure Azure Blob storage ✓ ✓
Azure Data Lake Store
✓ ✓
Azure DocumentDB
✓ ✓
Azure SQL Database
✓ ✓
Azure SQL Data Warehouse
✓ ✓
Azure Search Index
✓
Azure Table storage
✓ ✓
Databases Amazon Redshift ✓
DB2 ✓
MySQL ✓
Oracle ✓ ✓
PostgreSQL ✓
SAP Business Warehouse
✓
SAP HANA ✓
SQL Server ✓ ✓
Sybase ✓
Teradata ✓
Other data sources are support. see the link in the notes for full details
Data Sources
AZURE SQL DATABASE EXAMPLE{"name": "AzureSqlLinkedService","properties": {"type": "AzureSqlDatabase","typeProperties": {"connectionString": "Server=tcp:ctosqldb.database.windows.net,1433;Database=EquityDB;User ID=ctesta-
oneill;Password=P@ssw0rd;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"}
}}
AZURE BLOB STORE EXAMPLE{"name": "StorageLinkedService","properties": {"type": "AzureStorage","typeProperties": {"connectionString":
"DefaultEndpointsProtocol=https;AccountName=ctostorageaccount;AccountKey=087ubp097guh8*JON*&B*(97g9879"}
}}
1. Linked Services: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-introduction#linked-services
21
![Page 23: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/23.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Datasets{
"name": "<name of dataset>",Dataset name
"properties": {
Properties"type": "<type of dataset: AzureBlob, AzureSql etc...>","external": <boolean flag to indicate external data. only for input datasets>,"linkedServiceName": "<Name of the linked service that refers to a data store.>",
Type
External
LinkedServiceName
"structure": [{
"name": "<Name of the column>","type": "<Name of the type>"
}],"typeProperties": {
"<type specific property>": "<value>","<type specific property 2>": "<value 2>",
},Structure
Name
Type
"availability": {"frequency": "<Specifies the time unit for data slice production. >","interval": "<Specifies the interval within the defined frequency.>"
},
Availability "policy":{ }
}}
Policy
AzureSqlLinkedService
StorageLinkedService
1. Datasets: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-datasets
22
![Page 24: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/24.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Time Slicing Data"availability": {
"frequency": "<Specifies the time unit for data slice production. >","interval": "<Specifies the interval within the defined frequency.>"
},
Availability
Offset
"availability":{
"frequency": "Day","interval": 1,"offset": "06:00:00"
}
anchorDateTime
"availability": {
"frequency": "Hour", "interval": 23, "anchorDateTime":"2007-04-19T08:00:00"
}
{"name": "AzureBlobOutput",
"properties": {"published": false,"type": "AzureBlob","linkedServiceName":
"AzureStorageLinkedService","typeProperties": {"folderPath": "datacontainer/partitioneddata","format": {"type": "TextFormat","columnDelimiter": ","
}},"availability": {"frequency": "Month","interval": 1
}}
}
Style
"availability":{
"frequency": "Day","interval": 1,"offset": "06:00:00“"style": “EndOfInterval”
}
{"name": "AzureBlobInput",
"properties": {"published": false,"type": "AzureBlob","linkedServiceName": "StorageLinkedService","typeProperties": {"fileName": "input.log","folderPath": "datacontainer/inputdata","format": {"type": "TextFormat","columnDelimiter": ","
}},"availability": {"frequency": "Month","interval": 1
},"external": true,"policy": {}
}}
1. Time Slicing: https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-datasets
23
![Page 25: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/25.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Linked Services and Datasets
1. Open the ADF Student Workbook file from your \Resources folder
2. Follow the steps for Lab 1 to setup the lab environment
3. The follow the steps for Lab 2 to setup Azure Data Factory
4. Note – There’s a useful JSON prettifier here: http://www.jsoneditoronline.org/
24
![Page 26: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/26.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Activities
Data transformation activities
Data transformation activity
Compute environment
Hive HDInsight [Hadoop]
Pig HDInsight [Hadoop]
MapReduce HDInsight [Hadoop]
Hadoop Streaming HDInsight [Hadoop]
Machine Learning activities: Batch Execution and Update Resource
Azure VM
Stored ProcedureAzure SQL, Azure SQL DW, or SQL Server
Data Lake Analytics U-SQL Azure Data Lake Analytics
DotNetHDInsight [Hadoop] or Azure Batch
Data movement activities
{"name": "MyFirstPipeline","properties": {
"description": "My first Azure Data Factory pipeline","activities": [
{"type": "HDInsightHive","typeProperties": {
"scriptPath": "adfgetstarted/script/partitionweblogs.hql","scriptLinkedService": "StorageLinkedService","defines": {
"inputtable": "wasb://[email protected]/inputdata","partitionedtable": "wasb://[email protected]/partitioneddata"
}},"inputs": [
{"name": "AzureBlobInput"
}],"outputs": [
{"name": "AzureBlobOutput"
}],"policy": {
"concurrency": 1,"retry": 3
},"scheduler": {
"frequency": "Month","interval": 1
},"name": "RunSampleHiveActivity","linkedServiceName": "HDInsightOnDemandLinkedService"
}],"start": "2016-04-01T00:00:00Z","end": "2016-04-02T00:00:00Z","isPaused": false,"hubName": "ctogetstarteddf_hub","pipelineMode": "Scheduled"
}}
1. What is an activity: https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-create-pipelines#what-is-an-activity
25
![Page 27: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/27.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Pipelines
Pipeline is a grouping of logically related activities.
Pipeline can be scheduled so the activities within it get executed.
Pipeline can be managed and monitored.
1. Pipelines: https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-create-pipelines
26
![Page 28: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/28.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Activities and Pipelines
27
![Page 29: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/29.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Classified as Microsoft General
28
![Page 30: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/30.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Classified as Microsoft General
ADF orchestrates other tech to move, transform or analyze data
Broad range of options to create an ADF instance
Linked Services can point to data sources or compute resource
Datasets can be structures or unstructured
Activities can transform and analyse data sets
Pipelines are used to schedule and monitor ADF pipelines
Summary
In this session, you have learned:
• Scale-out distributed query engine• De-coupled storage from compute• Fully managed• Completely elastic• Platform as a Service (PaaS)• Petabyte scale• Leveraging cloud ecosystem• Broad range of connectivity options
29
![Page 31: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/31.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Classified as Microsoft General
Click on the graphics to explore more learning options from your Advanced Analytics and Data Science team, including:
• Online training
• Videos
• Instructor Led training
• Blogs
• Cortana Intelligence Gallery
30
![Page 32: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/32.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Classified as Microsoft General
31
![Page 33: Azure Data Factory presentation with links](https://reader034.fdocuments.us/reader034/viewer/2022051706/58f9a8f6760da3da068b6989/html5/thumbnails/33.jpg)
Cortana Intelligence Suite Workshop Class Notebook
Classified as Microsoft General
Information in this document, including URL and other Internet Web site references, is subject to change without notice. Unless otherwise noted, the companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
For more information, see Microsoft Copyright Permissions at http://www.microsoft.com/permission
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.The Microsoft company name and Microsoft products mentioned herein may be either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
This document reflects current views and assumptions as of the date of development and is subject to change. Actual and future results and trends may differ materially from any forward-looking statements. Microsoft assumes no responsibility for errors or omissions in the materials.
THIS DOCUMENT IS FOR INFORMATIONAL AND TRAINING PURPOSES ONLY AND IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, WHETHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT.
32