Azure Offerings for€¦ · · 2016-10-192016-10-19 · • Connect to on-premises and cloud data...
Transcript of Azure Offerings for€¦ · · 2016-10-192016-10-19 · • Connect to on-premises and cloud data...
Agenda
1. Integrated Big data Platform - Cortana Intelligent Suite
2. Scalable Machine Learning - R & Spark on HDInsight
3. GPUs in Azure for Compute & Visualization
Stay ahead of the curve with Cortana Intelligence Suite
Business apps
Custom apps
Sensors and devices
People
Automated systems
Data Intelligence
Cortana Intelligence
Action
Apps
Transform data into intelligent action
Intelligence
Dashboards &
Visualizations
Information
Management
Big Data Stores Machine Learning
and Analytics
CortanaEvent HubsHDInsight
(Hadoop and
Spark)
Stream
Analytics
Data Intelligence Action
People
Automated Systems
Apps
Web
Mobile
Bots
Bot
FrameworkSQL Data
WarehouseData Catalog
Data Lake
Analytics
Data Factory Machine
LearningData Lake Store
Cognitive
Services
Power BI
Data
Sources
Apps
Sensors
and
devices
Data
Information Management
Data
Sources
Apps
Sensors
and devices
Data
Information
Management
Event Hubs
Data Catalog
Data Factory
Compose and orchestrate data services at scale
INGEST
SQL
<>
SQL
DATA SOURCES
{ }
SQL
• Create, schedule, orchestrate, and manage data pipelines
• Visualize data lineage
• Connect to on-premises and cloud data sources
• Monitor data pipeline health
• Automate cloud resource management
• Move relational data for Hadoop processing
• Transform with Hive, Pig, or custom code
Information
Management
Event Hubs
Data Catalog
Data Factory
Get more value from your enterprise data assets
Information
Management
Event Hubs
Data Catalog
Data Factory
• Spend less time looking for data, and more time getting value from it
• Register enterprise data sources, discover data assets and unlock their potential, and capture tribal knowledge to make data understandable
• Bridge the gap between IT and the business, allowing everyone to contribute their insights, tags, and descriptions
• Intuitive search and filtering to understand the data sources and their purpose
• Let your data live where you want; connect using tools you choose
• Integrate into existing tools and processes with open REST APIs
Ingest events from websites, apps and devices at cloud scale
• Log millions of events per second in near real time
• Connect devices using flexible authorization and throttling
• Use time-based event buffering
• Get a managed service with elastic scale
• Get a managed service with elastic scale
• Reach a broad set of platforms using native client libraries
• Pluggable adapters for other cloud services
Azure
API
Management
Backend Services
Data
Information
Management
Event Hubs
Data Catalog
Data Factory
Data sources
Apps
Sensors and devices
Event Hubs
SQL Database Machine Learning
HDInsightStorage
Power BIStream Analytics
Big Data Stores
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and devices
Data
Information
Management
Event Hubs
Data Catalog
Data Factory
A hyper-scale repository for big data analytics workloads
• A Hadoop Distributed File System for the cloud
• No fixed limits on file size
• No fixed limits on account size
• Unstructured and structured data in their native format
• Massive throughput to increase analytic performance
• High durability, availability, and reliability
• Azure Active Directory access control
LOB
Applications
SocialDevices
Clickstream
Sensors
Video
Web
Relational
HDInsight
ADL Analytics
Machine Learning
Spark
R
ADL Store
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Elastic data warehouse as a service with enterprise-class features
• Petabyte scale with massively parallel processing
• Independent scaling of compute and storage—in seconds
• Transact-SQL queries across relational and non-relational data
• Full enterprise-class SQL Server experience
• Works seamlessly with Power BI, Machine Learning, HDInsight, and Data Factory
Power BI
App ServiceSQL Database
SQL Data Warehouse
Machine Learning
Hadoop
Intelligent App
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Machine Learning and Analytics
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and devices
Data Intelligence
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream
Analytics
Data Lake
Analytics
Machine
Learning
Easily build, deploy, and share predictive analytics solutions
• Simple, scalable, cutting edge. A fully managed cloud service that enables you to easily build, deploy, and share predictive analytics solutions.
• Deploy in minutes. Azure Machine Learning means business. You can deploy your model into production as a web service that can be called from any device, anywhere and that can use any data source.
• Publish, share, monetize. Share your solution with the world in the Gallery or on the Azure Marketplace.
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream
Analytics
Data Lake
Analytics
Machine
Learning
Big data analytics made easy
• Analyze data of any kind and size
• Develop faster, debug and optimize smarter
• Interactively explore patterns in your data
• No learning curve—use U-SQL, Spark, Hive, HBase and Storm
• Managed and supported with an enterprise-grade SLA
• Dynamically scales to match your business priorities
• Enterprise-grade security with Azure Active Directory
• Built on YARN, designed for the cloud
Data Lake Analytics
SQL DW SQL DB Storage BlobsData Lake Store SQL DB in a VM
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream
Analytics
Data Lake
Analytics
Machine
Learning
Comprehensive set of managed Apache big data projects
• Scale to petabytes on demand
• Process unstructured and semi-structured data
• Develop in Java, .NET, and more
• Skip buying and maintaining hardware
• Deploy in Windows or Linux
• Spin up an Apache Hadoop cluster in minutes
• Visualize your Hadoop data in Excel
• Easily integrate on-premises Hadoop clusters
Core Engine
Batch
Map Reduce
Script
Pig
SQL
Hive
NoSQL
HBase
Streaming
Storm
In-Memory
Spark
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream
Analytics
Data Lake
Analytics
Machine
Learning
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream
Analytics
Data Lake
Analytics
Machine
Learning
Real-time stream processing in the cloud
• Perform real-time analytics for your Internet of Things solutions
• Stream millions of events per second
• Get mission-critical reliability and performance with predictable results
• Create real-time dashboards and alerts over data from devices and applications
• Correlate across multiple streams of data
• Use familiar SQL-based language for rapid development
Event Hubs
Blob Storage
Stream
Analytics
SQL Database
Event Hubs
Power BI
Blob Storage
Table Storage
Intelligence
Intelligence
Cortana
Bot
Framework
Cognitive
Services
Big Data Stores
SQL Data
Warehouse
Data Lake Store
Data
Sources
Apps
Sensors
and devices
Data
Information
Management
Event Hubs
Data Catalog
Data Factory
Machine Learning
and Analytics
HDInsight
(Hadoop and
Spark)
Stream
Analytics
Data Lake
Analytics
Machine
Learning
Agenda
1. Integrated Big data Platform - Cortana Intelligent Suite
2. Scalable Machine Learning - R & Spark on HDInsight
3. GPUs in Azure for Compute & Visualization
Cortana Intelligent Suite
Intelligence
Dashboards &
Visualizations
Information
Management
Big Data Stores Machine Learning
and Analytics
CortanaEvent HubsHDInsight
(Hadoop and
Spark)
Stream
Analytics
Data Intelligence Action
People
Automated Systems
Apps
Web
Mobile
Bots
Bot
FrameworkSQL Data
WarehouseData Catalog
Data Lake
Analytics
Data Factory Machine
LearningData Lake Store
Cognitive
Services
Power BI
Data
Sources
Apps
Sensors
and
devices
Data
Infinite world of scalable machine learning
logistic regression, linear models,
basic statistics, hypothesis testing,
k-means, decision trees
page rank, collaborative filtering,
graph processing, SVD, PCA,
Bayesian models, …
deep learning over
various types of networks
Use cases of scalable machine learning
product recommendations
intelligent search
routing
robotics
ad placement
predictive maintenance
image, video recognition
sentiment analysis
text comprehension
natural language processing
robotics
bots
augmented reality
predictive maintenance
Retail Financial services Healthcare Manufacturing
loyalty programs
customer acquisition
pricing strategy
supply chain mgnt
customer churn
fraud detection
risk & compliance
cross-sell & upsell
personalization
bill collection
operational efficiency
patient demographics
pay for performance
demand forecasting
pricing strategy
supply chain
optimization
predictive maintenance
remote monitoring
What is R
What is
• The most popular statistical programming language
• A data visualization tool
• Open source
• 2.5+M users
• Taught in most universities
• Thriving user groups worldwide
• 8000+ contributed packages
• New and recent grad’s use it
Language
Platform
Community
Ecosystem• Rich application & platform integration
R Server and Spark resource sharing
YARN
Livy server
Thrift server
Jupyter notebooks
Default Queue
Thrift Queue
IntelliJ IDEA
BI Tools
Head node
Edge node
R server
DeployR
R Tools for VS
R Studio
Parallelized and Distributed Algorithms
Data import – Delimited, Fixed, SAS, SPSS,
OBDC
Variable creation & transformation
Recode variables
Factor variables
Missing value handling
Sort, Merge, Split
Aggregate by category (means, sums)
Chi Square Test
Kendall Rank Correlation
Fisher’s Exact Test
Student’s t-Test
ETL Statistical Tests
Min / Max, Mean, Median (approx.)
Quantiles (approx.)
Standard Deviation
Variance
Correlation
Covariance
Sum of Squares (cross product matrix for set
variables)
Pairwise Cross tabs
Risk Ratio & Odds Ratio
Cross-Tabulation of Data (standard tables & long
form)
Marginal Summaries of Cross Tabulations
Descriptive Statistics
Sum of Squares (cross product matrix for set
variables)
Multiple Linear Regression
Generalized Linear Models (GLM) exponential
family distributions: binomial, Gaussian, inverse
Gaussian, Poisson, Tweedie. Standard link
functions: cauchit, identity, log, logit, probit. User
defined distributions & link functions.
Covariance & Correlation Matrices
Logistic Regression
Predictions/scoring for models
Residuals for all models
Predictive Statistics K-Means
Clustering
Decision Trees
Decision Forests
Gradient Boosted Decision Trees
Naïve Bayes
Machine Learning
Simulation Simulation (e.g. Monte Carlo)
Parallel Random Number Generation
Custom Parallelization rxDataStep
rxExec
PEMA-R API
Variable Selection Stepwise Regression
0%
10%
20%
30%
40%
50%
60%
R SAS Python SQL Java
KNuggets poll (2014)
the poll
R and Python are two dominant languages
Scalable Machine Learning offerings in HDInsight
R language Python language Scala/Java
Server
Server
+ R Ecosystem + Python Ecosystem + Spark Ecosystem
+ Spark Ecosystem
Agenda
1. Integrated Big data Platform - Cortana Intelligent Suite
2. Scalable Machine Learning - R & Spark on HDInsight
3. GPUs in Azure for Compute & Visualization
GPU Virtualization Vision
• Deliver accelerated graphics & compute capabilities in Azure infrastructure
• High end performance
• Not “Swiss-army knife” offering
• Helps achieve true “HPC in the Cloud”
• Close partnership with NVIDIA
Media
• Stream high fidelity video games
• Encoding and transcoding
• Image processing
• Social media sentiment analysis
Rendering
• Visual Effects (VFX)
• Ray-Tracing rendering
• Advertising & Marketing
• CAD Applications in Architecture
• Simulations
GPU Virtualization Technology
• DDA (Discrete Device Assignment)
Entire device is mapped into the VM just as it would be running on bare metal
Allows for full access to capabilities of that device as well as allowing the device’s native driver to be used
• Introduced in Windows Server 2016 as part of Hyper-V
• Pass-through PCIe devices directly to a Guest VM
• Allows for close to bare-metal performance
Tesla K80 – “It’s fast…”
0x
5x
10x
15x
K80 CPU
Quantum Chemistry
Molecular Dynamics PhysicsBenchmarks
Rate of Improvement
72%
74%
84%
88%
93%
96%
65%
70%
75%
80%
85%
90%
95%
100%
2010 2011 2012 2013 2014 2015
GPU
65%
70%
75%
80%
85%
90%
95%
100%
11-2013 6-2014 12-2014 7-2015 1-2016
Acc
ura
cy
39%
45%
55%
62%
66%
72%75%
79%83%
86%
87.5%
30%
40%
50%
60%
70%
80%
90%
100%
Top Score
Collaboration with CNTK
• First class citizen
• Scalability – multi-GPU-multi-VM
• Performance
• Internal use-cases across various Microsoft properties and products
• (DSVM) Data Science VM by Azure Machine Learning
• N-Series and CNTK works really well together
CNTK Performance on DDA
2670
10560
18755
27575
35750
0
5000
10000
15000
20000
25000
30000
35000
40000
CPU 1 GPU 2 GPUs 3 GPUs 4 GPUs
Sam
ple
s p
er S
eco
nd
Resource
Avg. Samples/Sec Linear (Avg. Samples/Sec)
NV = Tesla M60 and supports OpenGL & DirectX
NC = Tesla K80 and supports CUDA & OpenCL
Sign up @ http://gpu.azure.com
Azure Batch “recipes” @ aka.as/tryazurehpc
Teradici trial @ http://teradici.com/4azuregpus
NVIDIA GRID @ http://nvidia.com/grid