Big Data & Data Warehousing Data and Data... · Big data & data warehousing from beginning to end...

1
Search for big data & data warehouse on msdn.microsoft.com © 2016 Microsoft Corporation. All rights reserved. Created by the Azure Poster Team Email: [email protected] * The above graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. R logo by Hadley Wickham and others at RStudio - https://www.r-project.org/logo/, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=35599651 Data, data, everywhere. Data is expanding ten-fold every five years with over 85% of the increase coming from new sources outside the traditional relational data warehouse. Data sources include mobile, social, videos, sensors, devices, RFID, web logs, advanced analytics, and click streams. Microsoft’s big data & data warehousing offerings are able to process and query data wherever it might live. Store any data of any size and speed Relational data and non-relational data Real-time data Data of any size from terabytes to petabytes Dynamically scales to match your business priorities Process and query any data, anywhere Distributed query federates and joins heterogeneous data sources from on-premises and cloud, structured and unstructured Create managed data pipelines to orchestrate data transformation On-premises and in the cloud On-premises options with enterprise software, reference architectures and appliances Cloud with virtual machines (IaaS) and managed services (PaaS) Hybrid deployments across on-premises and cloud Leader Gartner has named Microsoft a leader in vision and ability to execute for their 2016 Magic Quadrant for Data Warehouse and Management Solutions for Analytics. COMPLETENESS OF VISION As of February 2016 ABILITY TO EXECUTE LEADERS CHALLENGERS VISIONARIES NICHE PLAYERS Gartner Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics * Amazon Web Services Infobright MarkLogic 1010data HPE Oracle Teradata SAP Microsoft Exasol MongoDB Kognitio Hitachi MemSQL Pivotal Hortonworks Actian Transwarp MapR Technologies Cloudera IBM Choice of tools and strong ecosystem Managed and supported open source tools and Microsoft services for all functions Designed for seamless interaction between tools and services Strong partner ecosystem to integrate and extend the solution LANDING/STORAGE ANALYTICS/PROCESSING WAREHOUSE/PUBLISHING BUSINESS INTELLIGENCE/CONSUMPTION SQL SERVER 2016 AZURE BLOB Extract, transform, load SQL Server Integration Services Direct loading AZURE DATA LAKE STORE Analytics Platform System (APS) Apache HBase POWER BI APPLICATIONS AZURE EVENT HUBS AZURE IoT HUB APACHE KAFKA INGESTION = orchestration AZURE STREAM ANALYTICS APACHE SPARK STREAMING APACHE STORM STREAM PROCESSING STRUCTURED SEMI-STRUCTURED UNSTRUCTURED DATA SOURCES Big data & data warehousing from beginning to end Data exists in many forms, from traditional SQL stores, to IoT devices and sensors. Load raw data into a landing stage, or load it directly into storage, then extract and load it into a warehouse. Transform the data before or after it reaches the data warehouse. Use BI tools to query the isolated store. Or ingest streaming data and process it on the fly. Use big data analytics or machine learning to catch problems before they grow, or to gain insights and meaning. AZURE SQL DATA WAREHOUSE SQL SERVER 2016 SQL Server Analysis Services SQL Server Reporting Services SQL SERVER 2016 ANALYTICS PLATFORM SYSTEM • Enterprise-class cloud data warehouse with T-SQL • Dynamically scale and pause in seconds • Queries integrate relational data with data in Azure Blob Storage • Deploy in seconds • Automatic backup to Azure Blob Storage Move to storage Process Move to warehouse • Compose and orchestrate data movement and processing at scale • Visualize data orchestration • Connect to on-premises and cloud data sources • Data workflow scheduling AZURE DATA FACTORY SQL SERVER INTEGRATION SERVICES • Integrate & transform enterprise data • Extract data from multiple sources & load into multiple destinations • Create solutions without writing code • Optionally code custom components for business needs AZURE MACHINE LEARNING • Uncover patterns hidden in data • Apply statistical methods to solve any problem • Get started in minutes with drag & drop UI • Leverage familiar R and Python support MICROSOFT R SERVER • Discover valuable data insights • Incorporate advanced analytics algorithms • Flexible and agile with exceptional performance & enterprise support • Use ScaleR to compute large data sets • No memory constraints R AZURE STREAM ANALYTICS • Real-time insights from devices and sensors • Enable rapid development with SQL based syntax • Achieve mission-critical reliability and scale • Integrate directly with Power BI to publish real-time data POWER BI SQL SERVER REPORTING SERVICES • Bring together data from a variety of sources and services • Transform and model your data • Create linked, interactive visuals • Share dashboards and reports • Interact with your data anywhere on any device • Create mobile, interactive, tabular, and graphical reports • Show charts, maps, and KPIs • Integrated with Visual Studio • Programming features for automation Data AZURE DATA LAKE STORE AZURE DATA LAKE ANALYTICS • No limits to scale—architected for cloud scale and performance • High frequency, low latency, real-time analytics • HDFS for the cloud • Optimized for massive throughput • Stores data in native format • Enterprise ready: secure, manageable & reliable • Enterprise scale & performance: Scales from workstations to large clusters Growing portfolio of parallelized algorithms Runs R functions in parallel • Secure, scalable R deployment & operationalization • Write once, deploy anywhere: Windows: In-database & standalone R server Linux: RedHat and SuSE Hadoop: HDInsight, Hortonworks, Cloudera, MapR • IDE for data scientists and developers (R Tools for Visual Studio) • On-demand job service built on YARN • Pay for what you use • Use U-SQL—familiar, easily extensible • Develop faster and debug smarter with Visual Studio tools • Query any data store with federated query • Enterprise ready: access control & auditing APACHE SPARK ON AZURE HDINSIGHT MICROSOFT R SERVER ON AZURE HDINSIGHT • Open source processing framework for data analytics • Parallel data processing persists data in-memory, on disk • Suited to ETL, batch, interactive queries • Real-time processing for real-time scenarios APACHE HADOOP ON AZURE HDINSIGHT BATCH MapReduce SCRIPT Pig SQL Hive NoSQL HBase STREAMING MapReduce • Hadoop as a service on Azure: cost-effective, elastic & flexible • Works on Azure Storage or Data Lake Store • Customize clusters to run other Hadoop open-source projects • Crunch all data—structured, semi-structured & unstructured • Scale elastically on demand • Develop in your favorite language PolyBase Analytics Platform System Result set Select ... Cloudera on Linux Hortonworks on Linux Hortonworks on Windows Server AZURE SQL DATA WAREHOUSE • Scale-out, massively parallel processing system supporting integrated data warehouse scenarios for evolving needs • Easy to deploy, ships to your datacenter with hardware and software pre-installed and configured • Queries across relational and non-relational data by leveraging PolyBase • Offers the lowest price per terabyte for large data warehouse workloads DOCUMENT DB AZURE SQL DATABASE • Natively supports JSON and JavaScript • Schema-agnostic documents, automatically indexed • Supports SQL queries • SDKs for JavaScript, Java, Node.js, Python, and .NET • Create pools of elastic databases to manage performance and cost • Develop scalable SaaS applications • Enterprise grade security • Work within your preferred development environments • Manage data variety and volume across all data repositories • Balance all system components with Fast Track reference architectures • Optimized for OLTP, data warehouse and mixed workloads • Full hybrid capability • Gain real-time insights without impacting performance • Elastic scale POLYBASE • Query Azure HDInsight, external Hadoop clusters, or Azure Blob Storage as external tables using T-SQL • Import external data into SQL Server 2016 • Export cold data from SQL Server to Hadoop or Azure Blob Storage while keeping it queryable SQL SERVER 2016 TRANSACT-SQL POLYBASE U-SQL U-SQL query Azure Data Lake Analytics • Single query language for all data • Optimized for big data • Familiar syntax for SQL developers • Unites declarative SQL with imperative C# • Works across structured, semi-structured, and unstructured with federated query • Easily scales across available nodes • Designed for parallelized big data processing • Dedicated tooling with Visual Studio for easy query creation and optimization • Minimize data proliferation issues caused by multiple copies HDInsight Batch processing Real-time processing Stream processing Machine learning Interactive SQL Spark Core Engine Spark SQL Yarn Mesos Standalone scheduler Interactive queries Spark Streaming Stream processing Spark MLlib Machine learning GraphX Graph computation Unifies: Azure Data Lake Store YARN Fast Track DW Real-time Operational Analytics Master Data Services SQL Server R Services APACHE HIVE APACHE PIG APACHE SPARK SQL U-SQL R AZURE MACHINE LEARNING PYTHON T-SQL POLYBASE Microsoft’s big data and data warehousing solutions handle all types of data, end-to-end: streaming, collection, processing, storage, and analytics Big Data & Data Warehousing

Transcript of Big Data & Data Warehousing Data and Data... · Big data & data warehousing from beginning to end...

Page 1: Big Data & Data Warehousing Data and Data... · Big data & data warehousing from beginning to end Data exists in many forms, from traditional SQL stores, to IoT devices and sensors.

Search for big data & data warehouse on msdn.microsoft.com

© 2016 Microsoft Corporation. All rights reserved. Created by the Azure Poster Team Email: [email protected]

* The above graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

R logo by Hadley Wickham and others at RStudio - https://www.r-project.org/logo/, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=35599651

Data, data, everywhere. Data is expanding ten-fold every five years with over 85% of the increase coming from new sources outside the traditional relational data warehouse. Data sources include mobile, social, videos, sensors, devices, RFID, web logs, advanced analytics, and click streams. Microsoft’s big data & data warehousing offerings are able to process and query data wherever it might live.

Store any data of any size and speed• Relational data and non-relational data

• Real-time data

• Data of any size from terabytes to petabytes

• Dynamically scales to match your business priorities

Process and query any data, anywhere• Distributed query federates and joins heterogeneous data sources from on-premises

and cloud, structured and unstructured

• Create managed data pipelines to orchestrate data transformation

On-premises and in the cloud• On-premises options with enterprise software, reference architectures and appliances

• Cloud with virtual machines (IaaS) and managed services (PaaS)

• Hybrid deployments across on-premises and cloud

LeaderGartner has named Microsoft a leader in vision and ability to execute for their 2016 Magic Quadrant for Data Warehouse and Management Solutions for Analytics.

COMPLETENESS OF VISION As of February 2016

ABIL

ITY

TO EX

ECUT

E

LEADERSCHALLENGERS

VISIONARIESNICHE PLAYERS

Gartner Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics *

Amazon Web Services

Infobright

MarkLogic

1010dataHPE

Oracle

Teradata

SAP

Microsoft

Exasol

MongoDB

Kognitio

Hitachi

MemSQL

Pivotal

Hortonworks

ActianTranswarp

MapR Technologies

Cloudera

IBM

Choice of tools and strong ecosystem• Managed and supported open source tools and Microsoft services for all functions

• Designed for seamless interaction between tools and services

• Strong partner ecosystem to integrate and extend the solution

LANDING/STORAGE

ANALYTICS/PROCESSING

WAREHOUSE/PUBLISHING

BUSINESS INTELLIGENCE/CONSUMPTIONSQL SERVER

2016

AZURE BLOB

Extract,

transfo

rm, lo

ad

SQL Server In

tegration Service

s

Direct lo

ading

AZURE DATA LAKE

STORE

Analytics Platform

System (APS)

Apache HBase

POWER BI

APPLICATIONS

AZURE EVENT

HUBS

AZURE IoT

HUB

APACHE KAFKA

INGESTION

= orchestration

AZURE STREAM

ANALYTICS

APACHESPARK

STREAMING APACHE STORM

STREAM PROCESSING

STRUCTURED

SEMI-STRUCTURED

UNSTRUCTURED

DATA SOURCES

Big data & data warehousing from beginning to endData exists in many forms, from traditional SQL stores, to IoT devices and sensors. Load raw data into a landing stage, or load it directly into storage, then extract and load it into a warehouse. Transform the data before or after it reaches the data warehouse.

Use BI tools to query the isolated store. Or ingest streaming data and process it on the fly. Use big data analytics or machine learning to catch problems before they grow, or to gain insights and meaning.

AZURE SQL DATA

WAREHOUSE

SQL SERVER 2016

SQL Server Analysis Services

SQL Server Reporting Services

SQL SERVER 2016 ANALYTICS PLATFORM SYSTEM

• Enterprise-class cloud data warehouse with T-SQL• Dynamically scale and pause in seconds• Queries integrate relational data with data in Azure Blob Storage• Deploy in seconds• Automatic backup to Azure Blob Storage

Move to storage Process Move to warehouse

• Compose and orchestrate data movement and processing at scale

• Visualize data orchestration• Connect to on-premises and cloud

data sources• Data workflow scheduling

AZURE DATA FACTORY

SQL SERVER INTEGRATION SERVICES

• Integrate & transform enterprise data• Extract data from multiple sources &

load into multiple destinations• Create solutions without writing code• Optionally code custom components

for business needs

AZURE MACHINE LEARNING

• Uncover patterns hidden in data• Apply statistical methods to solve any

problem• Get started in minutes with drag &

drop UI• Leverage familiar R and Python

support

MICROSOFT R SERVER

• Discover valuable data insights • Incorporate advanced analytics

algorithms• Flexible and agile with exceptional

performance & enterprise support• Use ScaleR to compute large data sets• No memory constraints

R

AZURE STREAM ANALYTICS

• Real-time insights from devices and sensors• Enable rapid development with SQL based syntax• Achieve mission-critical reliability and scale• Integrate directly with Power BI to publish real-time data

POWER BI SQL SERVER REPORTINGSERVICES

• Bring together data from a variety of sources and services

• Transform and model your data• Create linked, interactive visuals• Share dashboards and reports • Interact with your data anywhere on

any device

• Create mobile, interactive, tabular, and graphical reports

• Show charts, maps, and KPIs• Integrated with Visual Studio• Programming features for

automation

Data

AZURE DATA LAKESTORE

AZURE DATA LAKE ANALYTICS

• No limits to scale—architected for cloud scale and performance

• High frequency, low latency, real-time analytics

• HDFS for the cloud• Optimized for massive throughput• Stores data in native format• Enterprise ready: secure,

manageable & reliable

• Enterprise scale & performance: Scales from workstations to large

clusters Growing portfolio of parallelized 

algorithms Runs R functions in parallel• Secure, scalable R deployment &

operationalization

• Write once, deploy anywhere: Windows: In-database &

standalone R server Linux: RedHat and SuSE Hadoop: HDInsight, Hortonworks,

Cloudera, MapR • IDE for data scientists and

developers (R Tools for Visual Studio)

• On-demand job service built on YARN• Pay for what you use• Use U-SQL—familiar, easily extensible• Develop faster and debug smarter with

Visual Studio tools• Query any data store with federated

query• Enterprise ready: access control &

auditing

APACHE SPARK ON AZURE HDINSIGHT MICROSOFT R SERVER ON AZURE HDINSIGHT

• Open source processing framework for data analytics• Parallel data processing persists data in-memory, on disk• Suited to ETL, batch, interactive queries• Real-time processing for real-time scenarios

APACHE HADOOP ON AZURE HDINSIGHT

BATCH

MapReduce

SCRIPT

Pig

SQL

Hive

NoSQL

HBase

STREAMING

MapReduce

• Hadoop as a service on Azure: cost-effective, elastic & flexible• Works on Azure Storage or Data Lake Store• Customize clusters to run other Hadoop open-source projects• Crunch all data—structured, semi-structured & unstructured• Scale elastically on demand• Develop in your favorite language

PolyBase

Analytics Platform System

Result setSelect ...

Cloudera on LinuxHortonworks on Linux

Hortonworks on Windows Server

AZURE SQL DATA WAREHOUSE

• Scale-out, massively parallel processing system supporting integrated data warehouse scenarios for evolving needs

• Easy to deploy, ships to your datacenter with hardware and software pre-installed and configured

• Queries across relational and non-relational data by leveraging PolyBase • Offers the lowest price per terabyte for large data warehouse workloads

DOCUMENT DBAZURE SQL DATABASE

• Natively supports JSON and JavaScript• Schema-agnostic documents,

automatically indexed • Supports SQL queries• SDKs for JavaScript, Java, Node.js,

Python, and .NET

• Create pools of elastic databases to manage performance and cost

• Develop scalable SaaS applications • Enterprise grade security • Work within your preferred

development environments

• Manage data variety and volume across all data repositories• Balance all system components with Fast Track reference architectures• Optimized for OLTP, data warehouse and mixed workloads• Full hybrid capability• Gain real-time insights without impacting performance• Elastic scale

POLYBASE

• Query Azure HDInsight, external Hadoop clusters, or Azure Blob Storage as external tables using T-SQL

• Import external data into SQL Server 2016• Export cold data from SQL Server to Hadoop or Azure Blob Storage while keeping it

queryable

SQL SERVER 2016

TRANSACT-SQL POLYBASE

U-SQL

U-SQL query

Azure Data Lake Analytics

• Single query language for all data• Optimized for big data• Familiar syntax for SQL developers• Unites declarative SQL with imperative C#• Works across structured, semi-structured,

and unstructured with federated query

• Easily scales across available nodes• Designed for parallelized big data

processing• Dedicated tooling with Visual Studio for

easy query creation and optimization• Minimize data proliferation issues caused

by multiple copies

HDInsight

Batch processingReal-time processingStream processingMachine learningInteractive SQLSpark Core Engine

Spark SQL

Yarn Mesos Standalone scheduler

Interactivequeries

Spark Streaming

Stream processing

Spark MLlib

Machine learning

GraphX

Graph computation

Unifies:

Azure Data Lake Store

YARN

Fast Track DW Real-time Operational AnalyticsMaster Data ServicesSQL Server R Services

APACHE HIVE

APACHE PIG APACHE SPARK SQL

U-SQL

R

AZURE MACHINE

LEARNING

PYTHON

T-SQL POLYBASE

Microsoft’s big data and data warehousing solutions handle all types of data, end-to-end: streaming, collection, processing, storage, and analytics

Big Data & Data Warehousing