Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos,...

23
Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya

Transcript of Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos,...

Page 1: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Microsoft Big Data EssentialsModule 1 - Introduction to Big Data

Saptak Sen, MicrosoftBill Ramos, Advaiya

Page 2: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

• Why Big Data?

• Big Data Lambda Architecture

• Getting started with Windows Azure HDInsight Service

Agenda

Page 3: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

The Business Imperative

1. 2. 4. 3. Human Fault Tolerance

Minimize CapEx Low Learning CurveHyper Scale on Demand

Page 4: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

CAP Theorem

Consistency

C

Partition Tolerance

PAvailabili

ty

A

Page 5: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Big Data Lambda Architecture

Page 6: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Big Data Lambda Architecture

• Batch layer• Stores master dataset• Compute arbitrary views

• Speed layer• Fast, incremental algorithms• Batch layer eventually

overrides speed layer

• Serving layer• Random access to batch

views• Updated by batch layer

Serving Layer

Speed Layer

Batch Layer

Page 7: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

The Batch Layer

• Stores master dataset (in append mode)

• Unrestrained computation

• Horizontally scalable

• High latency

Incoming data

streams

Master dataset

Batch views

Page 8: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

The Speed Layer

• Stream processing of data

• Stores a limited window of data

• Dynamic computation

Real-time increments

Incoming data

streams

Process stream

Increment views

Real-time views

Page 9: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

The Serving Layer

• Queries the batch and real-time views

• Merges the resultsReal-time views

Batch views

Querying and

mergingOutput

Page 10: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Microsoft Lambda Architecture Support Serving LayerSpeed LayerBatch Layer

Windows Azure HDInsight

Azure Blob storage

MapReduce, Hive, Pig, Oozie, SSIS

Federations in Windows Azure SQL Database

Azure tables

Memcached/MongoDB

SQL Server database engine

SQL Server VM:

• Columnstore indexes

• Analysis Services

• StreamInsight

Azure Storage Explorer

Microsoft Excel

Power Query

PowerPivot

Power View

Power Map

Reporting Services

LINQ to Hive

Analysis Services

Page 11: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Serving LayerSpeed LayerBatch Layer

Apache Hadoop

Yahoo!

SQL Server Analysis Service (SSAS)

Microsoft Excel and PowerPivot

Other BI Tools and Custom Applications

Hadoop Data

Third Party Database

SQL Server Analysis Services

(SSAS Cube)

+Custom

Applications

SQL Server Connector (Hadoop Hive ODBC)

Staging Database

Microsoft Excel & PowerPivot for

Excel

Page 12: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Serving LayerSpeed LayerBatch Layer

Windows Azure HDInsight

Ferranti Computer Systems

Microsoft Dynamics AX

SQL Server Analysis Services

SQL Server Reporting Services

SQL Server (In-Memory OLTP)

Data Feed from Smart Meters

Reactive Extensions (Rx)

SQL Server Database (In-Memory OLTP)

Reactive Extensions (Rx)

Windows Azure

HDInsight

SQL Server Analysis Services

SQL Server ReportingServices

Microsoft Dynamics

AX

Page 13: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Windows Azure Storage

Page 14: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Serving LayerSpeed LayerBatch Layer

Azure Blob storage

Windows AzureBlob storage

Demo 1: Setting up the Windows Azure storage account

Azure Storage Explorer

Azure Storage Explorer

Page 15: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Blob Storage Concepts

• Store large amounts of unstructured text or binary data with the fastest read performance

• Highly scalable, durable, and available file system

• Blobs can be exposed publically over HTTP

• Securely lock down permissions to blobs

BlobContainer

Account

Images

PIC01.JPG

Video

VID1.AVI

http://<account>.blob.core.windows.net/<container>/<blobname>

Pages/Blocks

Block/Page

Block/Page

PIC02.JPGContoso

Page 16: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Getting started with HDInsight Service

Page 17: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Demo 2: Setting up the Windows Azure HDInsight cluster

Windows Azure HDInsight

Azure Blob storage

Windows AzureHDInsight

Windows AzureBlob storage

HDInsight Console

HDInsight Console

https://<ClusterName>.azurehdinsight.net/

Serving LayerSpeed LayerBatch Layer

Page 18: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Demo 3: Loading data into Windows Azure storage for use with HDInsight

Windows Azure HDInsight

Azure Blob storage

Windows AzureHDInsight

Windows AzureBlob storage

HDInsight Console

HDInsight Console

https://<ClusterName>.azurehdinsight.net/

Serving LayerSpeed LayerBatch Layer

CSV files from local disk

Page 19: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Easy Access to Data, Big & Small

Page 20: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Easy Access to Data, Big & SmallSimplify access to public & corporate data

Easily preview, shape, & format your data

Combine and refine data across multiple sources

Gain insight across relational, unstructured, & semi-structured data

Common management of structured & unstructured data

Query across relational DB & Hadoop with single T-SQL Query

Power Query

Windows Azure Marketplace

Windows Azure HDInsight Service

Parallel Data Warehouse with Polybase

Page 22: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.

Questions?

Page 23: Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya.