Microsoft Big Data Essentials Module 1 - Introduction to Big Data

23
Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya

description

Microsoft Big Data Essentials Module 1 - Introduction to Big Data. Saptak Sen, Microsoft Bill Ramos, Advaiya. Agenda. Why Big Data? Big Data Lambda Architecture Getting started with Windows Azure HDInsight Service. The Business Imperative. 1 . . 2 . . 3. . 4. . - PowerPoint PPT Presentation

Transcript of Microsoft Big Data Essentials Module 1 - Introduction to Big Data

Page 1: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Microsoft Big Data EssentialsModule 1 - Introduction to Big Data

Saptak Sen, MicrosoftBill Ramos, Advaiya

Page 2: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

• Why Big Data?

• Big Data Lambda Architecture

• Getting started with Windows Azure HDInsight Service

Agenda

Page 3: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

The Business Imperative

1. 2. 4. 3. Human Fault Tolerance

Minimize CapEx Low Learning CurveHyper Scale on Demand

Page 4: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

CAP Theorem

Consistency

C

Partition Tolerance

PAvailabili

ty

A

Page 5: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Big Data Lambda Architecture

Page 6: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Big Data Lambda Architecture• Batch layer• Stores master dataset• Compute arbitrary views

• Speed layer• Fast, incremental algorithms• Batch layer eventually

overrides speed layer

• Serving layer• Random access to batch

views• Updated by batch layer

Serving Layer

Speed Layer

Batch Layer

Page 7: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

The Batch Layer

• Stores master dataset (in append mode)

• Unrestrained computation

• Horizontally scalable

• High latency

Incoming data

streamsMaster dataset

Batch views

Page 8: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

The Speed Layer

• Stream processing of data

• Stores a limited window of data

• Dynamic computation

Real-time increments

Incoming data

streams

Process stream

Increment views

Real-time views

Page 9: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

The Serving Layer

• Queries the batch and real-time views

• Merges the resultsReal-time views

Batch views

Querying and

mergingOutput

Page 10: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Microsoft Lambda Architecture Support Serving LayerSpeed LayerBatch Layer

Windows Azure HDInsightAzure Blob storageMapReduce, Hive, Pig, Oozie, SSIS

Federations in Windows Azure SQL Database Azure tablesMemcached/MongoDBSQL Server database engineSQL Server VM:• Columnstore

indexes• Analysis Services• StreamInsight

Azure Storage ExplorerMicrosoft ExcelPower QueryPowerPivot Power ViewPower MapReporting ServicesLINQ to HiveAnalysis Services

Page 11: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Serving LayerSpeed LayerBatch Layer

Apache Hadoop

Yahoo!

SQL Server Analysis Service (SSAS)Microsoft Excel and PowerPivotOther BI Tools and Custom Applications

Hadoop Data

Third Party Database

SQL Server Analysis Services

(SSAS Cube)

+Custom

Applications

SQL Server Connector (Hadoop Hive ODBC)

Staging Database

Microsoft Excel & PowerPivot for

Excel

Page 12: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Serving LayerSpeed LayerBatch Layer

Windows Azure HDInsight

Ferranti Computer Systems

Microsoft Dynamics AXSQL Server Analysis ServicesSQL Server Reporting Services

SQL Server (In-Memory OLTP)

Data Feed from Smart Meters

Reactive Extensions (Rx)SQL Server Database (In-Memory OLTP)

Reactive Extensions (Rx)

Windows Azure

HDInsight

SQL Server Analysis Services

SQL Server ReportingServices

Microsoft Dynamics

AX

Page 13: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Windows Azure Storage

Page 14: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Serving LayerSpeed LayerBatch Layer

Azure Blob storage

Windows AzureBlob storage

Demo 1: Setting up the Windows Azure storage account

Azure Storage Explorer

Azure Storage Explorer

Page 15: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Blob Storage Concepts• Store large amounts of

unstructured text or binary data with the fastest read performance

• Highly scalable, durable, and available file system

• Blobs can be exposed publically over HTTP

• Securely lock down permissions to blobs

BlobContainer

Account

Images

PIC01.JPG

Video

VID1.AVI

http://<account>.blob.core.windows.net/<container>/<blobname>

Pages/Blocks

Block/Page

Block/Page

PIC02.JPGContoso

Page 16: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Getting started with HDInsight Service

Page 17: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Demo 2: Setting up the Windows Azure HDInsight cluster

Windows Azure HDInsightAzure Blob storage

Windows AzureHDInsight

Windows AzureBlob storage

HDInsight Console

HDInsight Console

https://<ClusterName>.azurehdinsight.net/

Serving LayerSpeed LayerBatch Layer

Page 18: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Demo 3: Loading data into Windows Azure storage for use with HDInsight

Windows Azure HDInsightAzure Blob storage

Windows AzureHDInsight

Windows AzureBlob storage

HDInsight Console

HDInsight Console

https://<ClusterName>.azurehdinsight.net/

Serving LayerSpeed LayerBatch Layer

CSV files from local disk

Page 19: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Easy Access to Data, Big & Small

Page 20: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Easy Access to Data, Big & SmallSimplify access to public & corporate dataEasily preview, shape, & format your data

Combine and refine data across multiple sourcesGain insight across relational, unstructured, & semi-structured data

Common management of structured & unstructured dataQuery across relational DB & Hadoop with single T-SQL Query

Power QueryWindows Azure MarketplaceWindows Azure HDInsight ServiceParallel Data Warehouse with Polybase

Page 22: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data

Questions?

Page 23: Microsoft Big  Data  Essentials Module  1 - Introduction to Big  Data