Transform + analyze Visualize + decide Capture + manage Dat a.

Post on 19-Dec-2015

221 views 1 download

Tags:

Transcript of Transform + analyze Visualize + decide Capture + manage Dat a.

Spark the future.

May 4 – 8, 2015Chicago, IL

Big Data for the SQL Ninja

Scott KleinSenior Technical Evangelist

BRK2550

A little about Scott…

Why are you here?You want to advance your careerYou want and/or need to learn about big data technologiesWhere is the role of the DBA going?

The Microsoft data platform capabilities

Transform+ analyze

Visualize+ decide

Capture+ manage

Data

Visualize + decide

MobileReportsNatural LanguageDashboardsApplications

Complex Event Processing

Transform + analyze

Orchestration PredictionQueryInformation management

Search Streaming

Capture + manage

RelationalInternal & external

Non-relational

The Microsoft data platform capabilities

Transform+ analyze

Visualize+ decide

Capture+ manage

Data

Visualize + decide

MobileReportsNatural LanguageDashboardsApplications

Complex Event Processing

Transform + analyze

Orchestration PredictionQueryInformation management

Search Streaming

Capture + manage

RelationalInternal & external

Non-relational

What is Big Data? It’s all about the V’s… Volume …

Variety …

Velocity …

SizesKilo - 1,000Mega - 1,000,000 Giga - 1,000,000,000Tera - 1,000,000,000,000Peta - 1,000,000,000,000,000Exa - 1,000,000,000,000,000,000Zetta - 1,000,000,000,000,000,000,000Yotta - 1,000,000,000,000,000,000,000,000

Some interesting facts72 hours of video are uploaded per minute on YouTube (1 terabyte every 4 minutes)

500 terabytes of new data per day are ingested in Facebook databases

Sensors from a Boeing jet engine create 20 terabytes of data every hour

The proposed Square Kilometer Array telescope will generate “a few Exabytes of data per day” (single beam)

Hadoop Ecosystem

Distributed Storage(HDFS)

Query(Hive)

Distributed Processing

(MapReduce)

Scripting(Pig)

NoSQ

L Data

base

(HB

ase

)

Metadata(HCatalog)

Data

Inte

gra

tion

( OD

BC

/ SQ

OO

P/ REST)

Rela

tiona

l(S

QL

Serve

r)

Machine Learning(Mahout)

Graph(Pegasus

)

Stats processin

g(RHadoo

p)

Eve

nt Pip

elin

e(Flu

me)

Active Directory (Security)

Monitoring & Deployment

(System Center)

C#, F#, .NET

PowerShell

Pipelin

e / w

orkflow

(Oozie

)

Azure Storage Vault (ASV)

APS | Po

lybase

Busin

ess

Inte

lligence

(E

xcel, Po

wer

Vie

w, S

SA

S)

World's Data (Azure Data Marketplace)

Eve

nt

Drive

n

Proce

ssing

LegendRed = Core HadoopBlue = Data processingPurple = Microsoft integration points and value addsOrange = Data MovementGreen = Packages

The Hadoop EcosystemETL Tools BI Reporting RDBMS

HDInsight

HDInsight• HDInsight is a Hadoop-based service that brings 100%

Apache Hadoop solution running on the Microsoft Azure platform

• Based on the Hortonworks Data Platform (HDP)• Scalable, on-demand service

RDBMS vs. Hadoop

RDBMS HADOOP

Data size Gigabytes (Terabytes) Petabytes (Hexabytes)

Access Interactive and Batch Batch

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

ScalingNonlinear Linear

StorageTwo choices

Azure Storage (Blob)

File System

Demo

Now What?Working with your HDInsight cluster – running jobs, import/export data, viewing and consuming data…• .NET• Java• Hive• Sqoop• Pig• Storm / Stream Analytics• Excel• Etc.

Hive

What is Hive?• A data warehouse infrastructure built on top of

Hadoop for providing data summarization, query and analysis

• Provides a SQL-like language called HiveQL to query data

• Integration between Hadoop and BI and visualization tools

http://hive.apache.org

Demo

Sqoop

What is Sqoop?Command-line interface application to transfer bulk data between Hadoop and relational databases

http://sqoop.apache.org

Demo

Storm

What is Storm?• Apache Storm is a distributed, fault-tolerant, open-source real-

time event processing solution for large, fast streams of data• HDInsight provides a fully managed Apache Storm on Azure

http://storm.apache.org/

Demo

Hbase

NoSQL?

“No” SQL = Not Only SQL

What is HBase?• Open-source, distributed, non-relational database• Column-oriented, key-value built to run on top of Hadoop

HDFS

http://hbase.apache.org

Demo

SummaryBig data isn’t scaryYou can use technologies and languages you are already familiar withThe role of the DBA

HDInsight – Call to ActionKey Sessions at IgniteBRK3555-Real-Time Analytics at Scale for Internet of ThingsBRK2550-Big Data for the SQL NinjaBRK2576-Planning your Big Data Architecture on AzureBRK3556-Optimizing Hadoop using Microsoft Azure HDInsightBRK3559-Build Hybrid Big Data Pipelines with Azure Data Factory and Azure HDInsight

Sign Up for HDInsight Free Trialhttp://azure.com/hdinsight

Sign up for Azure Data Lake Previewhttp://azure.com/datalake

Ignite Azure Challenge Sweepstakes

Attend Azure sessions and activities, track your progress online, win raffle tickets for great prizes!

Aka.ms/MyAzureChallenge

Enter this session code online: “TZDL”

NO PURCHASE NECESSARY. Open only to event attendees. Winners must be present to win. Game ends May 9th, 2015. For Official Rules, see The Cloud and Enterprise Lounge or myignite.com/challenge

Questions?

?

Visit Myignite at http://myignite.microsoft.com or download and use the Ignite Mobile App with the QR code above.

Please evaluate this sessionYour feedback is important to us!

© 2015 Microsoft Corporation. All rights reserved.