Spark the future.
May 4 – 8, 2015Chicago, IL
Big Data for the SQL Ninja
Scott KleinSenior Technical Evangelist
BRK2550
A little about Scott…
Why are you here?You want to advance your careerYou want and/or need to learn about big data technologiesWhere is the role of the DBA going?
The Microsoft data platform capabilities
Transform+ analyze
Visualize+ decide
Capture+ manage
Data
Visualize + decide
MobileReportsNatural LanguageDashboardsApplications
Complex Event Processing
Transform + analyze
Orchestration PredictionQueryInformation management
Search Streaming
Capture + manage
RelationalInternal & external
Non-relational
The Microsoft data platform capabilities
Transform+ analyze
Visualize+ decide
Capture+ manage
Data
Visualize + decide
MobileReportsNatural LanguageDashboardsApplications
Complex Event Processing
Transform + analyze
Orchestration PredictionQueryInformation management
Search Streaming
Capture + manage
RelationalInternal & external
Non-relational
What is Big Data? It’s all about the V’s… Volume …
Variety …
Velocity …
SizesKilo - 1,000Mega - 1,000,000 Giga - 1,000,000,000Tera - 1,000,000,000,000Peta - 1,000,000,000,000,000Exa - 1,000,000,000,000,000,000Zetta - 1,000,000,000,000,000,000,000Yotta - 1,000,000,000,000,000,000,000,000
Some interesting facts72 hours of video are uploaded per minute on YouTube (1 terabyte every 4 minutes)
500 terabytes of new data per day are ingested in Facebook databases
Sensors from a Boeing jet engine create 20 terabytes of data every hour
The proposed Square Kilometer Array telescope will generate “a few Exabytes of data per day” (single beam)
Hadoop Ecosystem
Distributed Storage(HDFS)
Query(Hive)
Distributed Processing
(MapReduce)
Scripting(Pig)
NoSQ
L Data
base
(HB
ase
)
Metadata(HCatalog)
Data
Inte
gra
tion
( OD
BC
/ SQ
OO
P/ REST)
Rela
tiona
l(S
QL
Serve
r)
Machine Learning(Mahout)
Graph(Pegasus
)
Stats processin
g(RHadoo
p)
Eve
nt Pip
elin
e(Flu
me)
Active Directory (Security)
Monitoring & Deployment
(System Center)
C#, F#, .NET
PowerShell
Pipelin
e / w
orkflow
(Oozie
)
Azure Storage Vault (ASV)
APS | Po
lybase
Busin
ess
Inte
lligence
(E
xcel, Po
wer
Vie
w, S
SA
S)
World's Data (Azure Data Marketplace)
Eve
nt
Drive
n
Proce
ssing
LegendRed = Core HadoopBlue = Data processingPurple = Microsoft integration points and value addsOrange = Data MovementGreen = Packages
The Hadoop EcosystemETL Tools BI Reporting RDBMS
HDInsight
HDInsight• HDInsight is a Hadoop-based service that brings 100%
Apache Hadoop solution running on the Microsoft Azure platform
• Based on the Hortonworks Data Platform (HDP)• Scalable, on-demand service
RDBMS vs. Hadoop
RDBMS HADOOP
Data size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
ScalingNonlinear Linear
StorageTwo choices
Azure Storage (Blob)
File System
Demo
Now What?Working with your HDInsight cluster – running jobs, import/export data, viewing and consuming data…• .NET• Java• Hive• Sqoop• Pig• Storm / Stream Analytics• Excel• Etc.
Hive
What is Hive?• A data warehouse infrastructure built on top of
Hadoop for providing data summarization, query and analysis
• Provides a SQL-like language called HiveQL to query data
• Integration between Hadoop and BI and visualization tools
http://hive.apache.org
Demo
Sqoop
What is Sqoop?Command-line interface application to transfer bulk data between Hadoop and relational databases
http://sqoop.apache.org
Demo
Storm
What is Storm?• Apache Storm is a distributed, fault-tolerant, open-source real-
time event processing solution for large, fast streams of data• HDInsight provides a fully managed Apache Storm on Azure
http://storm.apache.org/
Demo
Hbase
NoSQL?
“No” SQL = Not Only SQL
What is HBase?• Open-source, distributed, non-relational database• Column-oriented, key-value built to run on top of Hadoop
HDFS
http://hbase.apache.org
Demo
Microsoft Big Data Solutions move to end
Power View Excel with PowerPivot Embedded BIPredictive Analytics
APPsLOBCRMERP
Connectors
Microsoft EDW
SSAS SSRS
Devices CrawlersSensors Bots
Hadoop On Windows Server
Hadoop On Azure
SummaryBig data isn’t scaryYou can use technologies and languages you are already familiar withThe role of the DBA
HDInsight – Call to ActionKey Sessions at IgniteBRK3555-Real-Time Analytics at Scale for Internet of ThingsBRK2550-Big Data for the SQL NinjaBRK2576-Planning your Big Data Architecture on AzureBRK3556-Optimizing Hadoop using Microsoft Azure HDInsightBRK3559-Build Hybrid Big Data Pipelines with Azure Data Factory and Azure HDInsight
Sign Up for HDInsight Free Trialhttp://azure.com/hdinsight
Sign up for Azure Data Lake Previewhttp://azure.com/datalake
Ignite Azure Challenge Sweepstakes
Attend Azure sessions and activities, track your progress online, win raffle tickets for great prizes!
Aka.ms/MyAzureChallenge
Enter this session code online: “TZDL”
NO PURCHASE NECESSARY. Open only to event attendees. Winners must be present to win. Game ends May 9th, 2015. For Official Rules, see The Cloud and Enterprise Lounge or myignite.com/challenge
Questions?
?
Visit Myignite at http://myignite.microsoft.com or download and use the Ignite Mobile App with the QR code above.
Please evaluate this sessionYour feedback is important to us!
© 2015 Microsoft Corporation. All rights reserved.
Top Related