asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.
-
Upload
merry-henry -
Category
Documents
-
view
219 -
download
0
Transcript of asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.
![Page 1: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/1.jpg)
![Page 2: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/2.jpg)
Data Management in Microsoft HDInsight: How to Move and Store Your Data
Andrew ConradDevelopment Lead HDInsight Team
DBI-B334
![Page 3: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/3.jpg)
Agenda
• What is HDInsight• Hadoop, OSS and HDInsight • HDInsight Architecture
• Working with Data in HDInsight• Where & how to store data for easy “big data” processing
• Consuming Result Sets from HDInsight Queries/Jobs• How to move result sets into familiar tools/solutions (Excel, RDBMS, etc)
• Questions
![Page 4: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/4.jpg)
What is Big Data?
![Page 5: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/5.jpg)
Big Data according to Wikipedia
Big Data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.
![Page 6: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/6.jpg)
Hardware and storage economics
User Expectations
Multiple sources
Why Big Data?
Large data volumes
011101101000111
Multiple data types
011101101000111
Real-time data creation
![Page 7: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/7.jpg)
What is HDInsight?
![Page 8: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/8.jpg)
What is Hadoop
• Collection of open sources projects in Apache for storing/processing big data (large, un/semi-structured data)
• Evolved over past 7+ years to power some of the largest data-driven sites/products
• The foundation/”kernel” of HDInsight
![Page 9: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/9.jpg)
What is HDInsight
• Enterprise grade big data platform • Built on Hadoop in partnership with
Hortonworks• Currently available as a preview service on
Windows Azure• Lets take a closer look…
![Page 10: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/10.jpg)
Windows Azure HDInsight Service
Hadoop
Windows Azure Blob Storage
HDFS
Hadoop Filesystem Interface
Hive Pig Map Reduce
Query & Metadata:
SqoopData Movement:
OozieWorkflow:
HCatalog
Gateway (REST APIs)
Data upload/download
Ambari
Monitoring:
Job submission (hive query, etc)
![Page 11: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/11.jpg)
Windows Azure HDInsight Service
Compute NodeCompute NodeCompute NodeCompute Node
Windows Azure Blob StorageHead
Node
Gateway (REST APIs)
Hadoop Cluster
Job submission (hive query, etc)
Cluster Dashboard UI
![Page 12: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/12.jpg)
Storing Data in HDInsight
![Page 13: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/13.jpg)
DEMOCreating a Hadoop Cluster, Explore Filesystem
![Page 14: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/14.jpg)
Storing Data for use with HDInsight Service• WHERE: All persistent data stored in Windows Azure Blob Storage
• Provides sharable, persistent, highly-scalable storage with Geo DR• HDInsight has been optimized for fast access from its compute nodes to blob storage
in the same Azure region (east, west, etc)• WHAT: File format used in blob storage is up you, but using a format with existing
serializer/deserializers (aka SerDe) is often a good choice (e.g. comma delim, Avro, JSON, etc)
• WHY: By separating HDInsight compute nodes from persistent storage you can:• Pay only for what you need: drop your HDInsight cluster whenever you don’t have
work to do• Multiple clusters access the same data, but isolate the compute resources by
org/job/team/etc.• HOW: • All data access in Hadoop goes through a pluggable file system interface• In on-prem Hadoop installations, this interface is implemented by Hadoop Distributed
File System (HDFS)• In Azure, HDInsight clusters use this mechanism to be wired to blob storage accounts
by default
![Page 15: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/15.jpg)
Using Blob Storage From HDInsight
asv[s]://<container>@<account>.blob.core.windows.net/<path>
<property> <name>fs.azure.account.key.accountname</name> <value>enterthekeyvaluehere</value></property>
• An HDInsight cluster is bound to one “default” blob storage account & container at cluster create time
• Using the “default” container requires no special addressing to access (“/” == root folder, etc)
• To access additional blob storage accounts or containers:
• Storage accounts other than the default need to be registered in site-config.xml:
![Page 16: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/16.jpg)
Transferring Data in HDInsight
![Page 17: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/17.jpg)
Uploading Data to Blob Storage
• For prototyping / samples: #put
• For production data interact directly with blob storage APIs. • AzCopy Command Line• CopyBlob REST API• Third party upload/download tools:
![Page 18: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/18.jpg)
AzCopy Example
C:\blobs\a.txtC:\blobs\b.txtC:\blobs\dir1\c.txtC:\blobs\dir1\dir2\d.txt
AzCopy c:\blobs https://<account>.blob.core.windows.net/mycontainer/ /destkey:<key> /S
Container Blob Name
mycontainer a.txt
mycontainer b.txt
mycontainer dir1\c.txt
mycontainer dir1\dir2\d.txt
Blob Storage:File System:
Command Line:
HDInsight will treat this as a file in a 2-level dir structure
![Page 19: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/19.jpg)
DEMOCopy blob, Query with Hive
![Page 20: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/20.jpg)
Sharing and Consuming results
![Page 21: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/21.jpg)
Consuming HDInsight Result Sets
Target Destination Tool / Library Requires Active HDInsight Cluster
SQL Server,Azure SQL DB
Sqoop (Hadoop ecosystem project) Yes
Excel Codename “Data Explorer” No
Another Blob Storage Account
Azure Blob Storage REST APIs (Copy Blob, etc)
No
SQL Server Analysis Services
Hive ODBC Driver Yes
Existing BI Apps Hive ODBC Driver (assumes app supports ODBC connections to data sources)
Yes
![Page 22: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/22.jpg)
DEMOConsume Result Sets – SQL DB
![Page 23: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/23.jpg)
DEMOConsume Result Sets – Excel & “Data Explorer”
![Page 24: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/24.jpg)
ETL Big Data style
![Page 25: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/25.jpg)
The HDInsight ETL pipeline
• Extract data from multiple sources into Azure Blob Storage
• Transform data using Hive, Pig, and Map/Reduce
• Load data into applications for analysis and visualization
![Page 26: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/26.jpg)
Programming HDInsightExisting Ecosystem
Hive, Pig, Mahout, Cascading, Scalding, Scoobi, Pegasus…
.NET
JavaScript
DevOps / IT Pros
C#, F# Map/Reduce, LINQ to Hive, .NET management clients
JavaScript Map/Reduce, Browser hosted console, Node.js management clients
PowerShell, Cross Platform CLI tools, SSIS Custom tasks
![Page 27: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/27.jpg)
DEMOHDInsight Big Data pipeline using SSIS
![Page 28: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/28.jpg)
SDK
Sources http://hadoopsdk.codeplex.com http://www.github.com/windowsazure
NuGet packages Microsoft.Hadoop.MapReduce Microsoft.Hadoop.Hive Microsoft.Hadoop.WebHDFS => WebClient
NPM packages Azure Azure-cli
![Page 29: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/29.jpg)
Summary
• HDInsight is an enterprise grade Hadoop-based big data storage/processing platform
• Azure Blob Storage + HDInsight == Simple big data storage and processing in the cloud and is available to try today
• Consuming results from HDInsight into familiar tools, app, etc (Excel, etc) is simple with Data Explorer, Azure Blob APIs, Sqoop, ODBC, etc.
![Page 30: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/30.jpg)
Questions?
![Page 31: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/31.jpg)
msdn
Resources for Developers
http://microsoft.com/msdn
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources
Sessions on Demand
http://channel9.msdn.com/Events/TechEdEurope
Resources for IT Professionals
http://microsoft.com/technet
![Page 32: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/32.jpg)
Evaluate this session
Scan this QR code to evaluate this session.
![Page 33: asv[s]:// @.blob.core.windows.net/ fs.azure.account.key.accountname enterthekeyvaluehere.](https://reader035.fdocuments.us/reader035/viewer/2022062500/56649e8f5503460f94b9405a/html5/thumbnails/33.jpg)
© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.