HdInsight essentials Hadoop on Microsoft Platform
-
Upload
nvvrajesh -
Category
Data & Analytics
-
view
185 -
download
4
description
Transcript of HdInsight essentials Hadoop on Microsoft Platform
![Page 1: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/1.jpg)
HDInsight Essentials ISBN : 1849695369 / ISBN 13 : 9781849695367
Rajesh Nadipalli 05/01/2014
![Page 2: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/2.jpg)
Goals of this Book • Focus on Microso'’s new Hadoop distribu=on • Serve as Quick Reference • Provide an Overview of Hadoop • Address both cloud and on-‐premise setup for HDInsight • Highlight HDInsight differen:ator • Provide Prac=cal & Real world examples
![Page 3: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/3.jpg)
Book Table of Contents • Chapter 1: HDInsight in a Heartbeat • Chapter 2: Deployment HDInsight on premise • Chapter 3: HDInsight Azure cloud service • Chapter 4: Administer your cluster • Chapter 5: Ingest data to your cluster • Chapter 6: Transform data in your cluster • Chapter 7: Analyze & Report data from cluster • Chapter 8: Project Planning & Architectural Considera=ons
![Page 4: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/4.jpg)
CHAPTER 1 HIGHLIGHTS: HDINSIGHT IN A HEARTBEAT
![Page 5: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/5.jpg)
Big Data Problem Characteristics
![Page 6: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/6.jpg)
Hadoop Overview
Self Healing Distributed Storage
Fault Tolerant Distributed Computing
+ Abstraction for
Parallel Processing
CORE HADOOP COMPONENTS • HDFS: Distributed Storage – replicated, self-‐healing and scalable
• MapReduce: Parallel Processing, process local data for efficiency
![Page 7: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/7.jpg)
NameNode
JobTracker TaskTracker
TaskTracker
TaskTracker
MapReduce Layer
Distributed File System
Layer Secondary NameNode
Master Node Slaves Nodes
DataNode
DataNode
DataNode
Hadoop Nodes Layout
![Page 8: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/8.jpg)
Data Sources
RDBMS Databases
Audio, Images Log Files Sensors,
RFID Social
Media, Feeds
Hadoop Data Store
HDFS
Hbase (NOSQL DB)
Data Processing
Mapreduce
Data Access
Hive Pig Mahout Machine Learning
Flume, Sqoop
Excel
Business Data Feeds
Zook
eepe
r (Distrib
uted Process M
anag
ement)
Hcatalog (M
etad
ata on
Pig, H
ive, M
apRe
duce )
Oozie Workflow, Scheduler
Infrastructure , Ope
ra:o
ns
(Mon
itorin
g, Con
figura<
on)
Hadoop Eco System
![Page 9: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/9.jpg)
Collect & Import to HDFS
Process (MapReduce)
Analyze (BI Tools) Report & Publish
End to End Solution on Hadoop
![Page 10: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/10.jpg)
Popular Hadoop Distributions • Amazon Elas=c MapReduce (cloud, hbp://aws.amazon.com/elas=cmapreduce/)
• Cloudera (hbp://www.cloudera.com/content/cloudera/en/home.html)
• EMC PivitolHD (hbp://gopivotal.com/)
• Hortonworks HDP (hbp://hortonworks.com/)
• MapR (hbp://mapr.com/)
• Microsod HDInsight (cloud, hbp://www.windowsazure.com/)
![Page 11: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/11.jpg)
HDInsight Differenciator • Enterprise-‐ready Hadoop backed by Microsod
• Analy:cs using Excel
• Integra=on with Ac=ve Directory.
• Integra=on with .NET and Javascript
• Connectors to RDBMS
• Scale using cloud offering: Azure HDInsight service enables customers to scale quickly and has seamless interface between HDFS and Azure Storage Vault
• JavaScript Console
![Page 12: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/12.jpg)
WordCount in HDInsight
![Page 13: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/13.jpg)
CHAPTER 2 HIGHLIGHTS: HDINSIGHT INSTALL ON PREMISE
![Page 14: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/14.jpg)
Apache Hadoop
• Open Source Sodware • Community Development
Hortonworks Data PlaSorm
• Enterprise Hadoop Plagorm (HDP) • Leaders in Hadoop • Code commibers to Hadoop
Microso' HDInsight
• Built on top of HDP • Integra=on with ASV, Excel, Powerview,
SQLServer, Ac=ve Directory
HDInsight Distribution
![Page 15: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/15.jpg)
Physical Install Options
NN SNN JT
DN / TT
Single node for development/test
Mul= node for produc=on
![Page 16: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/16.jpg)
Multi Node Install Steps • Pre-‐requisites • Networking Setup • Remote Scrip=ng • Firewall Setup • Sodware Install (each node) • Hadoop Configura=on • Verifica=on
![Page 17: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/17.jpg)
CHAPTER 3 HIGHLIGHTS: HDINSIGHT AZURE SERVICE
![Page 18: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/18.jpg)
Azure Cloud Service
Create Storage
Create HDInsight cluster
![Page 19: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/19.jpg)
CHAPTER 4 HIGHLIGHTS: ADMINISTER YOUR CLUSTER
![Page 20: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/20.jpg)
HDInsight Cluster Management
![Page 21: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/21.jpg)
HDInsight Dashboard
![Page 22: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/22.jpg)
HDInsight Dashboard
![Page 23: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/23.jpg)
NameNode Status
![Page 24: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/24.jpg)
Jobtracker Status
![Page 25: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/25.jpg)
CHAPTER 5 HIGHLIGHTS: INGEST DATA INTO YOUR CLUSTER
![Page 26: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/26.jpg)
Loading Data into your Cluster You have following op=ons… • Loading data using Hadoop commands • Loading data using Azure Storage Vault • Loading data using Interac:ve JavaScript • Shipping data to your Cluster • Loading data from RDBMS via Sqoop
![Page 27: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/27.jpg)
Loading via Azure Storage Explorer
![Page 28: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/28.jpg)
CHAPTER 6 HIGHLIGHTS: TRANSFORM YOUR DATA
![Page 29: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/29.jpg)
Transforming Data You have following op=ons… • MapReduce • Hive • Pig • Others
![Page 30: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/30.jpg)
Processing Data in Cluster Map for Jan2012
Map for Feb2012
Map for Apr2013
…
One Reducer
![Page 31: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/31.jpg)
HDFS
Hive JDBC/OBDC
Metastore
Thrift Server
Command Line Web GUI
Driver (Parser, Planner, Executor)
MapReduce
Hive
![Page 32: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/32.jpg)
Raw Data in HDFS • Distributed
Storage • Reliable
Data Processing via Pig • Pipelines • Itera=ve Processing • Research
Data Warehouse
HDFS
Data Warehouse via Hive • BI Tools • Analysis
Hive or Pig?
![Page 33: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/33.jpg)
CHAPTER 7 HIGHLIGHTS: ANALYZE & REPORT
![Page 34: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/34.jpg)
Analyze using Excel
![Page 35: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/35.jpg)
Analyze using Excel
![Page 36: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/36.jpg)
CHAPTER 8: PROJECT PLANNING & ARCHITECTURAL CONSIDERATIONS
![Page 37: HdInsight essentials Hadoop on Microsoft Platform](https://reader030.fdocuments.us/reader030/viewer/2022013111/54820d74b4af9f640d8b4699/html5/thumbnails/37.jpg)
Execu:ve & Stakeholder
Buy-‐in
Discovery & Analysis
Design
Implementa:on User Acceptance
Produc:on Opera:ons
Feedback, New Requirements