Hd insight essentials quick view
-
Upload
rajesh-nadipalli -
Category
Documents
-
view
101 -
download
2
Transcript of Hd insight essentials quick view
Goals of this Book • Focus on Microso'’s new Hadoop distribu=on • Serve as Quick Reference • Provide an Overview of Hadoop • Address both cloud and on-‐premise setup for HDInsight • Highlight HDInsight differen:ator • Provide Prac=cal & Real world examples
Book Table of Contents • Chapter 1: HDInsight in a Heartbeat • Chapter 2: Deployment HDInsight on premise • Chapter 3: HDInsight Azure cloud service • Chapter 4: Administer your cluster • Chapter 5: Ingest data to your cluster • Chapter 6: Transform data in your cluster • Chapter 7: Analyze & Report data from cluster • Chapter 8: Project Planning & Architectural Considera=ons
Hadoop Overview
Self Healing Distributed Storage
Fault Tolerant Distributed Computing
+ Abstraction for
Parallel Processing
CORE HADOOP COMPONENTS • HDFS: Distributed Storage – replicated, self-‐healing and scalable
• MapReduce: Parallel Processing, process local data for efficiency
NameNode
JobTracker TaskTracker
TaskTracker
TaskTracker
MapReduce Layer
Distributed File System
Layer Secondary NameNode
Master Node Slaves Nodes
DataNode
DataNode
DataNode
Hadoop Nodes Layout
Data Sources
RDBMS Databases
Audio, Images Log Files Sensors,
RFID Social
Media, Feeds
Hadoop Data Store
HDFS
Hbase (NOSQL DB)
Data Processing
Mapreduce
Data Access
Hive Pig Mahout Machine Learning
Flume, Sqoop
Excel
Business Data Feeds
Zook
eepe
r (Distrib
uted Process M
anag
ement)
Hcatalog (M
etad
ata on
Pig, H
ive, M
apRe
duce )
Oozie Workflow, Scheduler
Infrastructure , Ope
ra:o
ns
(Mon
itorin
g, Con
figura<
on)
Hadoop Eco System
Collect & Import to HDFS
Process (MapReduce)
Analyze (BI Tools) Report & Publish
End to End Solution on Hadoop
Popular Hadoop Distributions • Amazon Elas=c MapReduce (cloud, hbp://aws.amazon.com/elas=cmapreduce/)
• Cloudera (hbp://www.cloudera.com/content/cloudera/en/home.html)
• EMC PivitolHD (hbp://gopivotal.com/)
• Hortonworks HDP (hbp://hortonworks.com/)
• MapR (hbp://mapr.com/)
• Microsod HDInsight (cloud, hbp://www.windowsazure.com/)
HDInsight Differenciator • Enterprise-‐ready Hadoop backed by Microsod
• Analy:cs using Excel
• Integra=on with Ac=ve Directory.
• Integra=on with .NET and Javascript
• Connectors to RDBMS
• Scale using cloud offering: Azure HDInsight service enables customers to scale quickly and has seamless interface between HDFS and Azure Storage Vault
• JavaScript Console
Apache Hadoop
• Open Source Sodware • Community Development
Hortonworks Data PlaSorm
• Enterprise Hadoop Plagorm (HDP) • Leaders in Hadoop • Code commibers to Hadoop
Microso' HDInsight
• Built on top of HDP • Integra=on with ASV, Excel, Powerview,
SQLServer, Ac=ve Directory
HDInsight Distribution
Multi Node Install Steps • Pre-‐requisites • Networking Setup • Remote Scrip=ng • Firewall Setup • Sodware Install (each node) • Hadoop Configura=on • Verifica=on
Loading Data into your Cluster You have following op=ons… • Loading data using Hadoop commands • Loading data using Azure Storage Vault • Loading data using Interac:ve JavaScript • Shipping data to your Cluster • Loading data from RDBMS via Sqoop
HDFS
Hive JDBC/OBDC
Metastore
Thrift Server
Command Line Web GUI
Driver (Parser, Planner, Executor)
MapReduce
Hive
Raw Data in HDFS • Distributed
Storage • Reliable
Data Processing via Pig • Pipelines • Itera=ve Processing • Research
Data Warehouse
HDFS
Data Warehouse via Hive • BI Tools • Analysis
Hive or Pig?