The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern...
-
Upload
nguyenkhanh -
Category
Documents
-
view
216 -
download
0
Transcript of The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern...
![Page 1: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/1.jpg)
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director – BI & Big Data , Neudesic LLC. VTSP – Microsoft Corp.
[email protected] [email protected]
@OrionGM
![Page 2: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/2.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
The Inside Scoop on Hadoop
![Page 3: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/3.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
• Understanding Hadoop • Big Data Solution Deployment Models • Architecting the Modern Data Warehouse • Summary
Topics Covered
![Page 4: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/4.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Understanding Hadoop
![Page 5: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/5.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Big Data = Hadoop?
* A Modern Data Architecture with Apache Hadoop, Hortonworks Inc. 2014
![Page 6: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/6.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
The Fundamentals of Hadoop
Hadoop evolved directly from commodity scientific supercomputing clusters developed in the 1990s. Hadoop consists of a parallel execution framework called • Map/Reduce and • Hadoop Distributed File System (HDFS).
![Page 7: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/7.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Latest Developments
![Page 8: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/8.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
HDFS
• Very high fault tolerance • Can not be updated but corrections can be appended • File blocks are replicated multiple types
Three types nodes: Name Node (Directory) Backup Node ( checkpoint) Data Node-actual data
![Page 9: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/9.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
MapReduce
• A programing framework for library and runtime. just like .NET
• Map Function - Take a task and break it down into small tasks
• Reduce Function - Combine the partial answers and find the combined list
• Master (Job Tracker) • Is where you submit a query. Manages the Task Trackers which do the actual
Map or Reduce task. • Workers (Task Trackers)
• Do the work, just as each nodes in the cluster have a data node, they also have a task tracker
![Page 10: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/10.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Basics of MapReduce
400 bills 1 bill/ sec
= 400 Seconds
![Page 11: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/11.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Basics of MapReduce
1 bill/ sec
1 bill/ sec
= 200 Seconds
= 200 Seconds
200 Bills
200 Bills
Total =200 seconds
![Page 12: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/12.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Basics if MapReduce
= 100 Seconds 100 bills
100 bills
100 bills
100 bills = 100 Seconds
= 100 Seconds
= 100 Seconds
Total =100 seconds
![Page 13: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/13.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Basics of MapReduce
Query
Result
Name Node/Job Tracker Query
Data Nodes/Task Trackers
![Page 14: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/14.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
HDFS and MapReduce
The Main Node: runs the Job tracker and The name node controls the files. Each node runs two processes: Task Tracker and Data Node
![Page 15: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/15.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Hive and Pig
MapReduce • Java
• write many lines of code
Pig • Mostly used by yahoo • highly used for data
processing • Shares some constructs with
SQL e.g. filtering, selecting, grouping, and ordering. But syntax is very different from sql.
• Is more Verbose • Needs a lot of training for
users with limited procedural programming background.
• Gives you more control over the flow of data.
Hive • Mostly used by Facebook for
analytic purposes • Used for analytics • Relatively easier for
developers with SQL experience.
• Less control over optimization of data flows compared to Pig
• Not as efficient as MapReduce • Higher productivity for data scientists and developers
![Page 16: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/16.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
HDFS
* A Modern Data Architecture with Apache Hadoop, Hortonworks Inc. 2014
![Page 17: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/17.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Big Data Solution Deployment Models
![Page 18: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/18.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Major Players in Big Data
• Hortonworks • Cloudera • MapR • Pentaho • Amazon (AWS) • …
Apache Foundation
![Page 19: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/19.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Hortonworks
• June 2011 funded by $23 million from Yahoo! and Benchmark Capital as an independent company
• Horton the Elephant - Horton Hears a Who!
• Employs contributors to project Apache Hadoop
• October 2011 partnered with Microsoft : Azure and Windows Server .
• Cloudera founded in October 2008…started the effort to be Microsoft Azure Certified in October 2014.
![Page 20: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/20.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
HDP User Interface
![Page 21: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/21.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
![Page 22: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/22.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Deployment Models
• On Premise Deployment • Microsoft Analytics Platform System (APS) • Oracle Big Data Appliance • Hortonworks Data Platform (HDP) • Cloudera's CDH • Pivotal Data Computing Appliance (DCA)
• Big Data as a service • HDInsight • Cloudera on AWS • Amazon RedShift • Amazon Elastic MapReduce
![Page 23: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/23.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
HDInsight: Hadoop As A Cloud Service
![Page 24: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/24.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
HDFS
* A Modern Data Architecture with Apache Hadoop, Hortonworks Inc. 2014
![Page 25: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/25.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
HDInsight Versions
![Page 26: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/26.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Architecting the Modern Data Warehouse
![Page 27: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/27.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
The ETL Automation Model
* A Modern Data Architecture with Apache Hadoop, Hortonworks Inc. 2014
![Page 28: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/28.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
The ETL Automation Model
![Page 29: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/29.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
BI-Integration Model
![Page 30: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/30.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Data Sources
Reporting & Analytics
Staging
Mainframes and proprietary data sources
Flat Files sourcesSQL Server
HDFS
Hadoop
Data Integration tool
Enterprise Data Models
EnterpriseStaging Area
EDW
EnterpriseData Warehouse
Charts and DashboardsReports
Oracle, DB2, etc
Self-‐Service BI(Excel Pivot Reports, etc)
Hybrid BI-Integration Model
![Page 31: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/31.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Hybrid BI-Integration Model
EDW_neuBikes
Extract to BLOB
Upload
Integration(SSIS, Data Model)
www.neuBikes.comSource: weblogs
Power BI Data Source: EDW
Power BI Data Source: Azure Blob Store
On-‐Prem Data Warehouse(SQL, SSAS-‐Tabular, SSAS OLAP)
Container: ssugstaging
Compute: HDInsightCluster:ssughadoopContainer:ssughadoopNodes: 1
Databases: SQL Server, Oracle, etc.
Third Party applications
Flat Files
Container: ssugedw
![Page 32: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/32.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Hybrid BI-Integration Model
![Page 33: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/33.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
![Page 34: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/34.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Summary
![Page 35: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/35.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Summary
• Understand your data growth to determine when to “Scale-Out”.
• Determine the right tool for the workload you have.
• Choose the right deployment of Big Data Solutions
• Hybridize, do not start from scratch!
![Page 36: The Inside Scoop on Hadoop - Neudesic · PDF fileThe Inside Scoop on Hadoop ... * A Modern Data Architecture with Apache Hadoop, ... Hybrid BI-Integration Model](https://reader033.fdocuments.us/reader033/viewer/2022052916/5a78ac3a7f8b9ae91b8dc5c4/html5/thumbnails/36.jpg)
© Copyright 2015, Neudesic. All rights reserved. © Copyright 2015, Neudesic. All rights reserved.
Questions and Discussion
Questions?