2014 feb 24_big_datacongress_hadoopsession1_hadoop101
-
Upload
adam-muise -
Category
Technology
-
view
111 -
download
2
description
Transcript of 2014 feb 24_big_datacongress_hadoopsession1_hadoop101
HADOOP 101: AN INTRODUCTION TO HADOOP WITH THE HORTONWORKS SANDBOX
Adam Muise – Solu/on Architect, Hortonworks
Who are we?
Who is ?
We do Hadoop
The leaders of Hadoop’s development
Community driven, Enterprise Focused
Drive Innova/on in the plaForm – We lead the roadmap
100% Open Source – Democra/zed Access to Data
We do Hadoop successfully.
Support
Professional Services Training
Enter the Hadoop.
hOp://www.fabulouslybroke.com/2011/05/ninja-‐elephants-‐and-‐other-‐awesome-‐stories/
………
Hadoop was created because tradi/onal technologies never cut it
for the Internet proper/es like Google, Yahoo, Facebook, TwiOer,
and LinkedIn
Tradi/onal architecture didn’t scale enough…
DB DB DB
SAN
App App App App
DB DB DB
SAN
App App App App DB DB DB
SAN
App App App App
Databases can become bloated and useless
Tradi/onal architectures cost too much at that volume…
$/TB
$pecial Hardware
$upercompu/ng
So what is the answer?
If you could design a system that would handle this, what would it
look like?
It would probably need a highly resilient, self-‐healing, cost-‐efficient,
distributed file system…
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage
It would probably need a completely parallel processing framework that
took tasks to the data…
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage Processing Processing Processing
Processing Processing Processing
Processing Processing Processing
It would probably run on commodity hardware, virtualized machines, and
common OS plaForms
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage Processing Processing Processing
Processing Processing Processing
Processing Processing Processing
It would probably be open source so innova/on could happen as quickly
as possible
It would need a cri/cal mass of users
Apache Hadoop
Flume Ambari
HBase Falcon
MapReduce HDFS
Sqoop HCatalog
Pig
Hive
Storm YARN
Knox
Tez
Hortonworks Data PlaForm
Flume Ambari
HBase Falcon
MapReduce HDFS
Sqoop HCatalog
Pig
Hive
Storm YARN
Knox
Tez
We are going to learn how to work with Hadoop in less than an hour.
To do this, we need to install Hadoop right?
Nope.
Enter the
Sandbox.
The Sandbox is ‘Hadoop in a Can’. It contains one copy of each of the Master and Worker node processes used in a cluster, only in a single
virtual node.
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage Processing Processing Processing
Processing Processing Processing
Processing Processing Processing
Processing Storage
Linux VM
Gefng started with Sandbox VM: -‐ Pick your flavor of VM at…
hOp://www.hortonworks.com/sandbox -‐ Start the sandbox VM -‐ find the IP displayed -‐ go to…
hOp://172.16.130.131 -‐ Register -‐ Click on ‘Start Tutorials’ -‐ On the lek hand nav, click on ‘HCatalog, Basic Pig
& Hive Commands’
In this tutorial we will: -‐ Land files in HDFS -‐ Assign metadata with HCatalog -‐ Use SQL with Hive -‐ Learn to process data with Pig
Try the other tutorials.
Hadoop is the new Modern Data Architecture for the Enterprise
© Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page 29
There is NO second place
Hortonworks …the Bull Elephant of Hadoop InnovaGon