2014 feb 24_big_datacongress_hadoopsession1_hadoop101

HADOOP 101: AN INTRODUCTION TO HADOOP WITH THE HORTONWORKS SANDBOX

Adam Muise – Solu/on Architect, Hortonworks

Who are we?

Who is ?

We do Hadoop

The leaders of Hadoop’s development

Community driven, Enterprise Focused

Drive Innova/on in the plaForm – We lead the roadmap

100% Open Source – Democra/zed Access to Data

We do Hadoop successfully.

Support

Professional Services Training

Enter the Hadoop.

hOp://www.fabulouslybroke.com/2011/05/ninja-‐elephants-‐and-‐other-‐awesome-‐stories/

………

Hadoop was created because tradi/onal technologies never cut it

for the Internet proper/es like Google, Yahoo, Facebook, TwiOer,

and LinkedIn

Tradi/onal architecture didn’t scale enough…

DB DB DB

SAN

App App App App

DB DB DB

SAN

App App App App DB DB DB

SAN

App App App App

Databases can become bloated and useless

Tradi/onal architectures cost too much at that volume…

$/TB

$pecial Hardware

$upercompu/ng

So what is the answer?

If you could design a system that would handle this, what would it

look like?

It would probably need a highly resilient, self-‐healing, cost-‐efficient,

distributed file system…

Storage Storage Storage



It would probably need a completely parallel processing framework that

took tasks to the data…



Storage Storage Storage Processing Processing Processing

Processing Processing Processing


It would probably run on commodity hardware, virtualized machines, and

common OS plaForms






It would probably be open source so innova/on could happen as quickly

as possible

It would need a cri/cal mass of users

Apache Hadoop

Flume Ambari

HBase Falcon

MapReduce HDFS

Sqoop HCatalog

Pig

Hive

Storm YARN

Knox

Tez

Hortonworks Data PlaForm

Flume Ambari

HBase Falcon

MapReduce HDFS

Sqoop HCatalog

Pig

Hive

Storm YARN

Knox

Tez

We are going to learn how to work with Hadoop in less than an hour.

To do this, we need to install Hadoop right?

Enter the

Sandbox.

The Sandbox is ‘Hadoop in a Can’. It contains one copy of each of the Master and Worker node processes used in a cluster, only in a single

virtual node.






Processing Storage

Linux VM

Gefng started with Sandbox VM: -‐ Pick your flavor of VM at…

hOp://www.hortonworks.com/sandbox -‐ Start the sandbox VM -‐ find the IP displayed -‐ go to…

hOp://172.16.130.131 -‐ Register -‐ Click on ‘Start Tutorials’ -‐ On the lek hand nav, click on ‘HCatalog, Basic Pig

& Hive Commands’

In this tutorial we will: -‐ Land files in HDFS -‐ Assign metadata with HCatalog -‐ Use SQL with Hive -‐ Learn to process data with Pig

Try the other tutorials.

Hadoop is the new Modern Data Architecture for the Enterprise

© Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION

There is NO second place

Hortonworks …the Bull Elephant of Hadoop InnovaGon

2014 feb 24_big_datacongress_hadoopsession1_hadoop101

Technology

Transcript of 2014 feb 24_big_datacongress_hadoopsession1_hadoop101