Hadoop on azure_july_2012

40
Hadoop on Azure BigData on the Azure platform @LynnLangit

description

60 minute webcast for DevelopMentor - Hadoop on Azure

Transcript of Hadoop on azure_july_2012

Page 1: Hadoop on azure_july_2012

Hadoop on AzureBigData on the Azure platform@LynnLangit

Page 2: Hadoop on azure_july_2012
Page 3: Hadoop on azure_july_2012

Hadoop = BigData?

• HUGE Hype factor in 2011 / 2012

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• enables applications to work with thousands of nodes and petabytes of data• was inspired by Google's MapReduce and Google File System (GFS) papers

Page 4: Hadoop on azure_july_2012

Oracle Loader for Hadoop

SQL Server Connector for Hadoop

Page 5: Hadoop on azure_july_2012

Flavors of NoSQL

Page 6: Hadoop on azure_july_2012

Column Database

Wide, sparse column sets

Page 7: Hadoop on azure_july_2012

RDBMS vs. HadoopTraditional RDBMS Hadoop

Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)

Access Interactive and Batch Batch – NOT Interactive

Updates Read / Write many times Write once, Read many times

Structure Static Schema Dynamic Schema

Integrity High (ACID) Low

Scaling Nonlinear Linear

Query Response Time

Can be near immediate Has latency (due to batch processing)

Page 8: Hadoop on azure_july_2012

What about the cloud?

Page 9: Hadoop on azure_july_2012

The reality…two pivots

Storage Methods• SQL (RDBMS) • Hadoop

Storage Locations• On premises • Cloud-hosted

Page 10: Hadoop on azure_july_2012

Demo - Setting up Your Cluster

Page 11: Hadoop on azure_july_2012

Cluster Allocation Process

Page 12: Hadoop on azure_july_2012

Working with Hadoop on AzureTools / Languages• MapReduce

• Map (query/format)• Reduce (aggregate)• plug-in for Eclipse (Java)• JavaScript• C# Streaming

• Pig (ETL -- Java)• Hive (HQL Query)

• HBase tables• Others

• Mahout (analyze)• R (analyze)

Page 13: Hadoop on azure_july_2012

Tasks – DBA vs. Hadoop on AzureRDBMS Hadoop on AzureImport Data Upload Data using FTP or import via SqoopSetup Security Setup SecurityScale Compute (up or out) Add child nodes to the clusterPerform a Backup Monitor and replace failed nodesRestore a Database n/aClean up data via ETL Execute a PIG jobCreate an Index – query tune Write a HIVE query (HQL)Join Tables Together Run MapReducen/a Monitor and manage running MapReduce jobsSchedule a Job Schedule a (Cron) JobRun Database Maintenance Monitor space and resources used

Send an Email from SQL Server Set up resource threshold alerts

Manage License costs Manage usage time charges

Page 14: Hadoop on azure_july_2012

Demo - Basic Administration

Open Ports

Page 15: Hadoop on azure_july_2012

Demo - Basic Administration

Connect via RDP

Page 16: Hadoop on azure_july_2012

NameNode Utility – Top Level

Page 17: Hadoop on azure_july_2012

NameNode Utility – Drill Down

Page 18: Hadoop on azure_july_2012

Demo - Basic Administration

Configure connections to remote storage

Page 19: Hadoop on azure_july_2012

Configuring Upload from AWS S3

Page 20: Hadoop on azure_july_2012

Configuring Upload from Azure

Page 21: Hadoop on azure_july_2012

Using the Azure Storage Viewer

Page 22: Hadoop on azure_july_2012

Configuring Upload from DataMarket

Page 23: Hadoop on azure_july_2012

Asking Questions = MapReduce

Page 24: Hadoop on azure_july_2012

Samples

Page 25: Hadoop on azure_july_2012

Demo - MapReduce using Java

• WordCount example using AWS S3 data

Page 26: Hadoop on azure_july_2012

Demo - MapReduce using C# Streaming

• WordCount example

Page 27: Hadoop on azure_july_2012

Demo - MapReduce using JavaScript

• WordCount example

Page 28: Hadoop on azure_july_2012

Demo - Using HIVE

• WordCount example

Page 29: Hadoop on azure_july_2012

Demo - Using HIVE

Page 30: Hadoop on azure_july_2012

Monitoring Job Results• In the portal

– Main Console• Job icon (button) status summary• Job History

– Interactive Console• JS quick feedback• JS detailed feedback (log)

• Using RDP– Map/Reduce tool

Page 31: Hadoop on azure_july_2012

Demo – Monitoring Job Status

Page 32: Hadoop on azure_july_2012

Download – ODBC for HIVE

• Includes add-in for Excel

Page 33: Hadoop on azure_july_2012

Demo - Hadoop Connector to Excel

Page 34: Hadoop on azure_july_2012

Connecting to PowerPivot

• Create an ODBC connection to HIVE• Connect to ‘other data source’ in PowerPivot

Page 35: Hadoop on azure_july_2012

Real-World – Hadoop and…

Facebook runs on Hadoop & MySQL

Twitter runs on Hadoop (ran on FlockDb/graph)

Yahoo runs on Hadoop

LinkedIn runs on Hadoop & Voldemort

Klout runs Hadoop (on Azure) &HBase (Hive) & SQL Server SSAS BISM cubes

Page 36: Hadoop on azure_july_2012

Hadoop To-Do ListBigData = Hadoop• Use Hadoop when business

needs designate

Hadoop on the cloud• Quick and cheap• Specialized use cases• Behavioral data• dev, test , training environments

Hadoop access technologies• Learn Map/Reduce• Use HIVE via Excel

Page 37: Hadoop on azure_july_2012
Page 38: Hadoop on azure_july_2012

The Changing Data Landscape

HadoopRDBMS

OtherServices

Page 39: Hadoop on azure_july_2012

TeachingKidsProgramming.org

Do a Recipe Teach a Kid (Ages 10 ++)SmallBasic or Java Free Courseware (recipes)

Page 40: Hadoop on azure_july_2012

Toward Data Craftsmanship…

Follow me @LynnLangit

RSS my blog www.LynnLangit.com

Hire me• To help build your BI/Big Data solution• To teach your team next gen BI• To learn more about using NoSQL solutions