Iasi code camp 20 april 2013 mihai nadas hadoop azure
-
Upload
codecampiasi -
Category
Documents
-
view
115 -
download
0
Transcript of Iasi code camp 20 april 2013 mihai nadas hadoop azure
![Page 1: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/1.jpg)
Big Data on Azure
What do I need to know as a developer to make it worthwhile?
Mihai NadășChief Technology Officer, YonderMost Valuable Professional, Microsoft
![Page 2: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/2.jpg)
About myself
@mihainadasblog.mihainadas.com
![Page 3: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/3.jpg)
Agenda
Why Big Data?
Understanding the Basics
Microsoft and Hadoop
![Page 4: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/4.jpg)
Two Big Data examples
1. Google Flu Trends
2. Farecast
![Page 5: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/5.jpg)
Why Big Data?
![Page 6: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/6.jpg)
![Page 7: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/7.jpg)
![Page 8: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/8.jpg)
![Page 9: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/9.jpg)
![Page 10: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/10.jpg)
Gartner’s Hype Cycle on Big Data
![Page 11: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/11.jpg)
Key Technologies• Accessible storage (non-relational) in cloud: Amazon S3, Azure Blob &
Table storage, Google Cloud Storage
• In memory databases & grids: MemSQL, XAP (Gigaspaces ), SAP Hana
• Parallel processing frameworks: Hadoop
• Online analytics frameworks: Google BigQuery, Hive
• Data stream processing: Twitter Storm
• Complex event processing: Oracle CEP Server, Microsoft StreamInsight
• Sentiment analysis – Radian6
![Page 12: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/12.jpg)
It’s BIG
![Page 13: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/13.jpg)
Example Scenario
![Page 14: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/14.jpg)
OPERATIONAL DATA
Traditional E-Commerce Data Flow
NEW USER REGISTRY
NEW PURCHASE
NEW PRODUCT
Excess Data
Logs
ETL Some Data
Data Warehouse
![Page 15: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/15.jpg)
OPERATIONAL DATA
New E-Commerce Big Data Flow
Raw Data“Store it All” Cluster
Raw Data“Store it All” Cluster
NEW USER REGISTRY
NEW PURCHASE
NEW PRODUCT
Data Warehouse
Logs
Logs
How much do views for certain products increase when our TV ads run?
![Page 16: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/16.jpg)
Viktor Mayer-SchonbergerProfessor at Oxford
Kenneth CukierEditor, The Economist
![Page 17: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/17.jpg)
Big Data Principles1. More: store over trash
2. Messy: quantity over quality
3. Correlation: what over why
![Page 18: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/18.jpg)
Understanding the Basics Move the Compute to the Data
![Page 19: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/19.jpg)
Characteristics of Big Data
![Page 20: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/20.jpg)
MapReduce
![Page 21: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/21.jpg)
Think of the following problem...
![Page 22: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/22.jpg)
What if we parallelize?
![Page 23: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/23.jpg)
What if we parallelize?
![Page 24: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/24.jpg)
Welcome, MapReduce
Map Reduce
![Page 25: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/25.jpg)
So How Does It Work?
![Page 26: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/26.jpg)
MapReduce – Workflow
![Page 27: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/27.jpg)
Hadoop
![Page 28: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/28.jpg)
The Hadoop EcosystemETL Tools BI Reporting RDBMS
Reference: Tom White’s Hadoop: The Definitive Guide
![Page 29: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/29.jpg)
Traditional RDBMS vs. MapReduce
TRADITIONAL RDBMS MAPREDUCE
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
DBA Ratio 1:40 1:3000
Reference: Tom White’s Hadoop: The Definitive Guide
![Page 30: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/30.jpg)
Microsoft and Hadoop
![Page 31: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/31.jpg)
Microsoft Big Data Solution
Power View Excel with PowerPivot Embedded BIPredictive Analytics
APPsLOBCRMERP
Microsoft EDW
SSAS SSRS
Devices CrawlersSensors Bots
Hadoop On Windows Server
Hadoop On Windows Azure
![Page 32: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/32.jpg)
Deploying and Interacting With a Hadoop Cluster on Azure
step-by-step walktrough
![Page 33: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/33.jpg)
Objectives1. Run a basic Java MapReduce program using a Hadoop jar file
2. Import data from the Windows Azure Marketplace into a Hadoop on Azure cluster using the Interactive Hive Console
![Page 34: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/34.jpg)
Prerequisites1. Access to a Hadoop on Azure account
2. Request an invitation to the Preview Feature
![Page 35: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/35.jpg)
Creating a new HDInsight Cluster (I)
![Page 36: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/36.jpg)
Creating a new HDInsight Cluster (II)
![Page 37: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/37.jpg)
Cluster Management Interface
![Page 38: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/38.jpg)
Hadoop Sample Gallery
![Page 39: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/39.jpg)
Objective #1: Basic MapReduce Task• We will use the Pi Estimator sample job
• Distributed Pi Estimator with 16 maps, each will compute 10 million samples
![Page 40: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/40.jpg)
Pi Estimator
2r
r=1
• Uses the Monte Carlo Simulation method to compute π
![Page 41: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/41.jpg)
Pi Estimator• Uses the Monte Carlo Simulation method to
compute π
![Page 42: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/42.jpg)
Pi Estimator: Running the Job
![Page 43: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/43.jpg)
Pi Estimator: And the result is.... • 160.000.000 random
points• 16 mappers• 10.000.000 samples /
map
• Computed in 65.108 seconds
![Page 44: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/44.jpg)
Objective #2: Import data from the Windows Azure Marketplace into a Hadoop
• Windows Azure Marketplace is a cloud one-stop-shop for premium data and applications
• We will see how we can use the „2006 – 2008 Crime in the US” dataset to play with on Hadoop using Hive
![Page 45: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/45.jpg)
Windows Azure Marketplace
![Page 46: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/46.jpg)
Apache Hive
• Data Warehouse infrastructure built on top of Hadoop
• Provides data summarization, query and analysis
• Initially developed by Facebook, now an Apache project
![Page 47: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/47.jpg)
Apache Hive: Features
• Analysis of large datasets stored in Hadoop-compatible file-systems
• Provides a SQL-like language called HiveQL while maintaining full support for map/reduce
• By default, stores data in Apache Derby database
![Page 48: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/48.jpg)
Importing data to Hadoop on Azure
![Page 49: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/49.jpg)
Importing data to Hadoop on Azure
![Page 50: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/50.jpg)
Querying huge datasets using Hive
![Page 51: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/51.jpg)
Querying huge datasets using Hive
![Page 52: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/52.jpg)
Querying huge datasets using Hive
![Page 53: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/53.jpg)
Hadoop on WindowsInsights to all users by activating new types of data
Integrate with Microsoft Business Intelligence
Choice of deployment on Windows Server + Windows Azure
Integrate with Windows Components (AD, Systems Center)Easy installation and configuration of Hadoop on Windows
Simplified programming with . Net & Javascript integration
Integrate with SQL Server Data Warehousing
Diff
ere
nti
ati
on
![Page 54: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/54.jpg)
Microsoft Big Data RoadmapTo accelerate the delivery of Microsoft’s Hadoop based solution for Windows Server and service for Windows Azure, Microsoft is announcing a partnership with HortonworksMicrosoft is committed to broadening accessibility and usage of Hadoop to end users, developers and IT professionals in organizations of all sizes
Microsoft is announcing an end-to-end roadmap for Big Data that embraces Apache HadoopTM by distributing enterprise class Hadoop based solutions on both Windows Server and Windows Azure
Microsoft is extending its leadership in business intelligence and data warehousing to provide insights to all users by activating new types of data of any size
![Page 55: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/55.jpg)
Things to do1. Get a trial of Windows Azure2. Subscribe to the Preview Program of Hadoop on
Azure3. Write your first Map/Reduce job
4. Have a talk in autumn at CodeCamp on your experience with Big Data
![Page 56: Iasi code camp 20 april 2013 mihai nadas hadoop azure](https://reader036.fdocuments.us/reader036/viewer/2022062707/5585a773d8b42a6c1a8b4bdc/html5/thumbnails/56.jpg)
Thank you