Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data...

27
Getting Started & Successful with Big Data © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 @Pentaho #BigDataWebSeries

Transcript of Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data...

Page 1: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Getting Started & Successful with Big Data

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

@Pentaho #BigDataWebSeries

Page 2: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Your Hosts Today

Paul Brook Cloud EMEA Program Manager

Dell

2 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Davy Nys VP EMEA & APAC

Pentaho

Chuck Yarbrough Technical Solutions Marketing Pentaho

Page 3: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Pentaho Webinar Series

3 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Sign-up at: pentaho.com

Page 4: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Goals for Today

4 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

To Understand:

•  How to get a Hadoop cluster up and running

•  Where Hadoop and other pieces fit into the architecture

•  How you can easily get data in & out Hadoop

•  How to leverage Hadoop with Pentaho

•  Initial Best Practices

Page 5: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Complete Analytics and Visual Data Management

Hadoop NoSQL Databases

Data Discovery &

Visualization

Enterprise &

Ad Hoc Reporting

Predictive Analytics &

Machine Learning

Data Ingestion, Manipulation &

Integration

Analytic Databases

© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 5

Page 6: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Data Warehouse Optimization

Data Sources Big Data Architecture

Data Warehouse (Master & Transactional Data)

ERP

CRM

CDR

Analytic Data Mart(s)

Analytic Data Mart(s)

Analytic Data Mart(s)

Logs Logs

Other Data

Raw Data

Parsed Data

Analytic Datasets

Master Data

Tape Archive

Page 7: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Steps To Start with Hadoop

7 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Hadoop Installation

•  Install locally – as Pseudo-Distributed mode

•  Leverage tools like Dell Crowbar •  Cloud sandbox

•  Easy download & installation •  Start on desktop

Pentaho Installation

1

2 •  Extract or access data from source

systems •  Load it (in its raw form) into Hadoop •  Tokenize & parse as required •  Transform & enrich •  Load into destination

Start Loading 3

Page 8: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Data Architecture and Integration Challenges

Data Sources Big Data Architecture

Data Warehouse (Master & Transactional Data)

ERP

CRM

CDR

Analytic Data Mart(s)

Analytic Data Mart(s)

Analytic Data Mart(s)

Logs Logs

Other Data

Raw Data

Parsed Data

Analytic Datasets

Master Data

NOSQL

Page 9: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Data Architecture and Integration Challenges

Data Sources Big Data Architecture

Data Warehouse (Master & Transactional Data)

ERP

CRM

CDR

Analytic Data Mart(s)

Analytic Data Mart(s)

Analytic Data Mart(s)

Logs Logs

Other Data

Raw Data

Parsed Data

Analytic Datasets

Master Data

NOSQL

Extract Transform

Load

Orchestration & Integration

MR

Page 10: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Data Architecture and Integration Challenges

Data Sources Big Data Architecture

Data Warehouse (Master & Transactional Data)

ERP

CRM

CDR

Analytic Data Mart(s)

Analytic Data Mart(s)

Analytic Data Mart(s)

Logs Logs

Other Data

Raw Data

Parsed Data

Analytic Datasets

Master Data

NOSQL

Extract Transform

Load

Orchestration & Integration

MR

Page 11: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Solution Architecture & Demo

11 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Fast and easy way to deploy Hadoop clusters with Dell

1

Page 12: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Global Marketing

Fast and easy way to deploy Hadoop clusters with Dell

Page 13: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Global Marketing

Well we are ready, but how will the Hardware Team know how to size and design the Hadoop cluster……..?

I don’t know….and it may take a long time to build the Hadoop cluster

Time is a critical factor, we need to get this project moving

Page 14: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Global Marketing

Reduce time to Cluster Sizing, Design & Deployment

Faster time to productive operations

Optimize and adapt for your needs

Deliver the best return on investment

Reduce risk & increase flexibility with Dell

Dell.com/Crowbar

Page 15: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Global Marketing

Dell | Hadoop Solution “Dell … was one of the first of the hardware vendors to grasp the fact that cloud is about provisioning services, not about the hardware.”

Maxwell Cooter, Cloud Pro

Excels at supporting complex big data analyses across large collections of structured and unstructured data

•  Hadoop handles a variety of workloads, including search, log processing, data warehousing, recommendation systems and video/image analysis

•  Work on the most modern scale-out architectures using a clean-sheet design data framework

•  Without vendor lock-in

Apache Hadoop software

Crowbar software framework with a Hadoop barclamp

PowerEdge C8000 Series, C6220, R720, R720XD

Force10 or PowerConnect switches

Reference Architecture

Deployment Guide

Joint Service and Support

Proven solutions

Proven components

Partner Ecosystem

Page 16: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Global Marketing

Crowbar •  Accelerates multi-

node deployments •  Simplifies

maintenance •  Streamlines

ongoing updates

Built with DevOps •  Provides an operational model for managing big data clusters

and cloud

Field-proven technologies •  Build on locally deployed Chef Server •  Raw servers to full cluster in <2 hours •  Hardened with more than a year of deployments Apache 2 open source •  Multi-apps (Hadoop & OpenStack) •  Multi-OS (Ubuntu, RHEL, CentOS, SUSE) NOT limited to Dell hardware

Crowbar Software Framework A modular, open source framework

Page 17: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Global Marketing

Deploy a Hadoop cluster in ~2 hours

Reduce software licensing fees

100%

1. Dell case study: 7-Eleven; 2, Salesforce.com

Use Crowbar to: •  Automate the deployment and

configuration of a Hadoop cluster •  Quickly provision bare-metal servers

from box to cluster with minimal intervention

•  Maintain, upgrade and evolve your Hadoop cluster over time

•  Leverage an open source framework backed by a growing global developer ecosystem

Reduce development time

4-6 mo.

Crowbar software

framework

Evolve to meet your needs over time with built in DevOps

Page 18: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Global Marketing

Crowbar dashboard provides visibility

Page 19: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Global Marketing

Leverage developer expertise worldwide

Download the open source software: https://github.com/dellcloudedge/crowbar

Participate in an active community http://lists.us.dell.com/mailman/listinfo/crowbar

Get resources on the Wiki: https://github.com/dellcloudedge/crowbar/wiki

Visit Dell.com/Crowbar, [email protected]

Page 20: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Global Marketing

Dell.com/Crowbar

Page 21: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

pentaho.com/download •  Install on a local desktop – no need for a

cluster •  “Managed Code” no additional installations •  Pentaho will write to the Hadoop Distributed

Cache for execution

Pentaho Installation

21 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Scheduling Integration Manipulation Orchestration

2

Page 22: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Start Loading

Loading into HDFS & HIVE –  Hadoop Copy Files –  Specify source files / destination

22 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

3

Loading into HBASE –  Zookeeper host & port –  Specify HBASE Mapping

Page 23: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Solution Architecture & Demo

23 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Demo

Page 24: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Maximize Performance

24 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

As much as 15x faster than hand-written code.

Parallel execution as MapReduce in the Hadoop cluster.

Page 25: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Additional Best Practices

25 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Leverage Hadoop

•  Don’t do database lookups inside a Mapper/Reducer – bring the data set into HDFS

•  Don’t transfer data between two clustering technologies – network overload

•  Start with a small data set and validate logic & performance outside the cluster

•  Gradually increase volumes and fine tune the application, cluster, data stores & network

Don’t Boil the Ocean

•  Leverage the various technologies available •  A combination of easy to use tools, powerful

scripting and custom coding provides the best mix

It’s AND…AND

Page 26: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

Solution Architecture & Demo

26 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Q & A

Page 27: Getting Started & Successful with Big Data - Pentahoevents.pentaho.com/rs/pentaho/images/Big Data Webinar 2... · 2020-02-25 · Global Marketing Dell | Hadoop Solution “Dell …

27 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Contact Us or Sign-up at: pentaho.com