Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop

26
Dell | Cloudera | Syncsort Data Warehouse Optimization – ETL Offload Reference Architecture Dell Cloudera Syncsort Intel

Transcript of Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop

Page 1: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Dell | Cloudera | Syncsort Data Warehouse Optimization – ETL Offload Reference Architecture

DellClouderaSyncsort

Intel

Page 2: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Panel moderator Armando Acosta, Dell

Armando Acosta • Subject Matter Expert for Dell Big Data Solutions • Product Manager for the Dell Hadoop Solutions • Works with customers to transform IT into better business outcomes   • Seventeen years in technology

Page 3: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Sean Anderson Cloudera

Brandon DraegerIntel

Mark MuncySyncsort

Panel introductions

Page 4: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Organizations actively using data grow 50% faster

50% 39

%42%(2014 ) (201 5)

The number of organizations who understand the benefits of big data grew slightly.

Page 5: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Older technology can’t keep upThe ability to scale to support all data and unpredictable workloads means effective data management and data integration are key priorities

Data silos hinder decision-makingNeed to analyze all data, regardless of type or where it resides – and apply to use cases

Determining the valueIT/business alignment on strategic business objectives and use cases is critical to achieving ROI from all data

There are challenges that must be addressed

Page 6: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Address data challenges holistically, yet modularly

Page 7: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

7

How data is moved and prepared for analysis

The basics of big data and analyticsWhere data is analyzed

• Databases• Social media• Sensor data

(IoT)• Devices• LOB

applications• Cloud• External

sources

Where data originates

• Analytical engine

• Business intelligence

• In-memory computing

• Enterprise data warehouse

Data integration, aggregation and transformation

Page 8: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Sean Anderson Sean Anderson, Cloudera Product Marketing - IT Solutions at Cloudera

Sean is a tenured infrastructure scaling and cloud strategy consultant with a strong focus on strategic partnerships and innovative hybrid technology. He has been a part of integral shifts in technology including the rise of cloud computing, open source standardization, and big data. Sean quickly became a go-to resource and speaker for data specific workloads focusing on technologies like Hadoop, MongoDB, Redis, ElasticSearch, SQL, and Data Warehousing. At Rackspace Hosting, Sean helped build and launch open-source cloud platforms around Hadoop, MongoDB, and Redis. Sean is currently marketing director for IT Solutions at Cloudera; the pioneers of Apache Hadoop.

Page 9: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Inefficient data workloads cost customers money

Frequent ETL breakdowns Long reporting wait times

Ad hoc access pressure on EDW Extreme query complexity

Page 10: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Cloudera EnterpriseMaking Hadoop Fast, Easy, and Secure

A new kind of data platform.• One place for unlimited

data• Unified data access

Cloudera makes it:• Fast for business• Easy to manage• Secure without

compromise

Page 11: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Cloudera Navigator OptimizerUnlock Your Best Hadoop Strategy, Instantly

Active Data Optimization for Hadoop to save you time and money

• Instant workload insights

• Intelligent optimization guidance

• Reduce Hadoop workload development effort

Page 12: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Intel

Brandon Draeger Director of Marketing and Business Development for Big Data Solutions

Brandon is a Director of Marketing and Business Development for Big Data Solutions at Intel and manages the GTM relationship for Intel and Cloudera and their shared partner ecosystem. Brandon has over 15 years of experience in a variety of enterprise technology disciplines and has held roles in engineering, product management, and strategy at Dell, Symantec, and Dorado Software.

Page 13: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Customers Are StrugglingTraditional Tools Aren’t Working

Data integration and transformation workloads consume as much as 80% of EDW capacity 80

%

Of all Data Warehouses are performance and capacity constrained – 70%#1 ChallengeOrganizations cite TCO as biggest obstacle to data integration tools

Gartner: “The State of Data Warehousing in 2014, June 19, 2014”

Gartner: “The State of Data Warehousing in 2014, June 19, 2014”

Gartner: “The State of Data Warehousing in 2014, June 19, 2014”

Page 14: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

#1 Use Case for HadoopData Warehouse Optimization - ETL Offload

Customer Challenge- Processing and storing ever-increasing data volumes with traditional enterprise data warehouses and related data integration technology, and their legacy pricing

models, is taxing stagnant IT budgets

Practitioners who have shifted one or more workloads from legacy data warehouses or

mainframes to HadoopThe most popular workloads being shifted are large-scale

data transformations

61%Customers have

implemented Hadoop

Syncsort Customer Survey 2014

Page 15: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

15

Operational efficiency

ConnectUnify all data from disparate tables/sources to reduce existing system load and data transformation costs

AnalyzeDeliver streamlined business reporting even with existing analytical tools

ActUtilize better, faster reporting for improved data-driven decision making

Key use cases

• Data warehouse acceleration

• Log aggregation

• Data pipeline modernization

Data challenges for operational efficiency

Page 16: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Syncsort

Mark Muncy Technical Product Marketing Manager – Big Data, Syncsort

Mark Muncy leads Technical Product Marketing for Syncsort’s Big Data portfolio, working with technical and client-facing teams to deliver high-value solutions to the most data intensive companies in the world. Mark brings to his current role over a decade of hands-on experience in data architecture and ETL development in the gaming, data services, & financial services industries.

Page 17: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Modern Data Pipeline

Traditional Data Pipeline

Too Many Workloads in the EDWModernize the Data Pipeline with Hadoop

Data Staging Tool

Extract & Load Data

Clean & Parse Data

Disparate Data

SourcesEnterprise data

warehouse + ETLData Transformation

JobsBusiness Reporting

Query

Perf

Capacity

The Results Longer data transformation job times

Not meeting SLAs for business reporting

Slow Ad Hoc Query

Too costly to scale

Disparate Data

SourcesEnterprise data

warehouseBusiness Reporting

Query

Perf

Capacity

The Results Reduced data transformation job times

Improved SLAs for business reporting

Fast Ad Hoc Query

Scales Economically

Hadoop + ETLData Transformation

Jobs Clean, Parse, Transform

Page 18: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Syncsort DMX-h: A Complete Solution for Hadoop

Connect Transform Optimize

• Smarter Architecture – Engine runs natively within MapReduce and Spark

• Smarter Connectivity – Connect streaming and batch data sources across the organization, including mainframe, NoSQL and everything in between.

• Smarter Development – GUI for developing & maintaining Hadoop data pipeline

• Smarter Productivity – Use-case Accelerators to fast-track development

• Enterprise Grade Solution – Integrated support for Cloudera Navigator, Sentry, Kerberos and LDAP

Design Once, Deploy Anywhere• Free users from underlying complexities of Hadoop• Intelligent Execution dynamically optimizes the job

for any platform on premise or in the cloud• Future-proof your applications!

Page 19: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

19

3. Act2. Analyze1. ConnectSource

Operational efficiency architecture

ManagementServices Security Dell Financial ServicesInfrastructure

Operational data sources

Enterprise data warehouse

Relational management

database

Data mart

Extract, translate,and load

Sort

Aggregate

Group

Parse

Clean

Translate

Enterprise data warehouse

Relationalmanagement

database

Data mart

Business reporting and query

Price optimization

Improved forecasting

Uptime optimization

Accelerated response

Faster

reporting

Improved service levels

Dell | Cloudera | Syncsort |Intel

Microsoft APS, SAP HANA

Page 20: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Redeploying talent / reducing staff costsEntry level employee using the Dell | Cloudera | Syncsort solution for Hadoop could save 76.3% over three years compared to a senior engineer using a DIY, open source approach.

Save time and cost on Hadoop ETL jobs.

Expert Cost (contractor)$559.298

Expert Cost (employee) $279,149

Beginner Cost$132,326

Total administrative costs over three years to design 4 ETL jobs per month.

Page 21: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Entry Level vs. Senior EngineerTime to complete ETL jobs comparing experience engineers (green) to new hires (blue)

Complete Hadoop jobs faster

30 min, 11 sec

36 min, 39 sec

4 min, 48 sec5 min, 51 sec

6 min, 15 sec

15 min, 45 sec

Data validation and pre-processing

Fact dimension load with type 2 SCD

Vendor mainframe file integration

60.3%less time

17.6%less time

17.9%less time

Page 22: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Save 53.7% in timeUsing the Dell | Cloudera | Syncsort solution for Hadoop, the entry-level technician developed and deployed Hadoop ETL jobs in 53.7% less time

Reclaim days of valuable time

Fact dimension load with type 2 SCD

Data validation and pre-processing

Vendor mainframe file integration

Load Validate

Int.

8.3 Days

3.8 Days

Page 23: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Panel Q&A

Page 24: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Listen to this Webcast On-DemandIncluding Panel & Participant Q&A

http://bit.ly/1Rtk2OE

Page 25: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

For additional information:Dell.com/Hadoop [email protected]

Page 26: Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop

Thank you.