Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop

Dell | Cloudera | Syncsort Data Warehouse Optimization – ETL Offload Reference Architecture

DellClouderaSyncsort

Intel

Panel moderator Armando Acosta, Dell

Armando Acosta • Subject Matter Expert for Dell Big Data Solutions • Product Manager for the Dell Hadoop Solutions • Works with customers to transform IT into better business outcomes • Seventeen years in technology

Sean Anderson Cloudera

Brandon DraegerIntel

Mark MuncySyncsort

Panel introductions

Organizations actively using data grow 50% faster

50% 39

%42%(2014 ) (201 5)

The number of organizations who understand the benefits of big data grew slightly.

Older technology can’t keep upThe ability to scale to support all data and unpredictable workloads means effective data management and data integration are key priorities

Data silos hinder decision-makingNeed to analyze all data, regardless of type or where it resides – and apply to use cases

Determining the valueIT/business alignment on strategic business objectives and use cases is critical to achieving ROI from all data

There are challenges that must be addressed

Address data challenges holistically, yet modularly

7

How data is moved and prepared for analysis

The basics of big data and analyticsWhere data is analyzed

• Databases• Social media• Sensor data

(IoT)• Devices• LOB

applications• Cloud• External

sources

Where data originates

• Analytical engine

• Business intelligence

• In-memory computing

• Enterprise data warehouse

Data integration, aggregation and transformation

Sean Anderson Sean Anderson, Cloudera Product Marketing - IT Solutions at Cloudera

Sean is a tenured infrastructure scaling and cloud strategy consultant with a strong focus on strategic partnerships and innovative hybrid technology. He has been a part of integral shifts in technology including the rise of cloud computing, open source standardization, and big data. Sean quickly became a go-to resource and speaker for data specific workloads focusing on technologies like Hadoop, MongoDB, Redis, ElasticSearch, SQL, and Data Warehousing. At Rackspace Hosting, Sean helped build and launch open-source cloud platforms around Hadoop, MongoDB, and Redis. Sean is currently marketing director for IT Solutions at Cloudera; the pioneers of Apache Hadoop.

Inefficient data workloads cost customers money

Frequent ETL breakdowns Long reporting wait times

Ad hoc access pressure on EDW Extreme query complexity

Cloudera EnterpriseMaking Hadoop Fast, Easy, and Secure

A new kind of data platform.• One place for unlimited

data• Unified data access

Cloudera makes it:• Fast for business• Easy to manage• Secure without

compromise

Cloudera Navigator OptimizerUnlock Your Best Hadoop Strategy, Instantly

Active Data Optimization for Hadoop to save you time and money

• Instant workload insights

• Intelligent optimization guidance

• Reduce Hadoop workload development effort

Intel

Brandon Draeger Director of Marketing and Business Development for Big Data Solutions

Brandon is a Director of Marketing and Business Development for Big Data Solutions at Intel and manages the GTM relationship for Intel and Cloudera and their shared partner ecosystem. Brandon has over 15 years of experience in a variety of enterprise technology disciplines and has held roles in engineering, product management, and strategy at Dell, Symantec, and Dorado Software.

Customers Are StrugglingTraditional Tools Aren’t Working

Data integration and transformation workloads consume as much as 80% of EDW capacity 80

%

Of all Data Warehouses are performance and capacity constrained – 70%#1 ChallengeOrganizations cite TCO as biggest obstacle to data integration tools

Gartner: “The State of Data Warehousing in 2014, June 19, 2014”



#1 Use Case for HadoopData Warehouse Optimization - ETL Offload

Customer Challenge- Processing and storing ever-increasing data volumes with traditional enterprise data warehouses and related data integration technology, and their legacy pricing

models, is taxing stagnant IT budgets

Practitioners who have shifted one or more workloads from legacy data warehouses or

mainframes to HadoopThe most popular workloads being shifted are large-scale

data transformations

61%Customers have

implemented Hadoop

Syncsort Customer Survey 2014

15

Operational efficiency

ConnectUnify all data from disparate tables/sources to reduce existing system load and data transformation costs

AnalyzeDeliver streamlined business reporting even with existing analytical tools

ActUtilize better, faster reporting for improved data-driven decision making

Key use cases

• Data warehouse acceleration

• Log aggregation

• Data pipeline modernization

Data challenges for operational efficiency

Syncsort

Mark Muncy Technical Product Marketing Manager – Big Data, Syncsort

Mark Muncy leads Technical Product Marketing for Syncsort’s Big Data portfolio, working with technical and client-facing teams to deliver high-value solutions to the most data intensive companies in the world. Mark brings to his current role over a decade of hands-on experience in data architecture and ETL development in the gaming, data services, & financial services industries.

Modern Data Pipeline

Traditional Data Pipeline

Too Many Workloads in the EDWModernize the Data Pipeline with Hadoop

Data Staging Tool

Extract & Load Data

Clean & Parse Data

Disparate Data

SourcesEnterprise data

warehouse + ETLData Transformation

JobsBusiness Reporting

Query

Perf

Capacity

The Results Longer data transformation job times

Not meeting SLAs for business reporting

Slow Ad Hoc Query

Too costly to scale

Disparate Data

SourcesEnterprise data

warehouseBusiness Reporting

Query

Perf

Capacity

The Results Reduced data transformation job times

Improved SLAs for business reporting

Fast Ad Hoc Query

Scales Economically

Hadoop + ETLData Transformation

Jobs Clean, Parse, Transform

Syncsort DMX-h: A Complete Solution for Hadoop

Connect Transform Optimize

• Smarter Architecture – Engine runs natively within MapReduce and Spark

• Smarter Connectivity – Connect streaming and batch data sources across the organization, including mainframe, NoSQL and everything in between.

• Smarter Development – GUI for developing & maintaining Hadoop data pipeline

• Smarter Productivity – Use-case Accelerators to fast-track development

• Enterprise Grade Solution – Integrated support for Cloudera Navigator, Sentry, Kerberos and LDAP

Design Once, Deploy Anywhere• Free users from underlying complexities of Hadoop• Intelligent Execution dynamically optimizes the job

for any platform on premise or in the cloud• Future-proof your applications!

19

3. Act2. Analyze1. ConnectSource

Operational efficiency architecture

ManagementServices Security Dell Financial ServicesInfrastructure

Operational data sources

Enterprise data warehouse

Relational management

database

Data mart

Extract, translate,and load

Sort

Aggregate

Group

Parse

Clean

Translate

Enterprise data warehouse

Relationalmanagement

database

Data mart

Business reporting and query

Price optimization

Improved forecasting

Uptime optimization

Accelerated response

Faster

reporting

Improved service levels

Dell | Cloudera | Syncsort |Intel

Microsoft APS, SAP HANA

Redeploying talent / reducing staff costsEntry level employee using the Dell | Cloudera | Syncsort solution for Hadoop could save 76.3% over three years compared to a senior engineer using a DIY, open source approach.

Save time and cost on Hadoop ETL jobs.

Expert Cost (contractor)$559.298

Expert Cost (employee) $279,149

Beginner Cost$132,326

Total administrative costs over three years to design 4 ETL jobs per month.

Entry Level vs. Senior EngineerTime to complete ETL jobs comparing experience engineers (green) to new hires (blue)

Complete Hadoop jobs faster

30 min, 11 sec

36 min, 39 sec

4 min, 48 sec5 min, 51 sec

6 min, 15 sec

15 min, 45 sec

Data validation and pre-processing

Fact dimension load with type 2 SCD

Vendor mainframe file integration

60.3%less time

17.6%less time

17.9%less time

Save 53.7% in timeUsing the Dell | Cloudera | Syncsort solution for Hadoop, the entry-level technician developed and deployed Hadoop ETL jobs in 53.7% less time

Reclaim days of valuable time

Fact dimension load with type 2 SCD

Data validation and pre-processing

Vendor mainframe file integration

Load Validate

Int.

8.3 Days

3.8 Days

Panel Q&A

Listen to this Webcast On-DemandIncluding Panel & Participant Q&A

http://bit.ly/1Rtk2OE



For additional information:Dell.com/Hadoop [email protected]

Thank you.

Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop

Software

Transcript of Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop