The Value of the Modern Data Architecture with Apache Hadoop and Teradata

29
© Hortonworks Inc. 2013 The Value of a Modern Data Architecture with Apache Hadoop and Teradata Page 1

description

This webinar discusses why Apache Hadoop most typically the technology underpinning "Big Data". How it fits in a modern data architecture and the current landscape of databases and data warehouses that are already in use.

Transcript of The Value of the Modern Data Architecture with Apache Hadoop and Teradata

Page 1: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

The Value of a Modern Data Architecture with Apache Hadoop and Teradata

Page 1

Page 2: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

Today’s Topics

• Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • EDW’s role in the MDA • Q&A

Page 2

Page 3: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

Data  Systems  

Applica/

ons  

Sources  

Existing Data Architecture

Page 3

Custom    Analy/c  App  

Packaged    Analy/c  App  

Tradi/onal  Sources    (RDBMS,  OLTP,  OLAP)  

RDBMS   EDW   Discovery  PlaEorm  

APPLICAT

IONS  

DATA

 SYSTEMS  

DATA

 SOURC

ES  

Page 4: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

Big Data Explosion

Big Data Market Trends & Projections

Page 4

20% % by which org’s

leveraging modern info management

systems outperform peers by 2015

ñ

1 Zettabyte (ZB) =

1 Billion TBs

15x

growth rate of machine

generated data by 2020

The US has 1/3 of the world’s data

Big Data is 1 of 5 US GDP Game Changers $325 billion incremental annual GDP from big data

analytics in retail and manufacturing by 2020

Page 5: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

Traditional Data Architecture AP

PLICAT

IONS  

DATA

 SYSTEMS  

DATA

 SOURC

ES  

OLTP,  POS  SYSTEMS  

Business  Analy/cs  

Custom  Applica/ons  

Packaged  Applica/ons  

Pressured

RDBMS   EDW   Discovery  PlaEorm  

 Tradi/onal                                          New  Sources    

               (RDBMS,  OLTP,  OLAP)                              (sen/ment,  click,  geo,  sensor,  …)  

 

Source: IDC

2.8  ZB  in  2012  

85%  from  New  Data  Types  

15x  Machine  Data  by  2020  

40  ZB  by  2020  

Page 5

Page 6: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

Modern Data Architecture Enabled

Page 6

APPLICAT

IONS  

DATA

 SYSTEMS  

DATA

 SOURC

ES  

OLTP,  POS  SYSTEMS  

OPERATIONAL  TOOLS  

MANAGE  &  MONITOR  

DEV  &  DATA  TOOLS  

BUILD  &  TEST  

Business  Analy/cs  

Custom  Applica/ons  

Packaged  Applica/ons  

RDBMS   EDW   Discovery  PlaEorm  

 Tradi/onal                                          New  Sources    

               (RDBMS,  OLTP,  OLAP)                              (sen/ment,  click,  geo,  sensor,  …)  

 

Page 7: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

Today’s Topics

• Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • EDW’s role in the MDA • Q&A

Page 7

Page 8: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

What Data is Being Stored in Hadoop?

1.  Social Understand how your customers feel about your brand and products – right now

2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website

3.  Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines

4.  Geolocation Analyze location-based data to manage operations where they occur

5.  Server Logs Research logs to diagnose process failures and prevent security breaches

6.  Unstructured (text, video, pictures, etc..) Understand patterns in text across millions of unstructured work products: web pages, emails, video, pictures and documents

Value

Page 8

Page 9: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

Modern Data Architecture Applied Da

ta  Systems  

Applica/

ons  

Sources  

Infrastructure  -­‐  Data  Lake  Modern  Data  Architecture  RDBMS   EDW   Discovery  

PlaEorm  

Custom    Analy/c  App  

Packaged    Analy/c  App  

•  Store all data and build/enable applications on shared “data lake”

•  As orgs mature they move to this as a goal for Hadoop

•  Delivers broad value across the enterprise Tradi/onal                                        New  Sources    

               (RDBMS,  OLTP,  OLAP)                    (sen/ment,  click,  geo,  sensor,  …)  

Shared Data Lake

APPLICAT

IONS  

DATA

 SYSTEMS  

DATA

 SOURC

ES  

Page 9

Page 10: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

Driving Efficiency Driving Opportunity

Drivers for Hadoop Adoption

Modern Data Architecture Hadoop has a central role in next

generation data architectures while integrating with existing data systems

Business Applications Use Hadoop to extract insights that enable new customer value and competitive edge

Existing Traditional Server log

Clickstream

Big Data Sets Emerging

Sentiment/Social Machine/Sensor Geo-locations

Page 10

Page 11: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

Integrated Interoperable with existing data center investments Skills

Leverage your existing skills: development, operations, analytics

Requirements for Hadoop Adoption

Page 11

Key Services Platform, operational and data services essential for the enterprise

3 Requirements for Hadoop’s Role in the Modern Data Architecture

Page 12: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

Interoperating With Your Tools AP

PLICAT

IONS  

DATA

 SYSTEMS  

DEV  &  DATA  TOOLS  

OPERATIONAL  TOOLS  

Viewpoint

Microsoft Applications

DATA

 SOURC

ES  

 Tradi/onal                                          New  Sources    

               (RDBMS,  OLTP,  OLAP)                              (sen/ment,  click,  geo,  sensor,  …)  

 

Page 12

Page 13: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

Today’s Topics

• Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • EDW’s role in the MDA • Q&A

Page 13

Page 14: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

14 2/28/14 Teradata Confidential

Shift from a Single Platform to an Ecosystem

“Big Data requirements are solved by a range of platforms including analytical databases, discovery platforms, and NoSQL solutions beyond Hadoop.”

“We will abandon the old models based on the desire to implement for high-value analytic applications.”

"Logical" Data Warehouse

Source: “Big Data Comes of Age”. EMA and 9sight Consulting. Nov 2012.

Page 15: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

UNIFIED DATA ARCHITECTURE

ACCESS MOVE MANAGE Marketing Executives

Operational Systems

Frontline Workers

Customers Partners

Engineers

Data Scientists

Business Analysts

Math and Stats

Data Mining

Business Intelligence

Applications

Languages

Marketing

ANALYTIC TOOLS

USERS

DISCOVERY PLATFORM

INTEGRATED DATA WAREHOUSE

ERP

SCM

CRM

Images

Audio and Video

Machine Logs

Text

Web and Social

SOURCES

DATA PLATFORM

Page 16: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

UNIFIED DATA ARCHITECTURE

ACCESS MOVE MANAGE Marketing Executives

Operational Systems

Frontline Workers

Customers Partners

Engineers

Data Scientists

Business Analysts

Math and Stats

Data Mining

Business Intelligence

Applications

Languages

Marketing

ANALYTIC TOOLS

USERS

DISCOVERY PLATFORM

INTEGRATED DATA WAREHOUSE

ERP

SCM

CRM

Images

Audio and Video

Machine Logs

Text

Web and Social

SOURCES

DATA PLATFORM

Business Intelligence

Predictive Analytics

Operational Intelligence

Data Discovery

Path, graph, time-series analysis

Pattern Detection

Fast Loading

Filtering and Processing

Online Archival

Page 17: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

Marketing Executives

Operational Systems

Frontline Workers

Customers Partners

Engineers

Data Scientists

Business Analysts

Math and Stats

Data Mining

Business Intelligence

Applications

Languages

Marketing

USERS

DISCOVERY PLATFORM

INTEGRATED DATA WAREHOUSE

ERP

SCM

CRM

Images

Audio and Video

Machine Logs

Text

Web and Social

SOURCES

DATA PLATFORM

TERADATA UNIFIED DATA ARCHITECTURE

ACCESS MOVE MANAGE

ANALYTIC TOOLS

Page 18: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

18 2/28/14 Teradata Confidential

Teradata Appliance for Hadoop Value-Added Software Bringing Hadoop to Enterprise

Access: SQL-H™, Teradata Studio Management: Viewpoint, TVI Administration: Hadoop Builder, Intelligent start/stop, DataNode swap, deferred drive replace High Availability : NameNode HA, Master Machine Failover

Refining, Metadata, Entity Resolution

Security & Data Access

HCatalog Kerberos Kerberos

Page 19: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

© Hortonworks Inc. 2013

KNOX AMBARI

Modern Data Architecture Details

Page 19

SOURCE DATA

Sensor Log Data

Customer/Inventory

Data

Clickstream Data

Flat Files

Sentiment Analysis

Data

DB

File

JMS

REST

HTTP

Streaming

Analytical Platforms

Teradata IDW

Aster Discovery Platform

Query/Visualization/ Reporting/Analytical

Tools and Apps

JDBC/ODBC Compliant Tool

MAPREDUCE

YARN

STRUCTURING

HCATALOG (metadata services)

INTERACTIVE Teradata SQL-H

EXPORT

SQOOP / HIVE

LOAD

TDCH

Viewpoint Alerts Services System

Health Node Health

Space Usage

Capacity Heatmap

Metrics Analysis

TVI – Proactive system monitoring tied to Teradata customer support

HDFS

REFINE HIVE

PIG

CUSTOM

ETL

LOAD SQOOP

FLUME

Web HDFS

NFS

EXTRACT

Page 20: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

20 2/28/14 Teradata Confidential

Teradata Vital Infrastructure (TVI)

PROACTIVE RELIABILITY, AVAILABILITY, AND MANAGEABILITY

1U server virtualizes system and cabinet management software Server Management VMS •  Cabinet Management Interface Controller (CMIC) •  Service Work Station (SWS) •  Automatically installed on base/first cabinet

VMS allows full rack solutions without additional cabinet for traditional SWS

Eliminates need for expansion racks, reducing customers’ floor space and energy costs

Supports Teradata hardware and Hadoop software

TVI Support for Hadoop

62–70% of Incidents Discovered through TVI

Page 21: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

21 2/28/14 Teradata Confidential

Standard SQL Access to Hadoop Data

•  Trusted: Use existing tools/skills and enable self-service BI with granular security

•  Standard: 100% ANSI SQL access to Hadoop data

•  Fast: Queries run on Teradata or Aster, data accessed from Hadoop

•  Efficient: Intelligent data access leveraging the Hadoop HCatalog Hadoop Layer: HDFS

Pig

Hive

Hadoop MR

Teradata SQL-H Aster SQL-H

HCatalog

Dat

a

Dat

a Fi

ltering

Give business users on-the-fly access to data in Hadoop

Page 22: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

22 2/28/14 Teradata Confidential

Teradata Unified Data Architecture™ Partners Support Many Layers

Page 23: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

23 2/28/14 Teradata Confidential

PATH ANALYSIS Discover Patterns in Rows of Sequential Data

TEXT ANALYSIS Derive Patterns and Extract Features in Textual Data

STATISTICAL ANALYSIS High-Performance Processing of Common Statistical Calculations

SEGMENTATION Discover Natural Groupings of Data Points

MARKETING ANALYTICS Analyze Customer Interactions to Optimize Marketing Decisions

DATA TRANSFORMATION Transform Data for More Advanced Analysis

Graph Analysis Graph analytics processing and visualization

SQL-MapReduce Visualization Graphing and visualization tools linked to key functions of the MapReduce analytics library

Teradata Aster Discovery Portfolio: Accelerate Time to Insights Some of the 80+ out-of-the-box analytical apps

Page 24: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

24 2/28/14 Teradata Confidential

More Accurate Customer Churn Prevention

Hadoop captures, stores and transforms social, images and call records

Aster does path and pattern

analysis

Data Sources

Multi-Structured Raw Data

Call Center Voice Records

eMail

Traditional Data Flow

Analysis + Marketing Automation

(Customer Retention Campaign)

Capture, Retain and Refine Layer

ETL Tools

Hadoop

Call Data

Check Data

Teradata Integrated DW

Dim

ensi

onal

Dat

a

An

alytic Resu

lts

Aster Discovery Platform

Sentiment Scores

CLICKSTREAM DATA

SOCIAL FEEDS

Page 25: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

25 2/28/14 Teradata Confidential

MPP RDBMS + Hadoop Customer Successes

Page 26: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

26 2/28/14 Teradata Confidential

Key Considerations For EDW and Hadoop

MPP RDBMS Hadoop Stable Schema Evolving Schema

Leverages Structured Data Structure Agnostic

ANSI SQL Flexible Programming

Iterative Analysis Batch Analysis

Fine Grain Security N/A

Cleansed Data Raw Data

Seeks Scans

Updates/Deletes Ingest

Service Level Agreements Flexibility

Core Data All Data

Complex Joins Complex Processing

Efficient Use of CPU/IO Low Cost of Storage

Page 27: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

27 2/28/14 Teradata Confidential

Complete Consulting and Training

Services Areas of Focus

Teradata Analytic Architecture Services

Services to scope, design, build, operate and maintain an optimal UDA approach for Teradata, Aster, and Hadoop

Teradata DI Optimization

Assess structured/non-structured data, discuss data loading techniques, determine best platform, optimize load scripts/processes

Teradata Big Analytics

Assess data value/cost of capture, identify source of “exhaust” data, create conceptual architecture, refine and enrich the data, implement initial analytics in Aster or best-fit tool

Teradata Workshop for Hadoop

Introduction workshop (across all of UDA)

Teradata Data Staging for Hadoop

Load data into landing-area; set-up data exploration/refining area; Scope architecture and analytics; set-up Hadoop repository; Load sample data

Teradata Platform for Hadoop

Installation guidance and mentoring for Hadoop platform, D-I-Y after installation

Teradata Managed Services for Hadoop

Operations, management, administration, backup, security, process control for Hadoop

Teradata Training Courses for Hadoop

Two comprehensive, multi-day training offerings: 1) Administration of Apache Hadoop and 2) Developing Solutions Using Apache Hadoop

Page 28: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

28 2/28/14 Teradata Confidential

Discovering Deep Insights in Retail Transforming Web Walks into DNA Sequences

Situation

Large retailer with 700M visits/year, 2M customers / day look at 1M products online

Problem

Increase ability of web content owners to self-serve insights

Solution

Treat web walks like DNA sequences of simple patterns.

Impact •  Data: loaded logs into Hortonworks

•  Loaded 2 months of raw data in 1 hour, vs. 1 day on old system

•  Can load a day’s log data in 60 sec •  Sessionize: Creates sequence for

visit, e.g., boils 20 customer clicks down to 1 line:

•  <Home –Search -Look at Product - Add to Basket – Pay – Exit>

•  Analyze: Business analysts can now do path analysis

•  Act: •  Segmentations by behavior can

increase conversion rates by 5-10%. •  Web design changes can drive

another 10-20% more visitors into the sales funnel

Page 29: The Value of the Modern Data Architecture with Apache Hadoop and Teradata

29 2/28/14 Teradata Confidential

Demo

Demo