KB Ramesh - TB2957 - Real-time, big data analytics

37
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. TB2957 Big data Analytics

description

HP Expert: KB Ramesh, presentation deck from HP Discover 2012 Las Vegas “Real-time, big data analytics "

Transcript of KB Ramesh - TB2957 - Real-time, big data analytics

Page 1: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

TB2957Big data Analytics

Page 2: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Transforming business intelligence – real time analytics

Big data analytics

KB Ramesh - Director WW Storage ConsultingJune 2012

Page 3: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3

Advanced analytics building blocks

Agenda

1. Big data – introduction2. Big data Analytics – whole new approach3. Big data – Challenges in harnessing all the data4. Using Next Gen Analytics Architecture5. Big Data Analytics – New Applications and Business Models6. HP solution7. HP follow-on services

Page 4: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Big data

From threat to opportunity

Page 5: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5

Some facts about big data

• Big data is NOT a problem but an opportunity

• Big data isn’t just big but also diverse data types and streaming data real time

• Big data Analytics is the application of advanced analytic techniques to very big data sets such as− Sentiment analysis, geo-location,

behavioral, social graph, and rich media social data

• Value = better understanding of − customer likes and dislikes− more effective risk management, − leveraging social media within IT as a

foundation for problem resolution & requirements definition

From problem to opportunity

Time

Siz

e in e

xabyte

s

Lee Gallant
I believe the graph comes from and IDC study. Please verify that HP has permission to use it. Alicia Johnson is trying to verify similar data for her presentation, so you might want to contact her. Otherwise Mike Myers ([email protected]) in TS Analyst Relations is the person to contact. The source must be cited on the slide, for example: IDC, XYZ Study, January 2012.
Page 6: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6

What is it anyway? And how it can be used to benefit organization?

Multi-structured data

• It’s often a mix of structured, semi-structured, and unstructured data, plus gradations among these

• Unstructured data works behind the scenes which subsequently converted to structured data.

• Value is in identifying patterns to make intelligent decisions• Value is in influencing decisions if we could see the behavior patterns?

Strategic

Tactical

Operational

Neural networks

Data mining

Pure data extraction/ad hoc

OLAP (slice and dice)

Parameterized reports

Canned reports

Realm

of analytical

modeling

Structured

Unstructured

Levels of reporting and analysis

Page 7: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7

Advanced analytics required in addition to the traditional processing of the

Background: evolution of advanced analytics

Evolving Current State of DW/BI

• Latency, compression and speed• Requires human intervention• Coverage is important rather than rigor• Amount of data can be Tbytes Pbytes • Improves the system performance by scale-

out• Statistical data creation, retrieval, and data

mining

Traditional DW/BI

• Can be fully automated Rigor is required

• Restricted on types of data

• Transaction management (OLTP)

• Volumes of data (Gbytes Terabytes)

Converged big data Future

• New understanding of all multi-structured data

• Real-time advanced analytics• Superior speed with low latency• Process information in-memory, In-time, in-

place

Tradition

al DW/BI

Advanced Analytics – NLP and Artificial

IntelligenceUnstructured data batch processing -

Hadoop

In-Database analytics

In-Database analytics

Tradition

al DW/BI Advanced analytics

InformationApplication

sInfrastruct

ure

Converged Infrastructure IDOL 10

Hadoop

Page 8: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8

Big Data analytics – the need for new approach Taking unstructured data into account

 Challenges

Traditional approach

Scalablility NoIngest high Volumes of data (all available data) no

Sampling of data YesVariety of data (structired, semi-structured, unstructured) NoSimultaneous data and query processing NoFaster access to all relevant information NoAnalyze data at high rates(GB/sec NoAccuracy in anlytical models No

The questions that are answered

Std reports

Adhoc reports

QueryDrilldow

n

Alerts

Statistical Analysis

Forecasting

Predictive Analysis

Optimization

What happened?

How many, how often,, where?

Do You have opportunity or a problem?

What actions are needed?

Why is this happening?

What if these trends continue?

What will happen next?

What’s the best that can happen?

Com

peti

tive A

dvanta

ge

Degree of Intelligence

New approach

Yes

Yes

NO

Yes

Yes

Yes

Yes

Yes

Page 9: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Big data analytics

The need for whole new approach

Page 10: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10

Challenges in harnessing ALL the data• In advanced analytics the data from unstructured to structured must undergo various

stages before it can be used by end user can reap benefits • Creation - storing the data, how to optimize and compress it in the data creation stage.• Ingestion - transformations and integrations play a major role , new tools and

techniques to process • Analysis - the data may have hidden trends and traits that are immensely useful.

Statistical data mining, machine learning and NLP• Visualization -- new modes of data delivery available, visualization for various channels

such as graphical vs. tabularVisualization

• Channels• In-memory

support• Standardization• Dashboards

Analysis

• Tools and Technologies.

• Enterprise search• Sentiment analysis

Ingestion

• Integrations• Tools and

technologies

Creation

• Storage• Elasticity• Compression• Data backup and

recovery strategies

Page 11: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Creation

Storage and management

Page 12: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12

Creation - big data storage considerations• Organizations need to reduce amount of

data stored and exploit new storage technologies that improve performance and utilization

• Three important directions:− Reducing data storage requirements

using data compression and new physical storage structures such as columnar storage

− Improving input/output (I/O) performance using solid-state drives (SSDs)

− Increasing storage utilization by using tiered storage -- data stored on different types of devices based on usage

Archiving

Replication and snapshots

Storage tiering and hybrid storage with SSD, SAS and SATA

Page 13: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Ingesting, analyzing and visualizing

Consuming, processing and publishing the data

Page 14: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14

Ingesting – unstructured data

Challenge simplified by solution that:1. Is on scale-out architecture

2. Can handle petabytes of data and more

3. Can handle data from numerous sources, such as social media, audio, video

4. Can process the data in batch and/or real time

5. Can provide faster access to relevant information

6. Can improve accuracy of analytical models

7. Has low latency

Page 15: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15

With Hadoop platform

Using next gen analytics architecture

• Next-generation BI architecture is more analytical, highly scalable

• Gives power users greater options to access and mix corporate data

• Brings unstructured and semi-structured data fully into the mix using Hadoop and non-relational databases

Operational system

Operational system

Machine data

Semi structured

data

Unstructured Data

Externaldata

Power user

In-Database analytics

Subject

Areas

Reports, dashboards

Statistical analytics tools(R and CEP)

Data warehouse

Hadoop cluster

Operational Data store

Operational systems

(structured data)

Adhoc userAlert

s

Extract, transform, load batch; near real time

“Adapted with permission from Wayne Eckerson, Founder, BI Leadership Forum, www.bileadership.com.

Page 16: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16

What is in Hadoop platform?

• Able to handle enormous volumes of data, variety at greater velocity.

• More likely to be used than traditional data management systems to:− Identify patterns− Archive the data− Parse logs− Transform data− Perform types of analytics that couldn’t

be done on large volumes of data before capturing all source data (pre-process)

− Keep more historical data (post-process)

Role of Hadoop in big data analytics

Page 17: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17

Hadoop ecosystem map

Hadoop platform landscape

Page 18: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18

Open, modular, resilient, high performance, extreme scale-out architecture

Why Hadoop on HP Converged Infrastructure?

World’s Best IT Consulting Experts

World’s Most Self Sufficient Servers

World’s Strongest Partner Ecosystem

World’s Best Track Record

• Worldwide Center of Excellence for Hadoop in collaboration with HP Labs

• Global Solution Center for Proofs of Concept

• Workload analysis & characterization expertise

• Consulting for roadmap, sizing & configuration, and implementation

• HP manages more than 3 million square feet of data center space

• Some of the largest Hadoop clusters in the world run on HP

• Proven success with HP Insight CMU, Vertica and Autonomy

• 150 design innovations and over 900 patents for HP ProLiant Gen 8 servers

• 6x performance increase and up to 93% less down time for updates

• 66% faster time to problem resolution

• AllianceONE - 180,000 channel partners worldwide

• Development and marketing agreements with SAP and Microsoft on converged systems

• Partnerships with the top 3 Hadoop distribution vendors

Hadoop on HP

Converged Infrastructure

Page 19: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19

Sizing and storage configurations for your workload and scalability requires collaboration

Hadoop challenges & best practices

• For Hadoop deployments using SAN or NAS needs to be evaluated on case by case basis. Though SAN or NAS can perform in certain scenarios but not always true.

• Hadoop Deployments are on SAN or NAS devices there can be network communications overhead and can cause performance bottlenecks especially on larger clusters.

• Hadoop deployments with built-in HA (HDFS) demands three time the storage that is normally required. While planning for storage it is good practice to account such requirement.

Page 20: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20

Limitations of Hadoop

• Hadoop is a framework, not a solution• Hive and Pig are good, but do not overcome architectural limitations • Deployment is easy, fast and free, but very costly to maintain and

develop • Great for data pipelining and summarization, horrible for ad hoc

analysis • Performance is great, except when it’s not required

Source – Joe Brighton blog http://www.quantivo.com/blog/top-5-reasons-not-use-hadoop-analytics

Page 21: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.21

Analyzing and visualizing

Specific use cases• Optimizing advertising campaigns• Identifying and addressing patterns• Uncovering trends and issues that impact

business performance • Maximizing influence of user-generated

content• Analyzing interactions and transactions • Address marketing challenges

− Profiling,− clustering, − Sentiment analysis − Conceptual search

Real-time, contextual understanding of structured and unstructured data

Page 22: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22

Where latency and compression matters

Integrates data analytics into data warehousing functionality, enhancing Data warehouse performance with

• Parallel computing• Shared nothing architectures• Data compression• Columnar database architecture

Accelerates data analysis• Relevant for applications requiring high-

throughput• Eliminates the overhead of moving large

data sets from enterprise data warehouse to a separate analytic software application

In-database analytics - a platform for structured data

Vertica Analytics Platform – Monetizing Big Data

Make smarter decisions in real time

Predict trends & patterns with accuracy

Deliver greater insight with the right context

Improve competitive differentiation

Drive faster innovation

Optimize operations

Page 23: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23

In-database analytics best practices

1. Enterprise Data Warehouse scaling through parallelism2. Accelerate EDW with appliances3. Optimize batch performance by distributing storage4. Retune and rebalance workloads (auto tuning)5. Scale out through shared-nothing, massively-parallel processing (MPP)6. Push query processing to grid-enabled intelligent storage layers7. Apply efficient compression in storage layer

Page 24: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24

Vertica Analytic Platform

Key features:• Real-time query & loading• Advanced in-database analytics • Columnar storage & execution • Aggressive data compression • Scale-out MPP architecture • High availability      • Native BI, ETL, & Hadoop/MapReduce

integration

Extract value from data at speed and scale

Page 25: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Recap

In-database analytics vs. Hadoop platform

Page 26: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.26

To Hadoop or not to Hadoop?

Given the advantages and limitations of both the architectures• Hadoop common being batch oriented platform it is seldom used in rich

media analytics• Hadoop being an platform each of the tools that work with Hadoop Common

needs to be evaluated, designed and developed.• Hadoop being open source but needs to invest time to develop solutions that

can answer business questions.• Difficult to perform real time analytics with Hadoop Map-reduce though not

impossible• It Is questionable to perform advanced analytics with Hadoop faster• In-database analytics cannot handle unstructured data and needs to be

integrated into Hadoop architecture.• There is no “one size fits all” solution

Hybrid architectures needed to get the best of the both worlds

Page 27: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.27

Complement traditional BI with advanced analyticsCombine unstructured data and structured data

• Time Series Analysis and Continuous Aggregation• Real-time embedded analytics • Faster access to data with low latency• Large-scale graph & network analysis – Social

network environments demonstrate utility of managing connectivity

• Column-oriented approach• Eliminate need for multiple indexes, views and

aggregations• Integrate data analytics into data warehousing

functionality• Eliminate overhead of moving large data sets from

enterprise data warehouse to separate analytic software application

• Provide significant performance benefits

Structured data

HDFS & Map/Reduce

process

In-database analytical process

Advanced analytics

Page 28: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.28

Big data analytics

Real-life examplesE-commerce company: monitors server & application health and performance by gaining real-time visibility to tens of TBs of unstructured, time-sensitive machine data, online bookings, deal analysis and coupon use.• Avoid website outages• Optimize Web application

Wireless carrier: loads 10TB of CDR data into their system every day. • Make data accessible to BI tools to

enable the creation of dashboards for executives to analyze customer behavior

New applications and business modelsHealthcare outcomes analysis

Fraud detection

Pricing optimization

Social network analysis

Traffic flow optimization

Monitoring

Customer behavior analysis

Life science research

Web application optimization

Legal discovery

Weather forecasting

Infrastructure optimization

ActivityIndustryProcess

Page 29: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

HP solution

Integrating all the pieces

Page 30: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.30

Big data solutions from HPOnline storage • HP large-scale configuration• X9000 and 3PAR Utility Storage• IBRIX file system

HP ProLiant DL-3xx Gen8 class of servers Increased performance, energy efficient, Optimized for Hadoop implementations

Data warehouse – HP Vertical Analytics50-1000 times query speed of conventional SQL DB

Search and analysis – HP AutonomySupports 1000+ content repositories and analysis / search for 400 file formats

HP Technology Service Consulting• Experience• Results

Human information

(semi & unstructured)

Extreme information(structured)

Block Storage File Storage

Online Storage/Tiering

Snapshot/mirroring

Search/advance analytics

Data warehouse

• Process real time• Low latency

Human information(semi and unstructured)

• High throughput• Faster access to

relevant data

Page 31: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.31

Helping you make big data work for your organization

Offering• 3-day workshop• Enterprise Search – Focuses on many enterprise

search systems • Implementation and integration of Hadoop

distributions.• Advanced analytics/exploratory analytics with

Hadoop, Vertica and autonomy• Big data protection – securing, archiving and

protecting data with use cases

Problems solved• Impacts of rapid data growth• Impact of Advanced analytics and exploratory

analytics on Business• Significance backup and recovery, data

security and compliance on the business• Harnesses data as a rich repository of

informationBenefits• Understand the big data landscape, its

challenges, benefits and critical success factors• Define strategy, create a roadmap• Assess how and when to use Hadoop • Integrate structured and unstructured data

collections• Determine when and how big data needs to be

protected, archived, and secured

HP Big Data Strategy Workshop

Variety

ValueVelocity

Volume

Big data technologies

NEW

Page 32: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

32

HP Roadmap Service for Hadoop™PLAN for success

Offering Effective planning and implementation of

an Hadoop strategy & deployment Methodical approach to roadmap building Executable Roadmap with recommended

investments, timeline, & risk mitigations

Problems it Solves Builds a strategy to head in the right direction & avoids

fixing false starts Creates a shared vision Builds understanding of sources & sensitivity of data Identifies organizational inhibitors Addresses risk & mitigation Develops a roadmap for successful planning,

deployment, & support of an Hadoop platform

Benefits: Reduces time, cost & risk of successfully deploying

Hadoop Leverages proven success managing extremely large

HPC & Hadoop clusters Creates synergies with HP Vertica’s analytic database

& HP Autonomy’s meaning-based computing platform

NEW

Public

CloudPrivateCloud

Traditional

ManagedCloud

Page 33: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.33

TS Consulting big data follow-on service offerings

Analyze/ explore

Project management Operation/improve

Architect &

validate

Implement/ develop

Detailed design

Archive/ protect

Initiate Plan Develop Manage

Big Data Discovery Workshop

Big data explore / design / architect

Data profiling / data tiering

Big Data Integration Service

Big Data IT Assurance Service

Big data monitoring, maintaining and operations support of big data software

Data archiving

Big Data Analytics Implementation Service

Page 34: KB Ramesh - TB2957 - Real-time, big data analytics

The ideal platform for social graphing and analytics

Products in this solution: IDOL + Vertica + Hadoop

MobileExploreOEM

HumanSemi

Structured Structured Extreme

Social Connectors

Executive Dashboard

Page 35: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Q&A

35

Page 36: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.36

After the eventVisit these demos

Find out moreAttend these sessions

• TB3011: Designing a storage cloud -- 6/6, 4:00PM

• TB2957: Big Data Analytics, 06/05/2012, 2.45 PM

• BB3053: New HP Data Migration Service, Tuesday 06/05/2012, 11.15 PM

• KM: HP storage services transformational journey – Converged Infrastructure Pavilion / Management

• KL: HP Storage Efficiency Analysis – Converged Infrastructure Pavilion

• Contact your sales rep

Your feedback is important to us. Please take a few minutes to complete the session survey.

Page 37: KB Ramesh - TB2957 - Real-time, big data analytics

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Thank you