© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
TB2957Big data Analytics
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Transforming business intelligence – real time analytics
Big data analytics
KB Ramesh - Director WW Storage ConsultingJune 2012
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.3
Advanced analytics building blocks
Agenda
1. Big data – introduction2. Big data Analytics – whole new approach3. Big data – Challenges in harnessing all the data4. Using Next Gen Analytics Architecture5. Big Data Analytics – New Applications and Business Models6. HP solution7. HP follow-on services
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Big data
From threat to opportunity
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.5
Some facts about big data
• Big data is NOT a problem but an opportunity
• Big data isn’t just big but also diverse data types and streaming data real time
• Big data Analytics is the application of advanced analytic techniques to very big data sets such as− Sentiment analysis, geo-location,
behavioral, social graph, and rich media social data
• Value = better understanding of − customer likes and dislikes− more effective risk management, − leveraging social media within IT as a
foundation for problem resolution & requirements definition
From problem to opportunity
Time
Siz
e in e
xabyte
s
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6
What is it anyway? And how it can be used to benefit organization?
Multi-structured data
• It’s often a mix of structured, semi-structured, and unstructured data, plus gradations among these
• Unstructured data works behind the scenes which subsequently converted to structured data.
• Value is in identifying patterns to make intelligent decisions• Value is in influencing decisions if we could see the behavior patterns?
Strategic
Tactical
Operational
Neural networks
Data mining
Pure data extraction/ad hoc
OLAP (slice and dice)
Parameterized reports
Canned reports
Realm
of analytical
modeling
Structured
Unstructured
Levels of reporting and analysis
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7
Advanced analytics required in addition to the traditional processing of the
Background: evolution of advanced analytics
Evolving Current State of DW/BI
• Latency, compression and speed• Requires human intervention• Coverage is important rather than rigor• Amount of data can be Tbytes Pbytes • Improves the system performance by scale-
out• Statistical data creation, retrieval, and data
mining
Traditional DW/BI
• Can be fully automated Rigor is required
• Restricted on types of data
• Transaction management (OLTP)
• Volumes of data (Gbytes Terabytes)
Converged big data Future
• New understanding of all multi-structured data
• Real-time advanced analytics• Superior speed with low latency• Process information in-memory, In-time, in-
place
Tradition
al DW/BI
Advanced Analytics – NLP and Artificial
IntelligenceUnstructured data batch processing -
Hadoop
In-Database analytics
In-Database analytics
Tradition
al DW/BI Advanced analytics
InformationApplication
sInfrastruct
ure
Converged Infrastructure IDOL 10
Hadoop
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8
Big Data analytics – the need for new approach Taking unstructured data into account
Challenges
Traditional approach
Scalablility NoIngest high Volumes of data (all available data) no
Sampling of data YesVariety of data (structired, semi-structured, unstructured) NoSimultaneous data and query processing NoFaster access to all relevant information NoAnalyze data at high rates(GB/sec NoAccuracy in anlytical models No
The questions that are answered
Std reports
Adhoc reports
QueryDrilldow
n
Alerts
Statistical Analysis
Forecasting
Predictive Analysis
Optimization
What happened?
How many, how often,, where?
Do You have opportunity or a problem?
What actions are needed?
Why is this happening?
What if these trends continue?
What will happen next?
What’s the best that can happen?
Com
peti
tive A
dvanta
ge
Degree of Intelligence
New approach
Yes
Yes
NO
Yes
Yes
Yes
Yes
Yes
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Big data analytics
The need for whole new approach
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.10
Challenges in harnessing ALL the data• In advanced analytics the data from unstructured to structured must undergo various
stages before it can be used by end user can reap benefits • Creation - storing the data, how to optimize and compress it in the data creation stage.• Ingestion - transformations and integrations play a major role , new tools and
techniques to process • Analysis - the data may have hidden trends and traits that are immensely useful.
Statistical data mining, machine learning and NLP• Visualization -- new modes of data delivery available, visualization for various channels
such as graphical vs. tabularVisualization
• Channels• In-memory
support• Standardization• Dashboards
Analysis
• Tools and Technologies.
• Enterprise search• Sentiment analysis
Ingestion
• Integrations• Tools and
technologies
Creation
• Storage• Elasticity• Compression• Data backup and
recovery strategies
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Creation
Storage and management
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12
Creation - big data storage considerations• Organizations need to reduce amount of
data stored and exploit new storage technologies that improve performance and utilization
• Three important directions:− Reducing data storage requirements
using data compression and new physical storage structures such as columnar storage
− Improving input/output (I/O) performance using solid-state drives (SSDs)
− Increasing storage utilization by using tiered storage -- data stored on different types of devices based on usage
Archiving
Replication and snapshots
Storage tiering and hybrid storage with SSD, SAS and SATA
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Ingesting, analyzing and visualizing
Consuming, processing and publishing the data
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14
Ingesting – unstructured data
Challenge simplified by solution that:1. Is on scale-out architecture
2. Can handle petabytes of data and more
3. Can handle data from numerous sources, such as social media, audio, video
4. Can process the data in batch and/or real time
5. Can provide faster access to relevant information
6. Can improve accuracy of analytical models
7. Has low latency
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15
With Hadoop platform
Using next gen analytics architecture
• Next-generation BI architecture is more analytical, highly scalable
• Gives power users greater options to access and mix corporate data
• Brings unstructured and semi-structured data fully into the mix using Hadoop and non-relational databases
Operational system
Operational system
Machine data
Semi structured
data
Unstructured Data
Externaldata
Power user
In-Database analytics
Subject
Areas
Reports, dashboards
Statistical analytics tools(R and CEP)
Data warehouse
Hadoop cluster
Operational Data store
Operational systems
(structured data)
Adhoc userAlert
s
Extract, transform, load batch; near real time
“Adapted with permission from Wayne Eckerson, Founder, BI Leadership Forum, www.bileadership.com.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16
What is in Hadoop platform?
• Able to handle enormous volumes of data, variety at greater velocity.
• More likely to be used than traditional data management systems to:− Identify patterns− Archive the data− Parse logs− Transform data− Perform types of analytics that couldn’t
be done on large volumes of data before capturing all source data (pre-process)
− Keep more historical data (post-process)
Role of Hadoop in big data analytics
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17
Hadoop ecosystem map
Hadoop platform landscape
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18
Open, modular, resilient, high performance, extreme scale-out architecture
Why Hadoop on HP Converged Infrastructure?
World’s Best IT Consulting Experts
World’s Most Self Sufficient Servers
World’s Strongest Partner Ecosystem
World’s Best Track Record
• Worldwide Center of Excellence for Hadoop in collaboration with HP Labs
• Global Solution Center for Proofs of Concept
• Workload analysis & characterization expertise
• Consulting for roadmap, sizing & configuration, and implementation
• HP manages more than 3 million square feet of data center space
• Some of the largest Hadoop clusters in the world run on HP
• Proven success with HP Insight CMU, Vertica and Autonomy
• 150 design innovations and over 900 patents for HP ProLiant Gen 8 servers
• 6x performance increase and up to 93% less down time for updates
• 66% faster time to problem resolution
• AllianceONE - 180,000 channel partners worldwide
• Development and marketing agreements with SAP and Microsoft on converged systems
• Partnerships with the top 3 Hadoop distribution vendors
Hadoop on HP
Converged Infrastructure
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19
Sizing and storage configurations for your workload and scalability requires collaboration
Hadoop challenges & best practices
• For Hadoop deployments using SAN or NAS needs to be evaluated on case by case basis. Though SAN or NAS can perform in certain scenarios but not always true.
• Hadoop Deployments are on SAN or NAS devices there can be network communications overhead and can cause performance bottlenecks especially on larger clusters.
• Hadoop deployments with built-in HA (HDFS) demands three time the storage that is normally required. While planning for storage it is good practice to account such requirement.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20
Limitations of Hadoop
• Hadoop is a framework, not a solution• Hive and Pig are good, but do not overcome architectural limitations • Deployment is easy, fast and free, but very costly to maintain and
develop • Great for data pipelining and summarization, horrible for ad hoc
analysis • Performance is great, except when it’s not required
Source – Joe Brighton blog http://www.quantivo.com/blog/top-5-reasons-not-use-hadoop-analytics
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.21
Analyzing and visualizing
Specific use cases• Optimizing advertising campaigns• Identifying and addressing patterns• Uncovering trends and issues that impact
business performance • Maximizing influence of user-generated
content• Analyzing interactions and transactions • Address marketing challenges
− Profiling,− clustering, − Sentiment analysis − Conceptual search
Real-time, contextual understanding of structured and unstructured data
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22
Where latency and compression matters
Integrates data analytics into data warehousing functionality, enhancing Data warehouse performance with
• Parallel computing• Shared nothing architectures• Data compression• Columnar database architecture
Accelerates data analysis• Relevant for applications requiring high-
throughput• Eliminates the overhead of moving large
data sets from enterprise data warehouse to a separate analytic software application
In-database analytics - a platform for structured data
Vertica Analytics Platform – Monetizing Big Data
Make smarter decisions in real time
Predict trends & patterns with accuracy
Deliver greater insight with the right context
Improve competitive differentiation
Drive faster innovation
Optimize operations
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23
In-database analytics best practices
1. Enterprise Data Warehouse scaling through parallelism2. Accelerate EDW with appliances3. Optimize batch performance by distributing storage4. Retune and rebalance workloads (auto tuning)5. Scale out through shared-nothing, massively-parallel processing (MPP)6. Push query processing to grid-enabled intelligent storage layers7. Apply efficient compression in storage layer
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24
Vertica Analytic Platform
Key features:• Real-time query & loading• Advanced in-database analytics • Columnar storage & execution • Aggressive data compression • Scale-out MPP architecture • High availability • Native BI, ETL, & Hadoop/MapReduce
integration
Extract value from data at speed and scale
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Recap
In-database analytics vs. Hadoop platform
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.26
To Hadoop or not to Hadoop?
Given the advantages and limitations of both the architectures• Hadoop common being batch oriented platform it is seldom used in rich
media analytics• Hadoop being an platform each of the tools that work with Hadoop Common
needs to be evaluated, designed and developed.• Hadoop being open source but needs to invest time to develop solutions that
can answer business questions.• Difficult to perform real time analytics with Hadoop Map-reduce though not
impossible• It Is questionable to perform advanced analytics with Hadoop faster• In-database analytics cannot handle unstructured data and needs to be
integrated into Hadoop architecture.• There is no “one size fits all” solution
Hybrid architectures needed to get the best of the both worlds
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.27
Complement traditional BI with advanced analyticsCombine unstructured data and structured data
• Time Series Analysis and Continuous Aggregation• Real-time embedded analytics • Faster access to data with low latency• Large-scale graph & network analysis – Social
network environments demonstrate utility of managing connectivity
• Column-oriented approach• Eliminate need for multiple indexes, views and
aggregations• Integrate data analytics into data warehousing
functionality• Eliminate overhead of moving large data sets from
enterprise data warehouse to separate analytic software application
• Provide significant performance benefits
Structured data
HDFS & Map/Reduce
process
In-database analytical process
Advanced analytics
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.28
Big data analytics
Real-life examplesE-commerce company: monitors server & application health and performance by gaining real-time visibility to tens of TBs of unstructured, time-sensitive machine data, online bookings, deal analysis and coupon use.• Avoid website outages• Optimize Web application
Wireless carrier: loads 10TB of CDR data into their system every day. • Make data accessible to BI tools to
enable the creation of dashboards for executives to analyze customer behavior
New applications and business modelsHealthcare outcomes analysis
Fraud detection
Pricing optimization
Social network analysis
Traffic flow optimization
Monitoring
Customer behavior analysis
Life science research
Web application optimization
Legal discovery
Weather forecasting
Infrastructure optimization
ActivityIndustryProcess
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP solution
Integrating all the pieces
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.30
Big data solutions from HPOnline storage • HP large-scale configuration• X9000 and 3PAR Utility Storage• IBRIX file system
HP ProLiant DL-3xx Gen8 class of servers Increased performance, energy efficient, Optimized for Hadoop implementations
Data warehouse – HP Vertical Analytics50-1000 times query speed of conventional SQL DB
Search and analysis – HP AutonomySupports 1000+ content repositories and analysis / search for 400 file formats
HP Technology Service Consulting• Experience• Results
Human information
(semi & unstructured)
Extreme information(structured)
Block Storage File Storage
Online Storage/Tiering
Snapshot/mirroring
Search/advance analytics
Data warehouse
• Process real time• Low latency
Human information(semi and unstructured)
• High throughput• Faster access to
relevant data
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.31
Helping you make big data work for your organization
Offering• 3-day workshop• Enterprise Search – Focuses on many enterprise
search systems • Implementation and integration of Hadoop
distributions.• Advanced analytics/exploratory analytics with
Hadoop, Vertica and autonomy• Big data protection – securing, archiving and
protecting data with use cases
Problems solved• Impacts of rapid data growth• Impact of Advanced analytics and exploratory
analytics on Business• Significance backup and recovery, data
security and compliance on the business• Harnesses data as a rich repository of
informationBenefits• Understand the big data landscape, its
challenges, benefits and critical success factors• Define strategy, create a roadmap• Assess how and when to use Hadoop • Integrate structured and unstructured data
collections• Determine when and how big data needs to be
protected, archived, and secured
HP Big Data Strategy Workshop
Variety
ValueVelocity
Volume
Big data technologies
NEW
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
32
HP Roadmap Service for Hadoop™PLAN for success
Offering Effective planning and implementation of
an Hadoop strategy & deployment Methodical approach to roadmap building Executable Roadmap with recommended
investments, timeline, & risk mitigations
Problems it Solves Builds a strategy to head in the right direction & avoids
fixing false starts Creates a shared vision Builds understanding of sources & sensitivity of data Identifies organizational inhibitors Addresses risk & mitigation Develops a roadmap for successful planning,
deployment, & support of an Hadoop platform
Benefits: Reduces time, cost & risk of successfully deploying
Hadoop Leverages proven success managing extremely large
HPC & Hadoop clusters Creates synergies with HP Vertica’s analytic database
& HP Autonomy’s meaning-based computing platform
NEW
Public
CloudPrivateCloud
Traditional
ManagedCloud
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.33
TS Consulting big data follow-on service offerings
Analyze/ explore
Project management Operation/improve
Architect &
validate
Implement/ develop
Detailed design
Archive/ protect
Initiate Plan Develop Manage
Big Data Discovery Workshop
Big data explore / design / architect
Data profiling / data tiering
Big Data Integration Service
Big Data IT Assurance Service
Big data monitoring, maintaining and operations support of big data software
Data archiving
Big Data Analytics Implementation Service
The ideal platform for social graphing and analytics
Products in this solution: IDOL + Vertica + Hadoop
MobileExploreOEM
HumanSemi
Structured Structured Extreme
Social Connectors
Executive Dashboard
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Q&A
35
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.36
After the eventVisit these demos
Find out moreAttend these sessions
• TB3011: Designing a storage cloud -- 6/6, 4:00PM
• TB2957: Big Data Analytics, 06/05/2012, 2.45 PM
• BB3053: New HP Data Migration Service, Tuesday 06/05/2012, 11.15 PM
• KM: HP storage services transformational journey – Converged Infrastructure Pavilion / Management
• KL: HP Storage Efficiency Analysis – Converged Infrastructure Pavilion
• Contact your sales rep
Your feedback is important to us. Please take a few minutes to complete the session survey.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank you
Top Related