Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh...
Transcript of Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh...
Preparing your organization to derive insight from the internet of thingsSteve Sarsfield @SteveSarsfieldVertica - Hewlett Packard EnterpriseMarch 2017
Driving customer demand for a smarter and more personalized product experience
Predictive maintenance
Product recommendations
Electronic health records
Fraud detection
Customer support
Challenges
Handling more data
Skills to leverage new tools
Costs of License
Time to deliver analytics
Tuning Costs
The future belongs to those who analyze without limits With analytics free from closed infrastructure and narrow deployment options
Traditional data warehouse lock-in
Cloud analytics deployment lock-in
Hadoop and open source
HPE Vertica Enterprise
– Columnar storage and advanced compression
– Maximum performance and scalability
HPE VerticaAll built on the same trusted and proven HPE Vertica Core SQL Engine
Core HPE Vertica SQL Engine• Advanced Analytics
• Open ANSI SQL Standards ++
• R, Python, Java, Spark. Scala
• In-database machine learning
HPE Vertica for SQL on Hadoop
– Native support for ORC and Parquet
– Support for industry-leading distributions
– No helper node or single point of failure
HPE Vertica In the Cloud
– Get up and running quickly in the cloud
– Flexible, enterprise-class cloud deployment options
The appeal of Vertica
Requirement Proof
Extreme Optimization•Columnar design for high performance analytics•Aggressive compression•Scalable to petabyte scale
Total Cost of Ownership•Simply and predictable pricing•No penalty for additional hardware or connected users
Ready for your Enterprise•SQL compliant to 100% of the TPC-DS benchmark queries •Secure and ACID compliant•No single point of failure
Open and Compatible•Open platform – Standards compliant SQL, Python, Java •Working with open source community on Spark, Hadoop, Kafka, etc.
6
Bridging the gap between high cost legacy EDWs and Hadoop data lakes
Legacy Electronic Data Warehouse
– Declining performance at scale– Built on aging technology – Expensive w/ proprietary hardware– Limited deployment options
Data Lakes
– Low-cost storage of Big Data– Some analytics capabilities – Holding area for certain data
Complexity – Example: Analytics Ready for Internet of Things
Goal• Deliver analysis of critical data at the source of the data
and provide faster time to insight
8
Pattern Matching
Live Aggregate Projections
Event Series JOINS
Event Windows
Correlate events across streams when the times do not line up
Break a sequence into subsequences based on certain events or changes
Find matching subsequences of events, compare the frequency of event patterns
SQL-99
R, Python and Custom Analytics
Full ANSI SQL compliant
Speed up queries that rely on resource-intensive aggregate functions like SUM, MIN/MAX, COUNT and Top-K
Access rich custom and predictive analytics in your favorite languages and tools, including R, Python, and custom functions.
How to fill analysis gaps
Customer Segmentation
Channel & Location Analysis
Net ProfitRevenue
Geospatial Data Types
In-place JOINsTime series gap analysis
Event window functions
Sessionization Statistical functions
Data munging Custom Code Moving and copying big
data
With some solutions, you may be required to fill the gaps with
Using Generic Data Types
Spinning together two or
more open source projects
Perhaps the ultimate architecture is all-inclusiveApache Spark, Hadoop and Kafka
HPE VerticaOptimal Use Case– Deep Analysis– Massive scale– Many concurrent users
Kafka
HadoopOptimal Use Case– Data lake– Warm, cold storage– Data discovery– ETL
SparkOptimal Use Case– Small, fast running queries– ETL and complex event processing– Operational analytics
Features:– Vertica performs optimized data load from
Spark– Spark runs queries on Vertica data
Features:– Analyze-in-place without data movement via
native ORC and Parquet readers– Any Hadoop– Run ON the Hadoop cluster or ON Vertica
cluster
Features:– Share data between
applications that support Kafka– Data streaming into Vertica
“The choice was simple: the change to Vertica was much more cost effective than scaling their current Oracle system, while offering a much improved performance to execute very complex analytics use cases”
ROI: 351%Payback: 4 monthsAverage annual benefit: $3,014,583
Suunto – Internet of Things (IoT)Suunto enriches extreme-sport experience with IoT wearable analytics on Vertica
“Vertica helps provides analytics so that athletescan train better and achieve more.”
Checklist for preparing for IoT
Open up your systems(not just open source)
Skills to leverage new tools
Solutions that scale
Reconsider expensive legacy
Consider differing analytical workloads
Think outside the box - New York Genome
GCAT
Compare to Reference Gene
Gene Sequencing Data
ArthritisAlzheimer'sParkinson'sAsthmaDiabetesAutismCancer
• Develop algorithms to find molecular cause of diseases
• Deal with errors in DNA sequencing
• Share results to community of scientists
• Compare tumor DNA to patient’s blood to find variant
• Precision medicine
• Suggest drugs to interfere with mutation
• Specific cancer drugs
3 Billion letters
150 GB per person stored raw data450 GB with analytics
Cancer genome –just under a TB
Thank [email protected]
Community Editionmy.vertica.com