Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh...

15
Preparing your organization to derive insight from the internet of things Steve Sarsfield @SteveSarsfield Vertica - Hewlett Packard Enterprise March 2017

Transcript of Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh...

Page 1: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

Preparing your organization to derive insight from the internet of thingsSteve Sarsfield @SteveSarsfieldVertica - Hewlett Packard EnterpriseMarch 2017

Page 2: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

Driving customer demand for a smarter and more personalized product experience

Predictive maintenance

Product recommendations

Electronic health records

Fraud detection

Customer support

Page 3: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

Challenges

Handling more data

Skills to leverage new tools

Costs of License

Time to deliver analytics

Tuning Costs

Page 4: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

The future belongs to those who analyze without limits With analytics free from closed infrastructure and narrow deployment options

Traditional data warehouse lock-in

Cloud analytics deployment lock-in

Hadoop and open source

Page 5: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

HPE Vertica Enterprise

– Columnar storage and advanced compression

– Maximum performance and scalability

HPE VerticaAll built on the same trusted and proven HPE Vertica Core SQL Engine

Core HPE Vertica SQL Engine• Advanced Analytics

• Open ANSI SQL Standards ++

• R, Python, Java, Spark. Scala

• In-database machine learning

HPE Vertica for SQL on Hadoop

– Native support for ORC and Parquet

– Support for industry-leading distributions

– No helper node or single point of failure

HPE Vertica In the Cloud

– Get up and running quickly in the cloud

– Flexible, enterprise-class cloud deployment options

Page 6: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

The appeal of Vertica

Requirement Proof

Extreme Optimization•Columnar design for high performance analytics•Aggressive compression•Scalable to petabyte scale

Total Cost of Ownership•Simply and predictable pricing•No penalty for additional hardware or connected users

Ready for your Enterprise•SQL compliant to 100% of the TPC-DS benchmark queries •Secure and ACID compliant•No single point of failure

Open and Compatible•Open platform – Standards compliant SQL, Python, Java •Working with open source community on Spark, Hadoop, Kafka, etc.

6

Page 7: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

Bridging the gap between high cost legacy EDWs and Hadoop data lakes

Legacy Electronic Data Warehouse

– Declining performance at scale– Built on aging technology – Expensive w/ proprietary hardware– Limited deployment options

Data Lakes

– Low-cost storage of Big Data– Some analytics capabilities – Holding area for certain data

Page 8: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

Complexity – Example: Analytics Ready for Internet of Things

Goal• Deliver analysis of critical data at the source of the data

and provide faster time to insight

8

Pattern Matching

Live Aggregate Projections

Event Series JOINS

Event Windows

Correlate events across streams when the times do not line up

Break a sequence into subsequences based on certain events or changes

Find matching subsequences of events, compare the frequency of event patterns

SQL-99

R, Python and Custom Analytics

Full ANSI SQL compliant

Speed up queries that rely on resource-intensive aggregate functions like SUM, MIN/MAX, COUNT and Top-K

Access rich custom and predictive analytics in your favorite languages and tools, including R, Python, and custom functions.

Page 9: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

How to fill analysis gaps

Customer Segmentation

Channel & Location Analysis

Net ProfitRevenue

Geospatial Data Types

In-place JOINsTime series gap analysis

Event window functions

Sessionization Statistical functions

Data munging Custom Code Moving and copying big

data

With some solutions, you may be required to fill the gaps with

Using Generic Data Types

Spinning together two or

more open source projects

Page 10: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

Perhaps the ultimate architecture is all-inclusiveApache Spark, Hadoop and Kafka

HPE VerticaOptimal Use Case– Deep Analysis– Massive scale– Many concurrent users

Kafka

HadoopOptimal Use Case– Data lake– Warm, cold storage– Data discovery– ETL

SparkOptimal Use Case– Small, fast running queries– ETL and complex event processing– Operational analytics

Features:– Vertica performs optimized data load from

Spark– Spark runs queries on Vertica data

Features:– Analyze-in-place without data movement via

native ORC and Parquet readers– Any Hadoop– Run ON the Hadoop cluster or ON Vertica

cluster

Features:– Share data between

applications that support Kafka– Data streaming into Vertica

Page 11: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

“The choice was simple: the change to Vertica was much more cost effective than scaling their current Oracle system, while offering a much improved performance to execute very complex analytics use cases”

ROI: 351%Payback: 4 monthsAverage annual benefit: $3,014,583

Page 12: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

Suunto – Internet of Things (IoT)Suunto enriches extreme-sport experience with IoT wearable analytics on Vertica

“Vertica helps provides analytics so that athletescan train better and achieve more.”

Page 13: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

Checklist for preparing for IoT

Open up your systems(not just open source)

Skills to leverage new tools

Solutions that scale

Reconsider expensive legacy

Consider differing analytical workloads

Page 14: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

Think outside the box - New York Genome

GCAT

Compare to Reference Gene

Gene Sequencing Data

ArthritisAlzheimer'sParkinson'sAsthmaDiabetesAutismCancer

• Develop algorithms to find molecular cause of diseases

• Deal with errors in DNA sequencing

• Share results to community of scientists

• Compare tumor DNA to patient’s blood to find variant

• Precision medicine

• Suggest drugs to interfere with mutation

• Specific cancer drugs

3 Billion letters

150 GB per person stored raw data450 GB with analytics

Cancer genome –just under a TB

Page 15: Vertica HI-IS presentationbigdata-event.com/.../Vertica_Hi-IS_presentration.pdf · wr ghulyh lqvljkw iurp wkh lqwhuqhw ri wklqjv 6whyh 6duvilhog #6whyh6duvilhog ... 0dufk 'ulylqj

Thank [email protected]

Community Editionmy.vertica.com