Post on 16-Mar-2018
Extending Open Source Big Data to the Enterprise with SAP HANA VoraDaniel Rutschmann, Sr. Director, Database & Data Management GTMDecember 8, 2016
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 2
What has SAP got to do with Big Data?
• SAP is the world’s largest provider of Enterprise Application Software
• 82,400 employees in 130+ countries
• 335,000 customers in 190 countries
• 87% of the Forbes Global 2000 companies
• 74% of the world’s transaction revenue touches an SAP system
• Dealing with complex data processing problems is our daily business.
• Founding sponsor of the UC Berkeley AMPLab
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 3
Challenges With Getting Actionable Insights
Different data formats need different
computation tools
Business analysts struggle with highly
technical tools
Lack of unified environment for
production deployment
DraftChallenges
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 4
Draft
SAP HANA Vora 1.3
SAP HANA Vora is an enterprise-ready, easy-to-use in-memory distributed computing solution to help organizations uncover actionable insights from big data.
Builds upon Apache Spark
Seamless Integration with SAP HANA
Runs on Hadoop
Vora
* in beta
*
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 5
Distributed Computing for the Digital Enterprise
Hadoop
Spark
Distributed Transaction Log
Disk-to-Memory Accelerator
Data Modeler
OLAP Time Series Graph Doc Store
SAP HANA Vora
O P E N C O N S U M P TI O N
Data Science, Predictive, Business Intelligence, Visualization Apps
Insights from one single solution
Enterprise-ready
Easier to use
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 6
Insights from One Single SolutionDraft
In-memory distributed computing engines
Sophisticated analytics for relational, time series, graph and JSON data
High performance even when dataset sizes exceed memory capacity
Hadoop
Spark
Distributed Transaction Log
Disk-to-Memory Accelerator
Relational Time Series Graph Doc Store
SAP HANA Vora
O P E N C O N S U M P TI O N
Data Science, Predictive, Business Intelligence, Visualization Apps
Data Modeler
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 7
Time Series Data Analysis across big data
-30
-25
-20
-15
-10
-5
0
5
Temperature °C
Halifax Waterloo
Efficiently analyze time series data in distributed environments
� Interactive access to standard time series analysis functions using the well-known SQL language
� Efficient compression allowing analysis of more data using less memory
� Build time series models visually using VoraData Modeler
Trend | Cyclical | Seasonal | Random | Exception
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 8
Vora Time Series Functions
Sequence of data points recorded over time� Can be equidistant or non-equidistant� Detect and correct errors / anomalies� Granularization� Standard aggregation� Analysis
Specify a series clause during table creation� Define the period column (timestamp)� Provide a compression definition (optional)� Define start/end of series (optional)� Define the series increment (optional)
Column Functions� Trend� Stddev� Median� Linear_Approx� Const_Approx� Cubic_Spline_Approx� Polynomial_Approx
Table Functions � Auto_Corr� Cross_Corr� Histogram� DFT� Granulize
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 9
Enterprise ReadyDraft
Production-ready, integrated solution
Metadata persistence
Out-of-the-box business functions including hierarchy processing and currency conversion
Seamless integration with SAP HANA
Hadoop
Spark
Distributed Transaction Log
Disk-to-Memory Accelerator
Relational Time Series Graph Doc Store
SAP HANA Vora
Data Science, Predictive, Business Intelligence, Visualization Apps
Other Apps
In-Memory StoreSAP HANA
Platform
O P T I O N A LData Modeler
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 10
SAP HANA and SAP HANA Vora Integration
Other Apps
Gain business coherence with business data and big data
SAP HANA in-memory platform
In-Memory Store
SAP HANA Platform
HANA Smart Data Access Spark
Controller
YARN
HDFSFiles
VoraSpark
Files
VoraSpark
Files
VoraSpark
Spark Data-source API enhancement
SQL
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 11
Utilities CustomerData Tiering to Petabyte Scale
Data Lifecycle Manager
HOT-STORE(Column Table)
WARM-STORE(Extended Table)
DATA MOVEMENT
YARN
HDFSFiles
Vora
Spark
Files
Vora
Spark
Files
Vora
Spark
Hadoop Cluster
SAP HANA
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 12
Fashion Retail Use CaseSocial Media Analysis
Check out the Vora Test Drive on http://testdrive.saphanavora.comi
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 13
Easier to Use
Intuitive web interface with drag-and-drop for creating data models
One SQL entry point to interact with specialized computing engines
Connect familiar analytics tools and web notebooks
Hadoop
Spark
Distributed Transaction Log
Disk-to-Memory Accelerator
Relational Time Series Graph Doc Store
SAP HANA Vora
Data Science, Predictive, Business Intelligence, Visualization Apps
Data Modeler
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 14
Data Modeler for creating business scenarios
Creating business scenarios views :
• Data Browser for viewing and exporting data
• SQL Editor for writing and running SQL scripts
• Modeler to visually create data models with intuitive web interface
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 15
The Lambda Architecture
www.ymc.ch/en/lambda-architecture-part-1
Batch Layer� High latency, high throughput� Compute official result
Speed Layer� Low latency� Compute approximate update to last known
result
Serving Layer� Real-time� Merge batch/speed results
© 2016 SAP SE or an SAP affiliate company. All rights reserved. 16
Lambda ArchitectureCustomer Example
MQ Kafka
LTE
Validate & Aggregate Messages
Reporting with standard BI Tools
Low Latency – High throughput
Modern Developer Tools
Record/Replay
SpatialPredictive Libraries
SAP HANASAP HANA Smart Data Streaming
All Thing History +++
High Speed Analytics
Immutable Copy
Observations
• Intelligent distribution eliminates replication of data for analytics• Each component provides fit-to purpose analytics
• Each component scales independently for the use case at hand
• Reduced TCO • Increased Analytical Agility• Brings the code to the data• Supports additional usage models
Alerts
Things
YARN
HDFSFiles
Vora
Spark
Files
Vora
Spark
Files
Vora
Spark