Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales...
Transcript of Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales...
![Page 1: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/1.jpg)
www.pervasivedatarush.com
Pervasive DataRushTM
Parallel Data Analysis with KNIME
![Page 2: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/2.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES 2
Company Overview
Global Software Company
• Tens of thousands of users across the globe
• Americas, EMEA, Asia
• ~230 employees worldwide
Strong Financials
• $46 million revenue (Trailing 12-month)
• 40 consecutive quarters of profitability
• $36 million in the bank
• NASDAQ:PVSW since 1997
Leader in Data Innovation
• Cloud-Based and On-Premises Data Integration
• Data Management
• Web-based Business-to-Business Data Interchange
• Highly Parallel Data-Intensive and Analytic Applications
![Page 3: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/3.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Data Size
Com
ple
xit
y
HPC • climate modeling
• seismic analysis
• fluid dynamics
Internet scale
• web indexing
• web search
GB PB
Enterprise data
• custom solutions
• data quality
• data analytics
Need to deal
with increased
data and
complexity
The Challenge of Big Data
3
![Page 4: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/4.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Pervasive DataRush™
• Scalable: Performance dynamically scales with
increased core counts and increased nodes.
• High Throughput: Fast, deep analysis of large
data sets with no limit on input data size.
• Cost Efficient: Maximum performance from
commodity multicore servers, SMP systems and
clusters.
• Easy to Implement: No complex parallel
processing issues; visual and API level
interfaces.
• Extensible: Extensible platform so you remain in
control of development.
… a parallel dataflow platform that eliminates performance
bottlenecks in your data-intensive applications
Mult
icore
SM
P
Clu
ster
Hadoop
Clu
ster
Analytics and Big Data
Application
DataRush Apps Scale Up and Out
4
![Page 5: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/5.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Pervasive DataRush Architecture
PDR Modules
DR DataMatcher
DR Recommender
DR Profiler
User-defined Modules
Data preparation
Data analytics
DR Core Data
Prep Lib
DR Core
Analytics Lib
Dynamic
Processing
Graph
DataRush SDK
User-defined
Libraries
High Performance Data-intensive Application
Quality data
Actionable analytics
Large volumes of data
PDR Parallel Dataflow Engine
…
KNIME
5
JVM: Java, Python, JRuby, SCALA…
![Page 6: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/6.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
DataRush & KNIME Integration
• Desktop plug-in for DataRush usage
– Nodes for data preparation and manipulation
– Base set of parallelized data mining functionality
– Highly efficient & parallelized data staging
– Parallel execution extension
• SDK plug-in for DataRush node development
– Create your own DataRush based nodes
– Access to full DataRush API’s
– Wizard for creating DataRush based nodes
6
![Page 7: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/7.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Normal Execution
7
![Page 8: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/8.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Normal Execution
8
DataRush
Engine
spawns
![Page 9: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/9.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Normal Execution
9
DataRush
Engine
spawns
![Page 10: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/10.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Normal Execution
10
DataRush
Engine
spawns
![Page 11: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/11.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Normal Execution - Complete
11
![Page 12: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/12.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Parallel DataRush Executor
• Capabilities
– Supports parallel execution of DataRush based
nodes without intermediate staging
– Automatically splits workflows into executable
graphs at staging boundaries
– Executes non-DataRush nodes including meta-
nodes, for loops and branches
– Usable within desktop, command line and
server environments
12
![Page 13: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/13.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Parallel Execution
13
![Page 14: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/14.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Parallel Execution
14
DataRush
Engine
![Page 15: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/15.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Parallel Execution
15
DataRush
Engine
![Page 16: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/16.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Parallel Execution - Complete
16
![Page 17: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/17.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Parallel Execution – Details
17
Parse
Parse
Parse
Parse
Replace
Replace
Replace
Replace
Aggregate
Aggregate
Aggregate
Aggregate
Format
Format
Format
Format Write
![Page 18: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/18.jpg)
www.pervasivedatarush.com
Demo
![Page 19: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/19.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Vision for Levels of Usage
• Level 0
– No code changes on your part
– Install DataRush plug-in and most nodes will see a performance
benefit
• Level 1
– Some code changes required
– Utilize DataRush to access parallelized data staging capability
bypassing BDT API
• Level 2
– Utilize DataRush SDK to build nodes using the full parallelized
flow capability of DataRush
– Available today
19
![Page 20: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/20.jpg)
www.pervasivedatarush.com
Demo
![Page 21: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/21.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
DataRush Benefits
• High Throughput
– Process data quickly and efficiently
– Accomplish complex processing in a single pass
• Scalable
– Takes advantage of multicore processors
– Runs faster as more cores are added
– Scales with the amount of data
• Easy to use and extend
– Dataflow abstraction hides parallelism details
– SDK to ease development
![Page 22: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/22.jpg)
SCALE YOUR BIG DATA APPLICATION ACROSS MANY CORES AND MULTIPLE NODES
Summary
• Scale performance on commodity multicore systems – Massive performance exists on a single server
– Core counts growing with Moore’s Law
• Scale up and scale out – Economical, environmental, and manageable
• Scale to big data – Handle diverse, complex, massive data sets
• Scale development – Easy for existing team to implement parallel applications
– Extensible platform keeps you in control
Simplify how you develop Big Data applications
22
![Page 23: Hardware & Software Acquisition Costs · 2017-05-23 · •Scalable: Performance dynamically scales with increased core counts and increased nodes. •High Throughput: Fast, deep](https://reader033.fdocuments.us/reader033/viewer/2022050409/5f85bf891fc22b469d4ee07e/html5/thumbnails/23.jpg)
www.pervasivedatarush.com
Questions?