ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL...
Transcript of ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL...
![Page 1: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/1.jpg)
© 2017 GridGain Systems, Inc.
In-Memory Hammer for Your Data Science Toolkit Apache Ignite
Akmal Chaudhri Technology Evangelist GridGain
![Page 2: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/2.jpg)
© 2017 GridGain Systems, Inc.
• Apache Ignite Overview • Use Cases
• Data Science Toolkit Box • Data Grid • Durable Memory • Distributed SQL • Compute Grid • Machine Learning Grid (Beta)
• Q&A
Agenda
![Page 3: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/3.jpg)
© 2017 GridGain Systems, Inc.
Apache Ignite In-Memory Computing Platform
Memory-Centric Storage
Ignite Native Persistence (Flash, SSD, Intel 3D XPoint)
Third-Party Persistence (RDBMS, HDFS, NoSQL)
SQL Transactions Compute Services MLStreaming
Applications
Key/Value
IoTFinancial Services
Pharma & Healthcare
E-CommerceTravel & Logistics
Telco
![Page 4: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/4.jpg)
© 2017 GridGain Systems, Inc.
Apache Ignite Use Cases
FinTech
Financial Services Software Logistics & Travel
E-commerce
Telco
IoT
Pharma & HealthcareAdtech
![Page 5: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/5.jpg)
© 2017 GridGain Systems, Inc.
e-Therapeutics provides a computer-based drug discovery platform and a specialized approach to network biology.
Problem • Analysis of a network of proteins influencing a disease and
drugs discovery could be measured in weeks • Could not parallelize existing algorithms
Apache Ignite Solution • 80x speed increase over the non-parallelized environment • Analysis projects completion in hours and minutes • Computational resources for abandoned research projects
- Drug Discovery and Network Biology
Cache & Compute
API
e-Therapeutics Platform
100x Cluster Nodes 5x Physical Nodes
Server Nodes
Clients Nodes
![Page 6: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/6.jpg)
© 2017 GridGain Systems, Inc.
Data Grid
JCache Transactions Compute SQL
Server Node
Distributed Key-Value Store
Dynamic Scaling
Distributed partitioned hash map
ACID TransactionJCache & SQL
Server Node Server Node
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORYRDBMS
NoSQLHDFS
3rd party storage caching
![Page 7: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/7.jpg)
© 2017 GridGain Systems, Inc.
Durable Memory
Off-heap Removes noticeable GC pauses
Automatic Defragmentation
Stores Superset of Data
Predictable memory consumption
Fully Transactional (Write-Ahead Log)
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Server Node Server Node Server Node
Ignite Cluster
![Page 8: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/8.jpg)
© 2017 GridGain Systems, Inc.
Ignite Native Persistence
1. Update
RAM
2. Persist
Write-Ahead Log
Partition File 13. Ack
4. Checkpointing
Partition File N
Server Node
![Page 9: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/9.jpg)
© 2017 GridGain Systems, Inc.
Distributed SQL
JDBC ODBC SQL API
Java .NET C++ BI
SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER
DDL, DML Support
Cross-platform Compatibility
Indexes in RAM or Disk
Dynamic Scaling
Server Node Server NodeServer Node
Apache Ignite Cluster
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Tools
![Page 10: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/10.jpg)
© 2017 GridGain Systems, Inc.
1. Initial Query 2. Query execution over local data 3. Reduce multiple results in one
1. Initial Query 2. Query execution (local + remote data) 3. Potential data movement 4. Reduce multiple results in one
2
2
1
Collocated Joins Non-Collocated Joins
Server Node
ON-DISK
Server Node
ON-DISK
Client Node
3
2
2
1
Server Node
ON-DISK
Server Node
ON-DISK
Client Node
4
3
![Page 11: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/11.jpg)
© 2017 GridGain Systems, Inc.
Compute Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
C1
R1
C2
R2
C = C1 + C2
R = R1 + R2
C = Compute
R = Resultin T/2 time
Automatic Failover
Load Balancing
Zero Deployment
![Page 12: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/12.jpg)
© 2017 GridGain Systems, Inc.
1. Initial Request 2. Fetch data from remote nodes 3. Process entire data-set
1. Initial Request 2. Co-located processing with data 3. Reduce multiple results in one
3
1Data 12
2 Data 2
2
2
1Client Node
Client-Server Processing Co-located Processing
Server Node
ON-DISK
Server Node
ON-DISK
Server Node
ON-DISK
Server Node
ON-DISK
Client Node
3
![Page 13: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/13.jpg)
© 2017 GridGain Systems, Inc.
Genetic Algorithms Grid
DURABLE MEMORY
DURABLE MEMORY
Ignite Cluster
F2, C2, M2
F = F1 + F2
C = C1 + C2
Collocated Computation
Biological Evolution Simulation
Chromosome and Genes Cluster
M = M1 + M2
F1, C1, M1
F = Fitness Calculation C = Crossover M = Mutation
![Page 14: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/14.jpg)
© 2017 GridGain Systems, Inc.
Machine Learning Grid
K-Means Regressions Decision Trees
R C++ Python Java
Server Node Server NodeServer Node
Distributed Core Algebra
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Scala REST
Random ForestDistributed Algorithms
Dense and Sparse Algebra
Large scale parallelization
Multi-Language Support
No ETL
![Page 15: ignite data science toolkit - GridGain Systems · 2018. 9. 30. · Distributed SQL JDBC ODBC SQL API Java .NET C++ BI SELECT, UPDATE, INSERT, MERGE, DELETE, CREATE and ALTER DDL,](https://reader033.fdocuments.us/reader033/viewer/2022052004/6017d674144e4a348d4cb68a/html5/thumbnails/15.jpg)
© 2017 GridGain Systems, Inc.
Thank you for joining us. Follow the conversation.http://ignite.apache.org
Any Questions?
#apacheignite #denismagda