Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science...

16
100x Operational Analytics The RAPIDS SQL Engine

Transcript of Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science...

Page 1: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

100xOperational AnalyticsThe RAPIDS SQL Engine

Page 2: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

SQL in Python on GPUs

gdf = bc.sql('select count(*) from table').get()

@blazingsql

Page 3: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

conda install

@blazingsql

launch a notebook

run queries

Page 4: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

Faster

Cheaper

Easier@blazingsql

Page 5: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

End-to-End Accelerated GPU Data ScienceIntroducing the Open-Source RAPIDS Library Suite

cuDF cuIODataFrame

GPU Memory

Data Preparation VisualizationModel Training

cuMLMachine Learning

cuGraphGraph Analytics

PyTorch Chainer MxNet

Deep LearningcuXfilter <> pyViz

Visualization

Dask

@blazingsql

Page 6: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

End-to-End Accelerated GPU Data ScienceIntroducing the Open-Source RAPIDS Library Suite

cuDF cuIODataFrame

GPU Memory

Data Preparation VisualizationModel Training

cuMLMachine Learning

cuGraphGraph Analytics

PyTorch Chainer MxNet

Deep LearningcuXfilter <> pyViz

Visualization

Dask

BlazingSQLSQL Engine

@blazingsql

Page 7: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

Storage Plugins

Supported:File Readers (cuIO):

@blazingsql

Data Lake

• AWS S3• Google Cloud Storage• HDFS

• CSV• JSON• Apache Parquet• Apache ORC

• Azure BlobComing Soon:

GPU Memory

Page 8: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

CSV GDF

Pandas Parquet JSON

ETLFeature

Engineering

XGBoost>cuDFBlazingSQL >>

YOURDATA

MACHINELEARNING

from blazingsql import BlazingContext

import cudf

bc = BlazingContext()

bc.s3('bsql', bucket_name='bsql', access_key_id='<access_key>', secret_key='<secret_key')

bc.create_table('orders', s3://bsql/orders/')

gdf = bc.sql('select * from orders').get()

@blazingsql

Page 9: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

XGBoost>cuDFBlazingSQL >>

T4 GPU

0.00

4 NODES

25.00

50.00

75.00

100.00

84.40

Netflow Demo Timings

Graphistry>cuDFBlazingSQL >>

TIME(Seconds)

15.6GB(1 x T4)

15.6GB(4 Nodes)

0

1000

2000

3000

XGBoost Demo TimingsTIME

(Seconds) $0.90

$0.04

0.87

84.40

Cost to run the ETL workloads on Google Cloud Platform @blazingsql

Page 10: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

@blazingsqlGCP: 5 x n1-standard-4 (Tesla T4 GPU) w/ Local NVME

• TPC-H SF100 Query Times - NVME Storage

Page 11: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

@blazingsqlGCP: 5 x n1-standard-4 (Tesla T4 GPU)

• TPC-H SF100 Query Times - GCS Storage

Page 12: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

@blazingsqlGCP: 15 x n1-standard-4 (Tesla T4 GPU)

• TPC-H SF300 Query Times - GCS Storage

Page 13: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

@blazingsql

• TPC-H SF100 vs SF300 - GCS Storage

Page 14: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

@blazingsql

Demos

Page 15: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

Scale

Up /

Acce

lerate

Scale out with RAPIDS

Scale out / Parallelize

Accelerated on single GPU

NumPy -> CuPy/PyTorch/..Pandas -> cuDFScikit-Learn -> cuMLNumba -> Numba

RAPIDS and Others

Multi-GPUOn single Node (DGX)Or across a cluster

RAPIDSBlazingSQL + Dask + OpenUCX

Multi-core and Distributed PyData

NumPy -> Dask ArrayPandas -> Dask DataFrameScikit-Learn -> Dask-ML… -> Dask Futures

DaskNumPy, Pandas, Scikit-Learn, Numba and many more

Single CPU coreIn-memory data

PyData

BlazingSQL + Dask + OpenUCX

@blazingsql

Page 16: Operational Analytics - on-demand.gputechconf.com · End-to-End Accelerated GPU Data Science Introducing the Open-Source RAPIDS Library Suite cuDF cuIO DataFrame GPU Memory Data Preparation

GET STARTED NOWIt’s easy to get started with BlazingSQL + RAPIDS.ai

CONDAGET STARTED

DOCKER HUBTRY NOW

GITHUBINSTALL

BlazingSQL can be installed with conda (miniconda, or

the full Anaconda distribution) from the blazingsql channel.

To run BlazingSQL on your own infrastructure, you can use our

container on Docker Hub.

BlazingSQL, the GPU-accelerated SQL engine of

the RAPIDS ecosystem,is now 100% open-source

licensed under Apache 2.0!

https://github.com/BlazingDB/https://hub.docker.com/u/blazingdbhttps://anaconda.org/blazingsql

@blazingsql