Advantages of a Bare-Metal Cloud For GPU...

45
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Advantages of a Bare-Metal Cloud For GPU Workloads Karan Batta Product Management Oracle Cloud Infrastructure

Transcript of Advantages of a Bare-Metal Cloud For GPU...

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Advantages of a Bare-Metal Cloud For GPU Workloads

Karan BattaProduct ManagementOracle Cloud Infrastructure

A long time ago in a galaxy far, far away...

Our new digs…

Our Journey…

Bare-Metal Cloud

Announced at Oracle Open World 2016

Oracle Cloud Infrastructure Rebranding

October 2017

Our Journey…

Metal

Announced at Oracle Open World 2016

Oracle Cloud Infrastructure Rebranding &

Launch October 2017

Available GPU Instances with P100 at Open World 2017

Our Journey…

Oracle Cloud Infrastructure Rebranding

October 2017

Generally Available GPU Instances with P100 at Open World 2017

Oracle Cloud Infrastructure’s

first Super Computing in

Our Journey…

Oracle Cloud Infrastructure’s

first Super Computing in

2017•

And Today…

• Volta Generally Available Today!

• NGC for OCI• NVIDIA GRID on OCI

Our Architecture

Our Architecture

US-Phoenix

AD-1

AD-3

AD-2

US-Ashburn

EU-London

EU-Frankfurt

AD-1

AD-3

AD-2

AD-1

AD-3

AD-2 AD-1

AD-3

AD-2

REGION

Our Architecture

DATACENTERS

AD-1 AD-3AD-2

• Multiple fault-domains• completely independent datacenters• Predictable low latency & high speed, encrypted interconnect between ADs • Enables zero-data-loss architectures (e.g. Oracle MAA) and high availability

scale-out architectures (e.g. Cassandra)

Our Architecture

• Non-oversubscribed network – flat, fast, predictable• Very high scale – ~1 million network ports in an AD• Predictable low latency & high speed interconnect between hosts in an AD• < 100µs expected one-way latency, 2 x 25Gb/s bandwidth

PHYSICAL NETWORK

REGION DATACENTERS

AD-1 AD-3AD-2

Our Architecture

• Highly configurable private overlay networks – moves management and IO out of the hypervisor and enables lower overhead and bare metal instances

VIRTUAL NETWORK

REGION DATACENTERS

AD-1 AD-3AD-2

PHYSICAL NETWORK

Our Architecture

VIRTUAL NETWORK

COMPUTE, STORAGE, DATABASE…Bare-Metal NVMe Storage VMs Exadata Load Balancer

REGION DATACENTERS

AD-1 AD-3AD-2

PHYSICAL NETWORK

OCI HPC Capabilities

Bare Metal Standard 52 Cores, 768 GB RAM,

up to 512 TB Block Storage2x 25Gbe Network Interfaces

Bare Metal DenseIO52 Cores, 768 GB RAM,

51.2 TB of local NVMe SSD2x 25Gbe Network Interfaces

Bare Metal Pascal GPU28 Cores, 192 GB RAM,

2x Tesla P100 GPUs up to 512 TB Block Storage

2x 25Gbe Network InterfacesPre-Configured Images

Block Storage50 GB-2 TB volumes

Up to 25K IOPS per volume400K IOPS per host

File Storage ServiceManaged distributed file service

NFSv3 mount pointPay for what you use

Bare Metal Volta GPU52 Cores, 768 GB RAM,

8x Tesla V100 GPUsNVLINK Interconnect

up to 512 TB Block Storage2x 25Gbe Network Interfaces

Pre-Configured Images

Available Today

GPU Visualization12 cores

256 GB RAMVM or BM GPU InstancesNVIDIA Quadro EnabledTeradici & Citrix Support

PreviewToday

Tesla Volta on OCI

• Generally Available today in US-Ashburn Region

• Bare-Metal Instance with 8x Tesla V100 (SXM2) GPUs all interconnected with NVLINK

• Virtual Machines with 1, 2 or 4 GPUs in an instance coming over the next few weeks

• Uses HGX-1 Open Compute Platform Design as a reference architecture

Instance OfferingsShape Cores Memory GPUs Network Storage Cost

BM.GPU3.8 52 768 GB 8x V100 2x 25Gbps Up-to 512TB of Block $2.25/GPU/hr

VM.GPU3.4 24 360 GB 4x V100 1x 25Gbps Up-to 512TB of Block $2.25/GPU/hr

VM.GPU3.2 12 180 GB 2x V100 800 Mbps Up-to 512TB of Block $2.25/GPU/hr

VM.GPU3.1 6 90 GB 1x V100 400 Mbps Up-to 512TB of Block $2.25/GPU/hr

Instance OfferingsShape Cores Memory GPUs Network Storage Cost

BM.GPU3.8 52 768 GB 8x V100 2x 25Gbps Up-to 512TB of Block $2.25/GPU/hr

VM.GPU3.4 24 360 GB 4x V100 1x 25Gbps Up-to 512TB of Block $2.25/GPU/hr

VM.GPU3.2 12 180 GB 2x V100 800 Mbps Up-to 512TB of Block $2.25/GPU/hr

VM.GPU3.1 6 90 GB 1x V100 400 Mbps Up-to 512TB of Block $2.25/GPU/hr

BM.GPU2.2 12 192 GB 2x P100 2x 25Gbps Up-to 512TB of Block $1.25/GPU/hr

VM.GPU2.1 28 104 GB 1x P100 25Gbps Up-to 512TB of Block $1.25/GPU/hr

NGC Deployment

• Limited Availability

• Pascal & Volta Instance Shapes supported

• Deploy Deep Learning Frameworks, HPC Applications and Visualization Applications seamlessly

• Pre-Configured NGC Image available to deploy on OCI

• Flexibility to run dev/test on Virtual Machine GPU Instances and run production workloads on Bare-Metal Instances

https://cloud.oracle.com/iaas/gpu

NVIDIA GRID on OCI

• Limited Availability

• vDWS (Virtual Datacenter Workstation) on Pascal or Volta based GPU Instances

• Use Windows Server 2012/2016 or any other Linux Distribution

• Citrix HDX 3D Pro supported

• Teradici Cloud Access Software Supported

Workload time reduced by hours!

0.860.9 0.88

0.930.89

0.82

0

0.2

0.4

0.6

0.8

1

1.2

LS-DYNA ANSYS Fluent MILC WRF HPL Stream

Bare-Metal Matters for HPC

OCI Bare-Metal Performance vs Other Public Cloud Provider VMs

BM VM

Oracle Open World

Mike TurnerPresenter [ Product Lead, EPIC ] 2nd October 2017

HPC on Oracle Cloud

Big Data AnalyticsOracle Cloud Infrastructure

<1%of unstructured data

is analyzed or used at all

<50%of structured data is actively used in making decisions

80% of analysts’ time

is spent discovering and preparing data

>70%of employees have access to data they

should NOT

Big Data Challenges

Data Volume Growth

Cost of compute

Data volume/

Cheap Storage

Time

Machine/DeepLearning

NOMachineLearning

1950s 1960s 1970s 1980s 1990s 2000s 2010s

On Premises Big Data Analytics Challenges

Scale infrastructure as demand grows

Pay Only for what you use

Get access to the latest hardware on-

demand

• (High Memory, GPUs.,)

Avoid Large CapEx Spend

Our Strategy for Big Data Analytics on the Cloud

High Performance/Low Cost Infrastructure Offering

Big Data/Analytics/Data Management ISV Ecosystem

Big Data/Analytics Native Cloud Services offering

OCI Storage Options for Big Data Applications

• Guaranteed Performance with SLA

• Enterprise grade features – clones etc.,

HDFS overBlock Storage

• Low cost long term storage

• Scale compute independent of storage

• Ease of Data Sharing

HDFS overObject Store

• Lowest cost offering

• Colocation of compute & storage

• Highest Performance with Guaranteed SLA

HDFS over Local NVME Storage

Data Lake in the Cloud

• With the cloud, we can separate compute and storage

− Easy to share data between different clusters / applications

− On-demand cluster deployment for different workloads and tenants

33

Object Store

Data Lake

C C CC

Data Exploration Spark Cluster

C C CC

Hive Workload

Streaming Workload

C C CC

The Industry’s First End-to-End SLA

PERFORMANCE Covered No coverage No coverage

MANAGEABILITY Covered No coverage No coverage

AVAILABILITY Covered Covered Covered

OCI Differentiators for Big Data Applications on IaaS

Lower TCOBare Metal

Servers

Fault Tolerant

Architectures

Superior Storage Offering

No Vendor Lock in options

Pay only for what you use

Building Big Data Application

Data Integration/Streaming

Data Lake

Data Analytics/Filtering

Processed Data Storage/Sharing

Data Visualization/Dashboards

AI/Machine Learning

Data Management/Big Data ISV Solution Partner Ecosystem Oracle Cloud Infrastructure

Data Management / Big Data DevOps

Optimized & Certified Terraform Template solutions for every partner

Come Partner with us!

Qubole on Oracle Cloud Infrastructure

Confidential – Oracle Internal

Simple

• A complete data platform solution• No need to manage infrastructure• Self-service data access across the

enterprise

Agile and Fast

• Spark and Hadoop clusters in minutes• Builds on Oracle Cloud Infrastructure

performance advantages • Get business insights faster

Cost

• Stand up your Spark or Hadoop infrastructure at a fraction of the cost

• Reduce operation and management cost

Qubole is a Turnkey

Big Data Service on

Oracle Cloud Infrastructure

S3 | ADLS | HDFS | KUDU

Cloudera Enterprise on Oracle Cloud

40CONFIDENTIAL— RESTRICTED

The modern platform for machine learning and analytics optimized for

the cloud

EXTENSIBLE SERVICES

CORE SERVICESDATA

ENGINEERINGOPERATIONAL

DATABASEANALYTIC DATABASE

DATA CATALOG

INGEST & REPLICATION

SECURITY GOVERNANCEWORKLOAD

MANAGEMENT

DATA SCIENCE

SHARED DATAEXPERIENCE

SHARED STORAGE

Cloudera Data Science Work Bench on Oracle Cloud

Customer:Oracle Data Cloud on OCI• Cloud-based data management platform

and 3rd party marketplace for top marketers

• Platform used by top retailers, banks, and tech companies

• Moved significant infrastructure to Oracle Cloud Infrastructure (OCI)

• 30 billion API calls per day

• 9 billion global profiles

• 7.5 trillion data points collected monthly

• 50,000 categories

• Pipeline aggregates & summarizes logs: 300TB read, 150TB write per day

Sample MapReduce Demo

Recap

Limited Availability of NVIDIA GPU CLOUD for easy deployment of ML & HPC Applications

NVIDIA Tesla Volta GPUs Available Today in US Regions in Bare-Metal Shapes.

Recap

Limited Availability of NVIDIA GPU CLOUD for easy deployment of ML & HPC Applications

GPUs Available Today in

Limited Availability of NVIDIA GRID OCI GPU Instances with CITRIX & Teradici

Recap

Limited Availability of NVIDIA GPU CLOUD for easy deployment of

Limited Availability of NVIDIA GRID vDWS on OCI GPU Instances with CITRIX & Teradici

QUESTIONS?

Limited Availability of on

OCI GPU Instances with

[email protected]@Karan_Batta

[email protected]@_cloudguy

https://oracle.cloud.com/iaas/gpuOver 100+ hours of GPU for free!