Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs

#ibmedge© 2016 IBM Corporation

Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUsAndrei Yurkevich, AltorosIndrajit Poddar, IBMSep 23, 2016

#ibmedge

Please Note:• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

and at IBM’s sole discretion.

• Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

• The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.

• The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

• Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

1

#ibmedge

About Indrajit (a.k.a. I.P)Expertise:

• Accelerated Cloud Data Services, Machine Learning and Deep Learning

• Apache Spark, TensorFlow… with GPUs

• Distributed Computing (scale out and up)• Cloud Foundry, Spectrum Conductor, Mesos,

Kubernetes, Docker, OpenStack, WebSphere

• Cloud computing on High Performance Systems• OpenPOWER, IBM POWER

2

Indrajit Poddar Senior Technical Staff Member,Master Inventor, IBM [email protected]: @ipoddar

#ibmedge

3

#ibmedge 4

Sunnyvale, CA(HQ)

#ibmedge5

“ ”

#ibmedge

We will talk about

6

- Current state of Deep Learning

#ibmedge

We will talk about

7


- Deep Learning for cancer diagnosis (Digital Pathology)

#ibmedge

We will talk about

8



- TensorFlow framework for Deep Learning

#ibmedge

We will talk about

9




- Distributing TensorFlow with Docker

#ibmedge

We will talk about

10





- Faster training with TensorFlow on OpenPOWER and GPUs

#ibmedge

We will talk about

11





- Faster training with TensorFlow on OpenPOWER and GPUs

- Infrastructure for TensorFlow as a Service

#ibmedge

A picture is worth a thousand words…

12

http://www.wordclouds.com/

#ibmedge

What is Deep Learning?Machine Learning in layers and hierarchies

13

#ibmedge

Face classification example

#ibmedge15

Medical Data Analysis Example: Image classificationComparing classification by humans and by machines

Detected by a Doctor visually

Caught by a Trained model

#ibmedge

Time Scale: Before, Digital Pathology, Deep Learning

16

1980 1990 1997 2005

Video cameras

Progress in functional

telemedicineRobotic

microscopy

First fully functional WSI

Scanner

ANN intro Yann LeCun et al., backpropagation

algorithm

“Deep Learning” for Speech Recognition

#ibmedge

Machines are now learning the way we learn

17

From "Texture of the Nervous System of Man and the Vertebrates" by Santiago Ramón y Cajal.

Artificial Neural Networks

https://en.wikipedia.org/wiki/Nervous_System

https://en.wikipedia.org/wiki/Vertebrates

https://en.wikipedia.org/wiki/Santiago_Ram%C3%B3n_y_Cajal

https://en.wikipedia.org/wiki/Santiago_Ram%C3%B3n_y_Cajal

#ibmedge

Deep Learning is improving in accuracy

18

#ibmedge

Time Scale: Advances in Deep Learning

19

2005 2010 2015

Whole Slide Image (WSI)

Scanner

2016

GPUs 12 core/socket 8 thread/core

#ibmedge

Open Source Deep Learning Libraries

20

IBM Machine Learning and Deep Learning distribution for Ubuntu on OpenPOWER:http://openpowerfoundation.org/blogs/openpower-deep-learning-distribution/(does not include TensorFlow and DL4J in the current release)

http://openpowerfoundation.org/blogs/openpower-deep-learning-distribution/

http://openpowerfoundation.org/blogs/openpower-deep-learning-distribution/

#ibmedge

Why TensorFlow?• Authored by Google• OpenSource• TensorFlow has a Python API• Use Jupyter notebooks and examples to learn• Distributed training

21

https://www.tensorflow.org/



#ibmedge

Why distribute in clusters and why use GPUs?

• Input data sets are becoming larger

• High resolution images

• Video feeds

• Large number of training features

• Training times are very long (hours, days and weeks)

• Moore’s law is dying

• CPUs are not getting any faster

• Even the largest machine has limited capacity

22

#ibmedge

Distributed Deep Learning using TensorFlow• TensorFlow (version > 0.8.0) can distribute compute intensive tasks on

multiple nodes• Parameter Server for storing parameters (weight matrix)• Performing computations in Clients (Workers)• Once computed, gradients are sent to Parameter Server to update stored parameters

23

SuperVessel Private Network

◼ Worker Task

◼ Parameter Server Task

node #1 node #2 node #10

•••◼ Worker Task ◼ Worker Task

# define Parameter Server jobs:with tf.device('/job:ps/task:%d' % taskID): ...

# define Worker jobswith tf.device('/job:worker/task:%d' %

taskID): ...

TensorFlow cluster

The Problem: automated detection of metastases in whole-slide images of lymph node sections, Source: Camelyon16The Solution: Train Deep Learning Model, and classify whole slide histology image at “Level 0”

Medical Data Analysis Example

http://camelyon16.grand-challenge.org/data/

#ibmedge

Questions to address

26

- How long does it take to train a model?

- How performance will scale vs the cluster size?

- How scaling the cluster out will affect accuracy?

#ibmedge27

Deep Learning in a TensorFlow clusterGoal: improve the training time for Camelyon16 without losing accuracy significantly.

100K images, ~2GB 4 training epochs

(~5.5k iterations at batch size 72)VGG model


#ibmedge28

Medical Data Analysis Example: applying Deep LearningGoal: improve the training time for Camelyon16 without losing accuracy significantly.

100K images, ~2GB 4 training epochs



#ibmedge29

Medical Data Analysis Example: applying Deep LearningAccuracy metrics: ROC

100K images, ~2Gb 4 training epochs


Zoom

© 2016 IBM Corporation #ibmedge

Infrastructure Components for Deep Learning as a Service

30

#ibmedge

Deep Learning Cluster as a Service

31

#ibmedge

Example Dockerfile to create Deep Learning images

32

FROM ppc64le/ubuntu:14.04MAINTAINER Mike Hollinger <[email protected]> #bring in some base utilsRUN apt-get -y update && apt-get -y install software-properties-common wget build-essential bash-completion #enable apt-add-repository and wget for the next line and for the cuda installer to work correctlyRUN apt-get -y install dictionaries-common #inexplicably, this needs to be first before vnc-related things will install successfully#install VNC and VNC-related itemsRUN apt-get -y install x11vnc xfce4 xvfb xfce4-artwork xubuntu-icon-theme#install advanced toolchain and Linux SDKRUN wget ftp://public.dhe.ibm.com/software/server/iplsdk/v1.9.0/packages/deb/repo/dists/trusty/B346CA20.gpg.key -O /tmp/B346CA20.gpg.keyRUN wget ftp://ftp.unicamp.br/pub/linuxpatch/toolchain/at/ubuntu/dists/precise/6976a827.gpg.key -O /tmp/6976a827.gpg.keyRUN wget http://public.dhe.ibm.com/software/server/POWER/Linux/xl-compiler/eval/ppc64le/ubuntu/public.gpg -O /tmp/xl_public.gpgRUN apt-key add /tmp/B346CA20.gpg.keyRUN apt-key add /tmp/6976a827.gpg.keyRUN apt-key add /tmp/xl_public.gpgRUN add-apt-repository "deb ftp://ftp.unicamp.br/pub/linuxpatch/toolchain/at/ubuntu trusty at9.0"RUN apt-get -y updateRUN apt-get -y install advance-toolchain-at9.0-runtime \ advance-toolchain-at9.0-devel \ advance-toolchain-at9.0-perf \ advance-toolchain-at9.0-mcore-libs

#install XL C/C++ Community Edition, auto-accepting the license (from Ke Wen Lin)RUN apt-get -y install xlc.13.1.4 xlc-license-community.13.1.4RUN mkdir -p /opt/ibm/xlC/13.1.4/lap/license/ && chmod a+rx /opt/ibm/xlC/13.1.4/lap/licenseRUN echo "Status=9" >/opt/ibm/xlC/13.1.4/lap/license/status.datRUN /opt/ibm/xlC/13.1.4/bin/xlc_configureRUN apt-get -y install ibm-sdk-lop#bring in the ibm mldl PPARUN apt-add-repository -y ppa:ibmpackages/mldl#bring local cuda repo with GPU driver 352.39 and CUDA 7.5RUN wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1404-7-5-local_7.5-18_ppc64el.deb && \

dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_ppc64el.deb && apt-get update && \

apt-get install -y --no-install-recommends --force-yes cuda gpu-deployment-kit && \

ln -s /usr/lib/nvidia-352/libnvidia-ml.so /usr/lib/libnvidia-ml.so && \rm cuda-repo-ubuntu1404-7-5-local_7.5-18_ppc64el.deb

#bring in and install cudnnCOPY rootfs/cudnn-7.0-linux-ppc64le-v3.0-prod.tgz /tmp/cudnn-7.0-linux-ppc64le-v3.0-prod.tgzRUN tar --no-same-owner -xvf /tmp/cudnn-7.0-linux-ppc64le-v3.0-prod.tgz -C /usr/local #copy then untar to handle ownership problems vs "add" #install the MLDL frameworksRUN apt-get update && apt-get -y install torch caffe theano

continued ..

install deep learning software

install GPU drivers and libraries

#ibmedge

Cluster components to manage compute resources

33

Docker containers and images

OR

#ibmedge

An OpenStack- and Docker-based research cloudSuperVessel

34

https://ny1.ptopenlab.com/bigdata_cluster/



#ibmedge

Mesos with Marathon with Docker and GPUs

#ibmedge

OpenPOWER: GPU support

36

GPU

Credit: Kevin Klaues, Mesosphere

IBM Spectrum Conductor includes

enhanced support for fine grained GPU and CPU scheduling with Apache Spark and

Docker

Mesos supports GPUs

#ibmedge

POWER8 Core: Back bone of big data computing system

• Enhanced Micro-Architecture• Increased Execution Bandwidth• SMT 8• Transactional Memory

• Vector/Scalar Unit• High-performance Integer & FP Vector Processor• Optimized for Data Rich Applications

VSUFXU

IFU

DFU

ISU

PC

PC

LSU

#ibmedge

Combined I/O Bandwidth = 7.6Tb/s

POWER8

Processor

MemoryBuffers

MemoryBuffers

PCI

DMI

PCI

POWER8Processor

POWER8Processor

DMI

DMI

DMI

DMI

DMI

DMI

DMI

NODE-to-NODE

ON-NODE SMP

Putting it all together with the memory links, on- and off-node SMP links as well as PCIe, at 7.6Tb/s of chip I/O bandwidth

#ibmedge

New OpenPOWER Systems with NVLink

39

S822LC-hpc “Minsky”:2 POWER8 CPUs with 4 NVIDIA® Tesla® P100 GPUs GPUs hooked directly to CPUs using Nvidia’s NVLink high-speed interconnecthttp://www-03.ibm.com/systems/power/hardware/s822lc-hpc/index.html

http://www-03.ibm.com/systems/power/hardware/s822lc-hpc/index.html

http://www-03.ibm.com/systems/power/hardware/s822lc-hpc/index.html

#ibmedge

OpenPOWER: Open Hardware for High Performance

40

#ibmedge

Machine Learning and Deep Learning analytics on OpenPOWERNo code changes needed!!

41

ATLAS Automatically Tuned Linear Algebra Software)

#ibmedge

Challenges and what’s next

42

● Infrastructure issues:

○ Advanced resource scheduling with Platform Conductor and Kubernetes or Mesos

○ More GPUs per system (up to 4-16 cards) for improved power consumption and better density

● TensorFlow issues:

○ Resolve problems with TF-Slim and model convergence

○ Integrate HDFS or another Distributed FS with TensorFlow

○ Try Synchronous training and compare results with Asynchronous

● Improve model training for better accuracy:

○ Train on a 300K samples dataset

○ Increase the number of training iterations to 30 epochs

○ 2 iteration for update False-Positive samples in dataset

○ Use another model (change from VGG16 to Inception-v3)

#ibmedge

More related sessions at Edge•Expo Center Demo•Tue, Sept 20, 1:00-2:00PM, RM 312: Docker on IBM Power Systems: Build, Ship and Run

•Tue, Sept 20, 1:00-2:00PM, RM 313: Docker Containers for High Performance Computing

•Tue, Sept 20, 1:00-2:00PM, RM 317C: Lab: FPGA Virtualization and Operations Environment for Accelerator Application Development on Cloud

•Tue, Sept 20, 2:15-3:15PM, RM 320: Bringing the Deep Learning Revolution into the Enterprise

•Tue, Sept 20, 5:00-6:00PM, RM 308, Thu, Sep 22, 09:45 AM - 10:45 AM : Enabling Cognitive Workloads on the Cloud: GPU Enablement with Mesos, Docker and Marathon on POWER

•Wed, Sept 21, 9:45-10:45AM, RM 317 C: Lab: Fast, Scalable, Easy Machine Learning in the Cloud with OpenPOWER, GPUs and Docker

#ibmedge

Notices and DisclaimersCopyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.

Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.

IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.”

Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.

Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.

References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.

Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.

It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law

#ibmedge

Notices and Disclaimers Con’t.

45

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.

IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

http://www.ibm.com/legal/copytrade.shtml

Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs

Technology

Transcript of Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs