Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Towards Better DL Frameworks

Yangqing JiaResearch Lead on AI Platforms, Facebook

Source: XKCD, [Girshick et al. CVPR 2014]

• Researchers: "I will need to reproduce the ResNet paper."

• Companies: "I need to apply DL to drive cars."

The NeedsTwo sides of the same coin

• A grad student driven project• Started by doing one job really well: image

classification• Adopted by industry participants• Popular deep learning framework run by a non-

profit.

Yet very minimal (10k LOC)

Democratizing Deep Learning w/ CaffeGetting AlexNet running in 10 mins

http://caffe.berkeleyvision.org/

What makes a better DL library?

???

"MAPS"! !!

"MAPS"-

Scalability

ScalabilityRun fast, run far

“How do I train on multiple GPUs and machines?”

- Probably the most question we got from Caffe users


L1 L2 L3 L3b L2b L1b U3 U2 U1


L1 L2 L3 L3b L2b L1b U3 U2 U1R3 R2 R1


L1 L2 L3 L3b L2b L1b

U3 U2 U1R3 R2 R1

L1 L2 L3 L3b L2b L1b

U3 U2 U1R3 R2 R1

The Return of MPI"I'm your father", said Allreduce.

AllreduceTree based - O(MlogN)

Ring based - O(M)etc.

ScalabilitySitting on top of giants

... and many more

"MAPS"-

Portability

Portable SystemCloud, Mobile, IoT, Cars, Drones, Coffee makers

AI Math and Algorithms

Deployment Platforms

Portable SystemCloud, Mobile, IoT, Cars, Drones, Coffee makers

Model

auto predictor = caffe2::Predictor(model_file)

public class Predictor implements Caffe2ModelInterface;

Still, a lot of thoughts needed

• Limited computation• Battery life is a thing• Our models may be luxurious• Ecosystem less developed

Portable System Challenges

"MAPS"-

Augmented Comp Patterns

Augmented Comp PatternsForget about float dense math, the world is bigger

• Quantized Computation• Sparse Math Libraries• Model Compression• Rethinking Existing Operations

Quantized ComputationForget about float, the world is bigger

8 23

5 10

16

8

floatfp16

fixed16fixed8

Quantized ComputationForget about float, the world is bigger

float add

fp16 add

fixed16 add

fixed8 add

0.9

0.4

0.05

0.03

float mul

fp16 mul

fixed8 mul

4.0

1.0

0.2

Why?

Source: Nvidia https://devblogs.nvidia.com/parallelforall/mixed-precision-programming-cuda-8/

Rethinking Existing OperationsResNEXT is coming to town

gconv gconv

g g g g g g g g g ...

g

g g g g g g g g g ...

g

AlexNet Group Conv

ResNext

Augmented Math ChallengesForget about float, the world is bigger

• Solutions• Eigen fp16• CuDNN• NNPack• gemmlowp

• Challenges• Seamless

conversion?• Model training?• Performance tuning?• ...

"MAPS"-

Modularity

A Repeated Pattern

Many key components in deep learning are

reusable across frameworks.

In 2013 it used to be...

Caffe Torch Theano ...

Unix Philosophy?

Applications

Caffe, Torch, TF, MXNet, etc...

Core MathEigen

CuDNN NNPackTHNNMKL

CommsNCCL

MPIZeroMQ

Redis...

Low LevelCUDA

OpenGLOpenCLVulkan

...

Compilers

DataBasesLevelDB RocksDBHadoop

Amazon S3your old disk

or, "UnFramework"

ModularDesigns

MAPS for a good frameworkAugmented

MathematicsPortableSystem

Scalability

Interface toExistingToolkits

EfficientMobile

Runtimes

Tuned CollectivePrimitives

Optimized Math

Libraries

+Flexible Framework Design

No Silver Bullet?

There is no silver bullet

Industry:StabilityScale & speedData IntegrationRelatively Fixed

Research:Flexible

Fast IterationDebuggable

Relatively bare-bone

Caffe Torch

TheanoTensorFlowD4J etc.

There is no silver bullet

Industry:StabilityScale & speedData IntegrationRelatively Fixed

Research:Flexible

Fast IterationDebuggable

Relatively bare-bone

Caffe Torch

“In open source, we feel strongly thatto really do something well,

you have to get a lot of people involved.”

— Linus Torvalds

Thank you!

Towards Better Deep Learning FrameworksYangqing Jia, Research Lead on AI Platforms, Facebook

Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Technology

Transcript of Yangqing Jia at AI Frontiers: Towards Better DL Frameworks