Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

38
Towards Better DL Frameworks Yangqing Jia Research Lead on AI Platforms, Facebook

Transcript of Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Page 1: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Towards Better DL Frameworks

Yangqing JiaResearch Lead on AI Platforms, Facebook

Page 2: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Source: XKCD, [Girshick et al. CVPR 2014]

Page 3: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

• Researchers: "I will need to reproduce the ResNet paper."

• Companies: "I need to apply DL to drive cars."

The NeedsTwo sides of the same coin

Page 4: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

• A grad student driven project• Started by doing one job really well: image

classification• Adopted by industry participants• Popular deep learning framework run by a non-

profit.

Yet very minimal (10k LOC)

Democratizing Deep Learning w/ CaffeGetting AlexNet running in 10 mins

Page 5: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

http://caffe.berkeleyvision.org/

Page 6: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

What makes a better DL library?

???

Page 7: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

"MAPS"! !!

Page 8: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

"MAPS"-

Scalability

Page 9: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

ScalabilityRun fast, run far

“How do I train on multiple GPUs and machines?”

- Probably the most question we got from Caffe users

Page 10: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

ScalabilityRun fast, run far

L1 L2 L3 L3b L2b L1b U3 U2 U1

Page 11: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

ScalabilityRun fast, run far

L1 L2 L3 L3b L2b L1b U3 U2 U1R3 R2 R1

Page 12: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

ScalabilityRun fast, run far

L1 L2 L3 L3b L2b L1b U3 U2 U1R3 R2 R1

L1 L2 L3 L3b L2b L1b U3 U2 U1R3 R2 R1

Page 13: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

ScalabilityRun fast, run far

L1 L2 L3 L3b L2b L1b

U3 U2 U1R3 R2 R1

L1 L2 L3 L3b L2b L1b

U3 U2 U1R3 R2 R1

Page 14: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

The Return of MPI"I'm your father", said Allreduce.

AllreduceTree based - O(MlogN)

Ring based - O(M)etc.

Page 15: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

ScalabilitySitting on top of giants

... and many more

Page 16: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

"MAPS"-

Portability

Page 17: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Portable SystemCloud, Mobile, IoT, Cars, Drones, Coffee makers

AI Math and Algorithms

Deployment Platforms

Page 18: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks
Page 19: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Portable SystemCloud, Mobile, IoT, Cars, Drones, Coffee makers

Model

auto predictor = caffe2::Predictor(model_file)

public class Predictor implements Caffe2ModelInterface;

Page 20: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Still, a lot of thoughts needed

• Limited computation• Battery life is a thing• Our models may be luxurious• Ecosystem less developed

Portable System Challenges

Page 21: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks
Page 22: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

"MAPS"-

Augmented Comp Patterns

Page 23: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Augmented Comp PatternsForget about float dense math, the world is bigger

• Quantized Computation• Sparse Math Libraries• Model Compression• Rethinking Existing Operations

Page 24: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Quantized ComputationForget about float, the world is bigger

8 23

5 10

16

8

floatfp16

fixed16fixed8

Page 25: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Quantized ComputationForget about float, the world is bigger

float add

fp16 add

fixed16 add

fixed8 add

0.9

0.4

0.05

0.03

float mul

fp16 mul

fixed8 mul

4.0

1.0

0.2

Page 26: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Why?

Source: Nvidia https://devblogs.nvidia.com/parallelforall/mixed-precision-programming-cuda-8/

Page 27: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Rethinking Existing OperationsResNEXT is coming to town

gconv gconv

g g g g g g g g g ...

g

g g g g g g g g g ...

g

AlexNet Group Conv

ResNext

Page 28: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Augmented Math ChallengesForget about float, the world is bigger

• Solutions• Eigen fp16• CuDNN• NNPack• gemmlowp

• Challenges• Seamless

conversion?• Model training?• Performance tuning?• ...

Page 29: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

"MAPS"-

Modularity

Page 30: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

A Repeated Pattern

Many key components in deep learning are

reusable across frameworks.

Page 31: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

In 2013 it used to be...

Caffe Torch Theano ...

Page 32: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Unix Philosophy?

Applications

Caffe, Torch, TF, MXNet, etc...

Core MathEigen

CuDNN NNPackTHNNMKL

CommsNCCL

MPIZeroMQ

Redis...

Low LevelCUDA

OpenGLOpenCLVulkan

...

Compilers

DataBasesLevelDB RocksDBHadoop

Amazon S3your old disk

or, "UnFramework"

Page 33: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

ModularDesigns

MAPS for a good frameworkAugmented

MathematicsPortableSystem

Scalability

Interface toExistingToolkits

EfficientMobile

Runtimes

Tuned CollectivePrimitives

Optimized Math

Libraries

+Flexible Framework Design

Page 34: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

No Silver Bullet?

Page 35: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

There is no silver bullet

Industry:StabilityScale & speedData IntegrationRelatively Fixed

Research:Flexible

Fast IterationDebuggable

Relatively bare-bone

Caffe Torch

TheanoTensorFlowD4J etc.

Page 36: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

There is no silver bullet

Industry:StabilityScale & speedData IntegrationRelatively Fixed

Research:Flexible

Fast IterationDebuggable

Relatively bare-bone

Caffe Torch

Page 37: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

“In open source, we feel strongly thatto really do something well,

you have to get a lot of people involved.”

— Linus Torvalds

Page 38: Yangqing Jia at AI Frontiers: Towards Better DL Frameworks

Thank you!

Towards Better Deep Learning FrameworksYangqing Jia, Research Lead on AI Platforms, Facebook