MCDNN An Approximation-Based Execution Framework for Deep ...

MCDNN:AnApproximation-BasedExecutionFrameworkfor DeepStreamProcessingUnderResourceConstraints

HaichenShen(haichen@cs.washington.edu)withSeungyeop Han,Matthai Philipose,

Sharad Agarwal,AlecWolman,ArvindKrishnamurthy

UniversityofWashington MicrosoftResearch

Wearable computing

à more data

When computer vision meets wearable

Consumer Manufacturing Public Safety

“That drink will get you to 2800 calories for today”

“I last saw your keys in the store room”

“Remind Tom of the party”

“You’re on page 263 of this book”

Deep learning makes vision work

Recognition Task face scene* object*

Accuracy 97% 88% 92%

Compute/frame (FLOPs) 1.00G 30.9G 39.3G

Do we have enough resources to run deep learning?

Compute@1-30fps (FLOPS) 1-30G 30-900G 40G-1.2T

* top-5 accuracy is shown in the table

But...

Resource usage for continuous vision

Imager Processor Radio

OmnivisionOV274090mW

Tegra K1 GPU290GOPS@10W= 34pJ/OP

Qualcomm SD810 LTE>800mWAtheros 802.11 a/g15Mbps@700mW= 47nJ/b

Amazon EC2CPU c4.large 2x400GFLOPS $0.1/hGPU g2.2xlarge 2.3TFLOPS $0.65/h

5Huge gap between workload and budget

BudgetDevice power Cloud cost

30% of 10Wh for 10h = 300mW $10 person/year

Workload Deep learning 300GFLOPS @ 30GFLOPs/frame, 10fps

Compute power 9GFLOPS 3.5GFLOPS (GPU) / 8GFLOPS (CPU)

Neural network

(c)convolution(f)fullyconnect(m)maxpool(r)relu(s)softmax

imgRimgGimgB

classes+scores

Neural network ≈ matrix multiplications

Low rank approximation(Y. Kim, et al. 2016)

Matrix sparsification(S. Han, et al. 2015)

(c)convolution(f)fullyconnect(m)maxpool(r)relu(s)softmax

imgRimgGimgB

classes+scores

imgRimgGimgB

classes+scores

Architectural changes(J. Ba, et al. 2014)

Managing the approx. / resource trade-off

‣Detailed characterization of the approximation / resource trade-off for many optimizations

‣ Two new optimizations for streaming, multi-application settings

‣New scheduling problem, Approximate Model Scheduling, with a heuristic solution

Outline

30 40 50 60 70 80 90 100aFFuraFy (%)

Memory / accuracy trade-off

30 40 50 60 70 80 90 100aFFuraFy (%)

Memory / accuracy trade-off

Substantially reduce memory use with gradual accuracy loss

45 50 55 60 65 70 75 80 85aFFuraFy (%)

2Ej((xeF)

6Fene((xeF)

)aFe((xeF)

Energy / accuracy trade-off

Wifi xmit cost (0.5J)

LTE xmit cost (0.9J)

Compute energy budget (2.3J)

Always execute locally

Can execute locally under energy budget

Exceed energy budget when execute locally

energy budget = total energy / total time(10h) / requests per second(1 req/sec)

Nvidia Jetson TK1

Outline

‣ Two new optimizations for streaming, multi-application settings‣ Specialization‣ Model sharing

Exploiting stream locality by specialization

• Standard deep neural network recognizes 4000 people• Most of videos are dominated by less than 10 faces over minutes

Timeline

Produce more compact models for skewed classes

Specialization runtime

input(withskeweddistribution)

compactmodel

fullmodel

checkfor“other”

fullmodel

Better resource/accuracy trade-off

Outline

Energy 14

Approximate model scheduling

Mobile device

Model Pool

memory

energy

Accuracy

Task 1

8 2 2model 1 (90%) model 2 (80%)

Task 2

9 3 3model 3 (80%) model 4 (70%)

Memory 10

Cost 14

Goal: maximize the overall accuracy

Mobile device

Model Pool

memory

energy

Accuracy

Requests:1. task 1 2. task 2 3. task 1 4. task 1 5. task 2

Task 1

8 2 2model 1 (90%) model 2 (80%)

Task 2

9 3 3model 3 (80%) model 4 (70%)

Energy 14 Memory 10

Cost 14 à cloud, model 1à device, model 4à cloud, model 1à cloud, model 2à device, model 4

model 4

model 1 model 1 m2

model 4

1 2 3 4 5

Packing problem

Mobile device

Model Pool

Accuracy

Requests:1. task 1 à cloud, model 12. task 2 à device, model 43. task 1 à cloud, model 14. task 1 à cloud, model 25. task 2 à device, model 46. task 1

Energy 14 Memory 10

Cost 14

model 4

model 2

memory

energy

Task 1

8 2 2model 1 (90%) model 2 (80%)

Task 2

9 3 3model 3 (80%) model 4 (70%)

model 4

model 1 model 1

Paging problem

model 4

• Packing problem: pick versions that satisfy energy/cost budgets" 𝑒$𝑥$&

&≤ 𝐸," 𝑐$𝑥$&,

&≤ 𝐶(𝑥$&, 𝑥$&, ∈ 0,1 , 𝑥$& 3 𝑥$&, = 0)

• Paging problem: pick versions that fit in memory∀1 ≤ 𝑡 ≤ 𝑇," 𝑠$𝑥$&

$;<≤ 𝑆

• Goal: maximize the accuracymaxA" " 𝑎$(𝑥$& + 𝑥$&, )

No known optimal online algorithms22

Heuristic scheduler

• Estimate future resource use and compute the budget for each request

• Account for paging cost to reduce oscillations

• Use increasingly more accurate versions of more heavily used models

Trace-driven evaluation

Run out of battery

Lose connectivity

development time

MCDNN frameworkcompiler

input typemodel schema

training/validationdata

cloudruntimeschedulerdatarouterprofiler

input input

classesclasses

trainedmodelcatalog

deviceruntimeschedulerdatarouter

clouddevice

runtime

development time

MCDNN frameworkcompiler

input typemodel schema

training/validationdata

cloudruntimeschedulerdatarouterprofiler

input input

classesclasses

trainedmodelcatalog

deviceruntimeschedulerdatarouter

specializer

statsspecializedmodels

clouddevice

specializationtime

runtime

Conclusion

• MCDNN makes efficient trade-offs between resource use and accuracy• Formulate the approximate model scheduling problem and

devise a heuristic algorithm• Design a generic approximation-based execution framework

for continuous mobile vision

Thank you! Questions?

BackupSlides

Cloud cost / accuracy trade-off

45 50 55 60 65 70 75 80 85

aFFuUaFy (%)

s)2bj(COouG G38)

2bj(COouG C38)

6Fene(COouG G38)

6Fene(COouG C38)

)aFe(COouG G38)

)aFe(COouG C38)

)aFe(COouG G38)

)aFe(COouG C38)

cloud GPU latency budget (9.7ms)@ $10/yr, 1 request/s@ AWS g2.2xlarge

cloud GPU latency budget (582ms)@ $10/yr, 1 request/min@ AWS g2.2xlarge

cloud CPU latency budget (7645ms)@ $10/yr, 1 request/min@ AWS c4.large

29latency budget = cost budget / cost per hour / #requests

Model sharing

inputintermediate

values

face ID

race age gender …

routerinput

(a) (b)

model-fragment cache

Dynamically-sized caching scheme

MCDNN An Approximation-Based Execution Framework for Deep ...

Documents

Transcript of MCDNN An Approximation-Based Execution Framework for Deep ...

Approximation Algorithms (Load Balancing)arashr/lecture24.pdf · Approximation Algorithms (Load Balancing) July 16, 2014 Approximation Algorithms (Load Balancing)

Approximation Algorithms. 2301681Approximation Algorithms2 Outlines Why approximation algorithm? Approximation ratio Approximation vertex cover.

MCDNN: An Execution Framework for Deep Neural Networks on ...

Linear Programming and Approximation Seminar in Approximation Algorithms.

MCDNN: An Approximation-Based Execution Framework for …arvind/papers/mcdnn.pdfSharad Agarwal Microsoft Research Alec Wolman Microsoft Research Arvind Krishnamurthy University of

Numerical Solution of the Fokker- Planck Approximation of the...Concurrency — part I: Single-threaded execution 2005-002 H˚akan Zeffer: Hardware-Software Tradeoffs in Shared-Memory

Meshfree Approximation with Matlab - Lecture II: RBF ... · Derive matrix-free meshfree approximation method for scattered data approximation based on MLS and approximate approximation

Eikonal Approximation

Constructive Approximation

Approximation Chap

Numerical approximation

Scheduling Parallel Tasks: Approximation Algorithms · Parallel Tasks model (tasks that require more than one proces-sor for their execution) has been introduced about 15 years ago

Approximation Algorithms and Hardness of Approximation IPM ...

Area Approximation

Approximation - courses.cs.washington.edu · This is a 2-approximation for vertex cover! Another Approximation Algorithm Let’s look at another approximation algorithm for vertex

MCDNN: An Approximation-Based Execution …...MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints Seungyeop Han University of Washington

Design Techniques for Approximation Algorithms and Approximation Classes.

Large-Scale Deep Learning - University of Torontoranzato/files/ranzato_ipam12_p2.pdfTuraga et al. “Maximin learning of image segmentation” NIPS 2009 - OCR Ciresan et al. “MCDNN

Approximation Theory and Approximation Practice · PDF file2 Approximation Theory and Approximation Practice ... (1880–1968), and Jackson ... At a more detailed level,

Approximation via Cost Sharing: Simpler and Better Approximation ...