Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine...
Transcript of Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine...
![Page 1: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/1.jpg)
Mobile and Embedded Machine Learning
If you have the power to make someone happy, do it.
The world needs more of that
![Page 2: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/2.jpg)
Overview
Objective To understand the opportunities to apply machine learning
and deep learning techniques for mobile applications
Content Machine learning for mobile and IoT applications Intro to convolutional neural network DeepMon: Mobile gpu-based deep learning framework for
continuous vision applications
After this module, you should be able to Understand the basics of machine learning and deep
learning techniques for mobile and IoT applications
![Page 3: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/3.jpg)
Mobile and IoT Sensing
Continuous Pipelined Execution
Raw sensor
data readings
Summarized
features
Inferred
Contexts
Sensing Devices
…
Feature extraction/ Preprocessing
Classification/Inference
GMM
CNN/RNN
HMM
Sensing Applications
Activity
Location
Group
Stress
Sensing
kNN
… …
MFCC/SIFT
FFT
Norm
Sound/Video
Heart rate
Accel/Gyro
WiFi Signal
……
Continuous sensing and analytics of user activities, location, emotions, and surroundings with mobile/IoT/wearable devices
![Page 4: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/4.jpg)
Revisit Activity Recognition
![Page 5: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/5.jpg)
Simple Heuristic
• If STDEV(y-axis samples) < CThreshold1
• If AVG(y-axis samples) > CThreshold2
• output standing
• Else • output sitting
• Else• If FFT(y-axis samples) < CThreshold3
• output walking
• Else• output jogging
![Page 6: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/6.jpg)
Are We Good?
• How do we determine good features and good thresholds?
• How do we know STDEV is better than MAX?• How do we know AVG is better than Median?• How do we know the right values for Cthreshold ?
• What if a user puts her phone in her bag, not in her front pocket?
• The Y-axis of the phone is not anymore the major axis of movement.
• How do we solve these problems? A better heuristic?
![Page 7: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/7.jpg)
Decision Tree
• A simple but effective ML classifier.
• This tree can be built by the C4.5 algorithm.
• Given sufficient training data, the algorithm can automatically determine the important features and their thresholds.
![Page 8: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/8.jpg)
Other ML Techniques
• Naïve Bayes classifier
• Decision tree
• Random forest
• Support vector machine
• kNN algorithm
• Linear regression
• …
![Page 9: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/9.jpg)
ML Techniques Flow
![Page 10: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/10.jpg)
ML Techniques: Limitations
• Linear regression?• Why is it linear?
• Bayesian? • What is the prior?
• SVM?• What are the features?
• Decision tree?• What are the nodes/variables?
• KNN?• Cluster on what features?
These methods do not suit
well with very complex models.
![Page 11: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/11.jpg)
Deep Learning
![Page 12: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/12.jpg)
Deep Learning for Activity Recognition
• Example of applying a convolutional neural network
![Page 13: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/13.jpg)
Deep Learning Applications
Self-Driving Face Recognition
Play GoSpeech Recognition
![Page 14: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/14.jpg)
Deep Learning for Speech Recognition
Time
Fre
quency
Spectrogram
CNN
Image
The filters move in the
frequency direction.
![Page 15: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/15.jpg)
• Deep learning: the more data, the higher accuracy
Machine Learning vs. Deep Learning
![Page 16: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/16.jpg)
Introduction to Convolutional
Neural Network (CNN)
![Page 17: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/17.jpg)
Convolutional Neural Network
• CNN is a feed-forward network that can extract topological properties from an image.
• Like almost every other neural networks they are trained with a version of the back-propagation algorithm.
• Convolutional Neural Networks are designed to recognize visual patterns directly from pixel images with minimal preprocessing.
• They can recognize patterns with extreme variability (such as handwritten characters).
![Page 18: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/18.jpg)
Feed-Forward Networks
Information flow is unidirectional
Data is presented to Input layer
Passed on to Hidden Layer
Passed on to Output layer
Information is distributed
Information processing is parallel
Internal representation
(interpretation) of data
![Page 19: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/19.jpg)
Identifying a Bird in an Image
• Let’s assume “beak” is unique to birds.
• “beak” exists in a small sub-region of an image.
“beak” detector
![Page 20: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/20.jpg)
“Beak” in Different Parts of Images
“upper-left
beak” detector
“middle beak”
detector
![Page 21: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/21.jpg)
Convolutional Layer
A filter
A Convolutional Neural Network (CNN) is a neural network
with “convolutional layers”, which has a number of filters that
does convolutional operation.
Beak detector
![Page 22: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/22.jpg)
Convolution
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
-1 1 -1
-1 1 -1
-1 1 -1
Filter 2
……
These are the network
parameters to be learned.
Each filter detects
a small pattern (3 x 3).
![Page 23: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/23.jpg)
Convolution
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
3 -1
stride=1
Dot
product
![Page 24: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/24.jpg)
Convolution
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
3 -3
If stride=2
![Page 25: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/25.jpg)
Convolution
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
3 -1 -3 -1
-3 1 0 -3
-3 -3 0 1
3 -2 -2 -1
stride=1
![Page 26: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/26.jpg)
Convolution
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
3 -1 -3 -1
-3 1 0 -3
-3 -3 0 1
3 -2 -2 -1
-1 1 -1
-1 1 -1
-1 1 -1
Filter 2
-1 -1 -1 -1
-1 -1 -2 1
-1 -1 -2 1
-1 0 -4 3
Repeat this for each filterstride=1
Two 4 x 4 images
Forming 2 x 4 x 4 matrix
FeatureMap
![Page 27: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/27.jpg)
Color Image: RGB 3 Channels
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
1 -1 -1
-1 1 -1
-1 -1 1Filter 1
-1 1 -1
-1 1 -1
-1 1 -1Filter 2
1 -1 -1
-1 1 -1
-1 -1 1
1 -1 -1
-1 1 -1
-1 -1 1
-1 1 -1
-1 1 -1
-1 1 -1
-1 1 -1
-1 1 -1
-1 1 -1Color image
![Page 28: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/28.jpg)
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
imageconvolution
-1 1 -1
-1 1 -1
-1 1 -1
1 -1 -1
-1 1 -1
-1 -1 1
1x
2x
……
36x
…
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
How to Form a Feed Forward Network
Fully-
connected
![Page 29: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/29.jpg)
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
-1 1 -1
-1 -1 1
Filter 11
2
3
…
8
9…
13
14
15
…
Only connect to
9 inputs, not
fully connected
4
10
16
1
0
0
0
0
1
0
0
0
0
1
1
3
fewer parameters!
![Page 30: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/30.jpg)
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
1:
2:
3:
…
7:
8:
9:…
13:
14:
15:
…
4:
10:
16:
1
0
0
0
0
1
0
0
0
0
1
1
3
-1
Shared weights
6 x 6 image
Fewer parameters
Even fewer parameters
![Page 31: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/31.jpg)
The Whole CNN
Fully Connected Feedforward network
cat dog ……Convolution
Max Pooling
Convolution
Max Pooling
Flattened
Can repeat
many times
![Page 32: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/32.jpg)
Max Pooling
3 -1 -3 -1
-3 1 0 -3
-3 -3 0 1
3 -2 -2 -1
-1 1 -1
-1 1 -1
-1 1 -1
Filter 2
-1 -1 -1 -1
-1 -1 -2 1
-1 -1 -2 1
-1 0 -4 3
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
![Page 33: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/33.jpg)
Why Pooling
• Subsampling pixels will not change the object
Subsampling
bird
bird
We can subsample the pixels to make image smaller
fewer parameters to characterize the image
![Page 34: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/34.jpg)
Max Pooling
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
3 0
13
-1 1
30
2 x 2 image
Each filter
is a channel
New image
but smaller
Conv
MaxPooling
![Page 35: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/35.jpg)
The Whole CNN
Convolution
Max Pooling
Convolution
Max Pooling
Can repeat
many timesA new image
The number of channels is
the number of filters
Smaller than the original
image
3 0
13
-1 1
30
![Page 36: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/36.jpg)
The whole CNN
Fully Connected Layer
cat dog ……Convolution
Max Pooling
Convolution
Max Pooling
Flattened
A new image
A new image
![Page 37: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/37.jpg)
Fully Connected Layer
3 0
13
-1 1
30 Flattened
3
0
1
3
-1
1
0
3
Fully-connected Feedforward network
cat dog ……
Conceptually, this can be understood as the voting process
to see which input values contribute more to the output.
![Page 38: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/38.jpg)
Tools and APIs
• Tensorflow (https://www.tensorflow.org/)• Tensorflow light for Mobile and IoT
• PyTorch (https://pytorch.org)
• Caffe2 (https://caffe2.ai)
• Keras (https://keras.io/)
![Page 39: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/39.jpg)
Only modified the network structure and input format (vector -> 3-D tensor)CNN in Keras
Convolution
Max Pooling
Convolution
Max Pooling
input
1 -1 -1
-1 1 -1
-1 -1 1
-1 1 -1
-1 1 -1
-1 1 -1
There are 25
3x3 filters.…
…
Input_shape = ( 28 , 28 , 1)
1: black/white, 3: RGB28 x 28 pixels
3 -1
-3 1
3
![Page 40: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/40.jpg)
Only modified the network structure and input format (vector -> 3-D array)CNN in Keras
Convolution
Max Pooling
Convolution
Max Pooling
Input
1 x 28 x 28
25 x 26 x 26
25 x 13 x 13
50 x 11 x 11
50 x 5 x 5
How many parameters for
each filter?
How many parameters
for each filter?
9
225=25x9
![Page 41: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/41.jpg)
Only modified the network structure and input format (vector -> 3-D array)CNN in Keras
Convolution
Max Pooling
Convolution
Max Pooling
Input
1 x 28 x 28
25 x 26 x 26
25 x 13 x 13
50 x 11 x 11
50 x 5 x 5Flattened
1250
Fully connected feedforward network
Output
![Page 42: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/42.jpg)
AlphaGo
NeuralNetwork
(19 x 19
positions)
Next move
19 x 19 matrix
Black: 1
white: -1
none: 0
Fully-connected feedforward network can be used
But CNN performs much better
![Page 43: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/43.jpg)
AlphaGo’s policy network
Note: AlphaGo does not use Max Pooling.
The following is quotation from their Nature article:
![Page 44: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/44.jpg)
CNN in speech recognition
Time
Fre
quency
Spectrogram
CNN
Image
The filters move in the
frequency direction.
![Page 45: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/45.jpg)
CNN in text classification
Source of image: http://citeseerx.ist.psu.ed
u/viewdoc/download?doi=10.1.1.703.6858
&rep=rep1&type=pdf
?
![Page 46: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/46.jpg)
DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications
ACM MobiSys 2017
![Page 47: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/47.jpg)
Continuous Vision Applications47
![Page 48: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/48.jpg)
Conventional Processing Flow
48
Jenny
Privacy &
Network
Problems
Capture frames
Process frames
(with DNN models)
Capture frames
& Process
(with DNN models
such as YoLo)
Research Question: Can we support fully-disconnected
DNN-based inference purely on the mobile device?
![Page 49: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/49.jpg)
DeepMon: Mobile Deep Learning System
• Supports low-latency execution of CNNs on commoditymobile devices using mobile GPUs
• OpenCL/Vulkan enabled devices
• Supports multiple GPU architectures & mobile OSs• Mali, Adreno, PowerVR (to be supported)
• Supports existing trained models• Multiple frameworks (Caffe, Matconvnet, Yolo, Darknet)
• Available today- https://github.com/JC1DA/deepmon
49
![Page 50: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/50.jpg)
Challenge 1: Mobile GPU is Weak!
50
Nvidia GTX 980
Adreno 330(Samsung Note
4)
Ratio (Nvidia / Adreno)
Number of ALUs (CUDA cores)
2048 128 16x
Memory Bandwidth(GB/s)
22412.8
(Shared)17.5x
Peak Performance (Gflops)
4600 166.5 27.6x
CNN Execution Time VGG-16 (ms)
238.59 (Caffe)
6315 ~26.5x
DNNs can well run on desktop GPUs, but…
Too high latency to
support continues vision.
![Page 51: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/51.jpg)
Desktop GPU Mobile GPU
Challenge 2: Architecture is Different!
• Existing GPU-based optimizations won’t simply work!
51
CPU CPUGPU GPU
http://cdn.wccftech.com/wp-content/uploads/2014/03/NVIDIA-Maxwell-Unified-Virtual-Memory.jpg
![Page 52: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/52.jpg)
Identifying Latency Bottleneck
• Convolutional Neural Network
• Device: Samsung Galaxy Note 4
• Implementation: naïve CPU
52
Model Conv. (ms) FC. (ms) Pooling (ms) Total (ms)
VGG-F 8072 1079 26 9177
VGG-M 19521 2122 156 21800
VGG-16 213371 2408 882 21662
~90% time consumed by Convolutional Layers
![Page 53: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/53.jpg)
Problem 1: High Memory Use
• Fast convolution operations on desktop GPU• Use optimized general matrix multiplication (GEMM)
• Build up a new matrix by unfolding input matrix
53
https://i.stack.imgur.com/GvsBA.jpg
0
0
0
0
1
1
0
1
2
…
…
…
…
…
…
…
…
…
No good GEMM on
mobile GPUs
Increased memory use
(one value in input matrix
is copied 9 times (3x3))
Matrix building overhead
(~150ms per layer with
Caffe on Samsung S7)
![Page 54: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/54.jpg)
Solution 1: mGPU-Aware Optimizations
• Do convolution operations directly on input• No matrix building overhead
• Less memory consumption
• Do mGPU-aware Optimizations• Leverage local memory (high performance cache inside GPU) to
reduce memory reading• Store reusable convolutional kernels inside the local memory
• It will be shared across multiple threads
• Layout the input data to enable fast vector addition/ multiplication on mGPUs
• The data in vectors need to be consecutively stored in the memory.
• Use half floating point (32 bits 16 bits)
54
![Page 55: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/55.jpg)
Impact of Memory Optimizationmeasured on Samsung s7 (Mali T770)
55
9120
2976
1344
0
2000
4000
6000
8000
10000
Caffe DeepMon (LM+LR) DeepMon(LM+LR+HF)
Late
ncy
(m
s)
VGG-16
3.06x
2.21x
6.78x
2.6%
accuracy loss
0%
accuracy loss
LM: Local Memory – LR: Layout Redesign – HF: Half Floating point
![Page 56: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/56.jpg)
Problem 2: Redundant Computation56
- Background in continuous video frames tends to be static.
- Independent processing of each frame is redundant.
Similar regions
a
Key idea: Can we reuse the intermediate results of previous
convolutional layer computation for similar regions?
![Page 57: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/57.jpg)
Solution 2: Convolutional Caching
57
Histogram-
Based
Sub-Image
Comparison
Reusable?
Reuse
cached
Conv. Op.
resultsyes
Perform
Conv. Op.
no
Cache
Manager
- 16 bins color histogram
- Chi square distance
- If distance < 0.005, block
is reusable
- Light-weight & accurate
comparisons required!
- SIFT-based approach did
not work.
![Page 58: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/58.jpg)
Solution 3: Decomposition
• To decompose large convolutional layer into a sequence of several smaller ones so computation cost can be reduced
• Tucker-2 decomposition• Decompose a convolutional layer into 3 parts
• 2 with filter size of (1x1)• 1st layer acts as dimension reduction -> reduce computational cost• 2nd layer acts as dimension restoration -> guarantee output size equal
to output size of original convolutation layer
• 1 with original filter size • have lower number of input/output channels -> reduce
computational cost
![Page 59: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/59.jpg)
Tucker-2 Decomposition
• N: number of filters
• C: number of input channels
• D: filter size (D ≥ 3)
• Input: (HxWxC)
Conv-k𝑁𝑥𝐶𝑥𝐷𝑥𝐷
Conv-k-1𝑁1𝑥𝐶𝑥1𝑥1
Conv-k-2𝑁2𝑥𝑁1𝑥𝐷𝑥𝐷
Conv-k-3𝑁𝑥𝑁2𝑥1𝑥1
Total computation:𝐻𝑥𝑊𝑥𝑁𝑥𝐶𝑥𝐷𝑥𝐷
𝐻𝑥𝑊𝑥𝑁1𝑥𝐶 𝐻𝑥𝑊𝑥𝑁2𝑥𝑁1𝑥𝐷𝑥𝐷 𝐻𝑥𝑊𝑥𝑁𝑥𝑁2
Speedup: 𝑁𝐶𝐷2
𝑁1𝐶+𝑁1𝑁2𝐷2+𝑁𝑁2
![Page 60: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/60.jpg)
DeepMon Performance: Latency60
9120
2976
1344912 644
0
3934
20981773
1006
0
2000
4000
6000
8000
10000
Caffe DeepMon (MO) DeepMon(MO+HF)
DeepMon(MO+HF+CC)
DeepMon (All)
Late
ncy
(m
s)
VGG-16 Yolo
X
6.78X
10X14.16X
MO: Memory Opt. HF: Half Floating point CC: Convolutional Caching
Dataset: UCF-101 (13K+ short video clips)
![Page 61: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/61.jpg)
DeepMon Performance: Accuracy61
89.9
63.4
83.94
58.14
0
20
40
60
80
100
0
20
40
60
80
100
VGG-16 YOLO
mea
n A
vera
ge P
reci
sio
n
(%)
Top
-5 R
eco
gnit
ion
A
ccu
racy
(%
)
Caffe DeepMon
<6% loss
All Optimization techniques are applied for DeepMon.
Dataset: (1) ILSVRC2012 for VGG-VeryDeep-16,
(2) the Pascal VOC 2007 for YOLO
![Page 62: Mobile and Embedded Deep Learningocw.snu.ac.kr/sites/default/files/NOTE/Week 11- Mobile Machine Lea… · DeepMon: Mobile gpu-based deep learning framework for continuous vision applications](https://reader036.fdocuments.us/reader036/viewer/2022062604/5fba3e28c52f0a6e352fbeb8/html5/thumbnails/62.jpg)
Conclusion
• DeepMon is an easy to use framework• Supports existing deep learning models
• Supports commodity mobile devices & OS’s
• Supports various optimizations to reduce latency• Memory loading optimizations
• Convolutional caching
• Decomposition
• Achieve speedup of 14x over Caffe with minimal accuracy loss (<6%)
• DeepMon implementation in OpenCL/Vulkanhttps://github.com/JC1DA/deepmon
62