"Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and...
-
Upload
embedded-vision-alliance -
Category
Technology
-
view
63 -
download
1
Transcript of "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and...
![Page 1: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/1.jpg)
Copyright © 2015 Advanced Micro Devices 1
Harris Gasparakis, Ph.D.
12 May 2015
Understanding Adaptive Machine Learning Vision
Algorithms and Implementing them on GPUs and
Heterogeneous Platforms
![Page 2: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/2.jpg)
Copyright © 2015 Advanced Micro Devices 2
• Machine Learning (ML)
• Constrained optimization problems
• Heterogeneous computing
• OpenCL2.0, HSA
• Synthesis
• OpenCL programming tips for ML
• Conclusions
Agenda
![Page 3: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/3.jpg)
Copyright © 2015 Advanced Micro Devices 3
Can you find an algorithm to describe an object, and detect it?
Why Machine Learning (ML)?
![Page 4: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/4.jpg)
Copyright © 2015 Advanced Micro Devices 4
Can you find an algorithm to describe an object, and detect it?
Sometimes Not Needed…
![Page 5: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/5.jpg)
Copyright © 2015 Advanced Micro Devices 5
Can you find an algorithm to describe an object, and detect it?
Most Often Indispensable!
Vidit Jain and Erik Learned-Miller.
FDDB: A Benchmark for Face Detection in Unconstrained Settings.
![Page 6: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/6.jpg)
Copyright © 2015 Advanced Micro Devices 6
• Learn from examples!
• Model the universe using functions with (possibly many) parameters “w”
that you learn from training data
• “x” is a (multi-dimensional) function of the image data
• Pixel patches
• A priori Features
• Features in a learned dictionary (basis)
• PCA
• Sparse coding/LASSO/LARS
• DNN
• “y” is our value judgment on the data
• Object category
• Object identity, etc.
Formalism
![Page 7: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/7.jpg)
Copyright © 2015 Advanced Micro Devices 7
Tune parameters “w” to best explain the “N” observations (𝑦𝑛, 𝑥𝑛)
Machine learning typically involves constrained functional minimization
• Bias/variance
• Overcompleteness/sparsity
• How much learning is too much?
• N = ? |w| = ?
• Graphical models/subspace updates
Formalism
𝐸 𝑤 = 𝐷 𝑦𝑛, 𝑥𝑛; 𝑤 + λ𝐶 𝑤 +⋯
𝑁
𝑛=1
![Page 8: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/8.jpg)
Copyright © 2015 Advanced Micro Devices 8
It is a Jungle of Minima!
Start with initial guess:
𝑤0
Iteratively improve it:
𝑤𝑡 = 𝑤𝑡−1 + 𝛿𝑤𝑡
Local minima, with
Basins of attraction
![Page 9: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/9.jpg)
Copyright © 2015 Advanced Micro Devices 9
• Second order methods:
𝛿𝑤𝑡 = −𝐻 𝑤𝑡−1 𝑔(𝑤𝑡−1)
• First order methods:
𝛿𝑤𝑡 = −κ 𝑔(𝑤𝑡−1)
• Tweaks:
Line minimization, momentum, heat, homotopy, multiresolution
• Modern first order methods (AdaGrad, AdaDelta, etc):
𝛿𝑤𝑡 = −𝐻 (𝑔1:𝑡)𝑔(𝑤𝑡−1)
History
![Page 10: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/10.jpg)
Copyright © 2015 Advanced Micro Devices 10
• Start from a pool of multiple initial conditions, and multiple update
rules (“configurations”)
• Explore them simultaneously (GPU thread)
• On each update step, reason about the progress of each (CPU threads)
• Eliminate configurations:
• Dead ends
• in the same basin of attraction
• Replace them with other random configurations
• Give preference to configurations that progress the most
What if?
![Page 11: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/11.jpg)
Copyright © 2015 Advanced Micro Devices 11
Let’s Explore the Jungle!
![Page 12: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/12.jpg)
Copyright © 2015 Advanced Micro Devices 12
Let’s Explore the Jungle!
![Page 13: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/13.jpg)
Copyright © 2015 Advanced Micro Devices 13
Let’s Explore the Jungle!
![Page 14: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/14.jpg)
Copyright © 2015 Advanced Micro Devices 14
Let’s Explore the Jungle!
![Page 15: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/15.jpg)
Copyright © 2015 Advanced Micro Devices 15
Some Dead Ends…
![Page 16: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/16.jpg)
Copyright © 2015 Advanced Micro Devices 16
Reinitialize them!
![Page 17: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/17.jpg)
Copyright © 2015 Advanced Micro Devices 17
Continue Exploring…
![Page 18: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/18.jpg)
Copyright © 2015 Advanced Micro Devices 18
One Visitor Per Attractor is Enough...
![Page 19: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/19.jpg)
Copyright © 2015 Advanced Micro Devices 19
• CPU as adaptive GPU supervisor
• GPU computes an ensemble of updates
• CPU reasons about the ensemble of updates
• Coalesce if in the same basin of attraction
• Prune or “kick” if trapped in local minimum
• Test and rank according to generalization error
• Is it practical?
The Master Adaptive Strategy
![Page 20: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/20.jpg)
Copyright © 2015 Advanced Micro Devices 20
Know Thy (HSA) Hardware!
CPU HSA iGPU
Physical Memory
Unified (Bidirectionally Coherent, Pageable) Virtual Memory
L2 L2
CC
L1
CC
L1
CC
L1
CC
L1/LDS
CC
L1/LDS
CC
L1/LDS
CC
L1/LDS
Scheduler Scheduler
hUMA
Heterogeneous System Architecture (exposed via OpenCL 2.0)
![Page 21: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/21.jpg)
Copyright © 2015 Advanced Micro Devices 21
CPU HSA iGPU
Physical Memory
Unified (Bidirectionally Coherent, Pageable) Virtual Memory
L2 L2
CC
L1
CC
L1
CC
L1
CC
L1
CC
L1
CC
L1
CC
L1
hQ
Scheduler Scheduler
Dynamic parallelism,
Context switching,
Preemption,
Concurrent execution
Know Thy (HSA) Hardware!
![Page 22: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/22.jpg)
Copyright © 2015 Advanced Micro Devices 22
• Scope
• Thread, workgroup, device, all HSA devices
• Semantics
• Acquire (require that memory writes of other threads within the
scope become visible in current thread)
• Release (writes of current thread become visible to other threads in
current scope)
C++11 Atomics/Opencl 2.0 Atomics
![Page 23: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/23.jpg)
Copyright © 2015 Advanced Micro Devices 23
• Initialize pool of 𝑤𝑡 as fine grain SMV with atomics enabled:
clSVMAlloc (…, CL_MEM_READ_WRITE |
CL_MEM_SVM_FINE_GRAIN_BUFFER |
CL_MEM_SVM_ATOMICS,…);
• CPU waits for GPU to finish an iteration:
done = std::atomic_load_explicit (..,
std::memory_order_acquire );
• GPU kernel “signals” when done with an iteration:
atomic_store_explicit ( (global atomic_int *)(…), …
memory_order_release,
memory_scope_all_svm_devices );
C++11 Atomics/OpenCL 2.0 Atomics
![Page 24: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/24.jpg)
Copyright © 2015 Advanced Micro Devices 24
• The optimal partitioning of problem to threads may be non-obvious
• Depends a lot on cache line size
• Do not incur memory latency multiple times, align threads with
cache lines.
OpenCL Tips
![Page 25: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/25.jpg)
Copyright © 2015 Advanced Micro Devices 25
• X0,0, X0,1, … X0,15 ,…, X0,127
• X1,0, X1,1, … , X0,15 ,…, X1,127
• X2,0, X2,1, … , X0,15 ,…, X2,127
• XN-1,0, XN-1,1, … , XN,15 ,…, XN,127
K=2 Means, N=10000, in F=128 dims
• M0,0, M0,2, … , M0,15 ,…, M0,127
• M1,0, X1,2, … , X0,15 ,…, M1,127
![Page 26: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/26.jpg)
Copyright © 2015 Advanced Micro Devices 26
• X0,0, X0,1, … X0,15 ,…, X0,127
• X1,0, X1,1, … , X0,15 ,…, X1,127
• X2,0, X2,1, … , X0,15 ,…, X2,127
• XN-1,0, XN-1,1, … , XN,15 ,…, XN,127
K=2 Means, N=10000, in F=128 dims
• M0,0, M0,2, … , M0,15 ,…, M0,127
• M1,0, X1,2, … , X0,15 ,…, M1,127
![Page 27: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/27.jpg)
Copyright © 2015 Advanced Micro Devices 27
• The optimal partitioning of problem to threads may be non-obvious
• Depends a lot on cache line size
• Depends a lot on L2 size (and for virtual memory, on page size)
• Don’t jump around virtual pages
• Ensure you stay within L2
Know Thy hardware!
![Page 28: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/28.jpg)
Copyright © 2015 Advanced Micro Devices 28
Device/Main memory
Device/Main memory Input
Kernel 1
Kernel 2
Device/Main memory Output
Programmer’s View
Virtual memory
![Page 29: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/29.jpg)
Copyright © 2015 Advanced Micro Devices 29
Input
Kernel 1
Kernel 2
Output
L2
L2 Device/Main memory
Device/Main memory
L2
Device/Main
memory
Ideal Physical View
![Page 30: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/30.jpg)
Copyright © 2015 Advanced Micro Devices 30
Device/Main
memory L2
L2 Device/Main memory Input
Kernel 1
Kernel 2
Device/Main memory Output
L2
L2
Be Mindful of your L2
![Page 31: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/31.jpg)
Copyright © 2015 Advanced Micro Devices 31
• Consumer/producer paradigm…
• GPU: number crunching producer
• CPU: supervises GPU to global convergence
• mediated via C++11 platform atomics
• Very easy to transition to OpenCL right NOW!
• Replace all malloc code with:
clSVMAlloc and clEnqueueSVMMap (if needed)
• That’s it! No need to change any CPU code, and you can start
writing kernels!
Conclusions
![Page 32: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/32.jpg)
Copyright © 2015 Advanced Micro Devices 32
• Ready for prime time in real time!
• Detection
• Recognition
• Tracking
• Real-time learning
Conclusions
![Page 33: "Understanding Adaptive Machine Learning Vision Algorithms and Implementing Them on GPUs and Heterogeneous Platforms," a Presentation from AMD](https://reader033.fdocuments.us/reader033/viewer/2022042702/55d035e0bb61ebc6768b466f/html5/thumbnails/33.jpg)
Copyright © 2015 Advanced Micro Devices 33
The information presented in this document is for informational purposes only and may contain technical inaccuracies,
omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not
limited to product and roadmap changes, component and motherboard version changes, new model and/or product
releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the
like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right
to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify
any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO
RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN
NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES
ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY
OF SUCH DAMAGES.
ATTRIBUTION
© 2015 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks
of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes
only and may be trademarks of their respective owners. OpenCL is a trademark of Apple Inc. used by permission by
Khronos.
Disclaimer & Attribution