CVML software development tools
Transcript of CVML software development tools
N. Tsapanos, P. Bassia, P. Giannakeris,
Prof. Ioannis Pitas
Aristotle University of Thessaloniki
www.aiia.csd.auth.gr
Version 2.5
CVML software development
tools
1
Distributed computing (under development)
• ROS
• Libraries
• OpenCV
• BLAS, cuBLAS
• MKL DNN, cuDNN
• DNN Frameworks
• Neon, Tensorflow, Pytortch, Keras, MXNet
• Distributed/cloud computing
• MapReduce programming model
• Apache Spark
• Collaborative SW Development
• GitHub/Bitbucket.
Robotic Operating System (ROS)
ROS is used in developing robotic applications• ROS architecture:
• ROS master, ROS nodes.
• Node communications:
• Topics
• Publisher and Subscriber nodes.
• Functionalities:• Node graph visualization • Logging/plotting
• Checks and diagnostics.
ROS architecture
(Adapted from Willow Garage’s “What is ROS?” Presentation)
Master
Publisher
Publisher
Subscriber
Subscriber
/topic
(DNS-like)
ROS architecture
• A Node is a single ROS-enabled program: – Most communication happens between nodes. – Nodes can run on many different devices.
• One Master per system.
ROS Master • Enables nodes to locate one another • Handles Parameter Server
camera interface
image processing
motion logic
robot planning
robot interface
ROS Communications
msg … msg … msg
Publisher Node
Publishing msg data On channel /topic
Publisher Node
Advertises /topic is available
with type msg
Subscriber Node
Subscribed to /topicwith type msg
ROS MASTER
Subscriber Node
Listening for /topicwith type msg
/topic
Topics are for Streaming Data.
msg … msg … msg
Libraries vs Frameworks
7https://www.programcreek.com/
•When you use a library, you are in charge of the flow of the
application. You are choosing when and where to call the library.
•When you use a framework, the framework is in charge of the flow.
It provides some places for you to plug in your code, but it calls the
code you plugged in as needed.
OpenCVOpen Computer vision library:• Image processing.• Video analysis.
• Deep NN, Machine Learning• Object detection.
• Computer vision• Camera calibration, 3D world modeling.
Basic Linear Algebra
Subprograms (BLAS) Library
Basic Linear Algebra Subprograms (BLAS) is a software
library of high performance Linear Algebra routines:
• BLAS has three routine sets (“levels“).
• They correspond to both the chronological order of
definition and publication, as well as the degree of algorithm
complexity.
9
BLAS
BLAS Level 1:
• It contains 𝑂(𝑛) routines for vector operations:
• Dot products, vector norms for vector dimensionality 𝑛;
• A generalized vector addition of the form:
𝐲 ← 𝑎𝐱 + 𝐲.• Several other operations.
10
BLAS
BLAS Level 2:
• It contains 𝑂(𝑛2) matrix-vector operations for vector
dimensionality 𝑛, matrix dimensionality 𝑛 × 𝑛.
• A generalized matrix to vector multiplication:
𝐲 ← 𝛼𝐀𝐱 + 𝛽𝐲.• System solver for system of linear equation:
𝐓𝐱 = 𝐲,
• 𝐓: triangular matrix.
11
BLAS
BLAS Level 3:
• It contains 𝑂(𝑛3) matrix to matrix operations for 𝑛 ×𝑛 matrices:
• General matrix multiplication of the form:
𝐂 ← a𝐀𝐁 + β𝐂,• 𝐀, 𝐁 can optionally be transposed or Hermitian-conjugated inside
the routine;
• All three matrices 𝐀,𝐁, 𝐂 may be strided.
• Routines for solving:
𝐁 ← a𝐓−1𝐁,
• 𝐓 is a triangular matrix.12
BLAS• BLAS is extremely fast.
• There are several implementations.
• Some of them are hardware specific.
• Most of them are heavily optimized:
• Taking advantage of data locality in memory to reduce CPU cache
misses.
• Loop unrolling (probably best handled by the compiler).
• Using multiple threads to execute on every CPU core.
• Employing the Single Input Multiple Data (SIMD) CPU operation
capabilities (Streaming SIMD Extensions).
13
cuBLAS
• NVIDIA cuBLAS library is a fast GPU-accelerated
implementation of the standard basic linear algebra
subroutines (BLAS).
• Speed up is achieved by deploying compute-intensive
operations to a single GPU or scale up and distribute work
across multi-GPU configurations efficiently.
• Data needs to be copied to the GPU memory prior to
calling cuBLAS functions. Results need to be copied
backwards to the CPU memory.
cuBLAS
• It has support for multiple GPUs, concurrent kernels and streams.
• The optimization includes:
• Software pipelining.
• Use of vector memory operations that keeps data coalesced.
• Hide any kind of memory path latencies, with instruction and thread
level parallelism (ILP, TLP).
• Take advantage of each different GPU memory (shared, constant,
texture).
• Loop unrolling.
• cuBLAS 9.2 is up to 35 times faster than BLAS.
15
KL DNN
• Intel Math Kernel Library – Deep Neural Network is an
open source, performance-enhancing library for
accelerating deep learning frameworks on Intel Architecture.
• Optimizations are abstracted and integrated directly into
PyTorch, MXNet frameworks.
• Makes use of SIMD instructions available on Intel
processors
• Engine choice available (CPU/GPU).
16
cuDNN
• The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-
accelerated library of primitives for deep neural networks.
• Standard routines:
• Forward and backward convolution.
• Pooling.
• Normalization layers.
• Activation layers.
• cuDNN accelerates widely used deep learning frameworks:
• Caffe, Keras, MATLAB, TensorFlow, Torch.
17https://developer.nvidia.com/cudnn
DNN Frameworks
Image Source: Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Le - Performance Analysis of CNN Frameworks for GPUs
DNN Frameworks• All 5 frameworks work with cuDNN as backend.
• cuDNN unfortunately is not open source.
• cuDNN supports FFT and Winograd convolutions.
Image Source: Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Le - Performance Analysis of CNN Frameworks for GPUs
DNN Frameworks
Neon
• Intel Neon is a modern deep learning framework created by
Nervana Systems.
• Implemented in Python, while Nervana Caffe framework is
written in C and C++.
• Impressively fast compared to other frameworks.
• Image processing oriented (not general purpose enough).
20
DNN FrameworksNeon:
• Written in Python and C;
• It does not support Windows;
• Uses MKL for CPU (highly optimized by Intel);
• Supports CUDA for GPU;
• Known mostly to be the first to implement Winograd
convolutions faster than others;
• The library is open source and it is an good starting point for
those interested to study the code.
DNN FrameworksTensorFlow (TF)
• For Python (but also APIs in C++, C#, JavaScript, Java).
• It operates with static computation graphs.
• Recommended for production (cloud, mobile).
• Supported by Google.
• Scalable.
Pytorch
• Main competitor of TF.
• Dynamically updated graph (advantage over TF).
• Suited for research, small projects and prototyping.
• Supports distributed learning.22
DNN FrameworksKeras
• High-level API for TF.
• Suitable for quick testing, prototyping.
• Very easy to use, good for beginners.
• Many plug-and-play architectures available.
MXNet
• Highly scalable.
• Very effective parallelization on multiple GPUs and machines.
• Supports many languages (C ++, Python, R, Julia, JavaScript,
Scala, Go, Perl).
• Supported by Apache.23
DNN Frameworks
Recommended for beginners:
Recommended for research:
Recommended for production
(cloud or mobile):
24
Distributed/cloud
computing
• An alternative to expensive Graphics cards and
supercomputers.
• It can consist of any combination of PCs of
various processing power.
• Extremely flexible and scalable.
• Designed to handle Big Data.
MapReduce programming
model
▪ This programming model was developed to implement
the Map and Reduce commands, which can, in turn, be
used to write distributed programs with ease.
▪ The Map command applies a function on every
element of the distributed dataset. This can be used to
filter, sort, or transform the data.
▪ The Reduce command collects the distributed
dataset, or the results of a previous Map or Reduce
command, in a summary operation, such as addition.
Apache Spark• Open software for cluster
computing.
• Developed by Apache.
• Distributed big data computing.
• It is based on MapReduce.
• Up to 100 times faster than
Hadoop MapReduce.
• Machine Learning, Graph
Processing, SQL, Streaming
modules.
Apache Spark
▪ The Apache Spark framework is an implementation of
the MapReduce programming model.
▪ It provides the Map and Reduce command, among
others, in a fault-tolerant environment.
▪ Its advantages over other alternatives, like Hadoop,
include the ability to cache data in memory, in order to
minimize disk access.
▪ The data are stored in data collections named Resilient
Distributed Datasets (RDDs).
Apache Spark
• Supported languages include Java, Scala and Python.
• Spark also includes specialized APIs, like GraphX for
graph processing and SparkML for Machine Learning.
• It is possible to use native libraries to process data
(avoid OpenJDK).
Apache Spark
▪ In Spark, a master node coordinates several worker
nodes.
▪ We used VirtualBox VMs running Ubuntu to set up our
computing cluster.
Apache Spark Mllib
• Machine Learning Library:
• Clustering
• Classification
• Regression
• Data modeling
• Dimensionality reduction
• Available in Pyspark (Spark
wrapper for Python).
other partners offering
Collaborative SW Development
32
• Modern on-line collaborative development platforms provide a common
project repository hosted on-cloud.
• Several features are typically available:
• Distributed version control.
• Source code management.
• Access control.
• Bug tracking.
• Task management.
• Web interface.
Collaborative SW Development
33
• Several alternatives exist.
• Most popular front-end services:
• GitHub.
• Bitbucket.
• Most popular underlying open-source version control systems:
• Git.
• Mercurial.
34
References[BRA2008] Bradski G, Kaehler A. Learning OpenCV: Computer vision with the
OpenCV library. " O'Reilly Media, Inc."; 2008 Sep 24.
[KIR2016] Kirk DB, Wen-Mei WH. Programming massively parallel processors: a
hands-on approach. Morgan kaufmann; 2016.
[ABA2016] Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M,
Ghemawat S, Irving G, Isard M, Kudlur M. Tensorflow: A system for large-scale
machine learning. In12th {USENIX} Symposium on Operating Systems Design and
Implementation ({OSDI} 16) 2016 (pp. 265-283).
[DEA2010] Dean J, Ghemawat S. MapReduce: a flexible data processing tool.
Communications of the ACM. 2010 Jan 1;53(1):72-7.
[ZAH2026] Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X,
Rosen J, Venkataraman S, Franklin MJ, Ghodsi A. Apache spark: a unified engine
for big data processing. Communications of the ACM. 2016 Oct 28;59(11):56-65.
[KET2017] Ketkar N. Introduction to pytorch. InDeep learning with python 2017 (pp.
195-208). Apress.