Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... ·...

32
Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ´ ee MAGI Calcul scientifique VILLETANEUSE, 2017 July 5, 2017 Foutse Yuehgoh (AIMS) July 5, 2017 1 / 32

Transcript of Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... ·...

Page 1: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Mean-shift Clustering forHeterogeneous Architectures

Foutse Yuehgoh

Journee MAGI Calcul scientifiqueVILLETANEUSE, 2017

July 5, 2017

Foutse Yuehgoh (AIMS) July 5, 2017 1 / 32

Page 2: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

OUTLINE

Introduction

1 Generality

2 Heterogeneous Architecture

3 Our Approach

Conclusion

Foutse Yuehgoh (AIMS) July 5, 2017 2 / 32

Page 3: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Introduction

Context

Heterogeneous computing system.

Parallel programing paradigm of Machine learning (ML).

Need of specific construction languages to configure and reconfigurehardware according to the machine learning needs.

Foutse Yuehgoh (AIMS) July 5, 2017 3 / 32

Page 4: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Introduction

Problematic

Slow hardware designs.

Slow testing and evaluation time.

Slow and expensive fabrication.

Design cost dominate.

Foutse Yuehgoh (AIMS) July 5, 2017 4 / 32

Page 5: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Introduction

Objective

Performance optimizations for heterogeneous machine learningcomputing.

Foutse Yuehgoh (AIMS) July 5, 2017 5 / 32

Page 6: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Generality

Machine Learning

Building hardware or software that can achieve tasks such as:

Extracting features from data in order to explore or solve predictiveproblems,

Automatically learn to recognize complex patterns and makeintelligent decisions based on insight generated by learning fromexamples.

Foutse Yuehgoh (AIMS) July 5, 2017 6 / 32

Page 7: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Generality

Continues increase in data size

Consequences

Driven by the need for more explanatory power, tremendous pressure hasbeen on ML methods to scale beyond a single machine, due to both spaceand time bottlenecks.

Foutse Yuehgoh (AIMS) July 5, 2017 7 / 32

Page 8: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Generality

Complexity of algorithms

Their learning time grows as the size and complexity of the trainingdataset increases.

Fortunately, they have special properties which, if properly harnessed by awell-designed system, can yield a 10× or even 100× speed boost.

Foutse Yuehgoh (AIMS) July 5, 2017 8 / 32

Page 9: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Generality

Importance

Efficient computational methods and algorithms

On very large data sets,

in parallel to complete the machine learning tasks in reasonable time.

Processing vast amounts of data simultaneously, is important in neuralnetworks and machine learning algorithms, just as it is the case for thehuman brain.

Foutse Yuehgoh (AIMS) July 5, 2017 9 / 32

Page 10: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Generality

Review of results

Speeding up multidimensional clustering in statistical data analysis hasbeen the focal point of numerous papers that have employed differenttechniques to accelerate the clustering algorithms.

Example: Pipelined Euclidean distance hardware diagram

Foutse Yuehgoh (AIMS) July 5, 2017 10 / 32

Page 11: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Generality

Constructing Hardware in a Scala Embedded Language (Chisel)

Chisel eases the building of simple, reliable, and efficient software toharness the parallelism inherent in FPGA technology.

Foutse Yuehgoh (AIMS) July 5, 2017 11 / 32

Page 12: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Heterogeneous Architecture

Definition

A heterogeneous architecture is a novel architecture that incorporatesboth general-purpose and specialized cores on the same chip.

Takes care of generic control and computation;

Accelerates frequently used or heavy weight applications.

Challenge

Foutse Yuehgoh (AIMS) July 5, 2017 12 / 32

Page 13: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Heterogeneous Architecture

Example

(a) CBEA: Traditional CPU core and eight SIMD accelerator cores.

(b) GPU: 30 highly multi-threaded SIMD accelerator cores incombination with a standard multicore CPU.

(c) FPGA: An array of logic blocks in combination with a standardmulti-core CPU.

Foutse Yuehgoh (AIMS) July 5, 2017 13 / 32

Page 14: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Heterogeneous Architecture

Serial and Parallel Computing

Parallel computing enables us to solve problems that benefits from orneed faster solutions and require large amount of memory.

Foutse Yuehgoh (AIMS) July 5, 2017 14 / 32

Page 15: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Heterogeneous Architecture

Hardware difficulties

Foutse Yuehgoh (AIMS) July 5, 2017 15 / 32

Page 16: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Our Approach

Strategy

Direct implementation of the algorithm on FPGA development board.

How do we split and partition the ML program over a heterogeneousmachines, and which best language do we use?

Foutse Yuehgoh (AIMS) July 5, 2017 16 / 32

Page 17: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Our Approach

MEAN-SHIFT ALGORITHM

Goal: Produce clusters on input data.

Fixes a window around each data point;

Computes the mean of the data within the window;

Shifts the window to the mean and repeat until we have convergence.

Key Property

An exceptionally appealing and non-parametric clustering techniques thatdeal without prior notion on the number of clusters.

Foutse Yuehgoh (AIMS) July 5, 2017 17 / 32

Page 18: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Our Approach

Foutse Yuehgoh (AIMS) July 5, 2017 18 / 32

Page 19: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Field Programmable Gate Array (FPGA)

Altera Cyclone IV E FPGA DE2-115 Development kid

Foutse Yuehgoh (AIMS) July 5, 2017 19 / 32

Page 20: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Field Programmable Gate Array (FPGA) :

Key Properties

Foutse Yuehgoh (AIMS) July 5, 2017 20 / 32

Page 21: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Why FPGA?

Advantageous properties

Foutse Yuehgoh (AIMS) July 5, 2017 21 / 32

Page 22: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Important properties of Chisel

Foutse Yuehgoh (AIMS) July 5, 2017 22 / 32

Page 23: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Why Chisel?

Maximum of a vector

Foutse Yuehgoh (AIMS) July 5, 2017 23 / 32

Page 24: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Why Chisel?

Manual Code parallelization

Foutse Yuehgoh (AIMS) July 5, 2017 24 / 32

Page 25: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Why Chisel?

Chisel Code parallelization

Foutse Yuehgoh (AIMS) July 5, 2017 25 / 32

Page 26: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Running the Chisel Simulation

Generated Verilog

Foutse Yuehgoh (AIMS) July 5, 2017 26 / 32

Page 27: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Challenges encountered

Errors in Chisel

Type mismatches, syntax, (Scala compiler errors).

Errors found during low level transforms and verilog emission (Firrtlerrors).

Illegal chisel but legal scala (Chisel checks).

Underlying implementation crashed(Java Stack Trace).

Foutse Yuehgoh (AIMS) July 5, 2017 27 / 32

Page 28: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Advantages with Chisel for hardware implementation [1]

Important advantages of Chisel

Designer productivity with 3-stage 32-bit RISC processor reviles a 3×code reduction with Chisel compared to hand written Verilog.

Quality of results:

Foutse Yuehgoh (AIMS) July 5, 2017 28 / 32

Page 29: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Advantages with Chisel for hardware implementation[1]

Important advantages of Chisel

Simulator Speed with a (88, 291, 350 cycles total) Processor:

Foutse Yuehgoh (AIMS) July 5, 2017 29 / 32

Page 30: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Conclusion

Chisel breaks the barrier between hardware and software thus is agood tool for hardware design.

Rather than slowing-down data sender (during fast arrival of data inchunks), an FPGA allows you to add more processing ”blocks” toaccelerate the processing.

An FPGA, for example, can process 10 data streams in parallel (i.econcurrently), whereas in software each stream would have to bebuffered and processed one at a time.

Hence this approach is worth giving it a try!!!

Foutse Yuehgoh (AIMS) July 5, 2017 30 / 32

Page 31: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Some References

Bachan, John Constructing Hardware in a Scale Embedded Language,institution: Lawrence Berkeley National Laboratory, (2014)

Bailey, Donald G and Johnston, Christopher T Algorithmtransformation for FPGA implementation, booktitle: ElectronicDesign, Test and Application, 2010. DELTA’10. Fifth IEEEInternational Symposium on,pages:77–81, organization:IEEE(2010).

Han, Song and Kang,and others ESE: Efficient Speech RecognitionEngine with Sparse LSTM on FPGA, organization:ACM,(2017).

Liu, Shaoshan and RO, WON W and Liu, Chen and Cristobal-Salas,Alfredo and Cerin Christophe and Han, Jian-Jun and Gaudiot,Jean-Luc,INTRODUCING THE EXTREMELY HETEROGENEOUSARCHITECTURE,publisher: World Scientific(2012)

Foutse Yuehgoh (AIMS) July 5, 2017 31 / 32

Page 32: Mean-shift Clustering for Heterogeneous Architecturescerin/VILLETANEUSE2017/Foutse_project... · Mean-shift Clustering for Heterogeneous Architectures Foutse Yuehgoh Journ ee MAGI

Mean Shift, Heterogeneous Architecture (FPGA), Chisel

Foutse Yuehgoh (AIMS) July 5, 2017 32 / 32