A Random Forest using a Multi-valued Decision Diagram on an FPGa

Post on 21-Jan-2018

1.945 views 0 download

Transcript of A Random Forest using a Multi-valued Decision Diagram on an FPGa

A Random Forest using a Multi-valued Decision Diagram on an FPGA

1Hiroki Nakahara, 1Akira Jinguji, 1Shimpei Sato, 2Tsutomu Sasao

1Tokyo Institute of Technology, JP, 2Meiji University, JP

May, 22nd, 2017@ISMVL2017

Outline• Background• Random forest (RF)• Multi-valued decision diagram (MDD)• RF using MDDs• Experimental results• Conclusion

2

Machine Learning

3

Much computation power, and Big data(Left): “Single-Threaded Integer Performance,” 2016(Right): Nakahara, “Trend of Search Engine on modern Internet,” 2014

Machine Learning Algorithms

M. Warrick, “How to get started with machine learning,” PyCon2014 4

Introduction• Random Forest (RF)

• Ensemble learning method• Consists of multiple decision trees (DTs)• Applications: Segmentation, human pose

detection• It is based on binary DTs (BDTs)

• A node is evaluated by an if-then-else statement

• The same variable may appear several times• Multiple-valued decision diagram (MDD)

• Each variable appears only once on a path

5

Introduction (Contʼd)• Target platform

• CPU: Too slow• GPU: Not suitable to the RF → slow, and consumes much power

• FPGA: Faster, low power, long TAT• High-level synthesis (HLS) for the RF using MDDs on an FPGA• Low power, high performance, short design time

6

Random Forest

7

Classification by a Binary Decision Tree (BDT)• Partition of the feature map

1.00

0.53

0.29

0.00

0.09

0.63

0.71

1.00

C1

C2 C1

C1 C2 C1

X1

X2

X2<0.53?

X2<0.29? X1<0.09?

X1<0.63? X1<0.71?

Y N

N

NN

NY

Y

Y

Y

C1

C1C2 C1C2

C1

8

Training of a BDT• It is built by randomized samples• Recursively partition the dataset to maximize its

entropy → The same variables may appear

9

1.00

0.53

0.29

0.00

0.09

0.63

0.71

1.00

C1

C2 C1

C1 C2 C1

X1

X2

X2<0.53?

X2<0.29? X1<0.09?

X1<0.63? X1<0.71?

Y N

N

NN

NY

Y

Y

Y

C1

C1C2 C1C2

C1

Random Forest (RF)• Ensemble learning• Classification and regression• Consists of multiple BDT

10

Tree 1 Tree 2 Tree n

C1 C2C1

Voter

C1 (Class)

InputX1<0.53?

X3<0.71? X2<0.63?

X2<0.63? X3<0.72?

Y N

N

NN

NY

Y

Y

Y

C1

C1C2 C1C3

C1

Tree 1

Binary Decision Tree (BDT) Random Forest

...

Applications• Key point matching [Lepetit et al., 2006]• Object detector [Shotton et al., 2008][Gall et al., 2011]• Hand written character recognition [Amit&Geman, 1997]• Visual word clustering

[Moosmann et al.,2006]• Pose recognition

[Yamashita et al., 2010]• Human detector

[Mitsui et al., 2011][Dahang et al., 2012]

• Human pose estimation [Shotton 2011]

11

Known Problem• Build BDTs from randomized samples

• The same variable may appear on a path• Tend to be slow, even if we use the GPUs

12

X2<0.53?

X2<0.29? X2<0.09?

X1<0.63? X1<0.71?

Y N

N

NN

NY

Y

Y

Y

C1

C1C2 C1C2

C1

if X2 < 0.09 thenoutput C1;elsegoto Child_node;

Multi-valued Decision Diagram

13

14

Binary Decision Diagram (BDD)• Recursively apply Shannon expansion to a given logic function

• Non-terminal node: If-then-else statement• Terminal node: Set functional value

0 1

x1

x2

x3

x4

x5

x6

Non‐terminal node

Terminal node

15

Measurement of BDD

Memory size: # of nodes size of a nodeWorst case performance: LPL (Longest Path Length)

→Dedicated fully pipeline hardware

0 1

x1

x2

x3

x4

x5

x6

16

Multi-Valued Decision Diagram (MDD)

• MDD(k): 2k outgoing edges• Evaluates k variables at a time

0 1

x1

x2

x3

x4

x5

x6

BDD0 1

X3

X2

X1

{x5,x6}

{x3,x4}

{x1,x2}

MDD(2)

Comparison the BDT with the MDD

17

X2<0.53?

X2<0.29? X1<0.09?

X1<0.63? X1<0.71?

Y N

N

NN

NY

Y

Y

Y

C1

C1C2 C1C2

C1

X2

X1 X1

C1 C2

<0.29

<0.53<1.00

<1.00<0.71<0.71

<1.00

<0.63

BDT MDD

# of Nodes

18

1.00

0.53

0.29

0.00

0.09

0.63

0.71

1.00

C1

C2 C1

C1 C2 C1

X2

X1

1.00

0.53

0.29

0.00

0.09

0.63

0.71

1.00

C1

C2 C1

C1 C2 C1

X2

X1BDT MDD

Complexities of the BDT and the MDD

19

# Nodes LPL

BDT O(Σ|Xi|) O(Σ|Xi|)

MDD O(|Xi|k) O(n)

The RF prefers shallow decision trees for avoid the overfitting

Random Forest using MDDs on an FPGA

20

FPGA (Field Programmable Gate Array)• Reconfigurable architecture

• Look-up Table (LUT)• Configurable channel

• Advantages• Faster than CPU• Dissipate lower power

than GPU• Short time design

than ASIC

21

Fully Pipeline Circuit

Tree 1 Tree 2 Tree b

C1 C2

C1

VoterC1

X (Input)

...

22

MUX-based Realization

23

System Design Tool

24

①②

1. Behavior design+ pragmas

2. Profile analysis3. IP core generation by HLS4. Bitstream generation by

FPGA CAD tool5. Middle ware generation

↓Automatically done

Proposed Tool Flow

TrainingDataset

scikit‐learn

HyperParameter(by Grid‐search)

RandomForest

HostCode

KernelCode aocx

Binary

HostPC

FPGABoard

aoc

gcc

RF2AOC

25scikit‐learn Intel SDK for OpenCL

Experimental Results

26

Comparison the MDD based with the BDT based

27

BDT MDDName Path len.

(Peform.)#Nodes(Mem.)

Max. Path

Path len.(Peform.)

#Nodes(Mem.)

Dermatology 720 676 15 322 118336Contraceptive Method

600 1055 9 198 7360

Glass Identification

952 1260 10 268 17204

Hayes‐Roth 480 577 5 73 448Hepatitis 720 1040 15 357 145664Ionosphere 1196 1077 20 381 671744Iris 1056 777 4 199 517

Dataset: UCI Machine Learning Repositoryhttp://archive.ics.uci.edu/ml/datasets.html

Comparison of Platforms• Implemented RF following devices

• CPU: Intel Core i7 650• GPU: NVIDIA GeForce GTX Titan• FPGA: Terasic DE5-NET

• Measure dynamic power includingthe host PC

• Test bench: 10,000 random vectors• Execution time includingcommunication time between the host PC and devices

28

GPU

FPGA

Comparison of Platforms

29

GPU@86WGeForce Titan

CPU@13WXeon (R) E5607

FPGA@15WStratix V A7

Name LPS LPS/W LPS LPS/W LPS LPS/WDermatology 336.2 3.9 211.6 16.3 3221.2 214.7

Contraceptive Method

521.9 6.1 286.4 22.0 10924.3 728.3

Glass Identification

726.7 8.5 587.5 45.2 6442.3 429.5

Hayes‐Roth 1512.9 17.6 1165.5 89.7 12884.6 859.0

Hepatitis 739.1 8.6 662.7 51.0 8209.9 547.3Ionosphere 821.0 9.5 595.9 45.8 9663.5 644.2

Iris 446.6 5.2 436.7 33.6 4831.7 322.1

LPS: #Looks Per Second

Conclusion• Proposed the RF using MDDs

• Reduced the path length• Increased the column multiplicity

• # of nodes: O(|X|k)• The shallow decision diagram is recommended to avoid the overfitting

• Developed the high-level synthesis design flow toward the FPGA realization

• 10.7x faster than the GPU• 14.0x faster than the CPU

30