Model and Optimized Fully Binary Neural Network Hardware...

24
Fully Binary Neural Network Model and Optimized Hardware Architectures for Associative Memories PHILIPPE COUSSY, CYRILLE CHAVET, HUGUES NONO WOUAFO, and LAURA CONDE-CANENCIA Presented by: Stefany Escobedo, Joshua Kallus, and Alyssa Scheske March 26, 2020

Transcript of Model and Optimized Fully Binary Neural Network Hardware...

Page 1: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Fully Binary Neural Network Model and Optimized Hardware Architectures for Associative MemoriesPHILIPPE COUSSY, CYRILLE CHAVET, HUGUES NONO WOUAFO, and LAURA CONDE-CANENCIA

Presented by: Stefany Escobedo, Joshua Kallus, and Alyssa ScheskeMarch 26, 2020

Page 2: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Introduction

● The goal is to develop associative memories based on neural networks which can store

information and retrieve it in a similar manner as the human brain does

○ Robust against input noise

○ Constant retrieval time independent of the number of stored associations

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 3: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

GBNN Model

● Abstract neural network model

● Based on sparse clustered networks used to design

associative memories

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 4: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

GBNN Model

● N binary neurons

● C equally-partitioned clusters

● L = N/C neurons per cluster

● Each cluster is associated through one of its neurons

with a portion of an input message

● m message of K bits

● X = K/C = log_2(L) length of each cluster submessage

● Clique: set of of activated neurons that are connected

to each other

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 5: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

GBNN Model

● Learns by memorizing that the set of neurons that

constitute the input message are connected to each

other and form a clique

● Retrieves by detecting which neuron is the most

“stimulated” ○ Scoring step using Eq. 1○ Winner Takes All (WTA) step using Eq. 2

(1)

(2)

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 6: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

GBNN Model: HW Architecture

● Fully parallel HW implementation

● Modules○ Decoding○ Learning (memory)○ Computing

● Crossbar network dedicated to interchanges of neuron values

between clusters

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 7: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

GBNN Model: HW Architecture

● Learning process○ Cluster receives K-bit binary word○ Decoding module splits word in C-subwords (C clusters)○ Subword is used to determine which neuron must be activated

■ Remaining subwords used to determine which neurons must be connected to locally activated neuron

○ Memory is updated with the selected weights to store the clique

● Retrieval process○ Scoring step is processed○ WTA step elects a neuron or group of neurons○ Local neuron values are updated with new information○ Info is broadcasted to all distant neurons

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 8: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

GBNN Model: Discussion

● Advantage○ Strongly enhances performance of associative memories compared to Hopfield networks

● Disadvantage○ Complex hardware architectures whose area and timing performances do not scale well

● Further optimizations○ Transformation into a full binary model to simplify scoring and removing WTA (area reduction)○ Memorize half of the synaptic weights to reduce # of storage elements & cost of learning logic○ Serialize communications (area reduction)

● Overall goal○ Ease the process realized by the neurons○ Optimize hardware implementation○ Keep functionality and performance of the original model

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 9: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Proposed Simplified Neural Network

The optimizations proposed include the following:

● Fully binary semantics vs arithmetical-integer semantics● Reduced memory complexity● Serialized communications

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 10: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Fully Binary Semantics

Replacing all arithmetical-integer computations with logical equations allows for removing the winner takes all step and achieves the same performance as the enhanced GBNN model

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 11: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Unanimous Vote

● A neuron ni,j

is active in a given cluster if at least one active neuron in each other active cluster (distant active neurons), indicates that it is connected with neuron n

i,j

● This changes how the decoding module works and enables removal of the WTA step. Values of neurons can be calculated with only logical equations now.

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 12: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Reduced Memory

Synaptic weights are stored which represent connections between neurons and others in distant clusters. The original GBNN model calculates and stores redundant information which can be optimized out to save space with no performance cost

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 13: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Serialized Communications

In the fully parallel GBNN design a very large number of wires and logic is needed to connect every node to every other node. Serializing data transfers offers several benefits:

● Improve clock frequencies● Reduce area significantly● Lower power consumption

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 14: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Serialization Implementation

Cluster Based:

● Clusters take turns to broadcast the value of all their neurons● Takes C (# clusters) cycles to complete

Neuron Based:

● Clusters broadcast concurrently the values of one of their neurons● Takes L=N/C cycles to complete

Page 15: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Serialization: Hardware Implementation

● Steering logic for synaptic weights has large overhead in multiplexers

● Area cost of this design is high

Page 16: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Serialization: Hardware Implementation

● Flip Flop ring buffer logic● Requires only one MUX

instead of L-1 2:1 MUXes● Can be used with either

neuron based or cluster based serialization

Page 17: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Experiments

● Performance Analysis

● Complexity Analysis

● Hardware Synthesis Analysis○ FPGA Target - Stratix IV FPGA Platform○ ASIC Target - Altera HardCopy Platform

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Architecture Label

Original GBNN model V0

Fully binary model V1.0

Binary + triangular synaptic weight model

V1.1

Binary + cluster-based serialization

V1.2

Binary + neuron-based serialization

V1.3

Page 18: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Experiments

Proposed architecture performance

matches/superimposed original GBNN

architecture

Performance AnalysisComplexity AnalysisHardware Synthesis Analysis

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 19: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Experiments

Controller resources decompose into:

decoding, memorizing and computing

tasks.

Fully binary model (V1.0) reduces the

total area by 50% from V0.

V1.1 reduces architecture complexity

by ⅓ of V0 (70% area reduction).

V1.2 and V1.3 reduce architecture

complexity by ⅙ (83% area reduction)

Performance AnalysisComplexity AnalysisHardware Synthesis Analysis

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 20: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Performance AnalysisComplexity AnalysisHardware Synthesis Analysis

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

V0

V1.0

V1.1

V1.2

Page 21: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Experiments - Area

Largest improvement from the original V0 architecture to

triangular synaptic weight matrix V1.1 by 50% for all

configurations.

Performance AnalysisComplexity AnalysisHardware Synthesis Analysis

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Largest improvement from the original V0

architecture to V1.2/3 by 87% area savings.

Page 22: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Experiments

Look-up Table (LUT) average area

reductions range from 62% for V1.0

and up to 86% for V1.2.

The larger the network, the more impactful the reductions!

Performance AnalysisComplexity AnalysisHardware Synthesis Analysis

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 23: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Experiments - Clock Frequencies

Performance AnalysisComplexity AnalysisHardware Synthesis Analysis

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

Page 24: Model and Optimized Fully Binary Neural Network Hardware ...ziyang.eecs.umich.edu/iesr/lectures/coussy15apr-present.pdfFully Binary Neural Network Model and Optimized Hardware Architectures

Conclusion & Comments

● Full binary computation strongly reduces the cost of the computation module

● Memory reduction limits the cost of both the memory and the decoding modules

● Serialization optimizes the computation and the decoding modules

● Future work: ○ Further optimize architectures for timing performance (not just area)

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion