Model and Optimized Fully Binary Neural Network Hardware...

Fully Binary Neural Network Model and Optimized Hardware Architectures for Associative MemoriesPHILIPPE COUSSY, CYRILLE CHAVET, HUGUES NONO WOUAFO, and LAURA CONDE-CANENCIA

Presented by: Stefany Escobedo, Joshua Kallus, and Alyssa ScheskeMarch 26, 2020

Introduction

● The goal is to develop associative memories based on neural networks which can store

information and retrieve it in a similar manner as the human brain does

○ Robust against input noise

○ Constant retrieval time independent of the number of stored associations

IntroductionGBNN Model OverviewProposed OptimizationsExperimentsConclusion

GBNN Model

● Abstract neural network model

● Based on sparse clustered networks used to design

associative memories

GBNN Model

● N binary neurons

● C equally-partitioned clusters

● L = N/C neurons per cluster

● Each cluster is associated through one of its neurons

with a portion of an input message

● m message of K bits

● X = K/C = log_2(L) length of each cluster submessage

● Clique: set of of activated neurons that are connected

to each other

GBNN Model

● Learns by memorizing that the set of neurons that

constitute the input message are connected to each

other and form a clique

● Retrieves by detecting which neuron is the most

“stimulated” ○ Scoring step using Eq. 1○ Winner Takes All (WTA) step using Eq. 2

GBNN Model: HW Architecture

● Fully parallel HW implementation

● Modules○ Decoding○ Learning (memory)○ Computing

● Crossbar network dedicated to interchanges of neuron values

between clusters

GBNN Model: HW Architecture

● Learning process○ Cluster receives K-bit binary word○ Decoding module splits word in C-subwords (C clusters)○ Subword is used to determine which neuron must be activated

■ Remaining subwords used to determine which neurons must be connected to locally activated neuron

○ Memory is updated with the selected weights to store the clique

● Retrieval process○ Scoring step is processed○ WTA step elects a neuron or group of neurons○ Local neuron values are updated with new information○ Info is broadcasted to all distant neurons

GBNN Model: Discussion

● Advantage○ Strongly enhances performance of associative memories compared to Hopfield networks

● Disadvantage○ Complex hardware architectures whose area and timing performances do not scale well

● Further optimizations○ Transformation into a full binary model to simplify scoring and removing WTA (area reduction)○ Memorize half of the synaptic weights to reduce # of storage elements & cost of learning logic○ Serialize communications (area reduction)

● Overall goal○ Ease the process realized by the neurons○ Optimize hardware implementation○ Keep functionality and performance of the original model

Proposed Simplified Neural Network

The optimizations proposed include the following:

● Fully binary semantics vs arithmetical-integer semantics● Reduced memory complexity● Serialized communications

Fully Binary Semantics

Replacing all arithmetical-integer computations with logical equations allows for removing the winner takes all step and achieves the same performance as the enhanced GBNN model

Unanimous Vote

● A neuron ni,j

is active in a given cluster if at least one active neuron in each other active cluster (distant active neurons), indicates that it is connected with neuron n

● This changes how the decoding module works and enables removal of the WTA step. Values of neurons can be calculated with only logical equations now.

Reduced Memory

Synaptic weights are stored which represent connections between neurons and others in distant clusters. The original GBNN model calculates and stores redundant information which can be optimized out to save space with no performance cost

Serialized Communications

In the fully parallel GBNN design a very large number of wires and logic is needed to connect every node to every other node. Serializing data transfers offers several benefits:

● Improve clock frequencies● Reduce area significantly● Lower power consumption

Serialization Implementation

Cluster Based:

● Clusters take turns to broadcast the value of all their neurons● Takes C (# clusters) cycles to complete

Neuron Based:

● Clusters broadcast concurrently the values of one of their neurons● Takes L=N/C cycles to complete

Serialization: Hardware Implementation

● Steering logic for synaptic weights has large overhead in multiplexers

● Area cost of this design is high

Serialization: Hardware Implementation

● Flip Flop ring buffer logic● Requires only one MUX

instead of L-1 2:1 MUXes● Can be used with either

neuron based or cluster based serialization

Experiments

● Performance Analysis

● Complexity Analysis

● Hardware Synthesis Analysis○ FPGA Target - Stratix IV FPGA Platform○ ASIC Target - Altera HardCopy Platform

Architecture Label

Original GBNN model V0

Fully binary model V1.0

Binary + triangular synaptic weight model

Binary + cluster-based serialization

Binary + neuron-based serialization

Experiments

Proposed architecture performance

matches/superimposed original GBNN

architecture

Performance AnalysisComplexity AnalysisHardware Synthesis Analysis

Experiments

Controller resources decompose into:

decoding, memorizing and computing

tasks.

Fully binary model (V1.0) reduces the

total area by 50% from V0.

V1.1 reduces architecture complexity

by ⅓ of V0 (70% area reduction).

V1.2 and V1.3 reduce architecture

complexity by ⅙ (83% area reduction)

Experiments - Area

Largest improvement from the original V0 architecture to

triangular synaptic weight matrix V1.1 by 50% for all

configurations.

Largest improvement from the original V0

architecture to V1.2/3 by 87% area savings.

Experiments

Look-up Table (LUT) average area

reductions range from 62% for V1.0

and up to 86% for V1.2.

The larger the network, the more impactful the reductions!

Experiments - Clock Frequencies

Conclusion & Comments

● Full binary computation strongly reduces the cost of the computation module

● Memory reduction limits the cost of both the memory and the decoding modules

● Serialization optimizes the computation and the decoding modules

● Future work: ○ Further optimize architectures for timing performance (not just area)

Model and Optimized Fully Binary Neural Network Hardware...

Documents

Transcript of Model and Optimized Fully Binary Neural Network Hardware...

Neural Machine Translation Inspired Binary Code Similarity ...Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs Fei Zuo y, Xiaopeng Li , Patrick

An Optimized Scaled Neural Branch Predictorhpca23.cse.tamu.edu/taco/pdfs/iccd2011_dist.pdf · An Optimized Scaled Neural Branch Predictor Daniel A. Jime´nez Department of Computer

Design of Optimized Reversible Binary and BCD …ijiet.com/wp-content/uploads/2016/09/8211.pdfDesign of Optimized Reversible Binary and BCD Adders G.Naveen kumar M. Tech, Department

Optimized Binary GCD for Modular Inversion

1 Structured Binary Neural Networks for Image Recognition

Structured Binary Neural Networks for Accurate Image ... · Structured Binary Neural Networks for Accurate Image Classi cation and Semantic Segmentation Bohan Zhuang, Chunhua Shen,

Latent-Optimized Adversarial Neural Transfer for Sarcasm ...

Binary classification using ensemble neural networks and interval neutrosophic sets

TEX-Nets: Binary Patterns Encoded Convolutional Neural ...€¦ · TEX-Nets: Binary Pa‡erns Encoded Convolutional Neural Networks for Texture RecognitionICMR ’17, , June 6–9,

Neural Machine Translation Inspired Binary Code Similarity ......Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs Fei Zuo y, Xiaopeng Li

Exploring the Connection Between Binary and Spiking Neural ... · Exploring the Connection Between Binary and Spiking Neural Networks Sen Lu, Abhronil Sengupta School of Electrical

Optimized Binary Modular Reconfigurable Robotic Devicesrobots.mit.edu/publications/papers/2002_05_Haf_Lic_Dub.pdf · Optimized Binary Modular Reconfigurable Robotic Devices ... increase

OPTIMAL BINARY QUANTIZATION FOR DEEP NEURAL …

Optimized Hybrid Scaled Neural Analog Predictor

BinaryConnect: Training Deep Neural Networks with binary ... · BinaryConnect: Training Deep Neural Networks with binary weights during propagations Matthieu Courbariaux Ecole Polytechnique

Local Binary Convolutional Neural Networksopenaccess.thecvf.com/content_cvpr_2017/papers/Juefei-Xu... · 2017-05-31 · Local Binary Convolutional Neural Networks Felix Juefei-Xu

Deep Neural Oracles for Short-window Optimized Compressed ...

Fully Binary Neural Network Model and Optimized Hardware …condecan/papers/a2015_04_ACM... · 2015. 9. 14. · 35 Fully Binary Neural Network Model and Optimized Hardware Architectures

Fluorescence modeling for optimized-binary compressive ...lucier/692/Rehrauer15.pdfFluorescence modeling for optimized-binary compressive detection Raman spectroscopy Owen G. Rehrauer,1

Performance-optimized hierarchical models predict neural ... › content › pnas › 111 › 23 › 8619.full.pdfPerformance-optimized hierarchical models predict neural responses