An Analytical Method to Determine Minimum Per-Layer...

An Analytical Method to Determine Minimum Per-Layer Precision of Deep

Neural Networks

Department of Electrical and Computer Engineering

University of Illinois at Urbana-Champaign

Charbel Sakr, Naresh Shanbhag

Code: https://github.com/charbel-sakr/Precision-Analysis-of-Neural-Networks

Also found on sakr2.web.engr.Illinois.edu

Machine Learning ASICs

How are they choosing these precisions?

Why is it working?

Can it be determined analytically?

[Sze’16, ISSCC]

AlexNet accelerator

16b fixed-point

[Chen’15, ASPLOS]

ML accelerator

16b fixed-point

[Google’17, ISCA]

Tensorflow accelerator

8b fixed-point

Eyeriss PuDianNao TPU

Current Approaches• Stochastic Rounding during training [Gupta, ICML’15 – Hubara, NIPS’16]

→ Difficulty of training in a discrete space

• Trial-and-error approach [Sung, SiPS’14]

→ Exhaustive search is expensive

• SQNR based precision allocation [Lin, ICML’16]

→ Lack of precision/accuracy understanding

Fixed-point Quantization

No accuracy

vs. precision

understanding

International Conference on Machine Learning (ICML) - 2017

Precision in Neural Networks

[Sakr et al., ICML’17]

Classification Output Quantization

Mismatch Probability

Floating-point Network

Fixed-point Network

𝑝𝑚: “Mismatch Probability”

Second Order Bound on 𝒑𝒎


• Input/Weight precision trade-off:

→ Optimal precision allocation by balancing the sum

• Data dependence (compute once and reuse)

→ Derivatives obtained in last step of backprop

→ Only one forward-backward pass needed

activation quantization noise gain weight quantization noise gain

Proof Sketch• For one input, when do we have a mismatch?

→ If FL network predicts label “𝑗”→ But FX network predicts label “𝑖” where 𝑖 ≠ 𝑗

→ This happens with some probability computed as follows:

→ But we already know

→ Whose variance is

→ Applying Chebyshev + LTP yields the result

(Output re-ordering due to quantization)

(Symmetry of

quantization noise)


Tighter Bound on 𝒑𝒎

𝑀: Number of Classes

𝑆: Signal to quantization noise ratio

𝑃1 & 𝑃2: Correction factors

• Mismatch probability decreases double exponentially with precision

→ Theoretically stronger than Theorem 1

→ Unfortunately, less practical


Per-layer Precision Assignment

• Per-layer precision allows for more

aggressive complexity reductions

• Search space is huge: 2L

dimensional grid

• Example: 5 layer network, precision

considered up to 16 bits → 1610 ≈

one million millions design points

• We need an analytical way to reduce

the search space → maybe using the

analysis of Sakr et al.

Layer 1

Layer L

(input features)(activations)

(weights)

key idea: equalization of

reflected quantization

noise variances

Fine-grained Precision Analysis


[Sakr et al., ICASSP’18]

per-layer quantization noise gains

Per-layer Precision Assignment Method

equalization of quantization noise variances

Search space is reduced to only a one

dimensional axis of reference precision min

Step 1: compute least

quantization noise gain

Step 2: select precision

to equalize all

quantization noise

variances

Comparison with related works• Simplified but meaningful model of complexity

→ Computational cost

→ Total number of FAs used assuming folded MACs with bit growth allowed

→ Number of MACs is equal to the number of dot products computed

→ Number of FAs per MAC:

→ Representational cost

→ Total number of bits needed to represent weights and activations

→ High level measure of area and communications cost (data movement)

• Other works considered

→ Stochastic quantization (SQ) [Gupta’15, ICML]

→ 784 – 1000 – 1000 – 10 (MNIST)

→ 64C5 – MP2 – 64C5 – MP2 – 64FC – 10 (CIFAR10)

→ BinaryNet (BN) [Hubara’16, NIPS]

→ 784 – 2048 – 2048 – 2048 – 10 (MNIST) & VGG (CIFAR10)

Precision Profiles – CIFAR-10VGG-11 ConvNet on CIFAR-10:

32C3-32C3-MP2-64C3-64C3-MP2-128C3-128C3-256FC-256FC-10Precision chosen such that 𝑝𝑚 ≤ 1%

• Weight precision requirements greater [Sakr et al., ICML’17]

• Precision decreases with depth – more sensitivity to perturbations in

early layers [Raghu et al., ICML’17]

Precision – Accuracy Trade-offs

• Precision reduction is the greatest when using the proposed fine-

grained precision assignment

Precision – Accuracy – Complexity Trade-offs

• Best complexity reduction when the proposed precision assignment

• Complexity even much smaller than a BinaryNet for same accuracy due

to much higher network complexity

Conclusion & Future Work• Presented an analytical method to determine per-layer precision of neural

networks

• Method based on equalization of reflected quantization noise variances

• Per-layer precision assignment reveals interesting properties of neural

networks

• Method leads to significant overall complexity reduction, e.g., reduction of

minimum precision to 2 bits and lesser cost of implementation than

BinaryNets

• Future work:

• Trade-offs between precision vs. depth vs. width

• Trade-offs between precision and pruning

Thank you!

Acknowledgment:

This work was supported in part by Systems on Nanoscale Information fabriCs (SONIC), one of the six

SRC STARnet Centers, sponsored by MARCO and DARPA.

An Analytical Method to Determine Minimum Per-Layer...

Documents

Transcript of An Analytical Method to Determine Minimum Per-Layer...