First Place Memocode'14 Design Contest Entry
-
Upload
kevin-townsend -
Category
Engineering
-
view
332 -
download
2
description
Transcript of First Place Memocode'14 Design Contest Entry
![Page 1: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/1.jpg)
A High Performance Systolic Architecture for k-NNClassification
Kevin Townsend, Philip Jones, Joseph Zambreno
Reconfigurable Computing LaboratoryIowa State University
MEMOCODE’14
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 1 / 11
![Page 2: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/2.jpg)
Outline
1 The Competition
2 Our Approach
3 Hardware DesignPlatformSystolic ArrayProcessing ElementDot ProductSort
4 Results
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 2 / 11
![Page 3: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/3.jpg)
The Competition
Problem Statement
k Neareast Neighbors
32 Dimensional Space or 32 element length vectors
1,000 (M) test vectors
10,000,000 (N) train vectors
Values are 12 bits
Mahalonobis Distance√(x − y)tS−1(x − y) vs
√(x − y)t(x − y) where x is a training
vector and y is a testing vector.Better results for some problems1024 multiplications vs 32
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11
![Page 4: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/4.jpg)
The Competition
Problem Statement
k Neareast Neighbors
32 Dimensional Space or 32 element length vectors
1,000 (M) test vectors
10,000,000 (N) train vectors
Values are 12 bits
Mahalonobis Distance√(x − y)tS−1(x − y) vs
√(x − y)t(x − y) where x is a training
vector and y is a testing vector.Better results for some problems1024 multiplications vs 32
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11
![Page 5: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/5.jpg)
The Competition
Problem Statement
k Neareast Neighbors
32 Dimensional Space or 32 element length vectors
1,000 (M) test vectors
10,000,000 (N) train vectors
Values are 12 bits
Mahalonobis Distance√(x − y)tS−1(x − y) vs
√(x − y)t(x − y) where x is a training
vector and y is a testing vector.Better results for some problems1024 multiplications vs 32
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11
![Page 6: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/6.jpg)
The Competition
Problem Statement
k Neareast Neighbors
32 Dimensional Space or 32 element length vectors
1,000 (M) test vectors
10,000,000 (N) train vectors
Values are 12 bits
Mahalonobis Distance√(x − y)tS−1(x − y) vs
√(x − y)t(x − y) where x is a training
vector and y is a testing vector.Better results for some problems1024 multiplications vs 32
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11
![Page 7: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/7.jpg)
The Competition
Problem Statement
k Neareast Neighbors
32 Dimensional Space or 32 element length vectors
1,000 (M) test vectors
10,000,000 (N) train vectors
Values are 12 bits
Mahalonobis Distance√(x − y)tS−1(x − y) vs
√(x − y)t(x − y) where x is a training
vector and y is a testing vector.Better results for some problems1024 multiplications vs 32
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11
![Page 8: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/8.jpg)
The Competition
Problem Statement
k Neareast Neighbors
32 Dimensional Space or 32 element length vectors
1,000 (M) test vectors
10,000,000 (N) train vectors
Values are 12 bits
Mahalonobis Distance√(x − y)tS−1(x − y) vs
√(x − y)t(x − y) where x is a training
vector and y is a testing vector.Better results for some problems1024 multiplications vs 32
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11
![Page 9: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/9.jpg)
The Competition
Problem Statement
k Neareast Neighbors
32 Dimensional Space or 32 element length vectors
1,000 (M) test vectors
10,000,000 (N) train vectors
Values are 12 bits
Mahalonobis Distance√(x − y)tS−1(x − y) vs
√(x − y)t(x − y) where x is a training
vector and y is a testing vector.Better results for some problems1024 multiplications vs 32
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11
![Page 10: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/10.jpg)
The Competition
Problem Statement
k Neareast Neighbors
32 Dimensional Space or 32 element length vectors
1,000 (M) test vectors
10,000,000 (N) train vectors
Values are 12 bits
Mahalonobis Distance√(x − y)tS−1(x − y) vs
√(x − y)t(x − y) where x is a training
vector and y is a testing vector.Better results for some problems1024 multiplications vs 32
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11
![Page 11: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/11.jpg)
The Competition
Problem Statement
k Neareast Neighbors
32 Dimensional Space or 32 element length vectors
1,000 (M) test vectors
10,000,000 (N) train vectors
Values are 12 bits
Mahalonobis Distance√(x − y)tS−1(x − y) vs
√(x − y)t(x − y) where x is a training
vector and y is a testing vector.Better results for some problems1024 multiplications vs 32
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11
![Page 12: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/12.jpg)
Our Approach
Optimizations
We choose a brute force solution. This is all 10,000,000,000 (M × N)products.
(x − y)tS−1(x − y) is used because√
is an increasing function.
(x − y)t(S−1x − S−1y) reduces the computation from 1024multiplications to 32 multiplications.
S−1x and S−1y can be calculated ahead of time. (Only 10,001,000matrix vector multiplications)
This results in approximately 1.3 trillion integer operations required.
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 4 / 11
![Page 13: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/13.jpg)
Our Approach
Optimizations
We choose a brute force solution. This is all 10,000,000,000 (M × N)products.
(x − y)tS−1(x − y) is used because√
is an increasing function.
(x − y)t(S−1x − S−1y) reduces the computation from 1024multiplications to 32 multiplications.
S−1x and S−1y can be calculated ahead of time. (Only 10,001,000matrix vector multiplications)
This results in approximately 1.3 trillion integer operations required.
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 4 / 11
![Page 14: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/14.jpg)
Our Approach
Optimizations
We choose a brute force solution. This is all 10,000,000,000 (M × N)products.
(x − y)tS−1(x − y) is used because√
is an increasing function.
(x − y)t(S−1x − S−1y) reduces the computation from 1024multiplications to 32 multiplications.
S−1x and S−1y can be calculated ahead of time. (Only 10,001,000matrix vector multiplications)
This results in approximately 1.3 trillion integer operations required.
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 4 / 11
![Page 15: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/15.jpg)
Our Approach
Optimizations
We choose a brute force solution. This is all 10,000,000,000 (M × N)products.
(x − y)tS−1(x − y) is used because√
is an increasing function.
(x − y)t(S−1x − S−1y) reduces the computation from 1024multiplications to 32 multiplications.
S−1x and S−1y can be calculated ahead of time. (Only 10,001,000matrix vector multiplications)
This results in approximately 1.3 trillion integer operations required.
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 4 / 11
![Page 16: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/16.jpg)
Our Approach
Optimizations
We choose a brute force solution. This is all 10,000,000,000 (M × N)products.
(x − y)tS−1(x − y) is used because√
is an increasing function.
(x − y)t(S−1x − S−1y) reduces the computation from 1024multiplications to 32 multiplications.
S−1x and S−1y can be calculated ahead of time. (Only 10,001,000matrix vector multiplications)
This results in approximately 1.3 trillion integer operations required.
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 4 / 11
![Page 17: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/17.jpg)
Our Approach
High level approach
trainA trainA
trainB trainB
testA testA
testB testB
MahalanobisProduct
MahalanobisProduct
k-NN
retret
0.6GB
1.3GB
64KB
128KB
256KB
Host Coprocessor
start time
end time
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11
![Page 18: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/18.jpg)
Our Approach
High level approach
trainA trainA
trainB trainB
testA testA
testB testB
MahalanobisProduct
MahalanobisProduct
k-NN
retret
0.6GB
1.3GB
64KB
128KB
256KB
Host Coprocessor
start time
end time
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11
![Page 19: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/19.jpg)
Our Approach
High level approach
trainA trainA
trainB trainB
testA testA
testB testB
MahalanobisProduct
MahalanobisProduct
k-NN
retret
0.6GB
1.3GB
64KB
128KB
256KB
Host Coprocessor
start time
end time
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11
![Page 20: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/20.jpg)
Our Approach
High level approach
trainA trainA
trainB trainB
testA testA
testB testB
MahalanobisProduct
MahalanobisProduct
k-NN
retret
0.6GB
1.3GB
64KB
128KB
256KB
Host Coprocessor
start time
end time
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11
![Page 21: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/21.jpg)
Our Approach
High level approach
trainA trainA
trainB trainB
testA testA
testB testB
MahalanobisProduct
MahalanobisProduct
k-NN
retret
0.6GB
1.3GB
64KB
128KB
256KB
Host Coprocessor
start time
end time
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11
![Page 22: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/22.jpg)
Our Approach
High level approach
trainA trainA
trainB trainB
testA testA
testB testB
MahalanobisProduct
MahalanobisProduct
k-NN
retret
0.6GB
1.3GB
64KB
128KB
256KB
Host Coprocessor
start time
end time
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11
![Page 23: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/23.jpg)
Our Approach
High level approach
trainA trainA
trainB trainB
testA testA
testB testB
MahalanobisProduct
MahalanobisProduct
k-NN
retret
0.6GB
1.3GB
64KB
128KB
256KB
Host Coprocessor
start time
end time
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11
![Page 24: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/24.jpg)
Our Approach
High level approach
trainA trainA
trainB trainB
testA testA
testB testB
MahalanobisProduct
MahalanobisProduct
k-NN
retret
0.6GB
1.3GB
64KB
128KB
256KB
Host Coprocessor
start time
end time
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11
![Page 25: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/25.jpg)
Our Approach
High level approach
trainA trainA
trainB trainB
testA testA
testB testB
MahalanobisProduct
MahalanobisProduct
k-NN
retret
0.6GB
1.3GB
64KB
128KB
256KB
Host Coprocessor
start time
end time
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11
![Page 26: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/26.jpg)
Our Approach
High level approach
trainA trainA
trainB trainB
testA testA
testB testB
MahalanobisProduct
MahalanobisProduct
k-NN
retret
0.6GB
1.3GB
64KB
128KB
256KB
Host Coprocessor
start time
end time
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11
![Page 27: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/27.jpg)
Our Approach
High level approach
trainA trainA
trainB trainB
testA testA
testB testB
MahalanobisProduct
MahalanobisProduct
k-NN
retret
0.6GB
1.3GB
64KB
128KB
256KB
Host Coprocessor
start time
end time
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11
![Page 28: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/28.jpg)
Our Approach
High level approach
trainA trainA
trainB trainB
testA testA
testB testB
MahalanobisProduct
MahalanobisProduct
k-NN
retret
0.6GB
1.3GB
64KB
128KB
256KB
Host Coprocessor
start time
end time
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11
![Page 29: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/29.jpg)
Hardware Design Platform
The Convey Platform
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
Memory
Controller 1
Memory
Controller 2
Memory
Controller 3
Memory
Controller 4
Memory
Controller 5
Memory
Controller 6
Memory
Controller 7
Memory
Controller 8
Design a k-NN processing element (PE) with one floating pointmultiply-accumulator (MAC).
Duplicate the PE block as many times as possible.
Give each PE access to memory.
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 6 / 11
![Page 30: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/30.jpg)
Hardware Design Platform
The Convey Platform
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
Memory
Controller 1
Memory
Controller 2
Memory
Controller 3
Memory
Controller 4
Memory
Controller 5
Memory
Controller 6
Memory
Controller 7
Memory
Controller 8
Design a k-NN processing element (PE) with one floating pointmultiply-accumulator (MAC).
Duplicate the PE block as many times as possible.
Give each PE access to memory.
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 6 / 11
![Page 31: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/31.jpg)
Hardware Design Platform
The Convey Platform
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
kNNPE
Memory
Controller 1
Memory
Controller 2
Memory
Controller 3
Memory
Controller 4
Memory
Controller 5
Memory
Controller 6
Memory
Controller 7
Memory
Controller 8
Design a k-NN processing element (PE) with one floating pointmultiply-accumulator (MAC).
Duplicate the PE block as many times as possible.
Give each PE access to memory.
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 6 / 11
![Page 32: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/32.jpg)
Hardware Design Systolic Array
Systolic Arrays
testA testB trainA trainB ret
k-NNPE
k-NNPE
k-NNPE
k-NNPE
. . .
Solves routing problem
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 7 / 11
![Page 33: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/33.jpg)
Hardware Design Processing Element
Single Processing Element
kNN PE
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11
![Page 34: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/34.jpg)
Hardware Design Processing Element
Single Processing Element
Datain
/192 Data
out
kNN PE
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11
![Page 35: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/35.jpg)
Hardware Design Processing Element
Single Processing Element
Datain
Opcodein
Indexin
Opcodeout
Indexout
/192 Data
out
kNN PE
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11
![Page 36: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/36.jpg)
Hardware Design Processing Element
Single Processing Element
Buffer
Datain
Opcodein
Indexin
Opcodeout
Indexout
/192 Data
out
≈ 1536 Registers
kNN PE
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11
![Page 37: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/37.jpg)
Hardware Design Processing Element
Single Processing Element
Buffer
TestCache
Datain
Opcodein
Indexin
Opcodeout
Indexout
/192 Data
out
660 Registers560 LUTs
≈ 1536 Registers
kNN PE
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11
![Page 38: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/38.jpg)
Hardware Design Processing Element
Single Processing Element
Buffer TrainBuffer
TestCache
Datain
Opcodein
Indexin
Opcodeout
Indexout
/192 Data
out
660 Registers560 LUTs
≈ 1536 Registers ≈1536 Registers≈768 LUTs
kNN PE
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11
![Page 39: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/39.jpg)
Hardware Design Processing Element
Single Processing Element
Buffer TrainBuffer
TestCache Product
Datain
Opcodein
Indexin
Opcodeout
Indexout
/192 Data
out
660 Registers560 LUTs
≈ 1536 Registers ≈1536 Registers≈768 LUTs
8704 Registers6806 Luts20 DSPs
kNN PE
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11
![Page 40: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/40.jpg)
Hardware Design Processing Element
Single Processing Element
Buffer TrainBuffer
TestCache Product
Sort
Datain
Opcodein
Indexin
Opcodeout
Indexout
/192 Data
out
660 Registers560 LUTs
316 Registers388 LUTs
7 BlockRAMs
≈ 1536 Registers ≈1536 Registers≈768 LUTs
8704 Registers6806 Luts20 DSPs
kNN PE
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11
![Page 41: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/41.jpg)
Hardware Design Dot Product
Dot Product Pipeline
31, 12-bit subtracters
31, 24-bit subtracters
32, 13x25-bit multipliers
31, 45-bit adder tree
≈ 128 interger operators
150Mhz, 128 processingelements
2.4 billion operations persecond
testA
testB
trainA
trainB
pro
du
ct
Vec
tor
Su
btr
acte
rV
ecto
rS
ub
trac
ter
Vec
tor
Mu
ltip
lier
Ad
der
Tre
e
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 9 / 11
![Page 42: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/42.jpg)
Hardware Design Dot Product
Dot Product Pipeline
31, 12-bit subtracters
31, 24-bit subtracters
32, 13x25-bit multipliers
31, 45-bit adder tree
≈ 128 interger operators
150Mhz, 128 processingelements
2.4 billion operations persecond
testA
testB
trainA
trainB
pro
du
ct
Vec
tor
Su
btr
acte
rV
ecto
rS
ub
trac
ter
Vec
tor
Mu
ltip
lier
Ad
der
Tre
e
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 9 / 11
![Page 43: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/43.jpg)
Hardware Design Dot Product
Dot Product Pipeline
31, 12-bit subtracters
31, 24-bit subtracters
32, 13x25-bit multipliers
31, 45-bit adder tree
≈ 128 interger operators
150Mhz, 128 processingelements
2.4 billion operations persecond
testA
testB
trainA
trainB
pro
du
ct
Vec
tor
Su
btr
acte
rV
ecto
rS
ub
trac
ter
Vec
tor
Mu
ltip
lier
Ad
der
Tre
e
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 9 / 11
![Page 44: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/44.jpg)
Hardware Design Dot Product
Dot Product Pipeline
31, 12-bit subtracters
31, 24-bit subtracters
32, 13x25-bit multipliers
31, 45-bit adder tree
≈ 128 interger operators
150Mhz, 128 processingelements
2.4 billion operations persecond
testA
testB
trainA
trainB
pro
du
ct
Vec
tor
Su
btr
acte
rV
ecto
rS
ub
trac
ter
Vec
tor
Mu
ltip
lier
Ad
der
Tre
e
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 9 / 11
![Page 45: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/45.jpg)
Hardware Design Sort
Sort
Counter
product
Bouncer
B3
B2
B1=100
B0
Inse
rter
RAM
V0
V1
V2
V3
out
7
19
42
68
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11
![Page 46: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/46.jpg)
Hardware Design Sort
Sort
Counter
product13
Bouncer
B3
B2
B1=100
B0
Inse
rter
RAM
V0
V1
V2
V3
out
7
19
42
68
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11
![Page 47: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/47.jpg)
Hardware Design Sort
Sort
Counter
product
Bouncer
B3
B2
B1=100
B0
Inse
rter
RAM
V0
V1
V2
V3
out
7
19
42
68
13
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11
![Page 48: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/48.jpg)
Hardware Design Sort
Sort
Counter
product
Bouncer
B3
B2
B1=100
B0
Inse
rter
RAM
V0
V1
V2
V3
out
7
19
42
68
13
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11
![Page 49: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/49.jpg)
Hardware Design Sort
Sort
Counter
product
Bouncer
B3
B2
B1=100
B0
Inse
rter
RAM
V0
V1
V2
V3
out
7
13
42
68
19
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11
![Page 50: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/50.jpg)
Hardware Design Sort
Sort
Counter
product
Bouncer
B3
B2
B1=100
B0
Inse
rter
RAM
V0
V1
V2
V3
out
7
13
19
6842
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11
![Page 51: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/51.jpg)
Hardware Design Sort
Sort
Counter
product
Bouncer
B3
B2
B1=100
B0
Inse
rter
RAM
V0
V1
V2
V3
out
7
13
19
42
68
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11
![Page 52: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/52.jpg)
Hardware Design Sort
Sort
Counter
product
Bouncer
B3
B2
B1=68
B0
Inse
rter
RAM
V0
V1
V2
V3
out
7
13
19
42
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11
![Page 53: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/53.jpg)
Results
Results
1.3 billion integer operations / 2.4 billion integer operations persecond = 0.54 seconds.
Actual runtime is 0.54 seconds.
Paper at:http://www.rcl.ece.iastate.edu/sites/default/files/papers/TowJon14A.pdf
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 11 / 11
![Page 54: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/54.jpg)
Results
Results
1.3 billion integer operations / 2.4 billion integer operations persecond = 0.54 seconds.
Actual runtime is 0.54 seconds.
Paper at:http://www.rcl.ece.iastate.edu/sites/default/files/papers/TowJon14A.pdf
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 11 / 11
![Page 55: First Place Memocode'14 Design Contest Entry](https://reader031.fdocuments.us/reader031/viewer/2022020218/559411ec1a28ab04618b478b/html5/thumbnails/55.jpg)
Results
Results
1.3 billion integer operations / 2.4 billion integer operations persecond = 0.54 seconds.
Actual runtime is 0.54 seconds.
Paper at:http://www.rcl.ece.iastate.edu/sites/default/files/papers/TowJon14A.pdf
Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 11 / 11