Post on 19-Dec-2015
Squaring Squaring FunctionFunction
Zehavit TrachtenbergZehavit Trachtenberg
Ido DinermanIdo Dinerman
Barak CohenBarak Cohen
Squaring FunctionSquaring Function
The squaring function is used in many The squaring function is used in many applications such as the Viterbi alg. (error applications such as the Viterbi alg. (error correction code), VQ alg. (image data correction code), VQ alg. (image data compression, speech and writing recognition) compression, speech and writing recognition) and calculating Euclidean squared distance and calculating Euclidean squared distance estimation.estimation.
Fast implementation for RT purposes is Fast implementation for RT purposes is needed, two of which will be explored in this needed, two of which will be explored in this work:work:
1. Digital implementation – compensating 1. Digital implementation – compensating algorithm by Ming-Hwa Sheu and Su-Hon Lin.algorithm by Ming-Hwa Sheu and Su-Hon Lin.
2. Analog implementation.2. Analog implementation.
Project GoalsProject Goals
Function implementation – Analog Function implementation – Analog (Spice)(Spice)
Function implementation – Digital Function implementation – Digital (VHDL)(VHDL)
Implementation of function for Implementation of function for practical use (Pythagoras Theorem).practical use (Pythagoras Theorem).
DIGITALDIGITAL
Digital ImplementationDigital Implementation
The digital implementation is based The digital implementation is based on the on the approximateapproximate squaring squaring function.function.
Input: n-bit binary data Input: n-bit binary data
A = A = ΣΣ 2 2ii*a*aii i=0..n-1 i=0..n-1
Output: 2n-bit binary numberOutput: 2n-bit binary number
R = R = ΣΣ 2 2mm*r*rmmn n ≈ A≈ A2 2
m=0..2n-1m=0..2n-1
AlgorithmAlgorithm
The output expression of the exact The output expression of the exact squaring function is:squaring function is:
(for a 4 bit number)(for a 4 bit number)
A² = (aA² = (a33aa22aa11aa00)² )²
= 2= 266(a(a33+a+a33aa22) + 2) + 255aa33aa11+ + 2244(a(a22+a+a33aa00+a+a22aa11)+2)+233aa22aa00++
2222(a(a11aa00+a+a11)+2)+200aa00
Algorithm cont.Algorithm cont.
Step 1Step 1: The approximate result R is equal to : The approximate result R is equal to the pure terms the pure terms
R = 2R = 266aa33+2+244aa22+2+222aa11+2+200aa00
Step 2Step 2: Select the closest composite terms for : Select the closest composite terms for compensation (2compensation (266aa33aa2 , 2 , 2244aa22aa1, 1, 2222aa11aa00):):
R=2R=266(a(a33+a+a33aa22)+2)+244(a(a22+a+a22aa11)+)+
2222(a(a11+a+a11aa00)+2)+200aa0 0
=2=277aa33aa22+2+266aa33aa22+2+255aa22aa11+2+244aa22aa11+2+233aa11aa00
+2+222aa11aa00+2+200aa00
Algorithm cont.Algorithm cont.
Step 3Step 3: choose the second closest : choose the second closest composite terms for compensation.composite terms for compensation.
do the same as step 2 (terms 2do the same as step 2 (terms 255aa33aa11, , 2233aa22aa00) The result is:) The result is:
R= 2R= 277aa33aa22+2+266aa33(a(a22UaUa11) ) +2+255(a(a33+a+a22)a)a11+2+244aa22(a(a11UaUa00))
+2+233(a(a22+a+a11)a)a00+2+222aa11aa00+2+200aa00
Algorithm cont.Algorithm cont.
Step 4:Step 4: the approximation: the approximation:
Add the remaining term (2Add the remaining term (244aa33aa00) to the ) to the sum with the OR operator:sum with the OR operator:
R=2R=277aa33aa22+2+266aa33(a(a22UaUa11) ) +2+255(a(a33+a+a22)a)a11+2+244aa22(a(a11UaUa00)Ua)Ua33aa00
+2+233(a(a22+a+a11)a)a00+2+222aa11aa00+2+200aa00
Algorithm result Algorithm result
rr447 7 = a= a33aa22 r r44
66 = = aa33aa22UaUa33aa11
rr445 5 == aa33aa22aa11UaUa33aa22aa1 1 rr44
4 4
=a=a22aa11UaUa22aa00UaUa33aa00
rr443 3 = a= a22aa11aa00UaUa22aa11aa0 0 rr44
2 2 = a= a11aa00
rr441 1 = 0 = 0 r r44
0 0 = a= a00
Algorithm cont.Algorithm cont.
By induction: By induction: rrnn
i-1 i-1 = a= an-1n-1aan-2n-2
rrnni-2 i-2 = a= an-1n-1aan-2n-2UaUan-1n-1aan-3n-3
rrnni-3 i-3 = a= an-1 n-1 (r(rn-1n-1
i-3i-3) U a) U an-1n-1aan-2n-2aan-3n-3
rrnni-4 i-4 = (r= (rn-1n-1
i-4i-4)Ua)Uan-1n-1aan-4n-4
rrnni-n i-n = (r= (rn-1n-1
i-ni-n)Ua)Uan-1n-1aa00
rrnni-n-1 i-n-1 = (r= (rn-1n-1
i-n-1i-n-1))
rrnn0 0 = (r= (rn-1n-1
00))
Algorithm errorAlgorithm error Error = Error =
(A² - R) /A²*100%(A² - R) /A²*100%
the error increases with the error increases with the length of the the length of the number.number.
i.e. for 4 bits the error is i.e. for 4 bits the error is 9.47% and for 10 bits 9.47% and for 10 bits the error is 18.19%the error is 18.19%
Average error : 4 bits Average error : 4 bits 1.04% and 10 bits 1.04% and 10 bits 4.21%4.21%
0
20
40
60
80
100
13579111315171921232527293116 bit
16 bit
4 bit
ImplementationImplementation
VHDL simulation of the function.VHDL simulation of the function. implementation in the transistor level implementation in the transistor level
using CMOS transistors.using CMOS transistors. Place and route for the circuit.Place and route for the circuit. size, power and speed analysis.size, power and speed analysis.
SPICE ImplementationSPICE Implementation
SPICE SimulationSPICE Simulation
Simulation ResultsSimulation Results
OUTPUT
INPUT
LSB
LSB
MSB
MSB
Propagation DelayPropagation Delay
Propagation Delay :Propagation Delay : Nand2: Tpd = 5.1 nsNand2: Tpd = 5.1 ns Nand3: Tpd = 6.2nsNand3: Tpd = 6.2ns Nor2: Tpd = 4.4nsNor2: Tpd = 4.4ns Buffer: Tpd = 6.6 ns Buffer: Tpd = 6.6 ns
The propagation delay for the critical The propagation delay for the critical path (the one for R2 or R7) is a nor2 path (the one for R2 or R7) is a nor2 gate and a buffer, thus Tpd is 11nsgate and a buffer, thus Tpd is 11ns
Layout (Logic)Layout (Logic)
Layout (chip)Layout (chip)
Vcc
Gnd
out8 out7 out6
out5
out4
out3
out2in4in3
in2
in1
out1
LVS ResultLVS Result
VHDL ImplementationVHDL Implementation
The digital model was implemented The digital model was implemented using VHDL structural architecture using VHDL structural architecture similar to the spice implementation.similar to the spice implementation.
Propagation delay times were Propagation delay times were calculated using simulations of calculated using simulations of scmos library logical gates.scmos library logical gates.
VHDL simulation resultsVHDL simulation results
VHDL simulation results VHDL simulation results contcont
The errors in the algorithm occurred in a = 13 and in a = 15
Expected results :
for a = 13 r = 169 simulation result = 153for a = 15 r = 225 simulation result = 209
Error CorrectionError Correction
Different methods for producing an Different methods for producing an error-free Digital Squaring Function:error-free Digital Squaring Function:
• Implementing the shown algorithm Implementing the shown algorithm without using approximation.without using approximation.
• Correcting 2 error outputs using an Correcting 2 error outputs using an Error Correction Unit.Error Correction Unit.
Straightforward Straightforward CalculationCalculation
rr447 7 = a= a33aa22+[a+[a33(a(a22+a+a11)]{[(a)]{[(a33+a+a22)a)a11][a][a22(a(a11+a+a00)a)a33aa22]}]}
rr4466 ={[a ={[a33(a(a22+a+a11)] + [(a)] + [(a33+ a+ a22)a)a11][a][a22(a(a11+a+a00)a)a33aa00]}]}
rr445 5 =[(a=[(a33+ a+ a22)a)a11] + [a] + [a22(a(a11+a+a00)a)a33aa00] ]
rr444 4 = a= a22(a(a11+a+a00) + a) + a33aa00
rr443 3 = a= a11aa00+ a+ a22aa0 0
rr442 2 = a= a11aa00
rr441 1 = 0 = 0
rr440 0 = a= a00
Straightforward Algorithm Straightforward Algorithm cont.cont.
Estimated number of transistors per each Estimated number of transistors per each output bit:output bit:
rr447 7 = 52 r= 52 r44
66 = 44 = 44
rr445 5 = 32= 32 rr44
4 4 = 18 = 18
rr443 3 = 6= 6 rr44
2 2 = 4= 4
rr441 1 = 0 r= 0 r44
0 0 = 0= 0input inverters = 8input inverters = 8Buffers ~36Buffers ~36Total # transistors ~200 Total # transistors ~200 Important : the calculated number does not contain transistors in buffers.Important : the calculated number does not contain transistors in buffers.
Straightforward Algorithm Straightforward Algorithm cont.cont.
Calculating the Propagation Delay:Calculating the Propagation Delay:
# of levels in longest path (R7) = 5# of levels in longest path (R7) = 5
Pd for a NAND2 gate = 5.1nsPd for a NAND2 gate = 5.1ns
Pd for input inverter = 3nsPd for input inverter = 3ns
Total Pd (worst case) = 28.5nsTotal Pd (worst case) = 28.5ns
Error Correction UnitError Correction Unit
Designing an error correction unit for Designing an error correction unit for squaring a 4-bit number. squaring a 4-bit number.
The following implementation deals The following implementation deals with 2 errors: with 2 errors:
Err1: 13Err1: 1322 = 153… = 153… 13 1322 = 169 = 169
Err2: 15Err2: 1522 = 209… = 209… 15 1522 = 225 = 225
Error Correction UnitError Correction Unit
INPUT:
approximated
resultOUTPUT:
correct
result
Simulation ConfigurationSimulation Configuration
Error Correction OutputError Correction Output
OUTPUT
INPUT
LSB
MSB
LSB
MSB
correct
output!
Error Correction –Pros & Error Correction –Pros & ConsCons
Pros:Pros:• It’s correct! It’s correct! Cons:Cons:• Area usageArea usage• Resources & Cost – 120 transistors in Resources & Cost – 120 transistors in
correction unit.correction unit.• Propagation – requires synchronization in Propagation – requires synchronization in
order to avoid hazards (at Squaring Function order to avoid hazards (at Squaring Function output), considerable increase in propagation output), considerable increase in propagation delaydelay
• Not a generic solutionNot a generic solution
Comparing Error Correction Comparing Error Correction MethodsMethods
CompensatedCompensated
implementationimplementation
Error Error Correction UnitCorrection Unit
Straightforward Straightforward ImplementationImplementation
# of # of transistorstransistors
104104 104 + 120104 + 120 ~200~200
Pd (worst Pd (worst case)case)
13.2 ns13.2 ns Extra synch. Extra synch. Unit requiredUnit required
28.5 ns28.5 ns
PowerPower 3.02*103.02*10-16-16*f*f 3.02*103.02*10-16-16*f+*f+
3.98*103.98*10-16-16*f*f~6*10~6*10-16-16*f*f
AreaArea 1.72*101.72*10-8-8mm22 1.72*101.72*10-8-8mm2 2
+1.7*10+1.7*10-8-8mm22~2.45*10~2.45*10-8-8mm22
ANALOGANALOG
Analog implementationAnalog implementation
Consider the following Consider the following arrangement of CMOS arrangement of CMOS transistors. transistors.
M1, M2 in saturation.M1, M2 in saturation. The equation of transistors The equation of transistors
in saturation is:in saturation is:
IIdd=K(V=K(Vgsgs-V-Vtt)²)²
in our circuit:in our circuit:
II11=K(V=K(Vaa-V-Vtt)²)²
II22=K(V=K(Vbb-V-Vtt)²)²
VVbb = V = V22 – V – Vaa
Analog implementation Analog implementation cntd.cntd.
combining the three equations we will combining the three equations we will receive the following:receive the following:
difference of output currents:difference of output currents:
II11 – I – I22 = K(V = K(V22-2V-2Vtt)(V)(Vaa-V-Vbb))
sum of output currents:sum of output currents:
II11 + I + I22 = ½K(V = ½K(V22-2V-2Vtt)²+(I)²+(I11– I– I22) ²/2K(V) ²/2K(V22--2V2Vtt)²)²
Analog implementation Analog implementation cntd.cntd.
In order to provide In order to provide a stable Va stable V2 2 voltage voltage source we will use source we will use a current a current controlled circuit. controlled circuit.
II00 = = 11//44 K(V K(V22-2V-2Vtt) ²) ²
Analog implementation Analog implementation cntd.cntd.
By connecting the By connecting the drain and the drain and the source of M1 we source of M1 we get our new get our new circuit. Our circuit. Our previous equations previous equations still hold. We still hold. We consider Iconsider Iinin as an as an input. we get:input. we get:
IIinin = I = I11-I-I22
Analog implementation Analog implementation cntd.cntd.
We copy I1 using a current mirror, hence We copy I1 using a current mirror, hence we get:we get:
IIoutout = I1+I2 = I1+I2
Now, we substitute INow, we substitute I inin and I and Ioutout in our in our previous result:previous result:
II11 + I + I22 = ½K(V = ½K(V22-2V-2Vtt)²+)²+
(I(I11– I– I22) ²/2K(V) ²/2K(V22-2V-2Vtt)²)²
We get:We get:
IIinin= ½K(V= ½K(V22-2V-2Vtt)²+)²+
(I(Ioutout) ²/2K(V) ²/2K(V22-2V-2Vtt)²)²
Analog implementation Analog implementation cntd.cntd.
Remember that VRemember that V22 is controlled by the control current is controlled by the control current II00 = = 11//44 K(V K(V22-2V-2Vtt) ²) ²
substituting this in the previous expression we finally get:substituting this in the previous expression we finally get:IIoutout = 2I = 2I00+ I+ Iinin
22 // 8I8I00
We can eliminate the offset current 2IWe can eliminate the offset current 2I00 by subtracting it by subtracting it
from the output. We do so by copying Ifrom the output. We do so by copying I0 0 twice and twice and subtracting it from Isubtracting it from I0 0 . We finally get:. We finally get:
IIoutout = I = Iinin22
// 8I8I00
In order to keep all the devices in the circuit in ON state we In order to keep all the devices in the circuit in ON state we have to maintain the following:have to maintain the following:
|I|Iinin|| < 4I< 4I00
The final squaring The final squaring circuit: circuit:
Simulation resultsSimulation resultsDc analysis: sweep of input current -4*I0 4*I0Control current of 175uA gives the best results:
expected
output
Max absolute Error of Max absolute Error of 8uA 8uA
Approximated Error in percentage: ~0.75%
Simulation resultsSimulation resultsSquare of sin function: expected
output
BW of 10MHzBW of 10MHz
10MHz
10GHZ
1GHz
100MHz
For each frequency the input is sin(wt). The expected output – sin2(wt) is presented as well as the output of the circuit.
Analog summaryAnalog summary Area: 8 transistors (Very small) Area: 8 transistors (Very small) Band Width: 10MHzBand Width: 10MHz Input Current Range : Input Current Range :
-700uA -700uA 700uA 700uAAbsolute Error: 8uA (accuracy error more Absolute Error: 8uA (accuracy error more effective than enviromental errors – noise) effective than enviromental errors – noise) Error in percentage: ~0.75%Error in percentage: ~0.75%hence the device can handle a range of hence the device can handle a range of 2*(700/8)=180 values. 2*(700/8)=180 values.
Constant power dissipation (can be Constant power dissipation (can be reduced when the device is not in use by reduced when the device is not in use by adding more hardware) adding more hardware)
Analog Square RootAnalog Square Root
tdV
K
IgsV
The equation of The equation of transistors in transistors in saturation is:saturation is:
By solving for VBy solving for Vgs gs we get:we get:
)²V-K(VI tgsd
Analog Square RootAnalog Square Rootoutput
expected
C = sqrt(AC = sqrt(A22+B+B22))We combine all the results so far in order to implement Pythagoras Theorem and find an Euclidian distance:
ResultsResults expected
output
Sqrt(A2+B2)
A2+B2
Input
Pythagoras 2Pythagoras 2ndnd try try
ResultsResults expected
output
Sqrt(A2+B2)
A2+B2
Input
BibliographyBibliography
(1)(1) Fast Compensative Design Approach Fast Compensative Design Approach for the Approximate Squaring Function for the Approximate Squaring Function - Ming-Hwa Sheu and Su-Hon Lin, - Ming-Hwa Sheu and Su-Hon Lin, IEEE Journal of Solid-State Circuits, Vol.37, No.1, Jan IEEE Journal of Solid-State Circuits, Vol.37, No.1, Jan 20022002
(2)(2) ““A Class of Analog CMOS Circuits Based A Class of Analog CMOS Circuits Based on the Square-Law Characteristic of an on the Square-Law Characteristic of an MOS Transistor in Saturation” by Klass MOS Transistor in Saturation” by Klass Blut and Hans Wallinga, Blut and Hans Wallinga, IEEE Journal of Solid-State Circuits, Vol.SC-22, No.3, June 1987IEEE Journal of Solid-State Circuits, Vol.SC-22, No.3, June 1987
THE ENDTHE END