Design and Implementation of Hybrid CORDIC Algorithm Based ...

Research ArticleDesign and Implementation of Hybrid CORDIC AlgorithmBased on Phase Rotation Estimation for NCO

Chaozhu Zhang Jinan Han and Ke Li

College of Information and Communication Engineering Harbin Engineering University Harbin 150001 China

Correspondence should be addressed to Jinan Han hanjinanbuaa163com

Received 17 January 2014 Revised 20 June 2014 Accepted 23 June 2014 Published 7 July 2014

Academic Editor Brajesh K Kaushik

Copyright copy 2014 Chaozhu Zhang et alThis is an open access article distributed under theCreative CommonsAttribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

The numerical controlled oscillator has wide application in radar digital receiver and software radio system Firstly this paperintroduces the traditional CORDIC algorithmThen in order to improve computing speed and save resources this paper proposesa kind of hybrid CORDIC algorithm based on phase rotation estimation applied in numerical controlled oscillator (NCO)Throughestimating the direction of part phase rotation the algorithm reduces part phase rotation and add-subtract unit so that it decreasesdelay Furthermore the paper simulates and implements the numerical controlled oscillator by Quartus II software and Modelsimsoftware Finally simulation results indicate that the improvement over traditional CORDIC algorithm is achieved in terms of easeof computation resource utilization and computing speeddelay while maintaining the precision It is suitable for high speed andprecision digital modulation and demodulation

1 Introduction

Numerical controlled oscillator (NCO) is an important partof digital downconversion It is widely used in radar wirelesstransceiver system and software radio system [1ndash3]Themainfunction of NCO is to produce two path sine and cosine datasamples with variable frequency discrete time and mutuallyorthogonal It has an advantage of high frequency precisionand fast response

The traditional implement method of NCO is lookuptable and polynomial expansion method Data accuracy oflookup table method depends on the size of the lookuptable ROM The size of the memory and the precision ofphase accuracy are exponential relationship which enlargesthe resource consumption and reduces the processing speedof the system In [4] it solves this problem by using storecontent mapping technology of odd-even symmetry to opti-mize the storage unit and reduce the storage resources to125 However under the request of high precision it stillconsumes a lot of resources Polynomial expansion methodis a real-time computing method which needs multiplierresources and has certain restrictions on the complexity andspeed of the hardware It is too hard for the two methods totrade off speed accuracy and resource Coordinate rotation

digital compute algorithm (CORDIC) is proposed to solvethe problem CORDIC algorithm uses a basic algorithm toreplace the complex algorithm CORDIC algorithm is easyto hardware implementation It does not require hardwaremultiplier and all operations are only shift accumulationwhich meets the hardware requirements of modular andregularization algorithm requirements

Along with proposing high speed broadband receiver thedata accuracy and processing speed have a higher requestUnder the background traditional CORDIC algorithm hassome inherent drawbacks such as limited coverage angleand too much pipeline series which increase resource con-sumption and limit data processing speed Aiming at theseshortcomings this paper puts forward an efficient pipelinearchitecture CORDIC algorithm for NCO design

2 Traditional CORDIC Algorithm

Volder CORDIC algorithm was proposed in 1959 and in1971 Walther unified the form of the algorithm Meyer-baserealized the algorithm [5 6] using FPGA implementationfor the first time CORDIC algorithm has been applied inmany fields such as direct digital frequency synthesizer fast

Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 897381 8 pageshttpdxdoiorg1011552014897381

2 The Scientific World Journal

(xi yi)

(xj yj)

xo

y

120579

Figure 1 CORDIC vector rotation diagram

Fourier transform discrete cosine transform digital modula-tiondemodulator and stream processors [7ndash10] Accordingto certain phase starting point (119909

119894 119910119894) rotates continuously

and approaches the final point gradually Rotation vectordiagram is shown in Figure 1

In Figure 1 it is easy to get

[119909119895

119910119895

] = [cos 120579 minus sin 120579

sin 120579 cos 120579 ] [119909119894

119910119894

] (1)

From the start to the end position spinning process canbe done by several steps and each step only rotates a certainphase

[119909119899+1

119910119899+1

] = [cos 120579119899

minus sin 120579119899

sin 120579119899

cos 120579119899

] [119909119899

119910119899

] (2)

After extracting cos 120579119899 formula (2) can be expressed as

follows

[119909119899+1

119910119899+1

] = cos 120579119899[

1 minus tan 120579119899

tan 120579119899

1] [

119909119899

119910119899

] (3)

In order to simplify the hardware implementation everyoperation sets each rotation phase to 120579

119899= arctan(2minus119899) The

total rotation phase is 120579 = sum119878119899120579119899 So tan 120579

119899= 1198781198992minus119899 Formula

(3) can be expressed as follows

[119909119899+1

119910119899+1

] = cos 120579119899[

1 minus1198781198992minus119899

1198781198992minus119899 1

] [119909119899

119910119899

] (4)

From formula (4) in addition to the cos 120579119899coefficient the

operation is simple shift and addition

In the final result cos 120579119899can be eliminated bymultiplying

a known constant For example 119875 the number of iterationsis 16 and |120579| le 1205874 119870 can be expressed as follows

119870 =

16

prod119899=1

cos 120579119899=

16

prod119899=1

cos (arctan (2minus119899

))

=

16

prod119899=1

(1 minus 2minus2119894

)minus12

(5)

In the phase rotation process approximative rotationaliterative formula is

[119909119899+1

119910119899+1

] = [1 minus119878

1198992minus119899

1198781198992minus119899 1

] [119909119899

119910119899

] (6)

Parameter 119911 is used to judge when the iteration is over119911119899+1

= 119911119899minus 120579119899 1199110

= 120579 When 119911119899

lt 120579 119878119899

= minus1 When119911119899

ge 120579 119878119899

= +1 If the initial value is (119909119894 119910119894) = (119909

0 1199100) =

(119870 0) (119909119899 119910119899) of 119875th iteration will converge to (cos 120579 sin 120579)

The phase convergence satisfies the CORDIC convergencetheorem [6] The constant scaling factor 119870 is fixed and canbe precomputed as long as the precision 119873 is determinedAfter analysis of traditional CORDIC algorithm calculationaccuracy the iteration number and phase precision areexpressed as follows

119875 ge minuslog2[tan (Δ120579min)] (7)

where Δ120579min = 21205872119873 and the input phase data width is 119873

3 Hybrid CORDIC Algorithm Based onPhase Rotation Estimation

Common operation structures are iteration pipeline anddifferential CORDIC algorithm Iterative structure occupiesless hardware resources but the processing data efficiency islow Although the pipeline structure occupies more hardwareresources it can improve the throughput Based on the tworealization structures implementation schemes have parallelpipelines hybrid rotation CORDIC angle encoding methodand so forth [11 12] The work in [13] puts forward the wayof prediction rotation direction The algorithm applied inerror analysis and elimination has the advantages of fastspeed But it does not optimize hardware structure Usingthe structure of the parallel hybrid CORDIC algorithmthe prediction scheme of [14] is more regular and simplercompared to previous approaches which can reduce thenumber of iterations by more than 50 percent Howeverthe judgment of rotation direction is not optimized whichincreases latency time and resources so that it affects thethroughput The work in [15] puts forward a modifiedhybrid CORDIC algorithm and improved the precision ofoutput data but the method is more complex Trading offthe disadvantages of the above methods and advantages ofpipeline structure and iterative structure this paper simplifiesthe CORDIC algorithm further By using the arctangentfunction property it reduces the rotating judgment and add-subtract unit operation

The Scientific World Journal 3

In this paper attention is focused mainly on techniquesthat reduce the number of iterations while keeping thelow latency The hybrid CORDIC algorithm based on phaserotation estimation is presented in this section which can beaddressed by digit-on-line pipelined CORDIC circuits andrepetitive multiple accumulations architecture

31 Rotation Phase Estimation Assuming that the inputphase length of CORDIC algorithm is 119873 and pipeline seriesis 119875 rotation phase 120579 can be represented as follows

120579 =

119875

sum119899=1

119878119899120579119899=

119875

sum119899=1

119878119899arctan (2

minus119899) (8)

where 119878119899= plusmn1 It is noted that the initial value of 119899 is 1 and the

reason is that we restrict the rotation angle within the range|120579| le 1205874 in the application example of NCO

With the increase of rotational coefficient 119899 arctan(2minus119899)gets close to 2minus119899 When 119899 ge 1 2minus119899 gt arctan(2minus119899) Error is 120576

119899=

2minus119899 minus arctan(2minus119899) Arctangent function is developed throughthe tailor equation

120576119899= 2minus119899

minus [2minus119899

minus1

3(2minus119899

)3

+1

5(2minus119899

)5

minus sdot sdot sdot ]

=1

3(2minus119899

)3

minus1

5(2minus119899

)5

+ sdot sdot sdot

(9)

where 120576119899

lt (13)2minus3119899 The minimum phase value is 2minus119873In the process of phase rotation when the error estimate is(13)2

minus3119899 le 2minus119873 error generated by estimated value 2minus119899 canbe ignored The range of 119899 is

119899 ge119873 minus log

23

3 (10)

When 119899 ge 119898 = lceil(119873 minus log23)3rceil arctan(2minus119899) asymp 2minus119899

Through (7) pipeline series of CORDIC algorithm is 119875 ge

119873 minus 2 The less the pipeline series are the faster the speedis When 119875 = 119873 minus 2 we define the hybrid radix set

120579 =

119873minus2

sum119899=1

119878119899120579119899=

119898

sum119899=1

119878119899arctan (2

minus119899) +

119873minus2

sum119899=119898+1

1198781198992minus119899

(11)

After iterating119898+1 times the sum of residual rotation phaseis sum120579

119899 as shown in formula (12)

sum120579119899=

119873minus2

sum119899=119898+1

1198781198992minus119899

= 119878119898+1

2minus119898minus1

+ 119878119898+2

2minus119898minus2

sdot sdot sdot + 119878119873minus2

2minus119873+2

lt 2minus119898

(12)

The actual residual phase is 119911119898+1

According to thetraditional CORDIC algorithm theory 119911

119898+1asymp sum120579

119899 So

119911119898+1

lt 2minus119898 When the (119898 + 1)th rotation begins the newrotation phase 120579

119898+1is 120601119898+1

where the absolute value of 119911119898+1

is 120601119898+1

Thus tan 120579119898+1

= 119878119898+1

120579119898+1

When 119911119898+1

lt 0 119878119898+1

=

minus1 When 119911119898+1

ge 0 119878119898+1

= +1 Taking it into formula (3)

[119909119898+2

119910119898+2

] = cos120601119898+1

[1 minus119878

119898+1120579119898+1

119878119898+1

120579119898+1

1] [

119909119898+1

119910119898+1

] (13)

After the rotation the residual phase is 0 It shows that119909119898+2

and 119910119898+2

are the output of cosine data and sine data

32 Rotation Function Optimization and Error Analysis Inorder to obtain cosine data from the new pipeline processwe put forward unidirectional rotation method to reduce thecomparator and choose addition or subtractor 120579

119898+1should

be expressed firstly 119873-bits input phase needs to iterate 119898

times The results can be expressed as 119882 The residual phaseat this time is 119911

119898+1 120579119898+1

is expressed as follows

119879 =

119873minus2

sum119894=1

1198601198942119894= 119860119873minus2

2119873minus2

+ 119860119873minus3

2119873minus3

+ sdot sdot sdot + 119860121 (14)

where 119860119894= 1 or 0 When 119882[119873 minus 2] = 0 119860

119894= 119882[119894] When

119882[119873 minus 2] = 1 119860119894is that 119882[119894] flips every bit and adds 1

120579119898+1

=2120587119879

2119873=

2119873 (15)

From formula (15) is unknown In the hardwareimplementation needs to be expressed as follows

=

119873

sum119894=1

1198611198942119894= 2120587119879

=

119873

sum119894=1

1198601198942119894(22+ 2 + 2

minus2+ 2minus5

+ 2minus9

)

(16)

where 119861119894= 1 or 0 Taking all figures of into (15)

120579119898+1

=

119873minus119898minus1

sum119894=1

1198611198942119894minus119873

= 119861119873minus119898minus1

2minus119898minus1

+ sdot sdot sdot + 119861222minus119873

+ 119861121minus119873

(17)

Uniting formulas (13) and (17)

[119909119898+2

119910119898+2

] =

[[[[[

[

119909119898+1

minus 119878119898+1

sdot


sum119894=1

1198611198942119894minus119873 sdot 119910

119898+1

119878119898+1

sdot


sum119894=1

1198611198942119894minus119873 sdot 119909

119898+1+ 119910119898+1

]]]]]

]

(18)

where 119878119898+1

= plusmn1 and 119861119894= 1 or 0

CORDIC algorithm of efficient pipeline usesarctan 2minus1 arctan 2minus119898minus1 120579

119898+1 instead of the traditional

rotation phase arctan 2minus1 arctan 2minus119898minus1 arctan 2minus119875The last set of rotation phase can be expressed as binaryRotation direction is obtained directly from the last setOne more shift and add operation reduces 119873 minus 119898 minus 3

rotation times Under the premise of ensuring phase anddata accuracy it reduces the resource consumption andimproves the operation speed Finally with fewer lines seriesconstant coefficient is as follows

=

119898+1

prod119899=1


)) (19)


ROM

PhaseadderROM

Phaseadder

16

16

Sin output

Cos output

Efficient pipeline architecture CORDIC

algorithm

One estimate rotation Output

Exchange

16bits

16bits

16bits

16bits

16bitsFew

Phase map(0sim2120587) ge (0sim1205874)

7 timestraditional

phase rotation

Antidataand

add one

Non-exchange

Figure 2 High speed and precision NCO structure

At this time set the initial value (1199090 1199100) to ( 0)

According to the above process (119909119898+2

119910119898+2

) converges to(cos 120579 sin 120579)

The cosine error of this algorithm can be divided intothree parts

(1) the quantization errors are caused due to the limitedword length

(2) limited phase word length leads to approximationerror

(3) the phase estimation gives rise to the rotation estima-tion error

Quantization error is in an inverse ratio to word lengthand output word length is set by pipeline seriesThemore thepipeline series are the lower the quantization error is But theincrease of pipeline series will lead to resources consumptionSo according to the data figure it is necessary to tradeoff pipeline series and quantization error Considering thehardware consumption computing speed and precision [7]proposes the optimization method of data bits and pipelineseries According to [16] the quantization error consists oftwo parts the quantization error produced before and thistime It can be expressed as follows

1003816100381610038161003816E1198991003816100381610038161003816 le

10038161003816100381610038161003816100381610038161003816100381610038161003816

1003816100381610038161003816e1198991003816100381610038161003816 +

119899minus1

sum119894=0

(

119899minus1

prod119895=119894

119878119895)

1003816100381610038161003816e1198941003816100381610038161003816

10038161003816100381610038161003816100381610038161003816100381610038161003816

(20)

where E119899is the sum of quantization error and e

119899is the 119899th

phase rotation quantization error with 119878119895= plusmn1

When the output data is 119873 and e119894= [119890119909119894

119890119910119894]119879 |119890119909119894| le 120576

|119890119910119894| le 120576 120576 = 2minus119873minus1 |e

119894| can be expressed as follows

1003816100381610038161003816e1198941003816100381610038161003816 = radic1198902

119909119894+ 1198902119910119894

le radic2 times 2minus119873minus1

(21)

According to formula (7) when phase length is 119873 phaseresolution is 120593 = 21205872119873 Approximation error produced bylimited phase word length can be expressed as follows

|119860| =

10038161003816100381610038161003816119881 minus 1198811015840

10038161003816100381610038161003816100381610038161003816100381611988110158401003816100381610038161003816

le 2 sin(Δ120579

2) le Δ120579 le 120593 (22)

where 119881 is the actual value and 1198811015840 is the error value Δ120579 is

the difference between real phase and approximate phase Inthe final rotating phase estimate rotating phase arctan(2minus119899) isinstead of 2minus119899 Arc value of 2120587 can be replaced only by binaryvalues similarly In formula (16) the generated error can beexpressed as follows

|119861| =

119873minus3

sum119894=119898+2

(2minus119894

minus arctan (2minus119894)) + 2120587 minus 62832

asymp

119873minus3

sum119894=119898+2

(2minus119894

minus arctan (2minus119894)) + 2 times 10

minus5

(23)

4 The FPGA Design andImplementation of NCO

41 High Speed and Precision NCO Structure This paperadopts efficient pipelining structure CORDIC algorithm forhigh speed and high precision NCO Its structure is shownin Figure 2 We take 16-bit phase control words as anexample Firstly input is a 16-bit phase control word and16-bit frequency control word Secondly through the phaseaccumulator and phase adder the output is 16-bit phase valuePhase map generates the 0 1205874 phaseThirdly the shift-addefficient pipelining structure processes phase data Finallyaccording to the previous mapping relation 16-bit sine andcosine data can be generated

The range 120579 of rotation angle value is minus44855∘ 44855∘and approximates to minus1205874 1205874 It does not meet the 0 2120587

scope of phase Before 16-bit phase values are sent into thealgorithm cosine function property can judge the highestsecond highest and third highest bit According to certainmapping relation the highest 3 bits of 16-bit phase value andphase can be reduced to 3

1015840119887000 and 0 1205874 respectivelyThe highest bit controls sine data symbol If the bit is 1 thealgorithm flips the sine data and adds 1 On the other handthe algorithm does not process input data The highest bitand second highest bit control cosine data symbol If they aredifferent the algorithm flips sine data and adds 1 Otherwiseit remains to be the input data The second highest bit andthird highest bit control the location of cosine data and sine


plusmn

plusmn

Sign

plusmn

plusmn

Sign

plusmn

plusmn plusmn

Sign

plusmn

plusmn plusmn

plusmn plusmn plusmn

x1 y1

z1

z2

z3

z4

z5

plusmn plusmn

1205791

1205792

1205793

1205795

1205794

Sign

plusmn

plusmn

≪10 ≪9

≪5 ≪5

≪ 4≪ 4

≪3

≪2 ≪2

≪3

≪13 ≪10 ≪9≪13

x4

x5

y4

y5

S6

≪1 ≪1

xm+2 ym+2

Figure 3 Phase rotation estimation based on pipeline structure

data If they are different the algorithm exchanges cosine dataand sine data Or else it remains to be the input data

42 Internal Architecture Design and the Major Implementa-tion Steps According to formula (10) our algorithm needs119898 = lceil(16 minus log

23)3rceil = 5 times for traditional phase

rotation and one time for rotation phase estimation If 1205796=

2minus9 + 2minus10 + 2minus13 the pipeline structure is shown in Figure 3Each level only needs three adder-subtractors two or sixphase shift registers and a phase coefficient memory andreduces more than a half of the rotation phase judgmentand shift operation For reducing the critical path in thepipelined implementation of traditional CORDIC the differ-ential CORDIC (D-CORDIC) algorithm based on digit-on-line pipelined CORDIC circuits [17] can be used to achievehigher throughput and lower pipeline latency D-CORDICalgorithm is equivalent to the usual CORDIC in terms ofaccuracy as well as convergence The system architectureuses parallel and pipeline differential CORDIC architectureto reduce latency and improve throughout Digit-on-linepipelined CORDIC circuits take place of continuous phaseaccumulation in Figure 3

From what has been discussed above the major steps ofour algorithm are as follows

Table 1 Comparison of resource use

Algorithm Delay119873 = 16 119873 = 24 119873 = 32

Traditional pipeline structure 64119879119865119860

110119879119865119860

160119879119865119860

Hybrid CORDIC algorithm 43119879119865119860

72119879119865119860

96119879119865119860

Ours 26119879119865119860

37119879119865119860

72119879119865119860

Step 1 Phase rotation is limited in the range of minus1205874 1205874

Step 2 Traditional or differential CORDIC algorithm imple-ments partial phase rotation

Step 3 Using a relatively simple prediction scheme we divideoriginal CORDIC rotations into the lower part and the higherpart

Step 4 Differential CORDIC or traditional architecture isproposed to compute rotation direction The lower part iscomputed by continuous accumulation or online architecture[18] based on differential CORDIC and the higher part ispredicted by rotation phase estimation

Step 5 According to phase mapping relationship therequired high precision and high speed cosine data is pro-duced

43 Simulation Results Table 1 compares the delay of someCORDIC rotation methods Our proposed algorithm couldobtain good performance in delay and resource

To compare our pipeline CORDIC algorithm with otherpreviously proposed methods fairly we assume CSA isuniversal adder in all algorithms and fast carry-propagateadders (CPA) are used in the last stage to take carry-saveforms back to the input initial phase value

In [13] the first 119898 iterations use the traditional con-tinuous comparison method the same as the traditionalCORDIC The delay increases logarithmically with the max-imum number of shifts If the delay of carry-propagate adder(CPA) is lceillog

2119873rceil sdot 119879

119865119860 the latency of (119873 minus 119898) iterations

increases linearly with the word length and the delay is(41198733) sdot 119879

119865119860

Based on the calculation method above the traditionalCORDIC based on pipeline architecture has the delay oflceillog2119873rceil sdot 119879

119865119860sdot 119873

Unlike the abovemethods our proposedmethod reducesthe number of iterations and simplifies the 119885 datapath Thefirst iterations still adopt the traditional CORDIC algorithmwhere a delay of lceillog

2119873rceil sdot 119879

119865119860is assumed for an 119873-bit CPA

The accumulations of final iteration use repetitive multipleaccumulations architecture [19] which has much higherthroughput and less delay compared with serial accumulatorand pipelined adder based on carry-save addition as wellThe last iteration increases linearly and the delay is (41198703) sdot

119879119865119860 where119870 is the full-adder number for the accumulations

based on adder-tree architecture


0 50 100 150 200 250 300 350

0

5Si

ne er

ror v

alue

Phase (deg)

minus5

times10minus4

(a) Sine value error

Phase (deg)0 50 100 150 200 250 300 350

Cos

ine e

rror

val

ue

0

5

minus5

times10minus4

(b) Cosine value error

Figure 4 Sine and cosine error statistics of traditional pipeline structure

0 50 100 150 200 250 300 350

0

5

Sine

erro

r val

ue

minus5

times10minus4

Phase (deg)


Cos

ine e

rror

val

ue

0 50 100 150 200 250 300 350

0

5

minus5

times10minus4

Phase (deg)


Figure 5 Sine and cosine error statistics of ours

Table 2 Algorithm resource use comparison

Algorithm ResourceLogic unit Register Storage size

Traditional pipeline structure 1177 754 63Hybrid CORDIC algorithm 1034 576 26Ours 819 393 26

According to the structure shown in Figure 2 traditionalpipeline structure and efficient pipeline structure based onrotation phase estimation are implemented by verilog lan-guage respectively Hardware platform is a Cyclone II seriesEP2C8Q208C8 chip and software platform is in Quartus II ofAltera company Modelsim 100 simulation software tests theexperience result Firstly the input frequency control wordphase control word and clock frequency are set to 16

1015840ℎ19991610158401198890 and 100MHz Output frequency is 10MHz Comparedto the use of resources the result can be expressed in Table 2

Through the comparison in Table 2 our proposed algo-rithm reduced resource obviously

This algorithm precision is the same as traditionalCORDIC algorithm Δ120579min = 21205872119873 The input frequencycontrol word phase control word and clock frequency are setto 161015840ℎ001198616 and 1610158401198890 The output frequency is 03125MHz

Compared with the theoretical value and experiment valuethe error statistic is shown in Figures 4 and 5The simulationruntime of our proposed algorithm costs less than thetraditional CORDIC algorithm in Figure 6

Compared with Figures 4 and 5 our proposed algorithmhas the larger error volatility while the two kinds of thealgorithm error will be controlled in (minus5 times 10

minus4 5 times 10minus4)Though our algorithm structure reduces logic unit it

guarantees the cosine data accuracy Figure 7 shows the NCOsimulation waveform of efficient pipeline structure

It is necessary to obtain efficient bits of phase optimumiteration number and data width We do the above experi-ment 200 times The random angle value is restricted from 0to 45∘When the iteration number is 5sim8 and the series of datawidth are 15 16 18 and 21 we can obtain the effective bitsTherelationship of effective bit number with iteration times anddata width is shown in Table 3 The data unit is degree


0 2 4 6 8 100

500

1000

1500

2000

t (s)

The number of simulations

Traditional pipeline architectureOurs

Figure 6 The runtime of algorithmsrsquo comparison

Figure 7 NCO simulation waveform

Table 3 Relationship of effective bit number with iteration numberand data width

Data width (iteration number)14 (5) 16 (6) 18 (7) 21 (8)

Estimated value 22137 22137 22137 22137Simulated value 23672 22458 22281 22132

The algorithm error will be controlled in (minus5 times 10minus4 5 times

10minus4) when the iteration number is greater than 6 Theexperimental results show that the effective bit number is 13Through calculating the minimum number of microrotationthe effective bit number is generally seven greater thaniteration numberThe calculation of total quantization errorscould be calculated through this method

5 Conclusion

In this paper the hybrid CORDIC algorithm based on phaserotation estimation is proposed to design NCO In the caseof assuring the high precision output the efficient CORDICalgorithm reduces more than a half of the rotation phasejudgment and shift operation Resource consumption oper-ation speed and system delay have much better performancethan traditional CORDIC algorithm In terms of electroniccountermeasures it has a certain practicality The algorithmhas been successfully used in high speed broadband ADS-Breceiver and shows good performance

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Thiswork is supported by theNational Natural Science Foun-dation of China (Grant no 61172159) and the FundamentalResearch Funds for the Central Universities (HEUCFT1101)

References

[1] L Guo S Tian Z Wang and J Luo ldquoStudy of NCO realizationin parallel digital down anversionrdquo Chinese Journal of ScientificInstrument vol 33 no 5 pp 998ndash1004 2012

[2] Q Zhang Y Luo S Chen and J Yan ldquoDesign and implementa-tion of NCO based on phase rotationrdquo Systems Engineering andElectronics vol 32 no 5 pp 908ndash911 2010

[3] X-N Yang Y-C Lou and J-L Xu Software Radio Technologyand Application Beijing Institute of Technology Press BeijingChina 2010

[4] W-B Qin L-Y Luo and T-Y Li ldquoStudy on the efficient tech-nology applied to high precision and high resolution storagein high speed NCOrdquo Journal of Sichuan University (EngineeringScience Edition) vol 39 no 1 pp 156ndash159 2007

[5] J Volder ldquoThe CORDIC trigonometric computing techniquerdquoIRE Transactions on Electronic Computers vol 8 no 3 pp 330ndash334 1959

[6] J Walther ldquoA unified algorithm for elementary functionsrdquo inProceedings of the Spring Joint Computer Conference vol 38 pp379ndash385 1971

[7] S-Q Wan W-F Chen S-R Huang H Ji and Z Yu ldquoImple-mentation of a high-speed direct digital frequency synthesizerbased on improved CORDIC algorithmrdquo Chinese Journal ofScientific Instrument vol 31 no 11 pp 2586ndash2591 2010

[8] S Y Park and Y J Yu ldquoFixed-point analysis and parameterselections of MSR-CORDIC with applications to FFT designsrdquoIEEE Transactions on Signal Processing vol 60 no 12 pp 6245ndash6256 2012

[9] S Aggarwal P K Meher and K Khare ldquoScale-free hyperbolicCORDIC processor and its application to waveform gener-ationrdquo IEEE Transactions on Circuits and Systems I RegularPapers vol 60 no 2 pp 314ndash326 2013

[10] H Huang and L Xiao ldquoCORDIC based fast radix-2 DCTalgorithmrdquo IEEE Signal Processing Letters vol 20 no 5 pp 483ndash486 2013

[11] T Juang ldquoLow latency angle recoding methods for the higherbit-width parallel CORDIC rotator implementationsrdquo IEEETransactions on Circuits and Systems II Express Briefs vol 55no 11 pp 1139ndash1143 2008

[12] P K Meher and S Y Park ldquoCORDIC designs for fixed angleof rotationrdquo IEEE Transactions on Very Large Scale Integration(VLSI) Systems vol 21 no 2 pp 217ndash228 2013

[13] S Wang V Piuri and E E Swartzlander Jr ldquoHybrid CORDICalgorithmsrdquo IEEE Transactions on Computers vol 46 no 11 pp1202ndash1207 1997

[14] S-F Hsiao Y-H Hu and T-B Juang ldquoA memory-efficient andhigh-speed sinecosine generator based on parallel CORDICrotationsrdquo IEEE Signal Processing Letters vol 11 no 2 pp 152ndash155 2004


[15] X Zhang R Xin Q Wang and H Li ldquoDesign of direct digitalfrequency synthesizer based on improved hybrid CORDICalgorithmrdquo Acta Electronica Sinica vol 36 no 6 pp 1144ndash11482008

[16] Y H Hu ldquoThe quantization effects of the CORDIC algorithmrdquoIEEE Transactions on Signal Processing vol 40 no 4 pp 834ndash844 1992

[17] H Dawid and H Meyr ldquoThe differential CORDIC algorithmconstant scale factor redundant implementation without cor-recting iterationsrdquo IEEE Transactions on Computers vol 45 no3 pp 307ndash318 1996

[18] M D Ercegovac and T Lang ldquoRedundant and on-lineCORDIC application to matrix triangularization and SVDrdquoIEEE Transactions on Computers vol 39 no 6 pp 725ndash7401990

[19] P K Meher ldquoNew approach to scalable parallel and pipelinedrealization of repetitive multiple accumulationsrdquo IEEE Transac-tions on Circuits and Systems II Express Briefs vol 55 no 9 pp902ndash906 2008


(xi yi)

(xj yj)

xo

y

120579

Figure 1 CORDIC vector rotation diagram

Fourier transform discrete cosine transform digital modula-tiondemodulator and stream processors [7ndash10] Accordingto certain phase starting point (119909

119894 119910119894) rotates continuously

and approaches the final point gradually Rotation vectordiagram is shown in Figure 1

In Figure 1 it is easy to get

[119909119895

119910119895

] = [cos 120579 minus sin 120579

sin 120579 cos 120579 ] [119909119894

119910119894

] (1)

From the start to the end position spinning process canbe done by several steps and each step only rotates a certainphase

[119909119899+1

119910119899+1

] = [cos 120579119899

minus sin 120579119899

sin 120579119899

cos 120579119899

] [119909119899

119910119899

] (2)

After extracting cos 120579119899 formula (2) can be expressed as

follows

[119909119899+1

119910119899+1

] = cos 120579119899[

1 minus tan 120579119899

tan 120579119899

1] [

119909119899

119910119899

] (3)

In order to simplify the hardware implementation everyoperation sets each rotation phase to 120579

119899= arctan(2minus119899) The

total rotation phase is 120579 = sum119878119899120579119899 So tan 120579

119899= 1198781198992minus119899 Formula

(3) can be expressed as follows

[119909119899+1

119910119899+1

] = cos 120579119899[

1 minus1198781198992minus119899

1198781198992minus119899 1

] [119909119899

119910119899

] (4)

From formula (4) in addition to the cos 120579119899coefficient the

operation is simple shift and addition

In the final result cos 120579119899can be eliminated bymultiplying

a known constant For example 119875 the number of iterationsis 16 and |120579| le 1205874 119870 can be expressed as follows

119870 =

16

prod119899=1

cos 120579119899=

16

prod119899=1


))

=

16

prod119899=1

(1 minus 2minus2119894

)minus12

(5)

In the phase rotation process approximative rotationaliterative formula is

[119909119899+1

119910119899+1

] = [1 minus119878

1198992minus119899

1198781198992minus119899 1

] [119909119899

119910119899

] (6)

Parameter 119911 is used to judge when the iteration is over119911119899+1

= 119911119899minus 120579119899 1199110

= 120579 When 119911119899

lt 120579 119878119899

= minus1 When119911119899

ge 120579 119878119899

= +1 If the initial value is (119909119894 119910119894) = (119909

0 1199100) =

(119870 0) (119909119899 119910119899) of 119875th iteration will converge to (cos 120579 sin 120579)

The phase convergence satisfies the CORDIC convergencetheorem [6] The constant scaling factor 119870 is fixed and canbe precomputed as long as the precision 119873 is determinedAfter analysis of traditional CORDIC algorithm calculationaccuracy the iteration number and phase precision areexpressed as follows

119875 ge minuslog2[tan (Δ120579min)] (7)

where Δ120579min = 21205872119873 and the input phase data width is 119873

3 Hybrid CORDIC Algorithm Based onPhase Rotation Estimation

Common operation structures are iteration pipeline anddifferential CORDIC algorithm Iterative structure occupiesless hardware resources but the processing data efficiency islow Although the pipeline structure occupies more hardwareresources it can improve the throughput Based on the tworealization structures implementation schemes have parallelpipelines hybrid rotation CORDIC angle encoding methodand so forth [11 12] The work in [13] puts forward the wayof prediction rotation direction The algorithm applied inerror analysis and elimination has the advantages of fastspeed But it does not optimize hardware structure Usingthe structure of the parallel hybrid CORDIC algorithmthe prediction scheme of [14] is more regular and simplercompared to previous approaches which can reduce thenumber of iterations by more than 50 percent Howeverthe judgment of rotation direction is not optimized whichincreases latency time and resources so that it affects thethroughput The work in [15] puts forward a modifiedhybrid CORDIC algorithm and improved the precision ofoutput data but the method is more complex Trading offthe disadvantages of the above methods and advantages ofpipeline structure and iterative structure this paper simplifiesthe CORDIC algorithm further By using the arctangentfunction property it reduces the rotating judgment and add-subtract unit operation




120579 =

119875

sum119899=1

119878119899120579119899=

119875

sum119899=1

119878119899arctan (2

minus119899) (8)




119899=


120576119899= 2minus119899

minus [2minus119899

minus1

3(2minus119899

)3

+1

5(2minus119899

)5


=1

3(2minus119899

)3

minus1

5(2minus119899

)5

+ sdot sdot sdot

(9)

where 120576119899




23

3 (10)




120579 =

119873minus2

sum119899=1

119878119899120579119899=

119898

sum119899=1

119878119899arctan (2

minus119899) +

119873minus2

sum119899=119898+1

1198781198992minus119899

(11)



sum120579119899=

119873minus2

sum119899=119898+1

1198781198992minus119899

= 119878119898+1

2minus119898minus1

+ 119878119898+2

2minus119898minus2


2minus119873+2

lt 2minus119898

(12)



119898+1asymp sum120579

119899 So

119911119898+1


119898+1is 120601119898+1


is 120601119898+1

Thus tan 120579119898+1

= 119878119898+1

120579119898+1

When 119911119898+1

lt 0 119878119898+1

=

minus1 When 119911119898+1

ge 0 119878119898+1


[119909119898+2

119910119898+2

] = cos120601119898+1

[1 minus119878

119898+1120579119898+1

119878119898+1

120579119898+1

1] [

119909119898+1

119910119898+1

] (13)


and 119910119898+2



119898+1should



119898+1 120579119898+1


119879 =

119873minus2

sum119894=1

1198601198942119894= 119860119873minus2

2119873minus2

+ 119860119873minus3

2119873minus3

+ sdot sdot sdot + 119860121 (14)


119894= 119882[119894] When


120579119898+1

=2120587119879

2119873=

2119873 (15)


=

119873

sum119894=1

1198611198942119894= 2120587119879

=

119873

sum119894=1

1198601198942119894(22+ 2 + 2

minus2+ 2minus5

+ 2minus9

)

(16)


120579119898+1

=


sum119894=1

1198611198942119894minus119873

= 119861119873minus119898minus1

2minus119898minus1


+ 119861121minus119873

(17)


[119909119898+2

119910119898+2

] =

[[[[[

[

119909119898+1

minus 119878119898+1

sdot


sum119894=1

1198611198942119894minus119873 sdot 119910

119898+1

119878119898+1

sdot


sum119894=1

1198611198942119894minus119873 sdot 119909

119898+1+ 119910119898+1

]]]]]

]

(18)

where 119878119898+1

= plusmn1 and 119861119894= 1 or 0





=

119898+1

prod119899=1


)) (19)


ROM

PhaseadderROM

Phaseadder

16

16

Sin output

Cos output


algorithm


Exchange

16bits

16bits

16bits

16bits

16bitsFew


7 timestraditional

phase rotation

Antidataand

add one

Non-exchange




119910119898+2







1003816100381610038161003816E1198991003816100381610038161003816 le

10038161003816100381610038161003816100381610038161003816100381610038161003816

1003816100381610038161003816e1198991003816100381610038161003816 +

119899minus1

sum119894=0

(

119899minus1

prod119895=119894

119878119895)

1003816100381610038161003816e1198941003816100381610038161003816

10038161003816100381610038161003816100381610038161003816100381610038161003816

(20)


119899is the 119899th



119890119910119894]119879 |119890119909119894| le 120576

|119890119910119894| le 120576 120576 = 2minus119873minus1 |e


1003816100381610038161003816e1198941003816100381610038161003816 = radic1198902

119909119894+ 1198902119910119894


(21)


|119860| =

10038161003816100381610038161003816119881 minus 1198811015840

10038161003816100381610038161003816100381610038161003816100381611988110158401003816100381610038161003816

le 2 sin(Δ120579

2) le Δ120579 le 120593 (22)



|119861| =

119873minus3

sum119894=119898+2

(2minus119894


asymp

119873minus3

sum119894=119898+2

(2minus119894


minus5

(23)







plusmn

plusmn

Sign

plusmn

plusmn

Sign

plusmn

plusmn plusmn

Sign

plusmn

plusmn plusmn


x1 y1

z1

z2

z3

z4

z5

plusmn plusmn

1205791

1205792

1205793

1205795

1205794

Sign

plusmn

plusmn

≪10 ≪9

≪5 ≪5

≪ 4≪ 4

≪3

≪2 ≪2

≪3

≪13 ≪10 ≪9≪13

x4

x5

y4

y5

S6

≪1 ≪1

xm+2 ym+2









Algorithm Delay119873 = 16 119873 = 24 119873 = 32


110119879119865119860

160119879119865119860


72119879119865119860

96119879119865119860

Ours 26119879119865119860

37119879119865119860

72119879119865119860









2119873rceil sdot 119879



119865119860


119865119860sdot 119873


2119873rceil sdot 119879






0 50 100 150 200 250 300 350

0

5Si

ne er

ror v

alue

Phase (deg)

minus5

times10minus4


Phase (deg)0 50 100 150 200 250 300 350

Cos

ine e

rror

val

ue

0

5

minus5

times10minus4



0 50 100 150 200 250 300 350

0

5

Sine

erro

r val

ue

minus5

times10minus4

Phase (deg)


Cos

ine e

rror

val

ue

0 50 100 150 200 250 300 350

0

5

minus5

times10minus4

Phase (deg)
















0 2 4 6 8 100

500

1000

1500

2000

t (s)










5 Conclusion




Acknowledgments


References
























120579 =

119875

sum119899=1

119878119899120579119899=

119875

sum119899=1

119878119899arctan (2

minus119899) (8)




119899=


120576119899= 2minus119899

minus [2minus119899

minus1

3(2minus119899

)3

+1

5(2minus119899

)5


=1

3(2minus119899

)3

minus1

5(2minus119899

)5

+ sdot sdot sdot

(9)

where 120576119899




23

3 (10)




120579 =

119873minus2

sum119899=1

119878119899120579119899=

119898

sum119899=1

119878119899arctan (2

minus119899) +

119873minus2

sum119899=119898+1

1198781198992minus119899

(11)



sum120579119899=

119873minus2

sum119899=119898+1

1198781198992minus119899

= 119878119898+1

2minus119898minus1

+ 119878119898+2

2minus119898minus2


2minus119873+2

lt 2minus119898

(12)



119898+1asymp sum120579

119899 So

119911119898+1


119898+1is 120601119898+1


is 120601119898+1

Thus tan 120579119898+1

= 119878119898+1

120579119898+1

When 119911119898+1

lt 0 119878119898+1

=

minus1 When 119911119898+1

ge 0 119878119898+1


[119909119898+2

119910119898+2

] = cos120601119898+1

[1 minus119878

119898+1120579119898+1

119878119898+1

120579119898+1

1] [

119909119898+1

119910119898+1

] (13)


and 119910119898+2



119898+1should



119898+1 120579119898+1


119879 =

119873minus2

sum119894=1

1198601198942119894= 119860119873minus2

2119873minus2

+ 119860119873minus3

2119873minus3

+ sdot sdot sdot + 119860121 (14)


119894= 119882[119894] When


120579119898+1

=2120587119879

2119873=

2119873 (15)


=

119873

sum119894=1

1198611198942119894= 2120587119879

=

119873

sum119894=1

1198601198942119894(22+ 2 + 2

minus2+ 2minus5

+ 2minus9

)

(16)


120579119898+1

=


sum119894=1

1198611198942119894minus119873

= 119861119873minus119898minus1

2minus119898minus1


+ 119861121minus119873

(17)


[119909119898+2

119910119898+2

] =

[[[[[

[

119909119898+1

minus 119878119898+1

sdot


sum119894=1

1198611198942119894minus119873 sdot 119910

119898+1

119878119898+1

sdot


sum119894=1

1198611198942119894minus119873 sdot 119909

119898+1+ 119910119898+1

]]]]]

]

(18)

where 119878119898+1

= plusmn1 and 119861119894= 1 or 0





=

119898+1

prod119899=1


)) (19)


ROM

PhaseadderROM

Phaseadder

16

16

Sin output

Cos output


algorithm


Exchange

16bits

16bits

16bits

16bits

16bitsFew


7 timestraditional

phase rotation

Antidataand

add one

Non-exchange




119910119898+2







1003816100381610038161003816E1198991003816100381610038161003816 le

10038161003816100381610038161003816100381610038161003816100381610038161003816

1003816100381610038161003816e1198991003816100381610038161003816 +

119899minus1

sum119894=0

(

119899minus1

prod119895=119894

119878119895)

1003816100381610038161003816e1198941003816100381610038161003816

10038161003816100381610038161003816100381610038161003816100381610038161003816

(20)


119899is the 119899th



119890119910119894]119879 |119890119909119894| le 120576

|119890119910119894| le 120576 120576 = 2minus119873minus1 |e


1003816100381610038161003816e1198941003816100381610038161003816 = radic1198902

119909119894+ 1198902119910119894


(21)


|119860| =

10038161003816100381610038161003816119881 minus 1198811015840

10038161003816100381610038161003816100381610038161003816100381611988110158401003816100381610038161003816

le 2 sin(Δ120579

2) le Δ120579 le 120593 (22)



|119861| =

119873minus3

sum119894=119898+2

(2minus119894


asymp

119873minus3

sum119894=119898+2

(2minus119894


minus5

(23)







plusmn

plusmn

Sign

plusmn

plusmn

Sign

plusmn

plusmn plusmn

Sign

plusmn

plusmn plusmn


x1 y1

z1

z2

z3

z4

z5

plusmn plusmn

1205791

1205792

1205793

1205795

1205794

Sign

plusmn

plusmn

≪10 ≪9

≪5 ≪5

≪ 4≪ 4

≪3

≪2 ≪2

≪3

≪13 ≪10 ≪9≪13

x4

x5

y4

y5

S6

≪1 ≪1

xm+2 ym+2









Algorithm Delay119873 = 16 119873 = 24 119873 = 32


110119879119865119860

160119879119865119860


72119879119865119860

96119879119865119860

Ours 26119879119865119860

37119879119865119860

72119879119865119860









2119873rceil sdot 119879



119865119860


119865119860sdot 119873


2119873rceil sdot 119879






0 50 100 150 200 250 300 350

0

5Si

ne er

ror v

alue

Phase (deg)

minus5

times10minus4


Phase (deg)0 50 100 150 200 250 300 350

Cos

ine e

rror

val

ue

0

5

minus5

times10minus4



0 50 100 150 200 250 300 350

0

5

Sine

erro

r val

ue

minus5

times10minus4

Phase (deg)


Cos

ine e

rror

val

ue

0 50 100 150 200 250 300 350

0

5

minus5

times10minus4

Phase (deg)
















0 2 4 6 8 100

500

1000

1500

2000

t (s)










5 Conclusion




Acknowledgments


References






















ROM

PhaseadderROM

Phaseadder

16

16

Sin output

Cos output


algorithm


Exchange

16bits

16bits

16bits

16bits

16bitsFew


7 timestraditional

phase rotation

Antidataand

add one

Non-exchange




119910119898+2







1003816100381610038161003816E1198991003816100381610038161003816 le

10038161003816100381610038161003816100381610038161003816100381610038161003816

1003816100381610038161003816e1198991003816100381610038161003816 +

119899minus1

sum119894=0

(

119899minus1

prod119895=119894

119878119895)

1003816100381610038161003816e1198941003816100381610038161003816

10038161003816100381610038161003816100381610038161003816100381610038161003816

(20)


119899is the 119899th



119890119910119894]119879 |119890119909119894| le 120576

|119890119910119894| le 120576 120576 = 2minus119873minus1 |e


1003816100381610038161003816e1198941003816100381610038161003816 = radic1198902

119909119894+ 1198902119910119894


(21)


|119860| =

10038161003816100381610038161003816119881 minus 1198811015840

10038161003816100381610038161003816100381610038161003816100381611988110158401003816100381610038161003816

le 2 sin(Δ120579

2) le Δ120579 le 120593 (22)



|119861| =

119873minus3

sum119894=119898+2

(2minus119894


asymp

119873minus3

sum119894=119898+2

(2minus119894


minus5

(23)







plusmn

plusmn

Sign

plusmn

plusmn

Sign

plusmn

plusmn plusmn

Sign

plusmn

plusmn plusmn


x1 y1

z1

z2

z3

z4

z5

plusmn plusmn

1205791

1205792

1205793

1205795

1205794

Sign

plusmn

plusmn

≪10 ≪9

≪5 ≪5

≪ 4≪ 4

≪3

≪2 ≪2

≪3

≪13 ≪10 ≪9≪13

x4

x5

y4

y5

S6

≪1 ≪1

xm+2 ym+2









Algorithm Delay119873 = 16 119873 = 24 119873 = 32


110119879119865119860

160119879119865119860


72119879119865119860

96119879119865119860

Ours 26119879119865119860

37119879119865119860

72119879119865119860









2119873rceil sdot 119879



119865119860


119865119860sdot 119873


2119873rceil sdot 119879






0 50 100 150 200 250 300 350

0

5Si

ne er

ror v

alue

Phase (deg)

minus5

times10minus4


Phase (deg)0 50 100 150 200 250 300 350

Cos

ine e

rror

val

ue

0

5

minus5

times10minus4



0 50 100 150 200 250 300 350

0

5

Sine

erro

r val

ue

minus5

times10minus4

Phase (deg)


Cos

ine e

rror

val

ue

0 50 100 150 200 250 300 350

0

5

minus5

times10minus4

Phase (deg)
















0 2 4 6 8 100

500

1000

1500

2000

t (s)










5 Conclusion




Acknowledgments


References






















plusmn

plusmn

Sign

plusmn

plusmn

Sign

plusmn

plusmn plusmn

Sign

plusmn

plusmn plusmn


x1 y1

z1

z2

z3

z4

z5

plusmn plusmn

1205791

1205792

1205793

1205795

1205794

Sign

plusmn

plusmn

≪10 ≪9

≪5 ≪5

≪ 4≪ 4

≪3

≪2 ≪2

≪3

≪13 ≪10 ≪9≪13

x4

x5

y4

y5

S6

≪1 ≪1

xm+2 ym+2









Algorithm Delay119873 = 16 119873 = 24 119873 = 32


110119879119865119860

160119879119865119860


72119879119865119860

96119879119865119860

Ours 26119879119865119860

37119879119865119860

72119879119865119860









2119873rceil sdot 119879



119865119860


119865119860sdot 119873


2119873rceil sdot 119879






0 50 100 150 200 250 300 350

0

5Si

ne er

ror v

alue

Phase (deg)

minus5

times10minus4


Phase (deg)0 50 100 150 200 250 300 350

Cos

ine e

rror

val

ue

0

5

minus5

times10minus4



0 50 100 150 200 250 300 350

0

5

Sine

erro

r val

ue

minus5

times10minus4

Phase (deg)


Cos

ine e

rror

val

ue

0 50 100 150 200 250 300 350

0

5

minus5

times10minus4

Phase (deg)
















0 2 4 6 8 100

500

1000

1500

2000

t (s)










5 Conclusion




Acknowledgments


References






















0 50 100 150 200 250 300 350

0

5Si

ne er

ror v

alue

Phase (deg)

minus5

times10minus4


Phase (deg)0 50 100 150 200 250 300 350

Cos

ine e

rror

val

ue

0

5

minus5

times10minus4



0 50 100 150 200 250 300 350

0

5

Sine

erro

r val

ue

minus5

times10minus4

Phase (deg)


Cos

ine e

rror

val

ue

0 50 100 150 200 250 300 350

0

5

minus5

times10minus4

Phase (deg)
















0 2 4 6 8 100

500

1000

1500

2000

t (s)










5 Conclusion




Acknowledgments


References






















0 2 4 6 8 100

500

1000

1500

2000

t (s)










5 Conclusion




Acknowledgments


References





















Design and Implementation of Hybrid CORDIC Algorithm Based ...

Documents

Transcript of Design and Implementation of Hybrid CORDIC Algorithm Based ...