0% found this document useful (0 votes)
56 views5 pages

Chinna Thambi 2014

research paper

Uploaded by

Dlisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views5 pages

Chinna Thambi 2014

research paper

Uploaded by

Dlisha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2014 International Conference on Communication and Network Technologies (ICCNT)

FPGA Implementation of Fast and Area Efficient


CORDIC algorithm
M. Chinnathambi N. Bharanidharan S. Rajaram
Department of Electronics and Department of Electronics and Department of Electronics and
Communication, Communication, Communication,
Thiagarajar College of Engineering, Thiagarajar College of Engineering, Thiagarajar College of Engineering,
Madurai, India. Madurai, India. Madurai, India.
[email protected] [email protected] [email protected]

Abstract²This paper presents the fast and area efficient The organization of the paper is as follows: Section II
CORDIC (Coordinate Rotation DIgital Computer)algorithm gives basics of CORDIC algorithm. The general unrolled
for sine and cosine wave generation. The concepts of CORDIC algorithm is presented in Section III; section IV
pipelining and multiplexer based CORDIC algorithm is used explains the Multiplexer based CORDIC; section V
todecrease the critical path delay and reducing the area describes the Unrolled CORDIC with pipelining; section
respectively. A six stage CORDIC is implemented by two VIpresents pipelined Multiplexer based unrolled CORDIC.
schemes followed by four methods, unrolled CORDIC and Section VII presents the results and discussion followed by
multiplexer based CORDIC with and without pipelining. The conclusion in Section VIII.
pipelining is included in four stages(excluding first and last
stage). An 8-bit CORDIC algorithm for generating sine wave II. BASICS OF CORDIC ALGORITHM
and cosine wave is designed, implementedand compared by
all four methods on Xilinx Spartan3E (XC3S250E). The CORDIC algorithm is one of the iterative method
toperform vector rotations for arbitrary angles using shifts
Keywords²CORDIC algorithm, FPGA, Multiplexer, and adds. Planar rotation for any vector A of (Xj, Yj) can
Pipelining, Unrolled CORDIC, Sine/Cosine EHGH¿QHGLQPDWUL[IRUPDV
݆ܺ ܿ‫ݏ݋‬ș െ‫݊݅ݏ‬ș ܺ݅
I. INTRODUCTION ൤ ൨=ቂ ቃ ቂ ቃ(1)
ܻ݆ ‫݊݅ݏ‬ș ܿ‫ݏ݋‬ș ܻ݅
The FPGA platform is much better choice for CORDIC
algorithm implementation since it combines the flexibility With a small rearrangement, the stepwise rotation is
of microprocessors as well as the speed and computational performed by
power of ASICs. The CORDIC (COordinate Rotation ܺ௡ାଵ ͳ െ‫݊ܽݐ‬ș୬ ܺ݊
DIgital Computer) algorithm uses planar rotation and ൤ ൨=ܿ‫ݏ݋‬ș୬ ൤ ൨ ቂ ቃ(2)
ܻ௡ାଵ ‫݊ܽݐ‬ș୬ ͳ ܻ݊
vectoring to compute elementary trigonometric functions
when assigned with proper initial conditions. It is one of the The angle parameter for each step will be,
iterative algorithm explored by Jack E. Volder [1] and later ଵ
clearly refined by Walther [2] and others. CORDIC ș௡ = arctanቀ ೙ ቁ (3)

algorithm is very popular since it uses only shifts and adds
5RWDWLRQDQJOHșLVUHSUHVHQWHGDV
to perform number of functions including certain
trigonometric, vector rotations, hyperbolic, logarithmic σ’௡ୀ଴ ‫ݏ‬௡ ș௡ ൌ ș(4)
functions. In communication applications it is widely used
to implement universal modulator [3], demodulator [4]. ZKHUH6Q ^í ` GHSHQGV RQ =L 5HZULWLQJ WKH
CORDIC algorithm is used in various applications that equation for rotation,
includescalculators, mathematical coprocessor units, clock ܺ௡ାଵ ͳ െܵ௡ ʹି௡
recovery circuits, waveform generators. ൤ ൨=ܿ‫ݏ݋‬ș୬ ൤ ൨ (5)
ܻ௡ାଵ ܵ௡ ʹି௡ ͳ
There are two different modes of CORDIC algorithm as &RVșQFDQEHFRQVLGHUHGDVDFRQVWDQW.DQGFRPSXWHG
rotation and vector mode. In the rotation mode, CORDIC is at the end and for simplicity of hardware, multiplications
used for converting a vector from polar form to rectangular are replaced with shift operations. Residue Z which gives
form and vector mode makes the reverse operation. the angle difference between the expected rotation and the
The objective of this paper is to design, analyze and iterative rotations is defined as:
compare how an FPGA based unrolled CORDIC performs ଵ
when multiplexers are used instead of shifters and adders ܼ௡ାଵ ൌ ș െ σ୬୧ୀ଴ ș‹ ൌ ș െ σ୬୧ୀ଴ ƒ”…–ƒ ቀ ೙ ቁ(6)

with and without pipelining[5-8]. Rotation parameter,

978-1-4799-6266-2/14/$31.00 © 2014 IEEE

228
2014 International Conference on Communication and Network Technologies (ICCNT)
െͳ݂ܼ݅௡ ൏ Ͳ
ܵ௡ =൜ (7)
ͳ݂ܼ݅௡ ൒ Ͳ
With proper selection of initial values of Xi, Yi and Zi,
the required function is performed using (5-7) equations.
In rotation mode, we should make Z to zero. For
generating sine and cosine values, the initial values should
EH;L <L DQG=L ș

Fig.2. General unrolled CORDIC structure.

For the generation of sine and cosine values, Zi should


Fig.1. CORDIC angle rotations be varied for each clock pulse. The step size is the
difference between two Zi values. For more accuracy, the
Then at the end of the iteration, the final equation will
step size should be less. For the step size of 15°, Zi value
be,
varies as 0,15,30,..90 producing the values of
ܺ௜ ܿ‫ݏ݋‬ș « DW <6 VLQH YDOXHV  DQG « DW
൥ ܻ௜ ൩ = ൥ ‫݊݅ݏ‬ș ൩(8) X6(cosine values) for each clock pulse. By using the
ܼ௜ Ͳ quadrature symmetry property of sine and cosine waves,
remaining values are computed.
III. GENERAL UNROLLED CORDIC ALGORITHM
IV. MULTIPLEXER BASED CORDIC STRUCTURE
The architecture of the six stage unrolled CORDIC is
shown in Fig2. This consists of only shifters, adders and The concept for reducing the area of the CORDIC
subtractors andif the number of stages increase, accuracy of structure by using multiplexer is proposed for the ASIC
computation also increase. Depending on the most implementation in paper [3]. For the FPGA based
significant bit of previous angle, addition or subtraction of implementation this concept is adopted. Multiplexer is used
the angle value takes place in every rotation of the vector. instead of first three stages of general unrolled CORDIC.
Division is performed by just doing right shift using shift The output of first stage in original unrolled CORDIC
registers which gives theadvantage of using less hardware architecture is equal to Xi as Yi=0 and so the output of first
for division. Initially, Xi=1 and Yi=0 for sine and cosine stage is given as
wave generation. These initial values are shifted by ibits, Y1= Xi = 61
where i is the integer {0, 1, 2, 3, 4, 5} which makes division
of x and y by 1,2, 4, 8, 16, 32for each stage. In this rotation X1 = Xi = 61 (9)
mode, the given vector is iteratively rotated to form new
Z1= Zi - 45
vectors at the intermediate stages to get the desired angle,
Zi. In the first iteration stage, Z1 is calculated by
subtraction since Zi is always +ve as it varies from 0 to 90.
If the initial conditions are Xi=1 and Yi=0 then the
If Z1 is positive, then second stage output is represented as
resulting discrete sine and cosine values will vary from -1 to
1. These fractional values are not realizable in FPGA easily. ܺଵ ܺଵ
Hence to make the discrete sine and cosine values to vary ܻଶ ൌ ܻଵ െ ൌ  ൌ ͵ͳ
ʹ ʹ
from -100 to 100, the initial values are multiplied by 100 ௒భ ଷ௑భ
i.e., Xi=100 and Yi=0. ܺଶ ൌ ܺଵ ൅ ൌ ൌ ͻͳ (10)
ଶ ଶ
For representing sine and cosine values in the above
range 8 bit CORDIC is used as it can represent -128 to 127.
Generally the constant K as in the equation (5) should be If Z1 is negative, then second stage output will be,
multiplied to the final results after end of iteration. But to ܺଵ ͵ܺଵ
improve the accuracy, it is multiplied to initial conditions ܻଶ ൌ ܻଵ ൅ ൌ ൌ ͻͳ
ʹ ʹ
itself. K=0.611 for six stages of CORDIC and so initial
values are given as Xi=61(100x0.611), Yi=0.

229
2014 International Conference on Communication and Network Technologies (ICCNT)
௒భ ௑భ
ܺଶ ൌ ܺଵ െ ൌ ൌ ͵ͳ (11)
ଶ ଶ

Fig.3. Multiplexers to replace second stage of unrolled CORDIC

As the second stage output is fixed, two Multiplexers


are used as shown in Fig.3. Similarly the third stage is
implemented by using four multiplexers with the following
equations. Z2 is computed by using the equation (6) and it
is used as the selection line for third stage multiplexers
.For Z1 = + ve, Z2 = + ve
ܺଶ ͵ܺ௜ ܺ௜ ͳ͵ܺ௜
ܻଷ ൌ ܻଶ ൅ ൌ ൅ ൌ ൌ ͻͻ
Ͷ ʹ ͺ ͺ
௒మ ௑೔ ଷ௑೔ ௑೔
ܺଷ ൌ ܺଶ െ ൌ െ ൌ ൌ Ͳ͹(12) Fig.4. Multiplexer based CORDIC for first three stages and ordinary
ସ ଶ ଼ ଼
CORDIC for remaining three stages
For Z1 = - ve, Z2 = + ve
V. UNROLLED CORDIC WITH PIPELINING
ܺଶ ܺ௜ ͵ܺ௜ ͹ܺ௜
ܻଷ ൌ ܻଶ ൅ ൌ ൅ ൌ ൌ ͷ͵ Generally a pipeline is a set of simple data processing
Ͷ ʹ ͺ ͺ elements which are connected in series, so that the first
௒మ ଷ௑೔ ௑೔ ଵଵ௑೔
ܺଷ ൌ ܺଶ െ ൌ െ ൌ ൌ ͺ͵(13) element output is the input of the next element. Pipelining
ସ ଶ ଼ ଼
concept is mainly used to decrease the critical path delay
For Z1 = + ve, Z2 = - ve and making the system suitable for high speed applications.
ܺଶ ͵ܺ௜ ܺ௜ ͳͳܺ௜ Hence unrolled CORDIC with pipelining has more
ܻଷ ൌ ܻଶ െ ൌ െ ൌ ൌ ͺ͵ maximum frequency of operation than the ordinary one. But
Ͷ ʹ ͺ ͺ there are some disadvantages in using pipelining such as it
ܺଷ ൌ ܺଶ ൅
௒మ
ൌ
௑೔

ଷ௑೔
ൌ
଻௑೔
ൌ ͷ͵(14) increases area on FPGA and also there is N-Clock delays
ସ ଶ ଼ ଼ for the first output when N pipeline registers are used. After
For Z1 = - ve, Z2 = - ve that N clock delays, output will appear one by one for each
clock pulse. The position & number pipelined registers are
ܺଶ ܺ௜ ͵ܺ௜ ܺ௜ iteratively computed and the optimized result is taken which
ܻଷ ൌ ܻଶ െ ൌ െ ൌ  ൌ Ͳ͹
Ͷ ʹ ͺ ͺ uses pipeline registers at intermediate four stages(excluding
௒మ ଷ௑೔ ௑೔ ଵଷ௑೔ first and last stages) as shown in Fig.5.
ܺଷ ൌ ܺଶ ൅ ൌ ൅ ൌ ൌ ͻͻ(15)
ସ ଶ ଼ ଼
VI. PIPELINED MULTIPLEXER BASED CORDIC
Multiplexer for three stages reduce the one clock pulse
delay for first output than the ordinary unrolled CORDIC. Pipelining in CORDIC increases area but improves the
When the adders are replaced with Multiplexers, the area is speed of operation and increases area while Multiplexer
reduced up to 3rd stage. For the replacement of adders and based CORDIC reduces the area utilization and decreases
shifters with Multiplexers, there is exponential increase in speed of operation. Hence there is trade-off between area
the number of Multiplexers i.e., 6, 14, 30 Multiplexers are and speed of operation in pipelined multiplexer based
needed for constructing three, four and five stages CORDIC[9]. First three stages are replaced by multiplexers
respectively. while fourth and fifth stages are pipelined using registers.
Due to pipelining at two stages there is increase in two
clock pulse delay for the first output while the usage of
multiplexers reduces one clock pulse delay for first output
and so there is only one clock pulse delay for the first
output than usual.

230
2014 International Conference on Communication and Network Technologies (ICCNT)
due to N iterations. But after the (N+1) clock cycles, output
will appear for each clock cycle.

TABLE I. COMPARISON OF FOUR SCHEMES BASED ON FIRST


OUTPUT APPEARANCE

Scheme Clock pulse at which first


output appears
General unrolled CORDIC 7
Multiplexer based CORDIC 6
Pipelined unrolled CORDIC 11
Pipelined multiplexer based 8
CORDIC

Fig.7. Simulation output of Unrolled CORDIC


Fig.5. Unrolled CORDIC with pipelining at intermediate stages
In FigWKH³FON´LVWKHLQSXWVLJQDOZKLFKGHFLGHVWKH
frequency of sine and cosine wave generation while
³VLQRXW´ LV WKH GLVFUHWH VLQH YDOXHV DQG ³FRVRXW´ LV WKH
discrete cosine values. In unrolled CORDIC, due to six
iteration stages output appears at 7th clock pulse. Since first
stage is replaced without using any Multiplexers, the clock
pulse will appear on sixth clock pulse itself in multiplexer
based CORDIC.
Pipelined unrolled CORDIC uses 4 pipeline registers
and so first output will appear on 11th clock pulse(7+4=11)
while pipelined multiplexer based CORDIC has first output
at 10th clock pulse(6+2=11). Comparatively there is more
percentage of error in multiplexer based schemes due to
quantization effects of initial values.
:KHQ LPSOHPHQWHG LQ ;LOLQ[ )3*$¶V WKHUH LV PRUH
trade-off between critical path delay and area on FPGA as
shown in table II.
TABLE II. IMPLEMENTATION OF FOUR SCHEMES IN XILINX

Number of Number of Critical path


Scheme slice Flip occupied delay
Flop slices (ns)
General unrolled CORDIC 134 160 9.0
Multiplexer based
109 131 9.6
CORDIC
Pipelined unrolled
187 166 6.3
CORDIC
Pipelined multiplexer
Fig.6. Pipelined multiplexer based CORDIC 125 133 6.1
based CORDIC

VII. RESULTS AND DISCUSSION


All the four schemes are implemented in Xilinx
Spartan3E (XC3S250E). The first output of N stage
unrolled 8 bit CORDIC appears only at (N+1)thclock pulse

231
2014 International Conference on Communication and Network Technologies (ICCNT)
Thus pipelined unrolled CORDIC provides maximum [9] Deprettere, E.; Dewilde, P.; Udo, R.;, "Pipelined cordic architectures
operating speed as the critical path delay is reduced than the for fast VLSI filtering and array processing," Acoustics, Speech, and
Signal Processing, IEEE International Conference on ICASSP '84. ,
unrolled CORDIC without pipelining with the cost of area. vol.9, no., pp. 250- 253, Mar 1984.
Multiplexer based CORDIC provides more area reduction [10] U. Meyer-%DHVH  ³'LJLWDO  6LJQDO  3URFHVVLQJ  ZLWK  )LHOG
but comparatively less speed of operation. 3URJUDPPDEOH*DWH$UUD\V´UG(GLWLRQ%HUOLQ6SULQJHU

Fig.8. Comparison of four schemes based on results obtained by


Xilinx implementation

VIII. CONCLUSION
In this paper, detailed analysis of four schemes of
CORDIC for sine and cosine wave generation is made by
comparing the results obtained from Xilinx FPGA
implementation. Due to the changes we made in initial
values as said in section III, we got only 2.42% error in
computing sine and cosine values which vary from -100 to
100. Among the four methods, multiplexer based CORDIC
with pipelining is much better as it has good tradeoff
between area and critical path delay. From the results, as
said in the above sections, the multiplexer based CORDIC
reduces both area and speed of operation while pipelining
increases both area and speed of operation. Hence based on
the particular application, one of the four scheme is
selected.
REFERENCES
[1] -(9ROGHU³7KH&25',& 7ULJRQRPHWULF&RPSXWLQJ7HFKQLTXH´
IRE Transactions on Electronic computer, vol. EC-8, pp. 330-334,
1959.
[2] - :DOWKHU ³D XQLILHG DOJRULWKP IRU HOHPHQWDU\ IXQFWLRQV´ SURF
Spring joint comp. con & vol.38, pp.379-385, 1971.
[3] Vankka, J.; Kosunen, M.; Hubach, J.; Halonen, K.; , "A CORDIC-
based multicarri QAM modulator," Global Telecommunications
&RQIHUHQFH  */2%(&20¶ YRO$QRSS -
177vol.1a,1999 .
[4] Chen, A.; McDanell, R.; Boytim, M.; Pogue, R.;, "Modified
CORDIC demodulator implementation for digital IF-sampled
receiver," Global Telecommunications Conference, 1995.
GLOBECOM '95., IEEE , vol.2, no., pp.1450-1454 vol.2, 14-16
Nov 1995.
[5] 91DUHVK %9HQNDWDUDPDQL DQG 55DMD ³$Q DUHD HIILFLHQW
PXOWLSOH[HU EDVHG &25',&´  International Conference on
Computer Communication and Informatics (ICCCI -2013), Jan. 04 ±
06, 2013, Coimbatore
[6] 3HWHU 1LOVVRQ ³FRPSOH[LW\ UHGXFWLRQ LQ XQUROOHG &25',&
DUFKLWHFWXUHV³(OHFWURQLFVFLUFXLWVDQGV\VWHPV,&(&6
pp.868-871.
[7] Nilsson, P ³&RPSOH[LW\ UHGXFWLRQV LQ XQUROOHG &25',&
DUFKLWHFWXUHV´ (OHFWURQLFV &LUFXLWV DQG 6\VWHPV  ,&(&6
2009. 16th IEEE International Conference on 13-16 Dec. 2009,pp
868 ± 871
[8] 1DYHHQ .XPDU $PDQGHHS 6LQJK 6DSSDO ´&RRUGLQDWH 5RWDWLRQ
Digital ComputeU $OJRULWKP 'HVLJQ DQG $UFKLWHFWXUHV´ ,-$&6$ 
International Journal of Advanced Computer Science and
Applications, Vol. 2, No. 4, 2011

232

You might also like