Iaetsd pipelined parallel fft architecture through folding transformation

Pipelined Parallel FFT Architecture through Folding Transformation
M.S.Krishna priya, M.TECH Student, DEPARTMENT of ECE, Shri Vishnu Engg college for women
D.Murali Krishna, Sr.ASST.PROFESSOR, DEPARTMENT of ECE, Shri Vishnu Engg college for women
ABSTRACT:
This project presents a FUSING FFT system using
OFDM application. It is demonstrated by a software
reconfigurable OFDM system using a
programmable floating point DSP. A new VLSI
architecture for real-time pipeline FFT processor is
proposed in this project. In this project, high radix
floating point butterflies are implemented more
efficiently with the two fused floating-point
operations. The fused operations are a two-term dot
product and add-subtract unit. Both discrete and
fused radix processors are implemented; compared
in regarded with area wise. OFDM systems and the
associated clock cycles required to demodulate data
using loop and straight-line FFT programming
methods are provided. Higher execution speed is
achieved by using straight-line code instead of
looped code. The tradeoff of this optimization is a
larger program memory requirement of the straight-
line assembly code.
KEYWORDS: Fusing, OFDM, FFT, Radix, Dot
product, Folding transformation, Optimization,
complex valued Fourier transform, Add-substract
unit.
1.INTRODUTION: OFDM is a multimode
modulation and multiple access technique used in a
number of commercial wired and wireless
applications. In the wired side, it is used for a variant
of digital subscriber line (D). SLFor wireless, OFDM
is the basis for several television and radio broadcast
applications, including the European digital broadcast
television standard, as well as digital radio in North
America. FUSING platform provides software
control of variety of modulation schemes, wideband
or narrow band operation, communications security
functions such as frequency hopping and waveform
requirements of current and evolving standards over a
broad frequency range. It is viewed as a single radio
platform, providing services to multiple cellular
standards. F AST FOURIER TRANSFORM (FFT) is
widely used in the field of digital signal processing
(DSP) such as filtering, spectral analysis, etc., to
compute the discrete Fourier transform (DFT). FFT
plays a critical role in modern digital
communications such as digital video broadcasting
and orthogonal frequency division multiplexing
(OFDM) systems. Much research has been carried
out on designing pipelined architectures for
computation of FFT of complex valued signals
(CFFT). Various algorithms have been developed to
reduce the computational complexity, of which
Cooley-Tukey radix-2 FFT [1] is very popular.
Note that this is not the only way to represent floating
point numbers, it is just the IEEE standard way of
doing it. Here is what we do: the representation has
three fields:
Proceedings of International Conference On Current Innovations In Engineering And Technology
International Association Of Engineering & Technology For Skill Development
ISBN : 978 - 1502851550
www.iaetsd.in
72

| S | E | F |
----------------------------
S is one bit representing the sign of the number
E is an 8-bit biased integer representing the exponent
F is an unsigned integer
the decimal value represented is:
S e
(-1) x f x 2
where e = E – bias
f = ( F/(2^n) ) + 1
for single precision representation (the emphasis in
this class)
n = 23
bias = 127
for double precision representation (a 64-bit
representation)
n = 52 (there are 52 bits for the mantissa field)
bias = 1023 (there are 11 bits for the exponent field)
2.FUSED FLOATING-POINT ADD-SUBTRACT
UNIT:The floating-point fused add-subtract unit
(Fused AS) performs anaddition and a subtraction in
parallel on the same pair of data. The fused add-
subtract unit is based on a conventional floatingpoint
adder [8]. Although higher speed adder designs are
available(see [9] for example), the basic design
shown here serves todemonstrate the concept. A
block diagram of the fused addsubtractunit is shown
in Fig. 5 (after the initial design from [10]). Some
details, such as the LZA and normalization logic are
omitted here to simplify the figure. The exponent
difference calculation, significand swapping, and the
significand shifting for both the add and the subtract
operations are performed with a single set of
hardware and the results are shared by both the
operations. This significantly reduces the required
circuit area. The significand swapping and shifting is
done based solely on the values of the exponents (i.e.,
without comparing the significands).
To demonstrate the utility of the Fused DP and Fused
AS units for FFT implementation, FFT butterfly unit
designs using both the discrete and the fused units
have been made. First, a radix-2 decimation in
frequency FFT butterfly was designed. All lines carry
complex pairs of 32-bit IEEE-754 numbers and all
operations are complex. The complex add, subtract,
and multiply operations can be realized with a
discrete implementation that uses two real adders to
perform the complex add or subtract and four real
ISBN : 978 - 1502851550
www.iaetsd.in
73

multipliers and two real adders to perform the
complex multiply.
Although there is a multiplicative factor “j ” after the
first stage, the first two stages consists of only real-
valued datapath.We need to just combine the real and
imaginary parts and send it as an input to the
multiplier in the next stage. For this, we do not need a
full complex butterfly stage. The factor is handled in
the second butterfly stage using a bypass logic which
forwards the two samples as real and imaginary parts
to the input of the multiplier. The adder and
subtractor in the butterfly remains inactive during
that time. Scheduling Method 2: Another way of
scheduling is proposed which modifies the
architecture slightly and also reduces the required
number of delay elements. In this scheduling, the
input samples are processed sequentially, instead of
processing the even and odd samples separately. This
can be derived using the following folding sets:
The nodes from A0…A7.represent the eight
butterflies in the first stage of the FFT and B0..B7.
represent the butterflies in the second stage. Assume
the butterflies have only one multiplier at the bottom
output instead of both outputs.
3.HIGH THROUGHPUT FFT
ARCHITECTURE: The proposed architecture
consists of the following main parts, together with
their specific novelties and advantages. (i) A memory
unit composed of 16 dual-port memory banks, which
facilitates 16-way parallel data access. (ii) A memory
bank index and address generation unit (BAGU),
which generates conflict-free and in-place memory
bank indexes and address for the radix-16 FFT
operation. (iii) Four commutator blocks located in
front of the input side and after the output side of the
memory, provide efficient data routing mechanism
which is governed by the BAGU signals. (iv) A
scaling unit (SU) coordinates controlled scaling
operations for block floating point (BFP) operations,
which generates higher signal-to-quantization noise
ratio (SQNR) than the existing designs. (v) The
kernel processing engine, which is a high
performance computing engine for radix-16
butterfly operations. Four radix-16 PEs (i.e., PE_R16
0 through PE_R16 3), two sets of radix 2 PEs (each
set contains four radix-2 PEs), and four sets of
complex multipliers(each contains four complex
multipliers) for twiddle factor multiplications. Those
multipliers are optimized with the help of common-
subexpression sharing technique and a new twiddle-
ISBN : 978 - 1502851550
www.iaetsd.in
74

factor multiplication scheme. All the function units
inside the kernel processing engine are detailed. To
avoid possible conflicts in simultaneously reading (or
writing) 16 data from (or to) the memory banks
during FFT operations, a proper memory addressing
scheme is necessary. The well-known non-conflict
memory addressing schemes [5], [7] are only
applicable to radix-2 FFT algorithm. Although the
addressing scheme in [6] is for general radix- FFT
operations, its FFT size should be a power-of-
number. Besides, those schemes are only limited to
single-PE architecture. On the other hand, the radix-2
addressing scheme for multiple PEs [16] is relatively
inefficient compared with higher-radix schemes. The
proposed scheme has three special features. First, it
ensures conflict-free FFT butterfly executions during
the entire FFT operation. Second, it supports parallel
data outputs with normal ordering. This feature is
always desirable for providing immediate and
normal-order FFT outputs to the succeeding
functional blocks, such as channel estimator for
timely operations. Thirdly, like many other designs,
the in-place FFT computation strategy is also adopted
for low memory overhead consideration.
4. REORDERING OF THE OUTPUT SAMPLES:
Reordering of the output samples is an inherent
problem in FFT computation. The outputs are
obtained in the bit-reversal order [5] in the serial
architectures. In general the problem is solved using a
memory of size . Samples are stored in the memory
in natural order using a counter for the addresses and
then they are read in bit-reversal order by reversing
the bits of the counter. In embedded DSP systems,
special memory addressing schemes are developed to
solve this problem. But in case of real-time systems,
this will lead to an increase in latency and area. The
order of the output samples in the proposed
architectures is not in the bit-reversed order. The
output order changes for different architectures
because of different folding sets/scheduling schemes.
5.BOOTH ENCODER FOR
MULTIPLICATION:
We use the sign extension circuitry developed in [2]
and [3]. The conventional MBE partial product array
has two drawbacks: 1) an additional partial product
term at the (n-2)th bit position; 2) poor performance
at the LSB-part compared with the non-Booth design
when using the TDM algorithm. To remedy the two
drawbacks, the LSB part of the partial product array
is modified. Referring to theory, the Row_LSB (gray
circle) and the Neg_cin terms are combined and
further simplified using Boolean minimization. All
these are efficiently implemented using this advanced
modified booth algorithm. Below figure shows the
architecture of the commonly used modified Booth
multiplier. The inputs of the multiplier are multiplicand
X and multiplier Y. The Booth encoder encodes input Y
and derives the encoded signals as shown in below
figure The Booth decoder generates the partial products
according to the logic diagram using the encoded signals
and the other input X. The carry save tree computes the
last two rows by adding the generated partial products.
The last two rows are added to generate the final
multiplication results using the carry save addition.
Fig: Modified Booth Encoder
ISBN : 978 - 1502851550
www.iaetsd.in
75

6.RESULT:
7. CONCLUSION: Finally, This paper describes the
design of two new fused floating-point arithmetic
units and their application to the implementation of
FFT butterfly operations. Although the fused add-
subtract unit is specific to FFT applications, the fused
dot product is applicable to a wide variety of signal
processing applications. Both the fused dot product
unit and the fused add-subtract unit are smaller than
parallel implementations constructed with discrete
floating-point adders and multipliers. The fused dot
product is faster than the conventional
implementation, since rounding and normalization is
not required as a part of each multiplication. Due to
longer interconnections, the fused add-subtract unit is
slightly slower than the discrete implementation. An
efficient and more flexible architecture of FFT is
designed and experimental results are obtained with
XILINX.
REFERENCES:
[1] J. W. Cooley and J. Tukey, “An algorithm for
machine calculation of complex fourier series,”
Math. Comput., vol. 19, pp. 297–301, Apr. 1965.
[2] A. V. Oppenheim, R.W. Schafer, and J.R.Buck,
Discrete-Time Singal Processing, 2nd ed. Englewood
Cliffs, NJ: Prentice-Hall, 1998.
[3] P. Duhamel, “Implementation of split-radix FFT
algorithms for complex, real, and real-symmetric
data,” IEEE Trans. Acoust., Speech, Signal Process.,
vol. 34, no. 2, pp. 285–295, Apr. 1986. [4] S. He and
M. Torkelson, “A new approach to pipeline FFT
processor,” in Proc. of IPPS, 1996, pp. 766–770.
[5] L. R. Rabiner and B. Gold, Theory and
Application of Digital Signal Processing. Englewood
Cliffs, NJ: Prentice-Hall, 1975.
[6] E. H. Wold and A. M. Despain, “Pipeline and
parallel-pipeline FFT processors for VLSI
implementation,” IEEE Trans. Comput.,vol. C-33,
no. 5, pp. 414–426, May 1984.
[7] A. M. Despain, “Fourier transfom using CORDIC
iterations,” IEEE Trans. Comput., vol. C-233, no. 10,
pp. 993–1001, Oct. 1974.
[8] E. E. Swartzlander, W. K. W. Young, and S. J.
Joseph, “A radix-4 delay commutator for fast Fourier
transform processor implementation,” IEEE J. Solid-
State Circuits, vol. SC-19, no. 5, pp. 702–709, Oct.
1984.
[9] E. E. Swartzlander, V. K. Jain, and H. Hikawa,
“A radix-8 wafer scale FFT processor,” J. VLSI
Signal Process., vol. 4, no. 2/3, pp. 165–176, May
1992.
[10] G. Bi and E. V. Jones, “A pipelined FFT
processor for word-sequential data,” IEEE Trans.
ISBN : 978 - 1502851550
www.iaetsd.in
76

Acoust., Speech, Signal Process., vol. 37, no. 12, pp.
1982–1985, Dec. 1989.
[11] Y. W. Lin, H. Y. Liu, and C. Y. Lee, “A 1-GS/s
FFT/IFFT processor for UWB applications,” IEEE J.
Solid-State Circuits, vol. 40, no. 8, pp. 1726–1735,
Aug. 2005.
[12] J. Lee, H. Lee, S. I. Cho, and S. S. Choi, “A
High-Speed two parallel radix- FFT/IFFT processor
for MB-OFDM UWB systems,” in Proc. IEEE Int.
Symp. Circuits Syst., 2006, pp. 4719–4722.
[13] J. Palmer and B. Nelson, “A parallel FFT
architecture for FPGAs,” Lecture Notes Comput. Sci.,
vol. 3203, pp. 948–953, 2004.
[14] M. Shin and H. Lee, “A high-speed four parallel
radix- FFT/IFFT processor for UWB applications,” in
Proc. IEEE ISCAS, 2008, pp. 960–963.
[15] M. Garrido, “Efficient hardware architectures
for the computation of the FFT and other related
signal processing algorithms in real time,”Ph.D.
dissertation, Dept. Signal, Syst., Radio commun.,
Univ. Politecnica Madrid, Madrid, Spain, 2009.
ISBN : 978 - 1502851550
www.iaetsd.in
77

Iaetsd pipelined parallel fft architecture through folding transformation

More Related Content

What's hot

Similar to Iaetsd pipelined parallel fft architecture through folding transformation

More from Iaetsd Iaetsd

Recently uploaded

Iaetsd pipelined parallel fft architecture through folding transformation