IOSR Journal of Engineering (IOSRJEN) www.iosrjen.org
ISSN (e): 2250-3021, ISSN (p): 2278-8719
Vol. 05, Issue 04 (April. 2015), ||V1|| PP 01-05
International organization of Scientific Research 1 | P a g e
Design And Implementation Of Modified Booth Recoder Using
Fused Add Multiply Operator
S.Vijaya, Mr.k.Santhakumar,
ME(VLSI Design),PG Scholder, Nandha Enggineering college, perundurai, erode
Associate professor of ECE department,Nandha Enggineering college, perundurai, erode.
Abstract: - Complex arithmetic operations are widely used in Digital Signal Processing (DSP) applications.
This paper presents an efficient design of modified booth multiplier and then also implements it. Low-cost finite
impulse response (FIR) designs are presented using the concept of faithfully rounded truncated multipliers. In
this work, focus on optimizing the design of the fused Add-Multiply (FAM) operator for increasing
performance.
In this project introduce a structured and efficient recoding technique and explore three different schemes by
incorporating them in FAM designs. Comparing them with the FAM designs which use existing recoding
schemes, the proposed technique yields considerable reductions in terms of critical delay, hardware complexity
and power consumption of the FAM unit.
Keywords: - Add multiply operation, Modified Booth Recoding, FIR filter design.
I. INTRODUCTION
Modern consumer electronics make extensive use of Digital Signal Processing (DSP) providing custom
accelerators for the domains of multimedia, communications etc. Recent research activities in the field of
arithmetic optimization [1], [2] have shown that the design of arithmetic components combining operations
which share data, can lead to significant performance improvements. Based on the observation that an addition
can often be subsequent to a multiplication symmetric FIR filters), the Multiply-Accumulator (MAC) and
Multiply-Add (MAD) units were introduced [3] leading to more efficient implementations of DSP algorithms
compared to the conventional ones, which use only primitive resources [4]. In [12], the author proposes a two-
stage recoder which converts a number in carry-save form to its MB representation.
Although the direct recoding of the sum of two numbers in its MB form leads to a more efficient
implementation of the fused Add-Multiply (FAM) unit compared to the conventional one, existing recoding
schemes are based on complex manipulations in bit-level, which are implemented by dedicated circuits in gate-
level. More specifically, propose a new recoding technique which decreases the critical path delay and reduces
area and power consumption. The proposed S-MB algorithm is structured, simple and can be easily modified in
order to be applied either in signed (in 2’s complement representation) or unsigned numbers, which comprise of
odd or even number of bits. We explore three alternative schemes of the proposed S-MB approach using
conventional and signed-bit Full Adders (FAs) and Half Adders (HAs) as building blocks. The proposed
recoding technique delivers optimized solutions for the FAM design enabling the targeted operator to be timing
functional (no timing violations) for a larger range of frequencies. Also, under the same timing constraints, the
proposed designs deliver improvements in both area occupation and power consumption, thus outperforming the
existing S-MB recoding solutions.
An important design issue of FIR filter implementation is the optimization of the bit widths for filter
coefficients, which has direct impact on the area cost of arithmetic units and registers. Moreover, since the bit
widths after multiplications grow, many DSP applications do not need full-precision outputs. Instead, it is
desirable to generate faithfully rounded outputs where the total error introduced in quantization and rounding is
no more than one unit of the last place (ulp) defined as the weighting of the least significant bit (LSB) of the
outputs. In this brief, we present low-cost implementations of FIR filters based on the direct structure in Fig.
1(a) with faithfully rounded truncated multipliers. The MCMA module is realized by accumulating all the
partial products (PPs) where unnecessary PP bits (PPBs) are removed without affecting the final precision of the
outputs. The bit widths of all the filter coefficients are minimized using non uniform quantization with unequal
word lengths in order to reduce the hardware cost while still satisfying the specification of the frequency
response.
Design And Implementation of Modified Booth Recoder Using Fused Add Multiply Operator
International organization of Scientific Research 2 | P a g e
II. MOTIVATION AND FUSED AM IMPLEMENTATION
A. Motivation
In this paper, focus on AM units which implement the operation Z=X. (A+B). The conventional design
of the AM operator (Fig. 1(a)) requires that its inputs A and B are first driven to an adder and then the input X
and the sum Y=A+B are driven to a multiplier in order to get Z. The drawback of using an adder is that it inserts
a significant delay in the critical path of the AM. As there are carry signals to be propagated inside the adder, the
critical path depends on the bit-width of the inputs. In order to decrease this delay, a Carry-Look-Ahead (CLA)
adder can be used which, however, increases the area occupation and power dissipation. As a result, significant
area savings are observed and the critical path delay of the recoding process is reduced and decoupled from the
bit-width of its inputs. In this work, we present a new technique for direct recoding of two numbers in the MB
representation of their sum.
B. Review of the Modified Booth Form
Modified Booth (MB) is a prevalent form used in multiplication [15], [20], [24]. It is a redundant
signed-digit radix-4 en-coding technique. Its main advantage is that it reduces by half the number of partial
products in multiplication comparing to any other radix-2 representation.
Fig. 1.FAM operator based on the (a) conventional design and (b) Implementation with truncated multiplier.
The multiplier is a basic parallel multiplier based on the MB algorithm. The terms CT, CSA Tree and CLA
Adder are referred to the Correction Term, the Carry-Save Adder Tree and the final Carry-Look-Ahead Adder
of the multiplier.
The most significant of them is negatively weighted while the two least significant of them have
positive weight. Consequently, in order to transform the two aforementioned pairs of bits in MB form we need
to use signed-bit arithmetic. For this purpose, we develop a set of bit-level signed Half Adders (HA) and Full
Adders (FA) considering their inputs and outputs to be signed.
III. FIR FILTER IMPLEMENTATION
FINITE impulse response (FIR) digital filter is one of the fundamental components in many digital
signal processing (DSP) and communication systems. It is also widely used in many portable applications with
limited area and power budget.
A general FIR filter of order M can be expressed as
y[n] =M−1_i=0 a ix[n − i].
There are two basic FIR structures, direct form and transposed form, as shown in Fig. 1 for a linear-
phase even-order FIR filter. In the direct form in Fig. 1(a), the multiple constant multiplication
(MCM)/accumulation (MCMA) module performs the concurrent multiplications of individual delayed signals
and respective filter coefficients, followed by accumulation of all the products. Thus, the operands of the
multipliers in MCMA are delayed input signals x[n − i] and coefficients
In the transposed form in Fig. 1(b), the operands of the multipliers in the MCM module are the current
input signal x[n] and coefficients. The results of individual constant multiplications go through structure adders
(SAs) and delay elements. In the past decades, there are many papers on the designs and implementations of
low-cost or high-speed FIR filters [1]–[13], [15]–[19]. In order to avoid costly multipliers, most prior hardware
implementations of digital FIR filters can be divided into two categories: multiplier less based and memory
based.
Design And Implementation of Modified Booth Recoder Using Fused Add Multiply Operator
International organization of Scientific Research 3 | P a g e
Fig-2 Stages of digital FIR filter design and implementation
An important design issue of FIR filter implementation is the optimization of the bit widths for filter
coefficients, which has direct impact on the area cost of arithmetic units and registers. Moreover, since the bit
widths after multiplications grow, many DSP applications do not need full-precision outputs. Instead, it is
desirable to generate faithfully rounded outputs where the total error introduced in quantization and rounding is
no more than one unit of the last place (ulp) defined as the weighting of the least significant bit (LSB) of the
outputs.
IV. COEFFICIENT QUANTIZATION AND OPTIMIZATION
A generic flow of FIR filter design and implementation can be divided into three stages: finding filter
order and coefficients, coefficient quantization, and hardware optimization, as shown in Fig. 2. In the first stage,
the filter order and the corresponding coefficients of infinite precision are determined to satisfy the specification
of the frequency response. Then, the coefficients are quantized to finite bit accuracy. Finally, various
optimization approaches such as CSE are used to minimize the area cost of hardware implementations. Most
prior FIR filter implementations focus on the hardware optimization stage.
After FIR filter operations, the output signals have larger bit width due to bit width expansion after
multiplications. In many practical situations, only partial bits of the full-precision outputs are needed. For
example, assuming that the input signals of the FIR filter have 12 bits and the filter coefficients are quantized to
10 bits, the bit width of the resultant FIR filter output signals is at least 22 bits, but we might need only the 12
most significant bits for subsequent processing.
Our proposed FIR filter design has four versions. MCMA is the baseline implementation using
combined PP compression [similar to Fig. 4(b)] with uniformly quantized coefficients. MCMA_opt is an
improved version by adopting the non uniform quantization in Fig. 3 for coefficient optimization. MCMAT_I
and MCMAT_II faithfully truncate PPBs using the approaches in Fig. 6(a) and (b), respectively.
Although the area costs of the proposed designs are significantly reduced, the critical path delay is
increased because all the operations in the MCMA are executed within one clock cycle. It is possible to reduce
the delay by adding pipeline registers in the PP compression as suggested in [17], where the major goal is to
minimize the number of FAs, HAs, and registers (including algorithmic registers and pipelined registers) using
integer linear programming. In this brief, we focus on low-cost FIR filter designs with moderate speed
performance for mobile applications where area and power are important design considerations.
V. EXPERIMENTAL RESULTS AND COMPARISONS
In multiplier less designs with transposed structure, CSE can effectively reduce the number of adders in
MCM compared with CSD recoding. Non recursive signed CSE (NRSCSE) [1] and multi root binary partition
graph (MBPG) [2] belong to the category of CSE methods. Note that SAs are not optimized.In [11], the constant
multiplication is realized by storing the odd multiples of the constant in LUT implemented with dual-port
segmented memory sharing memory cells. This approach needs full-custom design of the LUT circuits.
Fig. 4. Result analysis of FIR filter implementation of FAM techniques.
Design And Implementation of Modified Booth Recoder Using Fused Add Multiply Operator
International organization of Scientific Research 4 | P a g e
A. Results Evaluvation
Table 1 comparison output for existing and proposed system
Most of prior FIR filter designs are based on the transposed structure because the major goal is to
minimize the cost of adders in MCM that takes less than 20% of the total area. Indeed, the MCM cost in
transposed-form NRSCSE and MBPG (and with further coefficient optimization [18], [19]) can be effectively
reduced.
Fig. 5. (a) Area, (b) delay, and (c) power of the proposed designs for filter C.
VI. CONCLUSION
This brief has presented low-cost FIR filter designs by jointly considering the optimization of
coefficient bit width and hardware resources in implementations. Although most prior designs are based on the
transposed form, we observe that the direct FIR structure with faithfully rounded MCMAT leads to the smallest
area cost and power consumption.
REFERENCES
[1] M. M. Peiro, E. I. Boemo, and L. Wanhammar, ―Design of high-speed multiplier less filters using a non
recursive signed common sub expression algorithm,‖ IEEE Trans. Circuits Syst. II,Analog Digit. Signal
Process., vol. 49, no. 3, pp. 196–203, Mar. 2002.
[2] C.-H. Chang, J. Chen, and A. P. Vinod, ―Information theoretic approach to complexity reduction of FIR
filter design,‖ IEEE Trans. Circuits Syst.I, Reg. Papers, vol. 55, no. 8, pp. 2310–2321, Sep. 2008.
[3] F. Xu, C. H. Chang, and C. C. Jong, ―Contention resolution—A new approach to versatile subexpressions
sharing in multiple constant multiplications,‖ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 2, pp.
559–571, Mar. 2008.
[4] F. Xu, C. H. Chang, and C. C. Jong, ―Contention resolution algorithms for common subexpression
elimination in digital filter design,‖ IEEE Trans.Circuits Syst. II, Exp. Briefs, vol. 52, no. 10, pp. 695–
700, Oct. 2005.
[5] I.-C. Park and H.-J. Kang, ―Digital filter synthesis based on an algorithm to generate all minimal signed
digit representations,‖ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 12, pp.
1525–1529, Dec. 2002.
[6] C.-Y. Yao, H.-H. Chen, T.-F. Lin, C.-J. J. Chien, and X.-T. Hsu, ―A novel common-subexpression-
elimination method for synthesizing fixed-point FIR filters,‖ IEEE Trans. Circuits Syst. I, Reg. Papers,
vol. 51, no. 11, pp. 2215–2221, Sep. 2004.
[7] O. Gustafsson, ―Lower bounds for constant multiplication problems,‖ IEEE Trans. Circuits Syst. II, Exp.
Briefs, vol. 54, no. 11, pp. 974–978, Nov. 2007.
[8] Y. Voronenko and M. Puschel, ―Multiplierless multiple constant multiplication,‖ ACM Trans.
Algorithms, vol. 3, no. 2, pp. 1–38, May 2007.
Design And Implementation of Modified Booth Recoder Using Fused Add Multiply Operator
International organization of Scientific Research 5 | P a g e
[9] D. Shi and Y. J. Yu, ―Design of linear phase FIR filters with high probability of achieving minimum
number of adders,‖ IEEE Trans. Circuits Syst.I, Reg. Papers, vol. 58, no. 1, pp. 126–136, Jan. 2011.
[10] R. Huang, C.-H. H. Chang, M. Faust, N. Lotze, and Y. Manoli, ―Signextension avoidance and word-
length optimization by positive-offset representation for FIR filter design,‖ IEEE Trans. Circuits Syst. II,
Exp. Briefs, vol. 58, no. 12, pp. 916–920, Oct. 2011.
[11] P. K. Meher, ―New approach to look-up-table design and memory-based realization of FIR digital filter,‖
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 3, pp. 592–603, Mar. 2010.
[12] P. K. Meher, S. Candrasekaran, and A. Amira, ―FPGA realization of FIR filters by efficient and flexible
systolization using distributed arithmetic,‖ IEEE Trans. Signal Process., vol. 56, no. 7, pp. 3009–3017,
Jul. 2008.
[13] S. Hwang, G. Han, S. Kang, and J.-S. Kim, ―New distributed arithmetic algorithm for low-power FIR
filter implementation,‖ IEEE Signal Process. Lett., vol. 11, no. 5, pp. 463–466, May 2004.

More Related Content

PDF
F1074145
PDF
Implementation of High Speed & Area Efficient Modified Booth Recoder for Effi...
PDF
Optimized FIR filter design using Truncated Multiplier Technique
PDF
Implementation of High Speed Low Power 16 Bit BCD Multiplier Using Excess-3 C...
PDF
IRJET- The RTL Model of a Reconfigurable Pipelined MCM
PDF
N046018089
PDF
Transpose Form Fir Filter Design for Fixed and Reconfigurable Coefficients
PDF
Design and Implementation of a Programmable Truncated Multiplier
F1074145
Implementation of High Speed & Area Efficient Modified Booth Recoder for Effi...
Optimized FIR filter design using Truncated Multiplier Technique
Implementation of High Speed Low Power 16 Bit BCD Multiplier Using Excess-3 C...
IRJET- The RTL Model of a Reconfigurable Pipelined MCM
N046018089
Transpose Form Fir Filter Design for Fixed and Reconfigurable Coefficients
Design and Implementation of a Programmable Truncated Multiplier

What's hot (20)

PDF
Comparative study of selected subcarrier index modulation OFDM schemes
PDF
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFT
DOCX
High performance nb-ldpc decoder with reduction of message exchange
PDF
IRJET - Distributed Arithmetic Method for Complex Multiplication
PDF
I1035563
PDF
A Review of Different Methods for Booth Multiplier
DOCX
Flexible dsp accelerator architecture exploiting carry save arithmetic
PDF
Design of Optimized FIR Filter Using FCSD Representation
PDF
Efficient implementation of bit parallel finite
PDF
Efficient implementation of bit parallel finite field multipliers
PDF
Sparse channel estimation by pilot allocation in MIMO-OFDM systems
PDF
AN EFFICIENT DSP ARCHITECTURE DESIGN IN FPGA USING LOOP BACK ALGORITHM
PDF
IRJET- Comparison of Different PAPR Reduction Schemes in OFDM System
PDF
Optimized OFDM Model Using CMA Channel Equalization for BER Evaluation
PDF
Design and implementation of DA FIR filter for bio-inspired computing archite...
PDF
Performance Evaluation of Iterative Receiver using 16-QAM and 16-PSK Modulati...
PDF
VLSI Implementation of High Speed & Low Power Multiplier in FPGA
PDF
IRJET- High Speed Multi-Rate Approach based Adaptive Filter using Multiplier-...
Comparative study of selected subcarrier index modulation OFDM schemes
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFT
High performance nb-ldpc decoder with reduction of message exchange
IRJET - Distributed Arithmetic Method for Complex Multiplication
I1035563
A Review of Different Methods for Booth Multiplier
Flexible dsp accelerator architecture exploiting carry save arithmetic
Design of Optimized FIR Filter Using FCSD Representation
Efficient implementation of bit parallel finite
Efficient implementation of bit parallel finite field multipliers
Sparse channel estimation by pilot allocation in MIMO-OFDM systems
AN EFFICIENT DSP ARCHITECTURE DESIGN IN FPGA USING LOOP BACK ALGORITHM
IRJET- Comparison of Different PAPR Reduction Schemes in OFDM System
Optimized OFDM Model Using CMA Channel Equalization for BER Evaluation
Design and implementation of DA FIR filter for bio-inspired computing archite...
Performance Evaluation of Iterative Receiver using 16-QAM and 16-PSK Modulati...
VLSI Implementation of High Speed & Low Power Multiplier in FPGA
IRJET- High Speed Multi-Rate Approach based Adaptive Filter using Multiplier-...
Ad

Viewers also liked (11)

DOCX
An optimized modified booth recoder for efficient design of the add multiply ...
PDF
A Single-Phase Clock Multiband Low-Power Flexible Divider
PDF
PDF
VHDL Implementation of Flexible Multiband Divider
PDF
enhancement of low power pulse triggered flip-flop design based on signal fee...
PDF
A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast ...
PDF
Booth multiplication
PPTX
Booths algorithm for Multiplication
PPTX
Wallace tree multiplier.pptx1
PPT
Booth Multiplier
PPT
Booths Multiplication Algorithm
An optimized modified booth recoder for efficient design of the add multiply ...
A Single-Phase Clock Multiband Low-Power Flexible Divider
VHDL Implementation of Flexible Multiband Divider
enhancement of low power pulse triggered flip-flop design based on signal fee...
A High Speed Wallace Tree Multiplier Using Modified Booth Algorithm for Fast ...
Booth multiplication
Booths algorithm for Multiplication
Wallace tree multiplier.pptx1
Booth Multiplier
Booths Multiplication Algorithm
Ad

Similar to A05410105 (20)

PDF
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...
PDF
Ae4101177181
PDF
FIR FILTER DESIGN USING MCMA TECHNIQUE
PDF
Memory Based Hardware Efficient Implementation of FIR Filters
PDF
Design of Area Efficient Digital FIR Filter using MAC
PDF
Implementation and validation of multiplier less fpga based digital filter
PDF
Design of Multiplier Less 32 Tap FIR Filter using VHDL
PDF
D0341015020
PDF
International Journal of Computational Engineering Research(IJCER)
PDF
Low complexity digit serial fir filter by multiple constant multiplication al...
PDF
Fast Multiplier for FIR Filters
PDF
FPGA Implementation of High Speed FIR Filters and less power consumption stru...
PDF
A High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
PPTX
Boothmultiplication
PDF
FPGA Based Design of 32 Tap Band Pass FIR Filter Using Multiplier- Less Techn...
DOCX
Novel design algorithm for low complexity programmable fir filters based on e...
PDF
International Journal of Computational Engineering Research(IJCER)
PDF
International Journal of Computational Engineering Research(IJCER)
PDF
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
PDF
Design of 4:2 Compressor for Parallel Distributed Arithmetic FIR Filter
Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filt...
Ae4101177181
FIR FILTER DESIGN USING MCMA TECHNIQUE
Memory Based Hardware Efficient Implementation of FIR Filters
Design of Area Efficient Digital FIR Filter using MAC
Implementation and validation of multiplier less fpga based digital filter
Design of Multiplier Less 32 Tap FIR Filter using VHDL
D0341015020
International Journal of Computational Engineering Research(IJCER)
Low complexity digit serial fir filter by multiple constant multiplication al...
Fast Multiplier for FIR Filters
FPGA Implementation of High Speed FIR Filters and less power consumption stru...
A High Speed Transposed Form FIR Filter Using Floating Point Dadda Multiplier
Boothmultiplication
FPGA Based Design of 32 Tap Band Pass FIR Filter Using Multiplier- Less Techn...
Novel design algorithm for low complexity programmable fir filters based on e...
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Design of 4:2 Compressor for Parallel Distributed Arithmetic FIR Filter

More from IOSR-JEN (20)

PDF
C05921721
PDF
B05921016
PDF
A05920109
PDF
J05915457
PDF
I05914153
PDF
H05913540
PDF
G05913234
PDF
F05912731
PDF
E05912226
PDF
D05911621
PDF
C05911315
PDF
B05910712
PDF
A05910106
PDF
B05840510
PDF
I05844759
PDF
H05844346
PDF
G05843942
PDF
F05843238
PDF
E05842831
PDF
D05842227
C05921721
B05921016
A05920109
J05915457
I05914153
H05913540
G05913234
F05912731
E05912226
D05911621
C05911315
B05910712
A05910106
B05840510
I05844759
H05844346
G05843942
F05843238
E05842831
D05842227

Recently uploaded (20)

PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PPTX
Internet of Everything -Basic concepts details
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PPTX
Configure Apache Mutual Authentication
DOCX
search engine optimization ppt fir known well about this
PDF
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
Statistics on Ai - sourced from AIPRM.pdf
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
giants, standing on the shoulders of - by Daniel Stenberg
Early detection and classification of bone marrow changes in lumbar vertebrae...
Internet of Everything -Basic concepts details
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
Training Program for knowledge in solar cell and solar industry
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
Comparative analysis of machine learning models for fake news detection in so...
Data Virtualization in Action: Scaling APIs and Apps with FME
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Improvisation in detection of pomegranate leaf disease using transfer learni...
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
Configure Apache Mutual Authentication
search engine optimization ppt fir known well about this
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
Convolutional neural network based encoder-decoder for efficient real-time ob...
Consumable AI The What, Why & How for Small Teams.pdf
Statistics on Ai - sourced from AIPRM.pdf
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
giants, standing on the shoulders of - by Daniel Stenberg

A05410105

  • 1. IOSR Journal of Engineering (IOSRJEN) www.iosrjen.org ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 05, Issue 04 (April. 2015), ||V1|| PP 01-05 International organization of Scientific Research 1 | P a g e Design And Implementation Of Modified Booth Recoder Using Fused Add Multiply Operator S.Vijaya, Mr.k.Santhakumar, ME(VLSI Design),PG Scholder, Nandha Enggineering college, perundurai, erode Associate professor of ECE department,Nandha Enggineering college, perundurai, erode. Abstract: - Complex arithmetic operations are widely used in Digital Signal Processing (DSP) applications. This paper presents an efficient design of modified booth multiplier and then also implements it. Low-cost finite impulse response (FIR) designs are presented using the concept of faithfully rounded truncated multipliers. In this work, focus on optimizing the design of the fused Add-Multiply (FAM) operator for increasing performance. In this project introduce a structured and efficient recoding technique and explore three different schemes by incorporating them in FAM designs. Comparing them with the FAM designs which use existing recoding schemes, the proposed technique yields considerable reductions in terms of critical delay, hardware complexity and power consumption of the FAM unit. Keywords: - Add multiply operation, Modified Booth Recoding, FIR filter design. I. INTRODUCTION Modern consumer electronics make extensive use of Digital Signal Processing (DSP) providing custom accelerators for the domains of multimedia, communications etc. Recent research activities in the field of arithmetic optimization [1], [2] have shown that the design of arithmetic components combining operations which share data, can lead to significant performance improvements. Based on the observation that an addition can often be subsequent to a multiplication symmetric FIR filters), the Multiply-Accumulator (MAC) and Multiply-Add (MAD) units were introduced [3] leading to more efficient implementations of DSP algorithms compared to the conventional ones, which use only primitive resources [4]. In [12], the author proposes a two- stage recoder which converts a number in carry-save form to its MB representation. Although the direct recoding of the sum of two numbers in its MB form leads to a more efficient implementation of the fused Add-Multiply (FAM) unit compared to the conventional one, existing recoding schemes are based on complex manipulations in bit-level, which are implemented by dedicated circuits in gate- level. More specifically, propose a new recoding technique which decreases the critical path delay and reduces area and power consumption. The proposed S-MB algorithm is structured, simple and can be easily modified in order to be applied either in signed (in 2’s complement representation) or unsigned numbers, which comprise of odd or even number of bits. We explore three alternative schemes of the proposed S-MB approach using conventional and signed-bit Full Adders (FAs) and Half Adders (HAs) as building blocks. The proposed recoding technique delivers optimized solutions for the FAM design enabling the targeted operator to be timing functional (no timing violations) for a larger range of frequencies. Also, under the same timing constraints, the proposed designs deliver improvements in both area occupation and power consumption, thus outperforming the existing S-MB recoding solutions. An important design issue of FIR filter implementation is the optimization of the bit widths for filter coefficients, which has direct impact on the area cost of arithmetic units and registers. Moreover, since the bit widths after multiplications grow, many DSP applications do not need full-precision outputs. Instead, it is desirable to generate faithfully rounded outputs where the total error introduced in quantization and rounding is no more than one unit of the last place (ulp) defined as the weighting of the least significant bit (LSB) of the outputs. In this brief, we present low-cost implementations of FIR filters based on the direct structure in Fig. 1(a) with faithfully rounded truncated multipliers. The MCMA module is realized by accumulating all the partial products (PPs) where unnecessary PP bits (PPBs) are removed without affecting the final precision of the outputs. The bit widths of all the filter coefficients are minimized using non uniform quantization with unequal word lengths in order to reduce the hardware cost while still satisfying the specification of the frequency response.
  • 2. Design And Implementation of Modified Booth Recoder Using Fused Add Multiply Operator International organization of Scientific Research 2 | P a g e II. MOTIVATION AND FUSED AM IMPLEMENTATION A. Motivation In this paper, focus on AM units which implement the operation Z=X. (A+B). The conventional design of the AM operator (Fig. 1(a)) requires that its inputs A and B are first driven to an adder and then the input X and the sum Y=A+B are driven to a multiplier in order to get Z. The drawback of using an adder is that it inserts a significant delay in the critical path of the AM. As there are carry signals to be propagated inside the adder, the critical path depends on the bit-width of the inputs. In order to decrease this delay, a Carry-Look-Ahead (CLA) adder can be used which, however, increases the area occupation and power dissipation. As a result, significant area savings are observed and the critical path delay of the recoding process is reduced and decoupled from the bit-width of its inputs. In this work, we present a new technique for direct recoding of two numbers in the MB representation of their sum. B. Review of the Modified Booth Form Modified Booth (MB) is a prevalent form used in multiplication [15], [20], [24]. It is a redundant signed-digit radix-4 en-coding technique. Its main advantage is that it reduces by half the number of partial products in multiplication comparing to any other radix-2 representation. Fig. 1.FAM operator based on the (a) conventional design and (b) Implementation with truncated multiplier. The multiplier is a basic parallel multiplier based on the MB algorithm. The terms CT, CSA Tree and CLA Adder are referred to the Correction Term, the Carry-Save Adder Tree and the final Carry-Look-Ahead Adder of the multiplier. The most significant of them is negatively weighted while the two least significant of them have positive weight. Consequently, in order to transform the two aforementioned pairs of bits in MB form we need to use signed-bit arithmetic. For this purpose, we develop a set of bit-level signed Half Adders (HA) and Full Adders (FA) considering their inputs and outputs to be signed. III. FIR FILTER IMPLEMENTATION FINITE impulse response (FIR) digital filter is one of the fundamental components in many digital signal processing (DSP) and communication systems. It is also widely used in many portable applications with limited area and power budget. A general FIR filter of order M can be expressed as y[n] =M−1_i=0 a ix[n − i]. There are two basic FIR structures, direct form and transposed form, as shown in Fig. 1 for a linear- phase even-order FIR filter. In the direct form in Fig. 1(a), the multiple constant multiplication (MCM)/accumulation (MCMA) module performs the concurrent multiplications of individual delayed signals and respective filter coefficients, followed by accumulation of all the products. Thus, the operands of the multipliers in MCMA are delayed input signals x[n − i] and coefficients In the transposed form in Fig. 1(b), the operands of the multipliers in the MCM module are the current input signal x[n] and coefficients. The results of individual constant multiplications go through structure adders (SAs) and delay elements. In the past decades, there are many papers on the designs and implementations of low-cost or high-speed FIR filters [1]–[13], [15]–[19]. In order to avoid costly multipliers, most prior hardware implementations of digital FIR filters can be divided into two categories: multiplier less based and memory based.
  • 3. Design And Implementation of Modified Booth Recoder Using Fused Add Multiply Operator International organization of Scientific Research 3 | P a g e Fig-2 Stages of digital FIR filter design and implementation An important design issue of FIR filter implementation is the optimization of the bit widths for filter coefficients, which has direct impact on the area cost of arithmetic units and registers. Moreover, since the bit widths after multiplications grow, many DSP applications do not need full-precision outputs. Instead, it is desirable to generate faithfully rounded outputs where the total error introduced in quantization and rounding is no more than one unit of the last place (ulp) defined as the weighting of the least significant bit (LSB) of the outputs. IV. COEFFICIENT QUANTIZATION AND OPTIMIZATION A generic flow of FIR filter design and implementation can be divided into three stages: finding filter order and coefficients, coefficient quantization, and hardware optimization, as shown in Fig. 2. In the first stage, the filter order and the corresponding coefficients of infinite precision are determined to satisfy the specification of the frequency response. Then, the coefficients are quantized to finite bit accuracy. Finally, various optimization approaches such as CSE are used to minimize the area cost of hardware implementations. Most prior FIR filter implementations focus on the hardware optimization stage. After FIR filter operations, the output signals have larger bit width due to bit width expansion after multiplications. In many practical situations, only partial bits of the full-precision outputs are needed. For example, assuming that the input signals of the FIR filter have 12 bits and the filter coefficients are quantized to 10 bits, the bit width of the resultant FIR filter output signals is at least 22 bits, but we might need only the 12 most significant bits for subsequent processing. Our proposed FIR filter design has four versions. MCMA is the baseline implementation using combined PP compression [similar to Fig. 4(b)] with uniformly quantized coefficients. MCMA_opt is an improved version by adopting the non uniform quantization in Fig. 3 for coefficient optimization. MCMAT_I and MCMAT_II faithfully truncate PPBs using the approaches in Fig. 6(a) and (b), respectively. Although the area costs of the proposed designs are significantly reduced, the critical path delay is increased because all the operations in the MCMA are executed within one clock cycle. It is possible to reduce the delay by adding pipeline registers in the PP compression as suggested in [17], where the major goal is to minimize the number of FAs, HAs, and registers (including algorithmic registers and pipelined registers) using integer linear programming. In this brief, we focus on low-cost FIR filter designs with moderate speed performance for mobile applications where area and power are important design considerations. V. EXPERIMENTAL RESULTS AND COMPARISONS In multiplier less designs with transposed structure, CSE can effectively reduce the number of adders in MCM compared with CSD recoding. Non recursive signed CSE (NRSCSE) [1] and multi root binary partition graph (MBPG) [2] belong to the category of CSE methods. Note that SAs are not optimized.In [11], the constant multiplication is realized by storing the odd multiples of the constant in LUT implemented with dual-port segmented memory sharing memory cells. This approach needs full-custom design of the LUT circuits. Fig. 4. Result analysis of FIR filter implementation of FAM techniques.
  • 4. Design And Implementation of Modified Booth Recoder Using Fused Add Multiply Operator International organization of Scientific Research 4 | P a g e A. Results Evaluvation Table 1 comparison output for existing and proposed system Most of prior FIR filter designs are based on the transposed structure because the major goal is to minimize the cost of adders in MCM that takes less than 20% of the total area. Indeed, the MCM cost in transposed-form NRSCSE and MBPG (and with further coefficient optimization [18], [19]) can be effectively reduced. Fig. 5. (a) Area, (b) delay, and (c) power of the proposed designs for filter C. VI. CONCLUSION This brief has presented low-cost FIR filter designs by jointly considering the optimization of coefficient bit width and hardware resources in implementations. Although most prior designs are based on the transposed form, we observe that the direct FIR structure with faithfully rounded MCMAT leads to the smallest area cost and power consumption. REFERENCES [1] M. M. Peiro, E. I. Boemo, and L. Wanhammar, ―Design of high-speed multiplier less filters using a non recursive signed common sub expression algorithm,‖ IEEE Trans. Circuits Syst. II,Analog Digit. Signal Process., vol. 49, no. 3, pp. 196–203, Mar. 2002. [2] C.-H. Chang, J. Chen, and A. P. Vinod, ―Information theoretic approach to complexity reduction of FIR filter design,‖ IEEE Trans. Circuits Syst.I, Reg. Papers, vol. 55, no. 8, pp. 2310–2321, Sep. 2008. [3] F. Xu, C. H. Chang, and C. C. Jong, ―Contention resolution—A new approach to versatile subexpressions sharing in multiple constant multiplications,‖ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 2, pp. 559–571, Mar. 2008. [4] F. Xu, C. H. Chang, and C. C. Jong, ―Contention resolution algorithms for common subexpression elimination in digital filter design,‖ IEEE Trans.Circuits Syst. II, Exp. Briefs, vol. 52, no. 10, pp. 695– 700, Oct. 2005. [5] I.-C. Park and H.-J. Kang, ―Digital filter synthesis based on an algorithm to generate all minimal signed digit representations,‖ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 12, pp. 1525–1529, Dec. 2002. [6] C.-Y. Yao, H.-H. Chen, T.-F. Lin, C.-J. J. Chien, and X.-T. Hsu, ―A novel common-subexpression- elimination method for synthesizing fixed-point FIR filters,‖ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 11, pp. 2215–2221, Sep. 2004. [7] O. Gustafsson, ―Lower bounds for constant multiplication problems,‖ IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 11, pp. 974–978, Nov. 2007. [8] Y. Voronenko and M. Puschel, ―Multiplierless multiple constant multiplication,‖ ACM Trans. Algorithms, vol. 3, no. 2, pp. 1–38, May 2007.
  • 5. Design And Implementation of Modified Booth Recoder Using Fused Add Multiply Operator International organization of Scientific Research 5 | P a g e [9] D. Shi and Y. J. Yu, ―Design of linear phase FIR filters with high probability of achieving minimum number of adders,‖ IEEE Trans. Circuits Syst.I, Reg. Papers, vol. 58, no. 1, pp. 126–136, Jan. 2011. [10] R. Huang, C.-H. H. Chang, M. Faust, N. Lotze, and Y. Manoli, ―Signextension avoidance and word- length optimization by positive-offset representation for FIR filter design,‖ IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 12, pp. 916–920, Oct. 2011. [11] P. K. Meher, ―New approach to look-up-table design and memory-based realization of FIR digital filter,‖ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 3, pp. 592–603, Mar. 2010. [12] P. K. Meher, S. Candrasekaran, and A. Amira, ―FPGA realization of FIR filters by efficient and flexible systolization using distributed arithmetic,‖ IEEE Trans. Signal Process., vol. 56, no. 7, pp. 3009–3017, Jul. 2008. [13] S. Hwang, G. Han, S. Kang, and J.-S. Kim, ―New distributed arithmetic algorithm for low-power FIR filter implementation,‖ IEEE Signal Process. Lett., vol. 11, no. 5, pp. 463–466, May 2004.