VLSI IMPLEMENTATION OF
OFDM
Orthogonal Frequency Division Multiplexing or OFDM is a modulation
format that is being used for many of the latest wireless and
telecommunications standards.
OFDM has been adopted in the Wi-Fi arena where the standards like
802.11a, 802.11n, 802.11ac and more. It has also been chosen for the
cellular telecommunications standard LTE / LTE-A, and in addition to
this it has been adopted by other standards such as WiMAX and many
more.
Orthogonal frequency division multiplexing has also been adopted for a
number of broadcast standards from DAB Digital Radio to the Digital
Video Broadcast standards, DVB.
Although OFDM, orthogonal frequency division multiplexing is more
complicated than earlier forms of signal format, it provides some
distinct advantages in terms of data transmission, especially where high
data rates are needed along with relatively wide bandwidths.
WHAT IS OFDM?
An OFDM signal consists of a number of closely spaced modulated carriers.
When modulation of any form - voice, data, etc. is applied to a carrier, then
sidebands spread out either side.
It is necessary for a receiver to be able to receive the whole signal to be able
to successfully demodulate the data. As a result when signals are
transmitted close to one another they must be spaced so that the receiver
can separate them using a filter and there must be a guard band between
them.
This is not the case with OFDM. Although the sidebands from each carrier
overlap, they can still be received without the interference that might be
expected because they are orthogonal to each another. This is achieved by
having the carrier spacing equal to the reciprocal of the symbol period.
Traditional View Of
Receiving Signals
Carrying Modulation
To see how OFDM works, it is necessary to look at the receiver. This acts
as a bank of demodulators, translating each carrier down to DC. The
resulting signal is integrated over the symbol period to regenerate the
data from that carrier. The same demodulator also demodulates the other
carriers. As the carrier spacing equal to the reciprocal of the symbol
period means that they will have a whole number of cycles in the symbol
period and their contribution will sum to zero - in other words there is no
interference contribution.
One requirement of the OFDM transmitting and receiving systems is that
they must be linear. Any non-linearity will cause interference between the
carriers as a result of inter-modulation distortion. This will introduce
unwanted signals that would cause interference and impair the
orthogonality of the transmission.
OFDM Spectrum
DATA ON OFDM
The data to be transmitted on an OFDM signal is spread across the carriers of the
signal, each carrier taking part of the payload. This reduces the data rate taken by
each carrier. The lower data rate has the advantage that interference from
reflections is much less critical. This is achieved by adding a guard band time or
guard interval into the system. This ensures that the data is only sampled when the
signal is stable and no new delayed signals arrive that would alter the timing and
phase of the signal.
The distribution of the data across a large number of carriers in the OFDM signal has
some further advantages. By using error-coding techniques, which does mean
adding further data to the transmitted signal, it enables many or all of the corrupted
data to be reconstructed within the receiver.
BASIC PRINCIPLE OF OFDM
Product modulator
Sub-carrier
Sub-carrier
Sub-carrier
Separate local oscillators to generate each individual sub-carrier
OFDM SYSTEM
CORRELATION RECEIVER
OFDM ADVANTAGES
OFDM can easily adapt to severe channel conditions without the need for
complex channel equalisation algorithms being employed
It is robust when combatting narrow-band co-channel interference. As
only some of the channels will be affected, not all data is lost and error
coding can combat this.
Intersymbol interference, ISI is less of a problem with OFDM because low
data rates are carried by each carrier.
Provides high levels of spectral efficiency.
Relatively insensitive to timing errors
Allows single frequency networks to be used - particularly important for
broadcasters where this facility gives a significant improvement in spectral
usage.
OFDM DISADVANTAGES
High peak-to average-power ratio (PAPR) This put high
demand on linearity in amplifiers.
Phase noise error cause degradation to OFDM system
Very sensitive time frequency synchronization
OFDM TRANSCEIVER AND IMPLEMENTATION
Data input to
Transmitter
Scrambler
Coding
Inter
leaving
QPSK
Mapping
Output Of
Transmitter
Serial to
parallel
Parallel
to serial
Add cyclic
Extension and
windowing
IFFT (Tx)
FFT(Rx)
QPSK
Demapping
De-inter
leaving
Equalizer
Decoding
Channel
Estimate
DeScrambler
Parallel
To Serial
Serial To
Parallel
Data received
Synchronization
Remove
Cyclic
Extension
Input to Receiver
SCRAMBLER(RANDOMIZER)
In the proposed design, a standard 7 bit scrambler has been used to
randomize the incoming bits.
INTERLEAVER
Two memory elements (usually RAMs) are used. In the first RAM the incoming block
of bits is stored in sequential order. This data from the first RAM is read out
randomly (using an algorithm) so that the bits are re-arranged and stored in the
second RAM and then read out.
The three building blocks of the interleaver are:
Block Memory
Controller
Address ROM
The job of the controller is to guide the incoming block of data to the
correct memory blocks, to switch the RAMs between reading and writing
modes, and to switch between the two RAMs for 16 alternate bits in
writing mode. This is done by using counters.
The address ROM is basically a 64x6 ROM that stores read addresses for
the RAMs.
Counter C is a 3-bit counter that controls switching between either RAM
1A and RAM 2A or RAM 1B and RAM 2B depending upon which RAMs
are in write mode. Counter1 and Counter2 are 5-bit counters after
every 8th count control switches to either Counter1 or Counter2; this
is controlled by Counter C.
CONSTELLATION MAPPER
Signal constellation of QPSK
*
-3m/8
-m/8
m/8
3m/8
Mapping of bits to constellation points
In QPSK two bits make up one symbol.
A ROM is used to store the constellation points. Each constellation point is
represented by 48 bits in binary. In these 48 bits, the most significant 24 bits
represent the real part and the least significant 24 bits represent the
imaginary part.
In both the real and imaginary parts the most significant 8 bits are the integer
part and the least significant 16 bits
represent
the
fractional part. 2s
complement notation has been used to represent negative numbers.
The size of ROM is 4x48. The incoming input bits (2 bits) act as address for the
ROM. Each ROM values in the ROM is a constellation point corresponding to
the data bits which here act as addresses for the ROM.
SERIAL TO PARALLEL MODULE
The data comes serially from the input port SERIN. The parallel data is output
from DOUT port. Output port DRDY is asserted 1 when the start bit, 8 bit data
and the parity bit is received. Output port PERRn is asserted 0 when the parity
bit received is different from the parity generated inside the serial to parallel
circuit. When parity error is detected, the serial to parallel circuit would be reset
before its normal operation can be performed.
17
IFFT DESIGN
64-point Radix-2^2 fixed-point DIT FFT
Since in the proposed design there are 64 sub-carriers so the input to FFT would
be 64 complex numbers, hence a 64 point FFT would be required.
PARALLEL TO SERIAL MODULE
A parallel to serial converter is a special function of shift register. The data is
parallel loaded to the shift register and then shift out bit by bit also is bounded
by a start bit and stop bit.
Data to be transmit is first parallel loaded then transmitted bit by bit by a start
bit of value 1. This is followed by the 8-bit data with the left bit most bit first.
The converter holds the output low when the transmission is completed.
CYCLIC PREFIX ADDER
Causes intercarrier
interference (ICI)
If multipath delay is less than the cyclic prefix no
intersymbol or intercarrier interference
RECEIVER DESIGN AND IMPLEMENTATION
The receiver follows an exact reverse procedure of which was followed in the
transmitter. It receives the complex (modulated) output points and performs
demodulation and recovers the original bits sent to the transmitter.
CYCLIC PREFIX REMOVER
The cyclic prefix was added at the transmitting end in order to avoid
inter-symbol interference, therefore during reception it must be eliminated
for any further processing of the received signal. This is done by simply
skipping the first eight sub-carriers in the received OFDM symbol. In
hardware this is implemented in the control unit. The control unit only
enables the next block (FFT) when the first eight bits of the received OFDM
symbols have been skipped .
FAST FOURIER TRANSFORM
In order to implement FFT in hardware the algorithm is same, only the
difference is that the divider is removed and the real and imaginary parts at the
input are swapped i.e. real becomes imaginary and imaginary becomes real.
Same goes for the output i.e. real and imaginary parts at the output are
swapped as well.
CONSTELLATION DE-MAPPER
Therefore, basically the incoming constellation points are mapped onto the data
points as shown in Table. Can be implemented by direct coding.
DE-INTERLEAVER
De-interleaving performs the inverse task. It re-arranges the interleaved bits into their
original order. De-interleaving is done the same way as Interleaving, the difference being
that the number of rows and the number
of
columns
for
de-interleaving
are
interchanged. Hence the only difference in the hardware architectures of interleaver and
de-interleaver is the contents of the address ROM, which actually provides the read
addresses to the RAM that stores the data to be de-interleaved.
DESCRAMBLER
The above setup simply
descrambles the scrambled
data
VITERBI DECODER
The Viterbi Decoder decodes Convolutional codes. Alteras Viterbi IP core is a
parameterized IP core that is synthesizable and allows for parallel as well as
hybrid implementation of the Viterbi decoder.
BMU
Branch metrics computation unit calculates the hamming distances for the
incoming pair of codes from four possible codes.
ACS
Add, compare and select unit is used to update the path metric for all the 64 states
and select the predecessor. For each of the 64 states, it adds current path metric
and branch metric for both the predecessor states and selects the lower of the two
as the new path metric and the predecessor information is passed on to the SMU
unit.
The width of the Path metric register and the ACS adders and subtractor will
change based on whether a soft-decision or a hard-decision viterbi is ued. It also
depends on the maximum metrics accumulated by metrics registers before a
normalization is done.
VLSI IMPLEMENTATION
Lower gate count compared to DSP+RAM+ROM, hence lower cost.
Low power consumption
DESIGN METHODOLOGY
Early in the development cycle,
different communication and
signal processing algorithms are
evaluated for their performance
under different conditions like
noise, multipath channel and
radio non-linearity. Since most of
these algorithms are coded in "C"
or tools like MATLAB, it is
important to have a verification
mechanism which ensures that
the hardware implementation
(RTL) is same as the "C"
implementation of the algorithm.
The flow is shown in the Figure.
SPECIFICATIONS OF THE OFDM TRANSCEIVER
Data rates to be supported
Range and multipath tolerance
Indoor/Outdoor applications
Multi-mode: 802.11a only or 802.11a+HiperLAN/2
DESIGN TRADE-OFF
Area - Smaller the die size lesser the chip cost
Power - Low power crucial for battery operated mobile devices
Ease of implementation - Easy to debug and maintain
Customizability - Should be customizable to future standards with variations
in OFDM parameters
ALGORITHM SURVEY & SIMULATION
The simulation at algorithmic level is to determine performance of algorithms
for various non-linearitys and imperfections. The algorithms are tweaked and
fine tuned to get the required performance. The following
algorithms/parameters are verified
Channel estimation and compensation for different channel models (Rayleigh,
Rician, JTC, Two ray) for different delay spreads
Correlated performance for different delay spreads and different SNR
Frequency estimation algorithm for different SNR and frequency offsets
Compensation for Phase noise and error in Frequency offset estimation
System tolerance for I/Q phase and amplitude imbalance
FFT simulation to determine the optimum fixed-point widths
Wave shaping filter to get the desired spectrum mask
Determine clipping levels for efficient PA use
Effect of ADC/DAC width on the EVM and optimum ADC/DAC width
FIXED POINT SIMULATION
One of the decisions to be taken early in the design cycle is the format or
representation of data. Floating point implementation results in higher hardware
costs and additional circuits related with normalizing of numbers. Floating point
representation is useful when dealing with data of different ranges. But this
however is not true as the Baseband circuits have a fair idea of the range of values
they will work on. So a fixed-point representation will be more efficient. Further in
fixed point a choice can be made between signed and 2's complement
representation.
The width of representation need not be constant throughout the Baseband and it
depends on the accuracy needed at different points in transmit or receive path. A
small change in the number of bits in the representation could result in a significant
change in the size of arithmetic circuits especially multipliers.
SIMULATION SETUP
The algorithms could be simulated in a variety of tools/languages
like SPW, MATLAB, C or a mix of these.
SPW has an exhaustive floating point and fixed-point library. SPW
also provides feature to plug-in RTL modules and do a cosimulation of SPW system and Verilog. This helps in verifying the
RTL implementation of algorithms against the SPW/C
implementation.
HARDWARE DESIGN
Baseband interfaces with two external modules: MAC and Radio.
INTERFACE TO MAC
Baseband should support the following for MAC
Should support transfer of data at different rates
Transmit and receive control
Register programming for power and frequency control
Following options are available for MAC interface:
Serial data interface Clock provided along with data. Clock speed changes for
different data rates
Varying data width, single speed clock The number of data lines vary according to
the data rate. The clock remains same for all rates.
Single clock, Parallel data with ready indication Clock speed and data width is same
for all data rates. Ready signal used to indicate valid data
INTERFACE TO RADIO
Two kinds of radio interfaces are described below
I/Q interface
On the transmit side, the complex Baseband signal is sent to the radio unit that
first does a Quadrature modulation followed by up-conversion at 5 GHz. On the
receive side, following the down-conversion to IF, Quadrature demodulation is
done and complex I/Q signal is sent to Baseband. Shown below is the interface.
IF interface
The Baseband does the Quadrature modulation and demodulation digitally.
CLOCKING STRATEGY
The 802.11a supports different data rates from 6 Mbps to 54 Mbps. The clock scheme
chosen for the Baseband should be able to support all rates and also result in low power
consumption. We know from our Basic ASIC design guidelines that most circuits should
run at the lowest clock.
Two options are shown below:
Above scheme requires different clock sources or a very high clock rate from which all
these clocks could be generated.
The modules must work for the highest frequency of 54 MHz.
Shown in the figure is a simpler clocking scheme with only one clock speed for all
data rates
Varying duty cycles for different data rates is provided by the data enable signal
All the circuits in the transmit and receive chain work on parallel data (4 bits)
Overhead is the Data enable logic in all the modules
Optimize Usage Of Hardware Resources By
Reusing Different Blocks
Hardware resources can be reused considering the fact that 802.11a system is a halfduplex system. The following blocks are re-used:
FFT/IFFT
Interleaver/De-interleaver
Scrambler/Descrambler
Intermediate data buffers
Since Adders and Multipliers are costly resources, special attention should be given to
reuse them. An example shown below where an Adder/Multiplier pool is created and
different blocks are connected to this.
Optimize the widely used circuits
Identify the blocks that are used at several places (several instances of the same unit)
and optimize them. Optimization can be done for power and area. Some of the
circuits that can be optimized are:
Multipliers
They are the most widely used circuits. Synthesis tools usually provide highly
optimized circuits for multipliers and adders.
In case optimized multipliers are not available, multipliers could be designed using
different techniques.
ACS unit
There are 64 instantiations of ACS unit in the Viterbi decoder. Optimization of ACS
unit results in significant savings.
Custom cell design (using foundry information) for adders and comparators could
be considered.
THANK YOU