Design and FPGA implementation of low-complexity multiuser vector precoders M. Barrenechea, M. Mendicute, L. Barbero, J. Thompson Signal Theory and Communications Area Mondragon Goi Eskola Politeknikoa University of Mondragon TexPoint fonts used in EMF.  Read the TexPoint manual before you delete this box.:  A A A A A A A A A A A A
Outline Vector precoding Fixed sphere encoder Channel matrix pre-processing Simulation results FPGA implementation and optimization Implementation results Summary and conclusions
Vector precoding In uncoordinated receiver scenarios, the use of  precoding  techniques at the base station can allow the separation of users’ information streams. . . . x 1 x 2 x M-1 x M y 2 User 2 y K User K Wireless  K  x  M  channel matrix  H User 1 y 1 Precoder Multiuser MIMO downlink channel s 1 s 2 s K-1 s K . . . Base Station
Vector precoding Linear precoding techniques Main linear approaches: Zero-Forcing:  Regularized: MMSE (WF):
Vector precoding Vector precoding The perturbation vector  a   that minimizes the unscaled transmitted power can be found as: Another approach is to minimize the MMSE (WF-VP):
Vector precoding Solution: search for the closest point in a lattice The problem is similar to maximum likelihood (ML) detection in MIMO systems: The main differences are the following: 1- VP lattice, which is infinite, must be reduced to be implemented. 2- VP search is not affected by noise. 3- Quantization is less critical in VP since both  s  and  a  belong to known sets. 4.- A failure of the search causes bit errors in MIMO detection, whereas it only means a larger unscaled power and a more noisy reception in VP, which may affect BER slightly.
Outline Vector precoding Fixed sphere encoder Channel matrix pre-processing Simulation results FPGA implementation and optimization Implementation results Summary and conclusions
Fixed Sphere Encoder Sphere encoder (SE): Reduces the complexity in comparison to an exhaustive search. The search is constrained to the perturbation vectors  a  belonging to a hypersphere of radius  R  around the signal  s . The triangular vector  , obtained through the Cholesky or QR decomposition of the precoding matrix  is used to enable a recursive search through a tree.
Fixed Sphere Encoder Sphere encoder search tree Sequential algorithm    Suboptimal resource usage. Variable complexity    Variable throughput.
Fixed Sphere Encoder Originally designed for signal detection in MIMO scenarios [Barbero06]. Performs a suboptimum fixed complexity tree search. Tree configuration vector [Barbero06]  L. Barbero,  Rapid prototyping of a fixed-complexity sphere decoder and its application to iterative decoding of turbo-MIMO systems,  PhD dissertation, University of Edinburgh, 2006.
Fixed Sphere Encoder In order to fix the tree, the lattice must be reduced.  The following candidate points (25 per level) have been considered: Real Imaginary %
Outline Vector precoding Fixed sphere encoder Channel matrix pre-processing Simulation results FPGA implementation and optimization Implementation results Summary and conclusions
Channel matrix pre-processing Ordering of the channel matrix Since most of the branches of the SE are going to be removed to design the FSE, the following considerations must be taken: -   The mean number of visited nodes per tree level is inversely  proportional to  . - This effect is more relevant at the top levels of the tree, since  T i  decreases with  i. The following ordering strategies have been considered best: - V-BLAST like iterative algorithm from MIMO detection literature  based on the minimization of the norm of the pseudoinverse of the  precoding matrix. - Simple non-iterative ordering of the columns of the precoding  matrix  according to their norm.
Channel matrix pre-processing Ordering of the channel matrix Averaged values of  u ii   for different levels depending on the ordering: Averaged numbers of evaluated nodes at each level:
Channel matrix pre-processing Effect of ordering on the number of evaluated nodes: 6x6 System with 16-QAM modulation
Outline Vector precoding Fixed sphere encoder Channel matrix pre-processing Simulation results FPGA implementation and optimization Implementation results Summary and conclusions
Simulation results Multiuser setups considered: 4x4 6x6 8x8 Tree configurations: n 4x4   =  [1, 1, 2, 5] n 6x6   =  [1, 1, 1, 2, 3, 4] n 8x8   =  [1, 1, 1, 1, 2, 2, 3, 4] Rayleigh channel, constant per each block. 16-QAM modulation
Simulation results Number of visited nodes:
Simulation results BER comparison:
Outline Vector precoding Fixed sphere encoder Channel matrix pre-processing Simulation results FPGA implementation and optimization Implementation results Summary and conclusions
FPGA implementation and optimization - 6 x 6 system - 16-QAM modulation - Tree configuration vector - 3 pipeline stages - Restricted group (5x5=25 points) of integers instead of the  lattice. - Channel ordering, which is carried out every transmission block, has not been considered. - Distance computation: Implemented VP FSE algorithm PED AED
FPGA implementation and optimization Algorithm implementation Implemented using Xilinx System Generator for DSP
FPGA implementation and optimization Special features PED 6 -The  n 6 =6  closest points to each symbol  s 6  are known beforehand - Due to symmetries, the set of 6 points can be computed by mapping each symbol to its equivalent in the first quadrant and varying the sign of the set for the equivalent point accordingly. PED 5 - 2-point slicer needed to compute the closest 2 points. - First closest point: - Second closest point:   or
FPGA implementation and optimization 274 multipliers required    Prohibitive for low-cost FPGA implementation. A series of hardware optimizations have been proposed to reduce the number of required embedded multipliers. Optimization 1:  Rearrangement of complex multiplications  -  Initial system    4 multipliers and 2 adders - Alternative complex multiplication    3 multipliers and 5 adders - Required number of multipliers after OPT. 1     224 Optimization 2:  Hard quantization  If the values of  u ij /u ii  are quantized to a very small number of bits , and the multiplications required to compute  z i  are implemented using programmable logic, the number of multipliers reduces to  74 , although the number of required slices is slightly incremented. Small degradation is introduced.
FPGA implementation and optimization Optimization 3:  Approximated Euclidean distance Replace the  -norm calculation performed to obtain the PEDs by a simpler method.  1.- The Manhattan distance metric (  ) 2.- The  metric Both of these techniques introduce a small BER performance degradation. However, after the implementation of OPT3 the number of multipliers has been reduced to  30 .
FPGA implementation and optimization Optimization 4:  Simplified 2-point slicer So far, the decision of whether  or  was the second closest point required the computation of both distances. A new technique which does not require of any extra distance calculation has been derived.  Interior If     Edge If   Vertex If   No performance loss after OPT4. The total number of multipliers is  22
Outline Vector precoding Fixed sphere encoder Channel matrix pre-processing Simulation results FPGA implementation and optimization Implementation results Summary and conclusions
FPGA implementation and optimization Summary of results The performance loss derived from the implementation of the optimization strategies is just 0.2 dB at a BER of 10 -4 . As for the HW resources, a reduced-complexity implement-ation has been achieved.
Outline Vector precoding Fixed sphere encoder Channel matrix pre-processing Simulation results FPGA implementation and optimization Implementation results Summary and conclusions
Summary and conclusions -The FSE (fixed sphere encoder) achieves close-to-optimal BER performance for VP systems with: - Reduced complexity.  Only a small subset of tree branches is analyzed in comparison to the SE. - Fixed architecture and throughput. The number of nodes and branches to be computed is fixed and can be paralellized. - A 6x6 FPGA implementation has been presented which achieves a good performance and throughput. - Implementation and optimization issues have been presented which show that the sphere encoder is less sensitive to quantization and suboptimal designed choices than its MIMO detection counterpart.
End Thank you for your attention!! You can send any comments/requests/questions to: Dr. Mikel Mendicute [email_address]

Design and Hardware Implementation of Low-Complexity Multiuser Precoders (ETH Zurich, 2010)

  • 1.
    Design and FPGAimplementation of low-complexity multiuser vector precoders M. Barrenechea, M. Mendicute, L. Barbero, J. Thompson Signal Theory and Communications Area Mondragon Goi Eskola Politeknikoa University of Mondragon TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A A A
  • 2.
    Outline Vector precodingFixed sphere encoder Channel matrix pre-processing Simulation results FPGA implementation and optimization Implementation results Summary and conclusions
  • 3.
    Vector precoding Inuncoordinated receiver scenarios, the use of precoding techniques at the base station can allow the separation of users’ information streams. . . . x 1 x 2 x M-1 x M y 2 User 2 y K User K Wireless K x M channel matrix H User 1 y 1 Precoder Multiuser MIMO downlink channel s 1 s 2 s K-1 s K . . . Base Station
  • 4.
    Vector precoding Linearprecoding techniques Main linear approaches: Zero-Forcing: Regularized: MMSE (WF):
  • 5.
    Vector precoding Vectorprecoding The perturbation vector a that minimizes the unscaled transmitted power can be found as: Another approach is to minimize the MMSE (WF-VP):
  • 6.
    Vector precoding Solution:search for the closest point in a lattice The problem is similar to maximum likelihood (ML) detection in MIMO systems: The main differences are the following: 1- VP lattice, which is infinite, must be reduced to be implemented. 2- VP search is not affected by noise. 3- Quantization is less critical in VP since both s and a belong to known sets. 4.- A failure of the search causes bit errors in MIMO detection, whereas it only means a larger unscaled power and a more noisy reception in VP, which may affect BER slightly.
  • 7.
    Outline Vector precodingFixed sphere encoder Channel matrix pre-processing Simulation results FPGA implementation and optimization Implementation results Summary and conclusions
  • 8.
    Fixed Sphere EncoderSphere encoder (SE): Reduces the complexity in comparison to an exhaustive search. The search is constrained to the perturbation vectors a belonging to a hypersphere of radius R around the signal s . The triangular vector , obtained through the Cholesky or QR decomposition of the precoding matrix is used to enable a recursive search through a tree.
  • 9.
    Fixed Sphere EncoderSphere encoder search tree Sequential algorithm  Suboptimal resource usage. Variable complexity  Variable throughput.
  • 10.
    Fixed Sphere EncoderOriginally designed for signal detection in MIMO scenarios [Barbero06]. Performs a suboptimum fixed complexity tree search. Tree configuration vector [Barbero06] L. Barbero, Rapid prototyping of a fixed-complexity sphere decoder and its application to iterative decoding of turbo-MIMO systems, PhD dissertation, University of Edinburgh, 2006.
  • 11.
    Fixed Sphere EncoderIn order to fix the tree, the lattice must be reduced. The following candidate points (25 per level) have been considered: Real Imaginary %
  • 12.
    Outline Vector precodingFixed sphere encoder Channel matrix pre-processing Simulation results FPGA implementation and optimization Implementation results Summary and conclusions
  • 13.
    Channel matrix pre-processingOrdering of the channel matrix Since most of the branches of the SE are going to be removed to design the FSE, the following considerations must be taken: - The mean number of visited nodes per tree level is inversely proportional to . - This effect is more relevant at the top levels of the tree, since T i decreases with i. The following ordering strategies have been considered best: - V-BLAST like iterative algorithm from MIMO detection literature based on the minimization of the norm of the pseudoinverse of the precoding matrix. - Simple non-iterative ordering of the columns of the precoding matrix according to their norm.
  • 14.
    Channel matrix pre-processingOrdering of the channel matrix Averaged values of u ii for different levels depending on the ordering: Averaged numbers of evaluated nodes at each level:
  • 15.
    Channel matrix pre-processingEffect of ordering on the number of evaluated nodes: 6x6 System with 16-QAM modulation
  • 16.
    Outline Vector precodingFixed sphere encoder Channel matrix pre-processing Simulation results FPGA implementation and optimization Implementation results Summary and conclusions
  • 17.
    Simulation results Multiusersetups considered: 4x4 6x6 8x8 Tree configurations: n 4x4 = [1, 1, 2, 5] n 6x6 = [1, 1, 1, 2, 3, 4] n 8x8 = [1, 1, 1, 1, 2, 2, 3, 4] Rayleigh channel, constant per each block. 16-QAM modulation
  • 18.
    Simulation results Numberof visited nodes:
  • 19.
  • 20.
    Outline Vector precodingFixed sphere encoder Channel matrix pre-processing Simulation results FPGA implementation and optimization Implementation results Summary and conclusions
  • 21.
    FPGA implementation andoptimization - 6 x 6 system - 16-QAM modulation - Tree configuration vector - 3 pipeline stages - Restricted group (5x5=25 points) of integers instead of the lattice. - Channel ordering, which is carried out every transmission block, has not been considered. - Distance computation: Implemented VP FSE algorithm PED AED
  • 22.
    FPGA implementation andoptimization Algorithm implementation Implemented using Xilinx System Generator for DSP
  • 23.
    FPGA implementation andoptimization Special features PED 6 -The n 6 =6 closest points to each symbol s 6 are known beforehand - Due to symmetries, the set of 6 points can be computed by mapping each symbol to its equivalent in the first quadrant and varying the sign of the set for the equivalent point accordingly. PED 5 - 2-point slicer needed to compute the closest 2 points. - First closest point: - Second closest point: or
  • 24.
    FPGA implementation andoptimization 274 multipliers required  Prohibitive for low-cost FPGA implementation. A series of hardware optimizations have been proposed to reduce the number of required embedded multipliers. Optimization 1: Rearrangement of complex multiplications - Initial system  4 multipliers and 2 adders - Alternative complex multiplication  3 multipliers and 5 adders - Required number of multipliers after OPT. 1  224 Optimization 2: Hard quantization If the values of u ij /u ii are quantized to a very small number of bits , and the multiplications required to compute z i are implemented using programmable logic, the number of multipliers reduces to 74 , although the number of required slices is slightly incremented. Small degradation is introduced.
  • 25.
    FPGA implementation andoptimization Optimization 3: Approximated Euclidean distance Replace the -norm calculation performed to obtain the PEDs by a simpler method. 1.- The Manhattan distance metric ( ) 2.- The metric Both of these techniques introduce a small BER performance degradation. However, after the implementation of OPT3 the number of multipliers has been reduced to 30 .
  • 26.
    FPGA implementation andoptimization Optimization 4: Simplified 2-point slicer So far, the decision of whether or was the second closest point required the computation of both distances. A new technique which does not require of any extra distance calculation has been derived. Interior If  Edge If  Vertex If  No performance loss after OPT4. The total number of multipliers is 22
  • 27.
    Outline Vector precodingFixed sphere encoder Channel matrix pre-processing Simulation results FPGA implementation and optimization Implementation results Summary and conclusions
  • 28.
    FPGA implementation andoptimization Summary of results The performance loss derived from the implementation of the optimization strategies is just 0.2 dB at a BER of 10 -4 . As for the HW resources, a reduced-complexity implement-ation has been achieved.
  • 29.
    Outline Vector precodingFixed sphere encoder Channel matrix pre-processing Simulation results FPGA implementation and optimization Implementation results Summary and conclusions
  • 30.
    Summary and conclusions-The FSE (fixed sphere encoder) achieves close-to-optimal BER performance for VP systems with: - Reduced complexity. Only a small subset of tree branches is analyzed in comparison to the SE. - Fixed architecture and throughput. The number of nodes and branches to be computed is fixed and can be paralellized. - A 6x6 FPGA implementation has been presented which achieves a good performance and throughput. - Implementation and optimization issues have been presented which show that the sphere encoder is less sensitive to quantization and suboptimal designed choices than its MIMO detection counterpart.
  • 31.
    End Thank youfor your attention!! You can send any comments/requests/questions to: Dr. Mikel Mendicute [email_address]