Vector Linear MMSE
Vector Linear MMSE
Abstract—In this paper, we create a framework for training- the performance over single-antenna systems. In flat fading
based channel estimation under different channel and interference systems, the capacity and spectral efficiency have been shown
statistics. The minimum mean square error (MMSE) estimator for to increase rapidly with the number of antennas [1], [2].
channel matrix estimation in Rician fading multi-antenna systems
is analyzed, and especially the design of mean square error (MSE) These results are based on the idealized assumption of full
minimizing training sequences. By considering Kronecker-struc- channel state information (CSI) and independent and identi-
tured systems with a combination of noise and interference and cally distributed (i.i.d.) channel coefficients. In practice, field
arbitrary training sequence length, we collect and generalize sev- measurements have shown that the channel coefficients often
eral previous results in the framework. We clarify the conditions
are spatially correlated in outdoor scenarios [3], but correlation
for achieving the optimal training sequence structure and show
when the spatial training power allocation can be solved explicitly. also frequently occurs in indoor environments [4], [5]. When it
We also prove that spatial correlation improves the estimation comes to acquiring CSI, the long-term statistics can usually be
performance and establish how it determines the optimal training regarded as known, through reverse-link estimation or a negli-
sequence length. The analytic results for Kronecker-structured gible signaling overhead [6]. Instantaneous CSI needs however
systems are used to derive a heuristic training sequence under
general unstructured statistics.
to be estimated with limited resources (time and power) due to
The MMSE estimator of the squared Frobenius norm of the the channel fading and interference.
channel matrix is also derived and shown to provide far better In this paper, we consider training-based estimation of
gain estimates than other approaches. It is shown under which instantaneous CSI in multiple-input multiple-output (MIMO)
conditions training sequences that minimize the non-convex MSE systems. Thus, the estimation is conditioned on the received
can be derived explicitly or with low complexity. Numerical ex-
amples are used to evaluate the performance of the two estimators signal from a known training sequence, which potentially can
for different training sequences and system statistics. We also be adapted to the long-term statistics. By nature, the channel is
illustrate how the optimal length of the training sequence often stochastic, which motivates Bayesian estimation—that is, mod-
can be shorter than the number of transmit antennas. eling of the current channel state as a realization from a known
Index Terms—Arbitrary correlation, channel matrix estimation, multi-variate probability density function (PDF). There is also a
majorization, MIMO systems, MMSE estimation, norm estima- large amount of literature on estimation of deterministic MIMO
tion, Rician fading, training sequence optimization. channels which are analytically tractable but in general provide
less accurate channel estimates, as shown in [7], [8]. Herein,
I. INTRODUCTION we concentrate on minimum mean square error (MMSE) esti-
mation of the channel matrix and its squared Frobenius norm,
IRELESS communication systems with antenna arrays
W at both the transmitter and the receiver have gained
much attention due to their potential of greatly improving
given the first and second order system statistics.
Training-based MMSE estimation of MIMO channel ma-
trices has previously been considered for Kronecker-structured
Rayleigh fading systems that are either noise-limited [9]–[11]
or interference-limited [12]. In these papers, optimization of
Manuscript received September 21, 2009; accepted October 25, 2009. First the training sequence was considered under various limitations
published November 24, 2009; current version published February 10, 2010.
The associate editor coordinating the review of this manuscript and approving on the long-term statistics, and analogous structures of the
it for publication was Prof. Amir Leshem. This work was supported in part by optimal training sequence were derived. These results reduce
the ERC under FP7 Grant Agreement No. 228044 and the FP6 project Cooper- the training optimization to a convex power allocation problem
ative and Opportunistic Communications in Wireless Networks (COOPCOM),
Project No. FP6-033533. This work was also partly performed in the frame- that can be solved explicitly in some special cases. When
work of the CELTIC project CP5-026 WINNER+. Parts of this work were pre- mentioning previous work, it is worth noting that simplified
viously presented at the IEEE International Conference on Acoustics, Speech, channel matrix estimators have been developed in [8] and [13]
and Signal Processing (ICASSP), Taipei, Taiwan, Apr.19–24, 2009.
E. Björnson is with the Signal Processing Laboratory, ACCESS Linnaeus and claimed to be MMSE estimators, but we show herein that
Center, Royal Institute of Technology (KTH), SE-100 44 Stockholm, Sweden these estimators are in general restrictive.
(e-mail: [email protected]). In the present work, we collect previous results in a frame-
B. Ottersten is with the Signal Processing Laboratory, ACCESS Linnaeus
Center, Royal Institute of Technology (KTH), SE-100 44 Stockholm, Sweden, work with general system properties and arbitrary length of
and also with the securityandtrust.lu, University of Luxembourg, L-1359 Lux- the training sequence. The MMSE estimator is given for Kro-
embourg-Kirchberg, Luxembourg (e-mail: [email protected]). necker-structured Rician fading channels that are corrupted by
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. some Gaussian disturbance, where disturbance denotes a com-
Digital Object Identifier 10.1109/TSP.2009.2037352 bination of noise and interference. The purpose of our frame-
Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
1808 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010
work is to enable joint analysis of different types of disturbance, at the main diagonal. The squared Frobenius norm
including the noise-limited and interference-limited scenarios of a matrix is denoted and is defined as the sum of
considered in [9]–[12] and certain combinations of both noise the squared absolute values of all the elements. The functions
and interference. In this manner, we show that the MSE mini- and
mizing training sequence has the same structure and asymptotic give the maximal and minimal value of the input parameters,
properties under a wide range of different disturbance statistics. respectively. is used to denote circularly symmetric
We give statistical conditions for finding the optimal training complex Gaussian random vectors, where is the mean and
sequence explicitly, and propose a heuristic solution under gen- the covariance matrix. The notation is used for definitions.
eral unstructured statistics. Finally, we prove analytically that
the MSE decreases with increasing spatial correlation at both II. SYSTEM MODEL
the transmitter and the receiver side. Based on this observation, We consider flat and block-fading MIMO systems with a
we show that the optimal number of training symbols can be transmitter equipped with an array of transmit antennas
considerably fewer than the number of transmit antennas in cor- and a receiver with an array of receive antennas. The
related systems. This result is a generalization of [14], where symbol-sampled complex baseband equivalent of the flat
completely uncorrelated systems were considered, and similar fading channel when transmitting at channel use is modeled
observations have been made in [15], [16]. as
Although estimation of the channel matrix is important for
receive and transmit processing, knowledge of the squared (1)
Frobenius norm of the channel matrix provides instantaneous
where and are the transmitted and
gain information and can be exploited for rate adaptation and
received signals, respectively, and represents
scheduling [17], [18]. The squared norm can be determined
arbitrarily correlated Gaussian disturbance. This disturbance
indirectly from an estimated channel matrix, but as shown in
models the sum of background noise and interference from
[16] this approach gives poor estimation performance at most
adjacent communication links and is a stochastic process in .
signal-to-interference-and-noise ratios (SINRs). The MMSE
The channel is represented by and is modeled
estimator of the squared channel norm was introduced in [16]
as Rician fading with mean and the positive
for Kronecker-structured Rayleigh fading channels, assuming definite covariance matrix , which is de-
the same training structure as for channel matrix estimation. fined on the column stacking of the channel matrix. Thus,
Herein, the estimator is proved and generalized to Rician fading . In the estimation parts of this
channels, along with the design of MSE minimizing training paper, the channel and disturbance statistics are known at the
sequences. Although the MSE is non-convex, we show that receiver. In the training sequence design, the statistics are also
the optimal training sequence can be determined with limited known to the transmitter.
complexity. Herein, estimation of the channel matrix and its squared
Frobenius norm are considered. The receiver knows the
A. Outline
long-term statistics, but in order to estimate the value of some
In Section II, the system model and the training-based estima- function of the unknown realization of , the transmitter typ-
tion framework is introduced. The MMSE channel matrix esti- ically needs to send a sequence of known training vectors that
mator is given and discussed in Section III for arbitrary training spans . We consider training sequences of arbitrary length
sequences. In Section IV, MSE minimizing training sequence under a total power constraint, and in Section IV-A the op-
design is considered. The general structure and asymptotic prop- timal value of is studied.
erties are derived. It is also shown under which covariance con- Let the training matrix represent the training
ditions there exist explicit solutions, and how the estimation per- sequence. This matrix fulfills the total power constraint
formance and the optimal length of the training sequence varies and its maximal rank is ,
with the spatial correlation. Section V derives the MMSE es- which represents the maximal number of spatial channel
timator of the squared channel norm and analyzes training se- directions that the training can excite. The columns of
quence design with respect to its MSE. The error performance of are used as transmit signal in (1) for channel uses
the different estimators are illustrated numerically in Section VI (e.g., ). The combined received matrix
and conclusions are drawn in Section VII. Finally, proofs of the of the training trans-
theorems are given in Appendix A. mission is
B. Notations (2)
Boldface (lower case) is used for column vectors, , and where the combined disturbance matrix
(upper case) for matrices, . Let , , and denote the is uncorrelated with the channel
transpose, the conjugate transpose, and the conjugate of , . The disturbance is modeled as ,
respectively. The Kronecker product of two matrices and where is the positive definite covariance
is denoted , is the column vector obtained matrix and is the mean disturbance.
by stacking the columns of , is the matrix trace, The multipath propagation is modeled as quasi-static block
and is the -by- diagonal matrix with fading; that is, the channel realization is constant during
Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
BJÖRNSON AND OTTERSTEN: TRAINING-BASED ESTIMATION IN ARBITRARILY CORRELATED RICIAN MIMO CHANNELS WITH RICIAN DISTURBANCE 1809
the whole training transmission and independent of previous unbiased (MVU) estimator developed for deterministic chan-
channel estimates. nels [23, Section 3.4].
By vectorizing the received signal in (2) and applying
A. Preliminaries on Spatial Correlation and Majorization , the received training signal
A measure of the spatial channel correlation is the eigenvalue of our system can be expressed as
distribution of the channel covariance matrix; weak correlation (5)
is represented by almost identical eigenvalues, while strong
correlation means that a few eigenvalues dominate. Thus, where . Then, by pre-subtracting the mean dis-
in a highly correlated system, the channel is approximately turbance from , it is straightforward to apply the
confined to a small eigensubspace, while all eigenvectors are results of [23, Chapter 15.8] to conclude that the MMSE esti-
equally important in an uncorrelated system. In urban cellular mator, , of the Rician fading channel matrix is
systems, base stations are typically elevated and exposed to
little near-field scattering. Thus, their antennas are strongly spa-
tially correlated, while the non-line-of-sight mobile users are (6)
exposed to rich scattering and have weak antenna correlation if
the antenna spacing is sufficiently large [19]. where . The error co-
The notion of majorization provides a useful measure of the variance
spatial correlation [20]–[22] and will be used herein for various becomes
purposes. Let and be
two non-negative real-valued vectors of arbitrary length . We
say that majorizes if (7)
and the is
(3)
(8)
where and are the th largest ordered elements of and We stress that the general MMSE estimator in (6) is in fact linear
, respectively. This majorization property is denoted . (affine), but nonetheless it has repeatedly been referred to as the
If and contain eigenvalues of channel covariance matrices, linear MMSE (LMMSE) estimator [10]–[12] which is correct
then corresponds to that is more spatially correlated but could lead to the incorrect conclusion that there may exist
than . Majorization only provides a partial order of vectors, better non-linear estimators. The MMSE estimator in (6) is also
but is still very powerful due to its connection to certain order- the maximum a posteriori (MAP) estimator of [23, Chapter
preserving functions: 15.8] and the LMMSE estimator in the case of non-Gaussian
A function is said to be Schur-convex if fading and disturbance (with known first and second order statis-
for all and , such that . Similarly, tics, independent fading and disturbance, and possibly unknown
is said to be Schur-concave if implies that . types of distributions [23, Chapter 12.3]).
Note that the computation of (6) only requires a multiplica-
tion of with a matrix and adding a vector, both of which
III. MMSE ESTIMATION OF CHANNEL MATRICES
depend only on the system statistics. Thus, the computational
There are many reasons for estimating the channel matrix complexity of the estimator is limited.
at the receiver. Instantaneous CSI can, for example, be used for Remark 1: For Rayleigh fading channels, the MMSE es-
receive processing (improved interference suppression and sim- timator in (6) has the general linear form
plified detection) and feedback (to employ beamforming and . A special kind of linear estimators with the alter-
rate adaptation). In this section, we consider MMSE estimation native structure were studied in [8] and [13] and
of the channel matrix from the observation during training trans- claimed to give rise to LMMSE estimators. In general, this
mission. In general, the MMSE estimator of a vector from an claim is incorrect, which is seen by vectorizing the estimate;
observation is and thus the estimators in [8] and
[13] belong to a subset of linear estimators with .
(4) The general MMSE estimator belongs to this subset when
applied to Kronecker-structured systems with identical re-
where denotes the expected value and is the con- ceive channel and disturbance covariance matrices,1 while the
ditional (posterior) PDF of given [23, Section 11.4]. The difference between and increases with
MMSE estimator minimizes the MSE, , and the difference in receive-side correlation and how far from
the optimal MSE can be calculated as the trace of the covari- Kronecker-structured the statistics are.
ance matrix of averaged over . The MMSE 1In this special case, the estimation of each row of H can be separated into
estimator is the Bayesian counterpart to the minimum variance independent problems with identical statistics.
Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
1810 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010
IV. TRAINING SEQUENCE OPTIMIZATION FOR CHANNEL Definition 1: In a Kronecker-structured system, the channel
MATRIX ESTIMATION covariance, , and disturbance covariance matrix, , can be
factorized as
Next, we consider the problem of designing the training se-
quence to optimize the performance of the MMSE estimator (10)
in (6). The performance measure is the MSE and thus from (8)
the optimization problem can be formulated as Here, and represent the spatial
covariance matrices at the transmitter and receiver side, respec-
tively, while and represent the
temporal covariance matrix and the received spatial covariance
(9) matrix.
We also assume that and have identical eigenvec-
Observe that the MSE depends on the training matrix and on tors. This means that the disturbance is either spatially uncor-
the covariance matrices of the channel and disturbance statistics, related or shares the spatial structure of the channel (i.e., ar-
while it is unaffected by the mean values. Thus, the training ma- riving from the same spatial direction). This assumption was
trix can potentially be designed to optimize the performance by first made in [12] for estimation of interference-limited systems.
adaptation to the second order statistics [9]–[12]. The intuition Under this assumption, we can jointly describe several types of
behind this training optimization is that more power should be disturbance, including the following examples:
allocated to estimate the channel in strong eigendirections (i.e., • Noise-limited, with some variance ;
large eigenvalues). Observe that training optimization is useful • Interference-limited, for a set of
in systems with dedicated training for each receiver, while mul- interferers with temporal covariance ;2
tiuser systems with common training may require fixed or code- • Noise and temporally uncorrelated interference,
book-based training matrices (if users do not have the same ;
channel statistics). • Noise and spatially uncorrelated interference,
For general channel and disturbance statistics, the MSE mini- .
mizing training matrix will not have any special form that can be To simplify the notation, we will use the following eigenvalue
exploited when solving (9). However, if the covariance matrices decompositions:
and are structured, the optimal may inherit this structure.
Previous work in training optimization has showed that in Kro- (11)
necker-structured systems with either noise-limited [9]–[11] or
(12)
interference-limited [12] disturbance, the optimal training ma-
trix has a certain structure based on the transmit-side channel
covariance and temporal disturbance covariance. Herein, this re- where the eigenvalues of and
sult is generalized by showing that the same optimal structure are ordered in decreasingly and
appears in systems with both noise and interference. Then, we increasingly, respectively. The diagonal eigenvalue matrices
will show how the training matrix behaves asymptotically and , and
under which conditions there exist explicit solutions to (9). Fi- are arbitrarily ordered.
nally, we analyze how the statistics and total training power de- Next, we provide a theorem that derives the general structure
termines the smallest length of the training sequence necessary of the MSE minimizing training sequence, along with its asymp-
to achieve the minimal MSE. totic properties.
Since the training matrix only affects the channel matrix, Theorem 1: Under the Kronecker-structured assumptions,
, from the right hand (transmit) side in (2), we consider covari- the solution to (9) has the singular value decomposition
ance matrices that also can be separated between the transmit , where has on its
and receive side. Thus, the covariance between the transmit an- main diagonal. The MSE with such a training matrix is convex
tennas is identical irrespectively of where the receiver is lo- with respect to the positive training powers , and the
cated, and vice versa [24]. This model is known as the Kro- training powers should be ordered such that decreases
necker-structure and is naturally applicable in uncorrelated sys- with (i.e., in the same order as ). The MSE minimizing
tems. In practice, for example insufficient antenna spacing leads power allocation, , is achieved from the following
to antenna correlation, but field measurements have verified the system of equations:
Kronecker-structure for certain correlated channels [3], [4]. In
general, certain weak scattering scenarios can be created and
observed where the Kronecker-structure is not satisfied [25], (13)
and thus the Kronecker model should be seen as a good ap-
proximation that enables analysis. We will show numerically
in Section VI that training sequences optimized based on this 2It worth noting that since a flat and block fading channel model was assumed
in (1), the potential temporal covariance in Q primarily originate from the
approximation perform well when applied for estimation under
general conditions. In our context, we define Kronecker-struc-
interfering signals and not from their channels. Also observe that if R =I 6 ,
the interference will be received from the same spatial direction as the training
tured systems in the following way. signal.
Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
BJÖRNSON AND OTTERSTEN: TRAINING-BASED ESTIMATION IN ARBITRARILY CORRELATED RICIAN MIMO CHANNELS WITH RICIAN DISTURBANCE 1811
for all such that and noise-limited systems with uncorrelated receive antennas as was
otherwise. The Lagrange multiplier is chosen shown in [9]–[11].
to fulfill the constraint . Next, we give a theorem that shows how the MSE with an
The limiting training matrix at high power is given by optimal training sequence depends on the spatial correlation at
the transmitter and receiver side.
for all , where . At low power
Theorem 2: The MSE with the MSE minimizing training ma-
, let be the minimum of the multiplicities of the largest trix is Schur-concave with respect to the eigenvalues of (for
and the smallest . Then, the limiting training matrix fixed ). If , then the MSE is also Schur-concave with
is given by allocating all power in an arbitrary manner among respect to the eigenvalues of (for fixed ).
, while for . Proof: The proof is given in Appendix A.
Proof: The proof is given in Appendix A. The interpretation of the theorem is that the MSE with an op-
The theorem showed that the MSE minimizing training ma- timal training matrix will decrease with increasing spatial cor-
trix in Kronecker-structured systems has a special structure relation. This result is intuitive if we consider the extreme: it
based on the eigenvectors of the channel at the transmitter side is easier to estimate the channel in one eigendirection with full
and the temporal disturbance; the th strongest channel eigendi- training power, than in two eigendirections where each receive
rection is assigned to the th weakest disturbance eigendirec- half the training power. This analytical behavior provides in-
tion (i.e., in opposite order of magnitude). In other words, the sight to the selection of parameters like the length of training
strongest channel direction is estimated when the disturbance is sequence, , and the total training power ; as the spatial cor-
as weak as possible (and vice versa). This was proved in [12] relation increases, less power is required to achieve a given
for interference-limited systems, and Theorem 1 generalizes it MSE and this power will be concentrated in the most important
to cover various combinations of noise and interference. eigendirections of the channel. This will be further analyzed in
At high training power, the power should be allocated to the Section IV-A.
statistically strongest eigendirections of the channel, and To summarize the results of this section, we have showed
proportionally to the square root of the weakest eigendirec- the structure of the MSE minimizing training matrix in Kro-
tions of the disturbance. At low training power, all power should necker-structured systems and analyzed the allocation of power
be allocated in a single direction where a certain combination between the eigendirections. Based on these results, we propose
of strong channel gain and weak disturbance is maximized. a heuristic training matrix that can be applied under general
These asymptotic results unify previous results, including the system conditions. Observe that even when Kronecker-struc-
special cases of uncorrelated noise [9], [11] and single-antenna tured approximations are used in the training sequence design,
receivers [26]. the general MMSE estimator in (6) should always be applied
Although the structure of the MSE minimizing training without these approximations.
sequence is given in Theorem 1, the solution to the remaining
Heuristic 1: Let and .
power allocation problem is in general unknown. Since the
Let their eigenvalue decompositions be and
problem is convex, the solution can however be derived with
limited computational effort. The following corollary sum- , where the eigenvalues are ordered decreas-
marize results on when the power allocation can be solved ingly and increasingly, respectively. Then, the training matrix
explicitly. , with diagonal elements in
Corollary 1: If and , then equal power that are calculated by inserting the eigenvalues in and
allocation ( for all ) minimizes the MSE. into (14), should provide good performance and minimize the
If , then MSE minimizing power allocation is given MSE under the Kronecker-structured conditions given in Corol-
by lary 1.
It will be illustrated numerically in Section VI that this
heuristic training matrix yields good performance, even when
(14) the covariance matrices are far from being Kronecker-struc-
tured.
where the Lagrange multiplier is chosen to fulfill the power A. Optimal Length of Training Sequences
constraint . The results of this paper are derived for an arbitrary training
Proof: In the first case, the conditions in (13) are identical sequence length . Next, we will provide some guidance on
for all and thus the solutions are identical. In the second case, how to select this variable under different system statistics and
an explicit expression for each can be achieved from (13) based on the rank of . Recall from Theorem 1 that all power is
since each term of the sum is identical. See [12, Theorem 5.3] allocated in a single eigendirection for low (i.e.,
for details. ). Corollary 1 gave a waterfilling solution to the power alloca-
The first part of the corollary represents the case of uncorre- tion, and thus strong eigendirections receive more power than
lated transmit antennas and temporal disturbance, and has pre- weak and only a subset of with cardinality
viously been shown in [9] for noise-limited systems. The water- will receive any power. Under these conditions, the rank of
filling solution in the second part of the corollary was derived is equal to , which in principle means that the training power
in [12] for interference-limited disturbance, but is also valid in is spread in the temporal dimension at the best channel uses
Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
1812 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010
out of the allocated for training. Unless the disturbance varies a strong eigenvalue spread in either or (i.e., strong spatial
heavily over time, it is not worth wasting channel uses or temporal correlation). Even if the disturbance is correlated
just waiting for better disturbance conditions. Thus, we should so that Theorem 3 cannot be applied, the training sequence
select . This observation is formalized by the following length can sometimes be reduced towards with only
general theorem. a slight degradation in MSE and with an improved overall
Theorem 3: Let denote the singular value data throughput. The optimal training sequence length under
decomposition of the training matrix for and suppose non-Kronecker conditions will be illustrated numerically in
that . If , then identical MSE is achieved by Section VI.
the -dimensional training sequence . Here,
denotes the minor matrix that contains column to V. MMSE ESTIMATION OF SQUARED CHANNEL NORMS
of the given matrix .
Proof: The proof is given in Appendix A. In many applications, it is of great interest to estimate the
The interpretation of Theorem 3 is that the optimal training squared Frobenius norm of the channel matrix. This norm
sequence length in noise-limited systems is equal to the rank corresponds directly to the SINR in space-time block coded
of . In this case, optimal means that it is the smallest length (STBC) systems and has a large impact on the SINR in many
that can achieve the minimal MSE. In general, the rank of other types of systems [17], [28]. The channel norm can be esti-
can only be determined numerically. In certain Kronecker- mated indirectly from an estimated channel matrix, for example
structured systems, the rank can however be derived explicitly. using the estimator in (6). This will however lead to suboptimal
This is shown by the following corollary, which also relaxes the performance and gives poor estimates at low training power (see
requirement of uncorrelated disturbance. Section VI). Thus, we consider training-based MMSE estima-
Corollary 2: In a Kronecker-structured system with tion of in this section.
, the MSE minimizing training matrix will have Analysis of the squared channel norm is considerably more
rank if involved than for the channel matrix. The next theorem gives a
general expression for the MMSE estimator and its MSE, and
special expressions for Kronecker-structured systems. In order
(15) to derive these expressions, we limit the analysis to training
matrices with the structure . It is our conjec-
ture that the MSE minimizing training matrix has this form,3 as
and otherwise have if where the positive was proved in Theorem 1 for channel matrix estimation. This
integer that fulfills training matrix structure is also of most practical importance,
since the same training signalling will be used to estimate both
and .
Theorem 4: The MMSE estimator of , with the
observation and training sequence , is
(16)
In addition, if and there exist an
integer in that factorizes as
Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
BJÖRNSON AND OTTERSTEN: TRAINING-BASED ESTIMATION IN ARBITRARILY CORRELATED RICIAN MIMO CHANNELS WITH RICIAN DISTURBANCE 1813
where and are the th elements of low training power can be derived explicitly. Observe that the
and , respectively. The corresponding MSE is MSE in (19) depends on the mean value of the channel, while the
MSE for channel matrix estimation is independent of the mean.
The limiting solutions are however similar in the sense that all
power is allocated in a single eigendirection at low power and
are spread in all spatial direction at high power. The defi-
nition of the strongest direction at low training power and the
proportional power distribution at large power are however dif-
ferent, which means that the MSE minimizing training matrices
usually are different for matrix and squared norm estimation.
The next theorem shows that under certain conditions, the
training power allocation can be solved with low complexity,
and a unique solution exists if all eigendirections are required
(19) to carry a minimal amount of training power.
Corollary 3: If , then MSE minimizing power
allocation is given by either or
Proof: The proof is given in Appendix A.
The explicit estimator in (18), and its MSE, can also be ex-
pressed as matrix multiplications for simplified implementation, (21)
see [16] for examples.
Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
1814 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010
Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
BJÖRNSON AND OTTERSTEN: TRAINING-BASED ESTIMATION IN ARBITRARILY CORRELATED RICIAN MIMO CHANNELS WITH RICIAN DISTURBANCE 1815
Fig. 2. The normalized MSEs of channel matrix estimation as a function of Fig. 4. The normalized MSEs of channel squared norm estimation as a func-
the total training power in a system with the Weichselberger model and the cou- tion of the total training power in a system with uncorrelated receive antennas
pling matrix proposed in [29, Eq. 28]. The MMSE estimator with three different and a transmit antenna correlation of 0.8. The MMSE estimator is compared
training matrices is compared with the one-sided linear estimator. with indirect estimation from an MMSE estimated channel matrix for different
training matrices.
Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
1816 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010
and the total training power. An interesting result was that the where denotes the th largest eigenvalue. The last in-
optimal training sequence length can be considerably smaller equality is given in [22, Theorem 20.A.4] and is fulfilled with
than the number of transmit antennas in systems with strong equality if and only if is diagonal with elements in
spatial correlation. This was proved analytically for certain Kro- the opposite order of , which means that would
necker-structured systems. minimize the constraint. For this , we have the relationship
Finally, the framework was extended to MMSE estimation
of the squared Frobenius norm of the channel, using the same (25)
type of training sequences as for channel matrix estimation. Al-
though the MSE of this estimator can be non-convex, the lim- which is satisfied if and the diagonal values of is
iting solutions at high and low training power were derived and ordered such that is in decreasing order. If this is not
it was shown under which conditions the solution can be derived fulfilled for the given , we can always find a better solution
explicitly or with low complexity. that fulfills them by first reordering the elements of and
removing which will give strict inequality in the constraint.
APPENDIX A Then, a smaller function value is achieved by scaling the new
COLLECTION OF LEMMAS AND PROOFS solution to achieve equality in the constraint. Thus, the optimal
In the appendix, we will first state two lemmas and then apply solution has the structure , where is ordered
them when proving the theorems of this paper. The first lemma as described.
provides the necessary structure of the training matrix when the Finally, for a solution of the type , we will show
weighted sum of MSEs is minimized, and is essentially a gen- that we always can reduce the function value by selecting
eralization of [12, Corollary 5.1] where a single MSE was min- . Let , and observe that
imized (i.e., ).
Lemma 1: Let and be positive coeffi- (26)
cients, and let and be diagonal ma-
trices with strictly positive elements ordered decreasingly and
increasingly, respectively. Then, the optimization problem As mentioned in the beginning of the proof, each component
of the sum is strictly convex in its eigenvalue. Thus, (26) is a
Schur-convex function for all [20, Proposition 2.7]. Recall that
is a linear combination of and with positive
coefficients for each . Then, we have from [20, Theorem 2.11]
(23) that each is minimized when the eigenvalues of
and are added together in opposite order. If ,
is solved by being a rectangular diagonal matrix we can therefore decrease the function value by replacing it by
that satisfies and gives decreasingly ordered di- an identity matrix, without affecting the power constraint.
agonal elements of (i.e., the same order as for ). To summarize, we have showed that for every given , we can
Proof: We will derive the structure of the optimal by reduce the cost function by removing the unitary matrices of its
contradiction; that is, for every that fulfill the constraint we singular value decomposition, reordering the diagonal elements,
can find a solution that satisfy the given structure and achieves and scaling the remaining matrix to satisfy the constraint with
a smaller or identical function value. Observe that the function equality.
is strictly convex in each eigenvalue of its argument The next lemma provides a simple condition to determine if
matrix. Therefore, if the constraint is not fulfilled with equality a function that originates from an optimal power allocation is
for a given , we can always achieve a smaller function value Schur-convex or Schur-concave.
by replacing it by for some and still satisfy the Lemma 2: Consider a continuous and twice continuously dif-
constraint. ferentiable function of two non-negative vectors
Suppose that fulfills the constraint with equality, and let and . For every that
its singular value decomposition be denoted . is convex and the Hessian and all its square minors are non-sin-
We will first show that can be removed if the diagonal gular with respect to , the solution to the optimization
elements of are reordered. For this purpose we introduce
and let its singular value decomposition be de-
noted , where the singular values in are
ordered decreasingly. Now, observe that only appears in the (27)
cost function as and thus we is differentiable. The partial derivatives of the solution at op-
can modify without affecting the function value. Using the timal power allocation are
new notation, the power constraint can be expressed as
(28)
Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
BJÖRNSON AND OTTERSTEN: TRAINING-BASED ESTIMATION IN ARBITRARILY CORRELATED RICIAN MIMO CHANNELS WITH RICIAN DISTURBANCE 1817
Proof: Since the cost function is convex with respect to the same manner as in Section III, we achieve an alternative
for every given and the domain of is closed, the Karush- expression of the MSE:
Kuhn-Tucker (KKT) conditions guarantee the existence of one
or several solutions to (27) and these are given by the following
system of stationarity equations
(29)
(31) (35)
where the last equality follows from that implies which is minimized by for all (using
that . Thus, we have proved (28). The last straightforward Lagrangian methods). At low power, we ap-
sentence of the lemma follows directly from Schur’s condition proximate (34) as
in [22, Theorem 3.A.4], which states that is Schur-convex
if and only if
(32)
(36)
for all and , and Schur-concave if the conditions are fulfilled
with inverted inequalities.
Finally, we give the proofs of Theorems 1–5 and Corollary 3. using a first order Taylor polynomial. This expression is mini-
Proof of Theorem 1: First, we derive the structure of mized by assigning all power in an arbitrary manner among the
the MSE minimizing training matrix. For Kronecker-struc- strongest term/terms of the second sum.
tured systems, the MSE can be expressed as Proof of Theorem 2: First, we will prove that the MSE in (34)
. By taking is Schur-concave with respect to the eigenvalues .
the conjugate transpose of the training transmission model It is straightforward to show that the MSE is convex in the
in (2) and then applying the results of [23, Chapter 15.8] in power allocation, differentiable with respect to and for
Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
1818 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010
all , and that the determinant of the Hessian is non-zero if estimation. We can therefore use the shorter training sequence
the eigenvalues of and are distinct. Thus, we can apply without any loss in performance.
Lemma 2. According to the lemma, it is sufficient to show that Proof of Theorem 4: In the general case, the integral expres-
for all such that sion of the MMSE estimator in (17) follows directly from the
, where MSE denotes the pre-optimization MSE in (34) definition of by exploiting that the posterior dis-
evaluated at the optimal solution. Thus, we can calculate the tribution, , is complex Gaussian
partial derivatives of (34) as distributed with the mean and covariance matrix derived in [23,
Chapter 15.8].
To derive the explicit expressions, we begin with the one-
(37)
dimensional case with the received signal
, where is the training signal, , and
. Using Bayes’ formula or [23, Chapter 15.8], the
and for . Observe posterior distribution can be expressed as
that the derivatives are positive and that and only
appear in the denominator of (37). From Theorem 1, we have
that whenever . Hence, it fol-
lows that and that the MSE is
Schur-concave.
Next, we have the case when , and then the MSE in
(34) can be expressed as (41)
where , ,
(39) , and is the modified
Bessel function of the first kind. The last equality in (42) fol-
The theorem follows from that (39) is independent of and lows by applying the formula [33,
that . Eq. 8.431.3]. The first and second order central moments of
Proof of Corollary 2: The rank of is equal to the number are
of active training powers . From Theorem 1, we have that the
th training power is active if and only if .
Suppose we only have active training powers, then
. Substitution into the power constraint gives (43)
Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
BJÖRNSON AND OTTERSTEN: TRAINING-BASED ESTIMATION IN ARBITRARILY CORRELATED RICIAN MIMO CHANNELS WITH RICIAN DISTURBANCE 1819
from a MIMO transformation of (43), with replaced with its where . Observe
average.
Proof of Theorem 5: A function is convex if and only if its that this expression has the form
second derivative is non-negative. The second derivative of the , where are positive real-
MSE in (19) with respect to is valued constants. Thus, in order for any of the solutions to be
real-valued we need . If
, this condition can be expressed as
(48)
(46) (50)
using a first order Taylor expansions of the denominators and where the multiplicative term outside the brackets is posi-
disregarding terms with in the numerator. Hence, the MSE is tive for all and . The bracketed term can be expressed
minimized by allocating all the power to the associated with as for
the largest . If . Then, the intervals follows
there is multiplicity in the largest value of the sum, the power from the observation that
can be allocated freely among these eigendirections. and .
Proof of Corollary 3: The condition means that Finally, we see that the second derivative of the MSE in (44)
for all , and therefore we can remove the depen- is positive if we limit ourself to
dence of in the denominator of (20). For all active training , since then each term in
powers , the remaining expression in (20) can be for- the sum is positive. Thus, the MSE will be convex with respect
mulated as a third degree polynomial equation in : to these and the KKT conditions in (20) becomes necessary
, using the notation and sufficient. In the special case , we can strengthen
. Its three solutions ( , 0, 1) are the condition since we know that the necessary KKT conditions
only give a single feasible solution if
. In both the general and special case,
these conditions need be combined with the original constraint
(47) .
Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
1820 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010
Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.