0% found this document useful (0 votes)
12 views14 pages

Vector Linear MMSE

Uploaded by

guoguoguo272727
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views14 pages

Vector Linear MMSE

Uploaded by

guoguoguo272727
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO.

3, MARCH 2010 1807

A Framework for Training-Based Estimation in


Arbitrarily Correlated Rician MIMO Channels With
Rician Disturbance
Emil Björnson, Student Member, IEEE, and Björn Ottersten, Fellow, IEEE

Abstract—In this paper, we create a framework for training- the performance over single-antenna systems. In flat fading
based channel estimation under different channel and interference systems, the capacity and spectral efficiency have been shown
statistics. The minimum mean square error (MMSE) estimator for to increase rapidly with the number of antennas [1], [2].
channel matrix estimation in Rician fading multi-antenna systems
is analyzed, and especially the design of mean square error (MSE) These results are based on the idealized assumption of full
minimizing training sequences. By considering Kronecker-struc- channel state information (CSI) and independent and identi-
tured systems with a combination of noise and interference and cally distributed (i.i.d.) channel coefficients. In practice, field
arbitrary training sequence length, we collect and generalize sev- measurements have shown that the channel coefficients often
eral previous results in the framework. We clarify the conditions
are spatially correlated in outdoor scenarios [3], but correlation
for achieving the optimal training sequence structure and show
when the spatial training power allocation can be solved explicitly. also frequently occurs in indoor environments [4], [5]. When it
We also prove that spatial correlation improves the estimation comes to acquiring CSI, the long-term statistics can usually be
performance and establish how it determines the optimal training regarded as known, through reverse-link estimation or a negli-
sequence length. The analytic results for Kronecker-structured gible signaling overhead [6]. Instantaneous CSI needs however
systems are used to derive a heuristic training sequence under
general unstructured statistics.
to be estimated with limited resources (time and power) due to
The MMSE estimator of the squared Frobenius norm of the the channel fading and interference.
channel matrix is also derived and shown to provide far better In this paper, we consider training-based estimation of
gain estimates than other approaches. It is shown under which instantaneous CSI in multiple-input multiple-output (MIMO)
conditions training sequences that minimize the non-convex MSE systems. Thus, the estimation is conditioned on the received
can be derived explicitly or with low complexity. Numerical ex-
amples are used to evaluate the performance of the two estimators signal from a known training sequence, which potentially can
for different training sequences and system statistics. We also be adapted to the long-term statistics. By nature, the channel is
illustrate how the optimal length of the training sequence often stochastic, which motivates Bayesian estimation—that is, mod-
can be shorter than the number of transmit antennas. eling of the current channel state as a realization from a known
Index Terms—Arbitrary correlation, channel matrix estimation, multi-variate probability density function (PDF). There is also a
majorization, MIMO systems, MMSE estimation, norm estima- large amount of literature on estimation of deterministic MIMO
tion, Rician fading, training sequence optimization. channels which are analytically tractable but in general provide
less accurate channel estimates, as shown in [7], [8]. Herein,
I. INTRODUCTION we concentrate on minimum mean square error (MMSE) esti-
mation of the channel matrix and its squared Frobenius norm,
IRELESS communication systems with antenna arrays
W at both the transmitter and the receiver have gained
much attention due to their potential of greatly improving
given the first and second order system statistics.
Training-based MMSE estimation of MIMO channel ma-
trices has previously been considered for Kronecker-structured
Rayleigh fading systems that are either noise-limited [9]–[11]
or interference-limited [12]. In these papers, optimization of
Manuscript received September 21, 2009; accepted October 25, 2009. First the training sequence was considered under various limitations
published November 24, 2009; current version published February 10, 2010.
The associate editor coordinating the review of this manuscript and approving on the long-term statistics, and analogous structures of the
it for publication was Prof. Amir Leshem. This work was supported in part by optimal training sequence were derived. These results reduce
the ERC under FP7 Grant Agreement No. 228044 and the FP6 project Cooper- the training optimization to a convex power allocation problem
ative and Opportunistic Communications in Wireless Networks (COOPCOM),
Project No. FP6-033533. This work was also partly performed in the frame- that can be solved explicitly in some special cases. When
work of the CELTIC project CP5-026 WINNER+. Parts of this work were pre- mentioning previous work, it is worth noting that simplified
viously presented at the IEEE International Conference on Acoustics, Speech, channel matrix estimators have been developed in [8] and [13]
and Signal Processing (ICASSP), Taipei, Taiwan, Apr.19–24, 2009.
E. Björnson is with the Signal Processing Laboratory, ACCESS Linnaeus and claimed to be MMSE estimators, but we show herein that
Center, Royal Institute of Technology (KTH), SE-100 44 Stockholm, Sweden these estimators are in general restrictive.
(e-mail: [email protected]). In the present work, we collect previous results in a frame-
B. Ottersten is with the Signal Processing Laboratory, ACCESS Linnaeus
Center, Royal Institute of Technology (KTH), SE-100 44 Stockholm, Sweden, work with general system properties and arbitrary length of
and also with the securityandtrust.lu, University of Luxembourg, L-1359 Lux- the training sequence. The MMSE estimator is given for Kro-
embourg-Kirchberg, Luxembourg (e-mail: [email protected]). necker-structured Rician fading channels that are corrupted by
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. some Gaussian disturbance, where disturbance denotes a com-
Digital Object Identifier 10.1109/TSP.2009.2037352 bination of noise and interference. The purpose of our frame-

1053-587X/$26.00 © 2010 IEEE

Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
1808 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010

work is to enable joint analysis of different types of disturbance, at the main diagonal. The squared Frobenius norm
including the noise-limited and interference-limited scenarios of a matrix is denoted and is defined as the sum of
considered in [9]–[12] and certain combinations of both noise the squared absolute values of all the elements. The functions
and interference. In this manner, we show that the MSE mini- and
mizing training sequence has the same structure and asymptotic give the maximal and minimal value of the input parameters,
properties under a wide range of different disturbance statistics. respectively. is used to denote circularly symmetric
We give statistical conditions for finding the optimal training complex Gaussian random vectors, where is the mean and
sequence explicitly, and propose a heuristic solution under gen- the covariance matrix. The notation is used for definitions.
eral unstructured statistics. Finally, we prove analytically that
the MSE decreases with increasing spatial correlation at both II. SYSTEM MODEL
the transmitter and the receiver side. Based on this observation, We consider flat and block-fading MIMO systems with a
we show that the optimal number of training symbols can be transmitter equipped with an array of transmit antennas
considerably fewer than the number of transmit antennas in cor- and a receiver with an array of receive antennas. The
related systems. This result is a generalization of [14], where symbol-sampled complex baseband equivalent of the flat
completely uncorrelated systems were considered, and similar fading channel when transmitting at channel use is modeled
observations have been made in [15], [16]. as
Although estimation of the channel matrix is important for
receive and transmit processing, knowledge of the squared (1)
Frobenius norm of the channel matrix provides instantaneous
where and are the transmitted and
gain information and can be exploited for rate adaptation and
received signals, respectively, and represents
scheduling [17], [18]. The squared norm can be determined
arbitrarily correlated Gaussian disturbance. This disturbance
indirectly from an estimated channel matrix, but as shown in
models the sum of background noise and interference from
[16] this approach gives poor estimation performance at most
adjacent communication links and is a stochastic process in .
signal-to-interference-and-noise ratios (SINRs). The MMSE
The channel is represented by and is modeled
estimator of the squared channel norm was introduced in [16]
as Rician fading with mean and the positive
for Kronecker-structured Rayleigh fading channels, assuming definite covariance matrix , which is de-
the same training structure as for channel matrix estimation. fined on the column stacking of the channel matrix. Thus,
Herein, the estimator is proved and generalized to Rician fading . In the estimation parts of this
channels, along with the design of MSE minimizing training paper, the channel and disturbance statistics are known at the
sequences. Although the MSE is non-convex, we show that receiver. In the training sequence design, the statistics are also
the optimal training sequence can be determined with limited known to the transmitter.
complexity. Herein, estimation of the channel matrix and its squared
Frobenius norm are considered. The receiver knows the
A. Outline
long-term statistics, but in order to estimate the value of some
In Section II, the system model and the training-based estima- function of the unknown realization of , the transmitter typ-
tion framework is introduced. The MMSE channel matrix esti- ically needs to send a sequence of known training vectors that
mator is given and discussed in Section III for arbitrary training spans . We consider training sequences of arbitrary length
sequences. In Section IV, MSE minimizing training sequence under a total power constraint, and in Section IV-A the op-
design is considered. The general structure and asymptotic prop- timal value of is studied.
erties are derived. It is also shown under which covariance con- Let the training matrix represent the training
ditions there exist explicit solutions, and how the estimation per- sequence. This matrix fulfills the total power constraint
formance and the optimal length of the training sequence varies and its maximal rank is ,
with the spatial correlation. Section V derives the MMSE es- which represents the maximal number of spatial channel
timator of the squared channel norm and analyzes training se- directions that the training can excite. The columns of
quence design with respect to its MSE. The error performance of are used as transmit signal in (1) for channel uses
the different estimators are illustrated numerically in Section VI (e.g., ). The combined received matrix
and conclusions are drawn in Section VII. Finally, proofs of the of the training trans-
theorems are given in Appendix A. mission is

B. Notations (2)
Boldface (lower case) is used for column vectors, , and where the combined disturbance matrix
(upper case) for matrices, . Let , , and denote the is uncorrelated with the channel
transpose, the conjugate transpose, and the conjugate of , . The disturbance is modeled as ,
respectively. The Kronecker product of two matrices and where is the positive definite covariance
is denoted , is the column vector obtained matrix and is the mean disturbance.
by stacking the columns of , is the matrix trace, The multipath propagation is modeled as quasi-static block
and is the -by- diagonal matrix with fading; that is, the channel realization is constant during

Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
BJÖRNSON AND OTTERSTEN: TRAINING-BASED ESTIMATION IN ARBITRARILY CORRELATED RICIAN MIMO CHANNELS WITH RICIAN DISTURBANCE 1809

the whole training transmission and independent of previous unbiased (MVU) estimator developed for deterministic chan-
channel estimates. nels [23, Section 3.4].
By vectorizing the received signal in (2) and applying
A. Preliminaries on Spatial Correlation and Majorization , the received training signal
A measure of the spatial channel correlation is the eigenvalue of our system can be expressed as
distribution of the channel covariance matrix; weak correlation (5)
is represented by almost identical eigenvalues, while strong
correlation means that a few eigenvalues dominate. Thus, where . Then, by pre-subtracting the mean dis-
in a highly correlated system, the channel is approximately turbance from , it is straightforward to apply the
confined to a small eigensubspace, while all eigenvectors are results of [23, Chapter 15.8] to conclude that the MMSE esti-
equally important in an uncorrelated system. In urban cellular mator, , of the Rician fading channel matrix is
systems, base stations are typically elevated and exposed to
little near-field scattering. Thus, their antennas are strongly spa-
tially correlated, while the non-line-of-sight mobile users are (6)
exposed to rich scattering and have weak antenna correlation if
the antenna spacing is sufficiently large [19]. where . The error co-
The notion of majorization provides a useful measure of the variance
spatial correlation [20]–[22] and will be used herein for various becomes
purposes. Let and be
two non-negative real-valued vectors of arbitrary length . We
say that majorizes if (7)

and the is

(3)
(8)

where and are the th largest ordered elements of and We stress that the general MMSE estimator in (6) is in fact linear
, respectively. This majorization property is denoted . (affine), but nonetheless it has repeatedly been referred to as the
If and contain eigenvalues of channel covariance matrices, linear MMSE (LMMSE) estimator [10]–[12] which is correct
then corresponds to that is more spatially correlated but could lead to the incorrect conclusion that there may exist
than . Majorization only provides a partial order of vectors, better non-linear estimators. The MMSE estimator in (6) is also
but is still very powerful due to its connection to certain order- the maximum a posteriori (MAP) estimator of [23, Chapter
preserving functions: 15.8] and the LMMSE estimator in the case of non-Gaussian
A function is said to be Schur-convex if fading and disturbance (with known first and second order statis-
for all and , such that . Similarly, tics, independent fading and disturbance, and possibly unknown
is said to be Schur-concave if implies that . types of distributions [23, Chapter 12.3]).
Note that the computation of (6) only requires a multiplica-
tion of with a matrix and adding a vector, both of which
III. MMSE ESTIMATION OF CHANNEL MATRICES
depend only on the system statistics. Thus, the computational
There are many reasons for estimating the channel matrix complexity of the estimator is limited.
at the receiver. Instantaneous CSI can, for example, be used for Remark 1: For Rayleigh fading channels, the MMSE es-
receive processing (improved interference suppression and sim- timator in (6) has the general linear form
plified detection) and feedback (to employ beamforming and . A special kind of linear estimators with the alter-
rate adaptation). In this section, we consider MMSE estimation native structure were studied in [8] and [13] and
of the channel matrix from the observation during training trans- claimed to give rise to LMMSE estimators. In general, this
mission. In general, the MMSE estimator of a vector from an claim is incorrect, which is seen by vectorizing the estimate;
observation is and thus the estimators in [8] and
[13] belong to a subset of linear estimators with .
(4) The general MMSE estimator belongs to this subset when
applied to Kronecker-structured systems with identical re-
where denotes the expected value and is the con- ceive channel and disturbance covariance matrices,1 while the
ditional (posterior) PDF of given [23, Section 11.4]. The difference between and increases with
MMSE estimator minimizes the MSE, , and the difference in receive-side correlation and how far from
the optimal MSE can be calculated as the trace of the covari- Kronecker-structured the statistics are.
ance matrix of averaged over . The MMSE 1In this special case, the estimation of each row of H can be separated into
estimator is the Bayesian counterpart to the minimum variance independent problems with identical statistics.

Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
1810 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010

IV. TRAINING SEQUENCE OPTIMIZATION FOR CHANNEL Definition 1: In a Kronecker-structured system, the channel
MATRIX ESTIMATION covariance, , and disturbance covariance matrix, , can be
factorized as
Next, we consider the problem of designing the training se-
quence to optimize the performance of the MMSE estimator (10)
in (6). The performance measure is the MSE and thus from (8)
the optimization problem can be formulated as Here, and represent the spatial
covariance matrices at the transmitter and receiver side, respec-
tively, while and represent the
temporal covariance matrix and the received spatial covariance
(9) matrix.
We also assume that and have identical eigenvec-
Observe that the MSE depends on the training matrix and on tors. This means that the disturbance is either spatially uncor-
the covariance matrices of the channel and disturbance statistics, related or shares the spatial structure of the channel (i.e., ar-
while it is unaffected by the mean values. Thus, the training ma- riving from the same spatial direction). This assumption was
trix can potentially be designed to optimize the performance by first made in [12] for estimation of interference-limited systems.
adaptation to the second order statistics [9]–[12]. The intuition Under this assumption, we can jointly describe several types of
behind this training optimization is that more power should be disturbance, including the following examples:
allocated to estimate the channel in strong eigendirections (i.e., • Noise-limited, with some variance ;
large eigenvalues). Observe that training optimization is useful • Interference-limited, for a set of
in systems with dedicated training for each receiver, while mul- interferers with temporal covariance ;2
tiuser systems with common training may require fixed or code- • Noise and temporally uncorrelated interference,
book-based training matrices (if users do not have the same ;
channel statistics). • Noise and spatially uncorrelated interference,
For general channel and disturbance statistics, the MSE mini- .
mizing training matrix will not have any special form that can be To simplify the notation, we will use the following eigenvalue
exploited when solving (9). However, if the covariance matrices decompositions:
and are structured, the optimal may inherit this structure.
Previous work in training optimization has showed that in Kro- (11)
necker-structured systems with either noise-limited [9]–[11] or
(12)
interference-limited [12] disturbance, the optimal training ma-
trix has a certain structure based on the transmit-side channel
covariance and temporal disturbance covariance. Herein, this re- where the eigenvalues of and
sult is generalized by showing that the same optimal structure are ordered in decreasingly and
appears in systems with both noise and interference. Then, we increasingly, respectively. The diagonal eigenvalue matrices
will show how the training matrix behaves asymptotically and , and
under which conditions there exist explicit solutions to (9). Fi- are arbitrarily ordered.
nally, we analyze how the statistics and total training power de- Next, we provide a theorem that derives the general structure
termines the smallest length of the training sequence necessary of the MSE minimizing training sequence, along with its asymp-
to achieve the minimal MSE. totic properties.
Since the training matrix only affects the channel matrix, Theorem 1: Under the Kronecker-structured assumptions,
, from the right hand (transmit) side in (2), we consider covari- the solution to (9) has the singular value decomposition
ance matrices that also can be separated between the transmit , where has on its
and receive side. Thus, the covariance between the transmit an- main diagonal. The MSE with such a training matrix is convex
tennas is identical irrespectively of where the receiver is lo- with respect to the positive training powers , and the
cated, and vice versa [24]. This model is known as the Kro- training powers should be ordered such that decreases
necker-structure and is naturally applicable in uncorrelated sys- with (i.e., in the same order as ). The MSE minimizing
tems. In practice, for example insufficient antenna spacing leads power allocation, , is achieved from the following
to antenna correlation, but field measurements have verified the system of equations:
Kronecker-structure for certain correlated channels [3], [4]. In
general, certain weak scattering scenarios can be created and
observed where the Kronecker-structure is not satisfied [25], (13)
and thus the Kronecker model should be seen as a good ap-
proximation that enables analysis. We will show numerically
in Section VI that training sequences optimized based on this 2It worth noting that since a flat and block fading channel model was assumed
in (1), the potential temporal covariance in Q primarily originate from the
approximation perform well when applied for estimation under
general conditions. In our context, we define Kronecker-struc-
interfering signals and not from their channels. Also observe that if R =I 6 ,
the interference will be received from the same spatial direction as the training
tured systems in the following way. signal.

Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
BJÖRNSON AND OTTERSTEN: TRAINING-BASED ESTIMATION IN ARBITRARILY CORRELATED RICIAN MIMO CHANNELS WITH RICIAN DISTURBANCE 1811

for all such that and noise-limited systems with uncorrelated receive antennas as was
otherwise. The Lagrange multiplier is chosen shown in [9]–[11].
to fulfill the constraint . Next, we give a theorem that shows how the MSE with an
The limiting training matrix at high power is given by optimal training sequence depends on the spatial correlation at
the transmitter and receiver side.
for all , where . At low power
Theorem 2: The MSE with the MSE minimizing training ma-
, let be the minimum of the multiplicities of the largest trix is Schur-concave with respect to the eigenvalues of (for
and the smallest . Then, the limiting training matrix fixed ). If , then the MSE is also Schur-concave with
is given by allocating all power in an arbitrary manner among respect to the eigenvalues of (for fixed ).
, while for . Proof: The proof is given in Appendix A.
Proof: The proof is given in Appendix A. The interpretation of the theorem is that the MSE with an op-
The theorem showed that the MSE minimizing training ma- timal training matrix will decrease with increasing spatial cor-
trix in Kronecker-structured systems has a special structure relation. This result is intuitive if we consider the extreme: it
based on the eigenvectors of the channel at the transmitter side is easier to estimate the channel in one eigendirection with full
and the temporal disturbance; the th strongest channel eigendi- training power, than in two eigendirections where each receive
rection is assigned to the th weakest disturbance eigendirec- half the training power. This analytical behavior provides in-
tion (i.e., in opposite order of magnitude). In other words, the sight to the selection of parameters like the length of training
strongest channel direction is estimated when the disturbance is sequence, , and the total training power ; as the spatial cor-
as weak as possible (and vice versa). This was proved in [12] relation increases, less power is required to achieve a given
for interference-limited systems, and Theorem 1 generalizes it MSE and this power will be concentrated in the most important
to cover various combinations of noise and interference. eigendirections of the channel. This will be further analyzed in
At high training power, the power should be allocated to the Section IV-A.
statistically strongest eigendirections of the channel, and To summarize the results of this section, we have showed
proportionally to the square root of the weakest eigendirec- the structure of the MSE minimizing training matrix in Kro-
tions of the disturbance. At low training power, all power should necker-structured systems and analyzed the allocation of power
be allocated in a single direction where a certain combination between the eigendirections. Based on these results, we propose
of strong channel gain and weak disturbance is maximized. a heuristic training matrix that can be applied under general
These asymptotic results unify previous results, including the system conditions. Observe that even when Kronecker-struc-
special cases of uncorrelated noise [9], [11] and single-antenna tured approximations are used in the training sequence design,
receivers [26]. the general MMSE estimator in (6) should always be applied
Although the structure of the MSE minimizing training without these approximations.
sequence is given in Theorem 1, the solution to the remaining
Heuristic 1: Let and .
power allocation problem is in general unknown. Since the
Let their eigenvalue decompositions be and
problem is convex, the solution can however be derived with
limited computational effort. The following corollary sum- , where the eigenvalues are ordered decreas-
marize results on when the power allocation can be solved ingly and increasingly, respectively. Then, the training matrix
explicitly. , with diagonal elements in
Corollary 1: If and , then equal power that are calculated by inserting the eigenvalues in and
allocation ( for all ) minimizes the MSE. into (14), should provide good performance and minimize the
If , then MSE minimizing power allocation is given MSE under the Kronecker-structured conditions given in Corol-
by lary 1.
It will be illustrated numerically in Section VI that this
heuristic training matrix yields good performance, even when
(14) the covariance matrices are far from being Kronecker-struc-
tured.

where the Lagrange multiplier is chosen to fulfill the power A. Optimal Length of Training Sequences
constraint . The results of this paper are derived for an arbitrary training
Proof: In the first case, the conditions in (13) are identical sequence length . Next, we will provide some guidance on
for all and thus the solutions are identical. In the second case, how to select this variable under different system statistics and
an explicit expression for each can be achieved from (13) based on the rank of . Recall from Theorem 1 that all power is
since each term of the sum is identical. See [12, Theorem 5.3] allocated in a single eigendirection for low (i.e.,
for details. ). Corollary 1 gave a waterfilling solution to the power alloca-
The first part of the corollary represents the case of uncorre- tion, and thus strong eigendirections receive more power than
lated transmit antennas and temporal disturbance, and has pre- weak and only a subset of with cardinality
viously been shown in [9] for noise-limited systems. The water- will receive any power. Under these conditions, the rank of
filling solution in the second part of the corollary was derived is equal to , which in principle means that the training power
in [12] for interference-limited disturbance, but is also valid in is spread in the temporal dimension at the best channel uses

Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
1812 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010

out of the allocated for training. Unless the disturbance varies a strong eigenvalue spread in either or (i.e., strong spatial
heavily over time, it is not worth wasting channel uses or temporal correlation). Even if the disturbance is correlated
just waiting for better disturbance conditions. Thus, we should so that Theorem 3 cannot be applied, the training sequence
select . This observation is formalized by the following length can sometimes be reduced towards with only
general theorem. a slight degradation in MSE and with an improved overall
Theorem 3: Let denote the singular value data throughput. The optimal training sequence length under
decomposition of the training matrix for and suppose non-Kronecker conditions will be illustrated numerically in
that . If , then identical MSE is achieved by Section VI.
the -dimensional training sequence . Here,
denotes the minor matrix that contains column to V. MMSE ESTIMATION OF SQUARED CHANNEL NORMS
of the given matrix .
Proof: The proof is given in Appendix A. In many applications, it is of great interest to estimate the
The interpretation of Theorem 3 is that the optimal training squared Frobenius norm of the channel matrix. This norm
sequence length in noise-limited systems is equal to the rank corresponds directly to the SINR in space-time block coded
of . In this case, optimal means that it is the smallest length (STBC) systems and has a large impact on the SINR in many
that can achieve the minimal MSE. In general, the rank of other types of systems [17], [28]. The channel norm can be esti-
can only be determined numerically. In certain Kronecker- mated indirectly from an estimated channel matrix, for example
structured systems, the rank can however be derived explicitly. using the estimator in (6). This will however lead to suboptimal
This is shown by the following corollary, which also relaxes the performance and gives poor estimates at low training power (see
requirement of uncorrelated disturbance. Section VI). Thus, we consider training-based MMSE estima-
Corollary 2: In a Kronecker-structured system with tion of in this section.
, the MSE minimizing training matrix will have Analysis of the squared channel norm is considerably more
rank if involved than for the channel matrix. The next theorem gives a
general expression for the MMSE estimator and its MSE, and
special expressions for Kronecker-structured systems. In order
(15) to derive these expressions, we limit the analysis to training
matrices with the structure . It is our conjec-
ture that the MSE minimizing training matrix has this form,3 as
and otherwise have if where the positive was proved in Theorem 1 for channel matrix estimation. This
integer that fulfills training matrix structure is also of most practical importance,
since the same training signalling will be used to estimate both
and .
Theorem 4: The MMSE estimator of , with the
observation and training sequence , is
(16)
In addition, if and there exist an
integer in that factorizes as

, for some and (17)

. Then, identical MSE is achieved where and are defined in (6)


by the -dimensional training sequence and (7), respectively. The corresponding MSE is
.
.
In Kronecker-structured systems with the eigenvalue decom-
Proof: The proof is given in Appendix A.
positions in (11) and a training matrix with the structure
According to the corollary, is rank deficient in systems
, the estimator in (17) can be evaluated as
with pronounced spatial correlation and/or limited total training
power . Corollary 2 relaxed the conditions in Theorem 3 by
proving that the optimal training sequence length also depend
on under certain correlated disturbances. The condi-
tions for this are for example satisfied when .
Theorem 3 and Corollary 2 constitute a generalization of [14],
where it was shown that the optimal training sequence length in
spatially uncorrelated and noise-limited systems is exactly equal (18)
to . Observe that the generalized results in Corollary 2 stands
in contrast to the belief that the training sequence length needs 3If the mean channel component is strong and has different directivity than
the strongest eigenvectors, it might be necessary to permute the eigenvectors in
to be at least of length in correlated systems [27].
Under general system statistics, one can expect that is
U P
when constructing the MSE minimizing training matrix . To simplify the
notation, this has been ignored herein, but it is only a matter of reordering the
rank deficient when the training power is limited and there is eigenvalues in (11).

Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
BJÖRNSON AND OTTERSTEN: TRAINING-BASED ESTIMATION IN ARBITRARILY CORRELATED RICIAN MIMO CHANNELS WITH RICIAN DISTURBANCE 1813

where and are the th elements of low training power can be derived explicitly. Observe that the
and , respectively. The corresponding MSE is MSE in (19) depends on the mean value of the channel, while the
MSE for channel matrix estimation is independent of the mean.
The limiting solutions are however similar in the sense that all
power is allocated in a single eigendirection at low power and
are spread in all spatial direction at high power. The defi-
nition of the strongest direction at low training power and the
proportional power distribution at large power are however dif-
ferent, which means that the MSE minimizing training matrices
usually are different for matrix and squared norm estimation.
The next theorem shows that under certain conditions, the
training power allocation can be solved with low complexity,
and a unique solution exists if all eigendirections are required
(19) to carry a minimal amount of training power.
Corollary 3: If , then MSE minimizing power
allocation is given by either or
Proof: The proof is given in Appendix A.
The explicit estimator in (18), and its MSE, can also be ex-
pressed as matrix multiplications for simplified implementation, (21)
see [16] for examples.

A. Training Sequence Design for Channel Norm Estimation for , where


Next, we consider minimization of the MSE of the explicit es-
timator in (18) by training sequence optimization, which means
that we seek the training power allocation in that
minimizes the MSE. The optimization principles in this section
will be similar to those for training matrix estimation, but the
MSE of squared norm estimation is not always convex in the
and
training powers, which makes it difficult to derive explicit solu-
tions. The following theorem will however give necessary con-
ditions on the convexity, and provide equations that can be used
to determine the solution. We will also analyze the asymptotic The Lagrange multiplier is chosen to fulfill the
behaviors of the power allocation. power constraint and the solutions in (21) are only fea-
Theorem 5: The MSE in (19) is convex in the training power sible if and when they
if for all . In general, the MSE can are positive. Depending on , solutions in the interval
however be non-convex in training powers, but the set of that are given
minimizes the MSE is always given as one of the solutions to by , while the interval
the following system of equations:
can be achieved by in
(21). Thus, if for some , then will never
give a feasible solution for .
If training sequence optimization is combined with the
additional constraints
for all and , the
(20)
resulting MSE is guaranteed to be convex in the training powers
for all active (among ) and otherwise.
. Then, the system of equations in (20) has a unique solution.
The Lagrange multiplier is chosen to fulfill the power
In the special case , the constraint can be relaxed to
constraint .
The limiting training matrix at high power is given by and the optimal
power allocation is given by in (21) for all active (i.e.,
, where
those larger than the new lower bound).
.
Proof: The proof is given in Appendix A.
At low power , the limiting solution is given by for
The corollary has two important implications. Firstly, in an
interference-limited system or in the case of uncorrelated re-
and for all . If the solution has multiplicity, the ceive antennas, the worst case complexity of finding the solution
power can be distributed arbitrarily among the different . to the potentially non-convex problem scales with the number
Proof: The proof is given in Appendix A. of transmit antennas as . Secondly, if we impose the ad-
Although the MSE cannot be guaranteed to be convex, The- ditional constraint that all eigendirections are allocated a min-
orem 5 showed that the limiting training sequences at high and imum amount of training power, the power allocation is assured

Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
1814 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010

to be convex and has a unique solution. Observe that in some


cases (e.g., for channels with strong mean components), the sug-
gested additional constraint in Corollary 3 can be identical to
for some and then the MSE is convex with respect to
this without the need of imposing any constraints.
To summarize the results of this section, we have derived an
explicit MMSE estimator of the squared channel norm based on
the type of training matrices derived in Theorem 1. The power
allocation in the training sequence has been analyzed and solved
in certain cases. Based on these results, we conclude this section
with a heuristic training matrix that can be applied in general
Fig. 1. The average normalized MSEs of channel matrix estimation as a func-
Kronecker-structured systems. tion of the total training power in a system with the Weichselberger model and
Heuristic 2: The training matrix , with diag-  -distributed coupling matrices. The performance of four different estimators
with MSE minimizing training matrices is compared. The performance with the
onal elements in from training matrix design in Heuristic 1 is also given.

(22) sion in (8) that the performance is unaffected by non-zero mean


components.
where the Lagrange multiplier is chosen to ful- We define the normalized MSE as
fill the power constraint , should . In Fig. 1, we give the normalized
provide good performance in Kronecker-structured MSEs averaged over 5,000 scenarios with different coupling
systems. Here, , matrices with , , and independent -distributed
elements. The performance of four different estimators
, , with MSE minimizing training matrices are compared: the
and for all . If MVU/ML channel estimator [8], the
and , then the one-sided linear estimator in [8], [13] that was incorrectly
power allocation in (22) will minimize the MSE. claimed to be the linear MMSE estimator, the two-sided
Bayesian linear estimator proposed in [27], and the MMSE
VI. NUMERICAL EXAMPLES estimator in (6). The MVU/ML estimator4 is unaware of the
In this section, the performance of the MMSE estimators and channel statistics (i.e., non-Bayesian), and it is clear from Fig. 1
the training sequence design will be illustrated numerically. The that this leads to poor estimation performance. The two-sided
MSE performance of the channel matrix estimator was thor- linear estimator also performs poorly under the given premises,
oughly evaluated in [12] for interference-limited Kronecker- but can provide good performance in special cases [27]. The
structured systems. Thus, we consider the opposite setting of performance gap between the one-sided linear estimator and
a noise-limited non-Kronecker-structured system, and we will the MMSE estimator (which is also linear) is noticeable,
compare the MMSE estimation performance with other recently while the difference between employing the optimal training
proposed estimators. This section will also illustrate the advan- matrix and the one proposed in Heuristic 1 is small. It should
tage of direct MMSE estimation of the squared channel norm be pointed out that the use of independent -distributed
over indirect calculation from an estimated channel matrix. Fi- elements in the coupling matrix induces a spatially correlated
nally, we will illustrate how the smallest necessary length of the environment with a few dominating paths. In less correlated
training sequence depends on the spatial correlation and avail- scenarios, the difference between the estimators decreases, but
able training power. the order of quality is usually the same.
To illustrate the performance of the training sequence de- In Fig. 2, the performance of the MMSE estimator is shown
sign for channel matrix estimation in Section IV under general for a uniform training matrix , MSE mini-
channel conditions, we consider the Weichselberger model [25]. mizing training matrix (achieved numerically), and the simple
This model has recently attracted much attention for its accurate explicit training matrix proposed in Heuristic 1. The one-sided
representation of measurement data. According to this model, linear estimator is given as a reference. In this simulation, we
the channel matrix can be expressed as , where used the coupling matrix that was proposed in [29, Eq. 28] to
are unitary matrices and has indepen- describe an environment with two small scatterers, two big scat-
dent elements with variances given by the corresponding ele- terers, and one large cluster. It is clear that the gain of employing
ments of the coupling matrix . The unitary matrices will not an MSE minimizing training sequence is substantial, and the
affect the performance when MSE minimizing precoding design heuristic approach captures most of this gain although uniform
is employed, and can therefore be selected as identity matrices. training is asymptotically optimal at high training power.
Without loss of generality, we always scale the coupling ma- Next, we illustrate the optimal length of the training sequence
trices as to make sure that the training SINR can for varying spatial correlation and training power. Recall from
be described by the training power constraint: Theorem 3 that the optimal length in noise-limited systems is
. To enable comparison with other estima- 4For this problem, the maximum likelihood (ML) estimator is equivalent to
tors, the channel is zero-mean, but recall from the MSE expres- the MVU [23, Theorem 7.5].

Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
BJÖRNSON AND OTTERSTEN: TRAINING-BASED ESTIMATION IN ARBITRARILY CORRELATED RICIAN MIMO CHANNELS WITH RICIAN DISTURBANCE 1815

Fig. 2. The normalized MSEs of channel matrix estimation as a function of Fig. 4. The normalized MSEs of channel squared norm estimation as a func-
the total training power in a system with the Weichselberger model and the cou- tion of the total training power in a system with uncorrelated receive antennas
pling matrix proposed in [29, Eq. 28]. The MMSE estimator with three different and a transmit antenna correlation of 0.8. The MMSE estimator is compared
training matrices is compared with the one-sided linear estimator. with indirect estimation from an MMSE estimated channel matrix for different
training matrices.

mation, defined as , are


given in Fig. 4 as a function of the total training power. In
this case, we limit the simulation to Kronecker-structured sys-
tems (i.e., rank-one coupling matrices), since the explicit es-
timator in Theorem 5 is based on this assumption. We con-
sider uncorrelated receive antennas and a correlation between
adjacent transmit antennas of 0.8, using the exponential model
[30]. The performance of the MMSE estimator in Theorem 5 is
compared with indirect calculation of the squared norm from a
channel matrix that is estimated using (6). In both approaches,
uniform and optimal training sequences are considered. For the
MMSE estimator, the performance with a channel matrix opti-
mized training sequence is also shown for comparison. This is
Fig. 3. The average optimal training sequence length (smallest length that min- probably the most important case in practice; the training se-
imizes the MSE) as a function of the total training power P . The system follows quence will be used to optimize estimation of the channel ma-
the Weichselberger model where the j th column of the coupling matrix has in- trix (or some receive filter), but the received training signal can
dependent  -distributed elements scaled by , for different . Decreasing
means increasing spatial correlation. simultaneously be used to calculate an MMSE estimate of the
squared norm (e.g., for the purpose of feedback). The first ob-
equal to the rank of the training matrix. We consider coupling servation from Fig. 4 is that the indirect approach yields poor
matrices with , , and independent -distributed performance at low SINR (even worse than the purely statis-
elements, and we induce random transmit-side correlation by tical estimator which would give unit normalized
scaling the th column by for different values on . The MSE) and is not even asymptotically optimal at high SINR. The
average optimal training sequence length (i.e., average rank of performance of the MMSE estimator can be considerably im-
) is shown in Fig. 3 for both an MSE minimizing training proved by proper training sequence design. A training sequence
matrix and the training matrix proposed in Heuristic 1. The designed for channel matrix estimation will improve the perfor-
average length is given as a function of the total training power mance over uniform training at low SINR, but they both share
and for the spatial correlation induced by . the same suboptimal asymptotic behavior.
In the case of identically distributed elements of the coupling
matrix , there is sufficient spatial correlation to have VII. CONCLUSION
at low training power. As the spatial correlation A framework for training-based estimation of Rician fading
increases (i.e., decreases), the optimal training length de- MIMO channel matrices has been introduced, for the purpose
creases and the convergence towards full rank becomes slower. of joint analysis under different noise and interference condi-
The heuristic training approach is clearly overestimating the tions. The MMSE estimator was analyzed in terms of the MSE
training length, which explains the performance difference in minimizing training sequence and the optimal training structure
Fig. 1. An important observation is that the conclusion of [14] was derived in Kronecker-structured systems. The limiting so-
that the optimal length in an uncorrelated system is equal to lutions at high and low training power were given, along with
the number transmit antennas does not hold in general. Careful sufficient conditions for when the training optimization can be
system analysis is always required to determine the optimal solved explicitly. Based on these results, a heuristic training se-
length under general statistics, and the loss in performance by quence was proposed for arbitrary system statistics.
employing an even shorter training sequence may be minor In addition, we proved analytically that the MSE improves
compared with the gain of having more data symbols. with the spatial correlation at both the transmitter and the re-
Finally, we illustrate the performance of squared norm esti- ceiver side. This result was used to clarify how the optimal
mation. The normalized MSEs for channel squared norm esti- length of the training sequence depends on the system statistics

Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
1816 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010

and the total training power. An interesting result was that the where denotes the th largest eigenvalue. The last in-
optimal training sequence length can be considerably smaller equality is given in [22, Theorem 20.A.4] and is fulfilled with
than the number of transmit antennas in systems with strong equality if and only if is diagonal with elements in
spatial correlation. This was proved analytically for certain Kro- the opposite order of , which means that would
necker-structured systems. minimize the constraint. For this , we have the relationship
Finally, the framework was extended to MMSE estimation
of the squared Frobenius norm of the channel, using the same (25)
type of training sequences as for channel matrix estimation. Al-
though the MSE of this estimator can be non-convex, the lim- which is satisfied if and the diagonal values of is
iting solutions at high and low training power were derived and ordered such that is in decreasing order. If this is not
it was shown under which conditions the solution can be derived fulfilled for the given , we can always find a better solution
explicitly or with low complexity. that fulfills them by first reordering the elements of and
removing which will give strict inequality in the constraint.
APPENDIX A Then, a smaller function value is achieved by scaling the new
COLLECTION OF LEMMAS AND PROOFS solution to achieve equality in the constraint. Thus, the optimal
In the appendix, we will first state two lemmas and then apply solution has the structure , where is ordered
them when proving the theorems of this paper. The first lemma as described.
provides the necessary structure of the training matrix when the Finally, for a solution of the type , we will show
weighted sum of MSEs is minimized, and is essentially a gen- that we always can reduce the function value by selecting
eralization of [12, Corollary 5.1] where a single MSE was min- . Let , and observe that
imized (i.e., ).
Lemma 1: Let and be positive coeffi- (26)
cients, and let and be diagonal ma-
trices with strictly positive elements ordered decreasingly and
increasingly, respectively. Then, the optimization problem As mentioned in the beginning of the proof, each component
of the sum is strictly convex in its eigenvalue. Thus, (26) is a
Schur-convex function for all [20, Proposition 2.7]. Recall that
is a linear combination of and with positive
coefficients for each . Then, we have from [20, Theorem 2.11]
(23) that each is minimized when the eigenvalues of
and are added together in opposite order. If ,
is solved by being a rectangular diagonal matrix we can therefore decrease the function value by replacing it by
that satisfies and gives decreasingly ordered di- an identity matrix, without affecting the power constraint.
agonal elements of (i.e., the same order as for ). To summarize, we have showed that for every given , we can
Proof: We will derive the structure of the optimal by reduce the cost function by removing the unitary matrices of its
contradiction; that is, for every that fulfill the constraint we singular value decomposition, reordering the diagonal elements,
can find a solution that satisfy the given structure and achieves and scaling the remaining matrix to satisfy the constraint with
a smaller or identical function value. Observe that the function equality.
is strictly convex in each eigenvalue of its argument The next lemma provides a simple condition to determine if
matrix. Therefore, if the constraint is not fulfilled with equality a function that originates from an optimal power allocation is
for a given , we can always achieve a smaller function value Schur-convex or Schur-concave.
by replacing it by for some and still satisfy the Lemma 2: Consider a continuous and twice continuously dif-
constraint. ferentiable function of two non-negative vectors
Suppose that fulfills the constraint with equality, and let and . For every that
its singular value decomposition be denoted . is convex and the Hessian and all its square minors are non-sin-
We will first show that can be removed if the diagonal gular with respect to , the solution to the optimization
elements of are reordered. For this purpose we introduce
and let its singular value decomposition be de-
noted , where the singular values in are
ordered decreasingly. Now, observe that only appears in the (27)
cost function as and thus we is differentiable. The partial derivatives of the solution at op-
can modify without affecting the function value. Using the timal power allocation are
new notation, the power constraint can be expressed as
(28)

Then, the function is Schur-convex with respect to if


(24) and only if for all , and
Schur-concave if and only if .

Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
BJÖRNSON AND OTTERSTEN: TRAINING-BASED ESTIMATION IN ARBITRARILY CORRELATED RICIAN MIMO CHANNELS WITH RICIAN DISTURBANCE 1817

Proof: Since the cost function is convex with respect to the same manner as in Section III, we achieve an alternative
for every given and the domain of is closed, the Karush- expression of the MSE:
Kuhn-Tucker (KKT) conditions guarantee the existence of one
or several solutions to (27) and these are given by the following
system of stationarity equations

(29)

for all (otherwise ), where the Lagrangian mul-


tiplier makes sure that [31]. Let denote (33)
the index set of all non-zero and those for which
the corresponding equation in (29) also would be satisfied with where the second equality follows from that the identical eigen-
equality (i.e., on those that are on the boundary of becoming vectors of and are not affecting the trace and that the
active). Observe that the Jacobian of the equation system in trace of block matrices is equal to the sum of the traces for each
(29) for these will be identical to a minor of the Hessian of block.
with respect to , and thus non-singular by assumption. Using the notation , , and
If we denote the power allocation solution in (27) as a function , we can apply Lemma 1 to conclude that the MSE
, we can then apply the Implicit function the- minimizing should be rectangularly diagonal, fulfill the ele-
orem to conclude all elements in with indexes in are ment ordering given in the theorem, and satisfy the power con-
differentiable with respect to [32, Theorem 9.28]. For those straint with equality. With a training matrix of this type, the ar-
with , this variable can be replaced with a zero gument in (33) will be diagonal, and the MSE can be expressed
in the optimization problem without affecting the solution, and as
thus its derivative can be defined as being zero.
We can now use that is differentiable with respect to (34)
to calculate the partial derivative of with respect to :

which is a convex function with respect to each (since


is a convex function for all ). Thus, the KKT
conditions give the necessary and sufficient condition for the
optimal power allocation [31, Ch. 5.5] and these are summarized
(30)
in (13).
Finally, we consider the two asymptotic cases. At high power,
Since for and we approximate the MSE in (34) as
for , we have that

(31) (35)

where the last equality follows from that implies which is minimized by for all (using
that . Thus, we have proved (28). The last straightforward Lagrangian methods). At low power, we ap-
sentence of the lemma follows directly from Schur’s condition proximate (34) as
in [22, Theorem 3.A.4], which states that is Schur-convex
if and only if

(32)

(36)
for all and , and Schur-concave if the conditions are fulfilled
with inverted inequalities.
Finally, we give the proofs of Theorems 1–5 and Corollary 3. using a first order Taylor polynomial. This expression is mini-
Proof of Theorem 1: First, we derive the structure of mized by assigning all power in an arbitrary manner among the
the MSE minimizing training matrix. For Kronecker-struc- strongest term/terms of the second sum.
tured systems, the MSE can be expressed as Proof of Theorem 2: First, we will prove that the MSE in (34)
. By taking is Schur-concave with respect to the eigenvalues .
the conjugate transpose of the training transmission model It is straightforward to show that the MSE is convex in the
in (2) and then applying the results of [23, Chapter 15.8] in power allocation, differentiable with respect to and for

Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
1818 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010

all , and that the determinant of the Hessian is non-zero if estimation. We can therefore use the shorter training sequence
the eigenvalues of and are distinct. Thus, we can apply without any loss in performance.
Lemma 2. According to the lemma, it is sufficient to show that Proof of Theorem 4: In the general case, the integral expres-
for all such that sion of the MMSE estimator in (17) follows directly from the
, where MSE denotes the pre-optimization MSE in (34) definition of by exploiting that the posterior dis-
evaluated at the optimal solution. Thus, we can calculate the tribution, , is complex Gaussian
partial derivatives of (34) as distributed with the mean and covariance matrix derived in [23,
Chapter 15.8].
To derive the explicit expressions, we begin with the one-
(37)
dimensional case with the received signal
, where is the training signal, , and
. Using Bayes’ formula or [23, Chapter 15.8], the
and for . Observe posterior distribution can be expressed as
that the derivatives are positive and that and only
appear in the denominator of (37). From Theorem 1, we have
that whenever . Hence, it fol-
lows that and that the MSE is
Schur-concave.
Next, we have the case when , and then the MSE in
(34) can be expressed as (41)

We want to estimate , while the phase is


not of interest. To achieve the conditional distribution ,
(38) we change variables in to (with the Jacobian 1/2)
and marginalize the distribution by integrating over the phase :

which is a concave function in for all . We apply [22,


Proposition 3.C.1] to conclude that parts and are both
Schur-concave with respect to , and thus the
MSE is Schur-concave.
Proof of Theorem 3: For , the MSE in (9) becomes
(42)

where , ,
(39) , and is the modified
Bessel function of the first kind. The last equality in (42) fol-
The theorem follows from that (39) is independent of and lows by applying the formula [33,
that . Eq. 8.431.3]. The first and second order central moments of
Proof of Corollary 2: The rank of is equal to the number are
of active training powers . From Theorem 1, we have that the
th training power is active if and only if .
Suppose we only have active training powers, then
. Substitution into the power constraint gives (43)

respectively. These moments follows from straightforward


(40) integration, by noting that ,
(for , ), and by iden-
for . All will be active if and only if is larger tifying the Maclaurin expansion of . The MSE is achieved
than the constraint for . by replacing in the expression of with its average
Finally, if there exist a that fulfills the requirements, then .
can be factorized as In the MIMO case, observe that the elements of
, where and are independent. Since the Frobenius norm is the sum of the
are independent. Thus, neither contain squared magnitude of each element, we will have the sum of
information of the channel matrix nor is correlated with pre- independent variables that can be estimated separately.
vious disturbance in , and will therefore not affect the Thus, the MMSE estimate and MSE in (18) and (19) follows

Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
BJÖRNSON AND OTTERSTEN: TRAINING-BASED ESTIMATION IN ARBITRARILY CORRELATED RICIAN MIMO CHANNELS WITH RICIAN DISTURBANCE 1819

from a MIMO transformation of (43), with replaced with its where . Observe
average.
Proof of Theorem 5: A function is convex if and only if its that this expression has the form
second derivative is non-negative. The second derivative of the , where are positive real-
MSE in (19) with respect to is valued constants. Thus, in order for any of the solutions to be
real-valued we need . If
, this condition can be expressed as

(48)

which has no solutions in the interval. For all


(44) , we observe that
which satisfies the condition . Thus, for these
we can rewrite (47) as
which in general is negative in the neighborhood of
and thus the MSE is non-convex (for small values of ). If the
condition for convexity in the theorem is fulfilled, all terms in
the sum will however be positive at . Even if the MSE
is non-convex, the KKT conditions give necessary condition for (49)
the optimal power allocation [31, Chapter 5.5]. By a straightfor-
ward Lagrangian approach, the power allocation that minimizes
(19) needs to fulfill the stationarity conditions in (20). where we used that with defined as in
At high training power, the necessary condition in (20) can the corollary. Since , will only give negative
be approximated and simplified as solutions. For , 0, we see that the interval boundary
gives the coinciding solution
, while the limit gives
(45) and , respectively. Thus, in order to show the intervals
for the solutions, it remains to show that is monotonically
decreasing in for and increasing in for . The
which has the unique solution for all . derivative of with respect to can be expressed as
At low training power, the MSE in (19) can be approximated
as

(46) (50)

using a first order Taylor expansions of the denominators and where the multiplicative term outside the brackets is posi-
disregarding terms with in the numerator. Hence, the MSE is tive for all and . The bracketed term can be expressed
minimized by allocating all the power to the associated with as for
the largest . If . Then, the intervals follows
there is multiplicity in the largest value of the sum, the power from the observation that
can be allocated freely among these eigendirections. and .
Proof of Corollary 3: The condition means that Finally, we see that the second derivative of the MSE in (44)
for all , and therefore we can remove the depen- is positive if we limit ourself to
dence of in the denominator of (20). For all active training , since then each term in
powers , the remaining expression in (20) can be for- the sum is positive. Thus, the MSE will be convex with respect
mulated as a third degree polynomial equation in : to these and the KKT conditions in (20) becomes necessary
, using the notation and sufficient. In the special case , we can strengthen
. Its three solutions ( , 0, 1) are the condition since we know that the necessary KKT conditions
only give a single feasible solution if
. In both the general and special case,
these conditions need be combined with the original constraint
(47) .

Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.
1820 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 3, MARCH 2010

ACKNOWLEDGMENT [23] S. Kay, Fundamentals of Statistical Signal Processing: Estimation


Theory. Englewood Cliffs, NJ: Prentice Hall, 1993.
The authors would like to thank the reviewers and the as- [24] A. Tulino, A. Lozano, and S. Verdú, “Impact of antenna correlation on
sociate editor for their insightful comments and suggestions, the capacity of multiantenna channels,” IEEE Trans. Inf. Theory, vol.
which led to a more precise and readable paper. 51, no. 7, pp. 2491–2509, 2005.
[25] W. Weichselberger, M. Herdin, H. Özcelik, and E. Bonek, “A sto-
chastic MIMO channel model with joint correlation of both link ends,”
REFERENCES IEEE Trans. Wireless Commun., vol. 5, no. 1, pp. 90–100, 2006.
[1] G. J. Foschini and M. J. Gans, “On limits of wireless communications [26] W. Hager, Y. Liu, and T. Wong, “Optimization of generalized mean
in a fading environment when using multiple antennas,” Wireless Per- square error in signal processing and communication,” Linear Algebra
sonal Commun., vol. 6, pp. 311–335, 1998. and Its Applications, vol. 416, pp. 815–834, 2006.
[2] E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur. Trans. [27] D. Katselis, E. Kofidis, and S. Theodoridis, “On training optimization
Telecommun., vol. 10, pp. 585–595, 1999. for estimation of correlated MIMO channels in the presence of mul-
[3] D. Chizhik, J. Ling, P. Wolniansky, R. Valenzuela, N. Costa, and K. tiuser interference,” IEEE Trans. Signal Process., vol. 56, no. 10, pp.
Huber, “Multiple-input-multiple-output measurements and modeling 4892–4904, 2008.
in Manhattan,” IEEE J. Sel. Areas Commun., vol. 21, no. 3, pp. [28] E. Björnson and B. Ottersten, “Post-user-selection quantization and
321–331, 2003. estimation of correlated Frobenius and spectral channel norms,” pre-
[4] K. Yu, M. Bengtsson, B. Ottersten, D. McNamara, P. Karlsson, and M. sented at the IEEE PIMRC’08, Cannes, France, Sep. 15–18, 2008.
Beach, “Modeling of wideband MIMO radio channels based on NLOS [29] V. Veeravalli, Y. Liang, and A. Sayeed, “Correlated MIMO wireless
indoor measurements,” IEEE Trans. Veh. Technol., vol. 53, no. 3, pp. channels: Capacity, optimal signaling, and asymptotics,” IEEE Trans.
655–665, 2004. Inf. Theory, vol. 51, no. 6, pp. 2058–2072, 2005.
[5] J. Wallace and M. Jensen, “Measured characteristics of the MIMO [30] S. Loyka, “Channel capacity of MIMO architecture using the expo-
wireless channel,” in Proc. IEEE VTC’01-Fall, 2001, vol. 4, pp. nential correlation matrix,” IEEE Commun. Lett., vol. 5, pp. 369–371,
2038–2042. 2001.
[6] K. Werner and M. Jansson, “Estimating MIMO channel covariances [31] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge,
from training data under the Kronecker model,” Signal Process., vol. U.K.: Cambridge Univ. Press, 2004.
89, pp. 1–13, 2009. [32] W. Rudin, Principles of Mathematical Analysis. New York: Mc-
[7] F. Dietrich and W. Utschick, “Pilot-assisted channel estimation based Graw-Hill, 1976.
on second-order statistics,” IEEE Trans. Signal Process., vol. 53, no. 3, [33] I. Gradshteyn and I. Ryzhik, Table of Integrals, Series, and Products.
2005. Boston, MA: Academic Press, 1980.
[8] M. Biguesh and A. Gershman, “Training-based MIMO channel esti-
mation: A study of estimator tradeoffs and optimal training signals,” Emil Björnson (S’07) was born in Malmö, Sweden,
IEEE Trans. Signal Process., vol. 54, no. 3, pp. 884–893, 2006. in 1983. He received the M.S. degree in engineering
[9] P. Jiyong, L. Jiandong, L. Zhuo, Z. Linjing, and C. Liang, “Optimal mathematics from Lund University, Lund, Sweden,
training sequences for MIMO systems under correlated fading,” J. Syst. in 2007. He is currently working towards the Ph.D.
Eng. Electron., vol. 19, pp. 33–38, 2008. degree in telecommunications at the Signal Pro-
[10] X. Ma, L. Yang, and G. Giannakis, “Optimal training for MIMO cessing Laboratory, Royal Institute of Technology
frequency-selective fading channels,” IEEE Trans. Wireless Commun., (KTH), Stockholm, Sweden.
vol. 4, no. 2, pp. 453–466, 2005. His research interests include wireless commu-
[11] J. Kotecha and A. Sayeed, “Transmit signal design for optimal estima- nications, resource allocation, estimation theory,
tion of correlated MIMO channels,” IEEE Trans. Signal Process., vol. stochastic signal processing, and mathematical
52, no. 2, pp. 546–557, 2004. optimization.
[12] Y. Liu, T. Wong, and W. Hager, “Training signal design for estimation For his work on MIMO communications, he received a Best Paper Award
of correlated MIMO channels with colored interference,” IEEE Trans. at the 2009 International Conference on Wireless Communications and Signal
Signal Process., vol. 55, no. 4, pp. 1486–1497, 2007. Processing (WCSP 2009).
[13] D. Katselis, E. Kofidis, and S. Theodoridis, “Training-based estimation
of correlated MIMO fading channels in the presence of colored inter-
ference,” Signal Process., vol. 87, pp. 2177–2187, 2007. Björn Ottersten (S’87-M’89-SM’99-F’04) was
[14] B. Hassibi and B. Hochwald, “How much training is needed in mul- born in Stockholm, Sweden, in 1961. He received
tiple-antenna wireless links?,” IEEE Trans. Inf. Theory, vol. 49, no. 4, the M.S. degree in electrical engineering and applied
pp. 951–963, 2003. physics from Linköping University, Linköping,
[15] J. Pang, J. Li, L. Zhao, and Z. Lü, “Optimal training sequences for Sweden, in 1986 and the Ph.D. degree in electrical
MIMO channel estimation with spatial correlation,” in Proc. IEEE engineering from Stanford University, Stanford, CA,
VTC’07-Fall, 2007, pp. 651–655. in 1989.
[16] E. Björnson and B. Ottersten, “Training-based Bayesian MIMO He has held research positions at the Department of
channel and channel norm estimation,” in Proc. IEEE ICASSP’09, Electrical Engineering, Linköping University; the In-
2009, pp. 2701–2704. formation Systems Laboratory, Stanford University;
[17] E. Björnson, D. Hammarwall, and B. Ottersten, “Exploiting quantized and the Katholieke Universiteit Leuven, Leuven, Bel-
channel norm feedback through conditional statistics in arbitrarily cor- gium. During 1996–1997, he was Director of Research at ArrayComm Inc.,
related MIMO systems,” IEEE Trans. Signal Process., vol. 57, no. 10, San Jose, CA, a start-up company based on Ottersten’s patented technology. In
pp. 4027–4041, 2009. 1991, he was appointed Professor of Signal Processing at the Royal Institute of
[18] X. Zhang, E. Jorswieck, and B. Ottersten, “User selection schemes Technology (KTH), Stockholm, Sweden. From 2004 to 2008, he was Dean of
in multiple antenna broadcast channels with guaranteed performance,” the School of Electrical Engineering at KTH, and from 1992 to 2004 he was
presented at the IEEE SPAWC’07, Helsinki, Finland, Jun. 17–20, 2007. head of the Department for Signals, Sensors, and Systems at KTH. He is also
[19] R. Ertel, P. Cardieri, K. Sowerby, T. Rappaport, and J. Reed, “Overview Director of security and trust at the University of Luxembourg. His research
of spatial channel models for antenna array communication systems,” interests include wireless communications, stochastic signal processing, sensor
IEEE Personal Commun. Mag., vol. 5, pp. 10–22, 1998. array processing, and time-series analysis.
[20] E. Jorswieck and H. Boche, “Majorization and matrix-monotone func- Dr. Ottersten has coauthored papers that received an IEEE Signal Processing
tions in wireless communications,” Foundations and Trends in Com- Society Best Paper Award in 1993, 2001, and 2006. He has served as Associate
munication and Information Theory, vol. 3, pp. 553–701, 2007. Editor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING and on the Editorial
[21] E. Björnson, E. Jorswieck, and B. Ottersten, “Impact of spatial corre- Board of the IEEE Signal Processing Magazine. He is currently Editor-in-Chief
lation and precoding design in OSTBC MIMO systems,” IEEE Trans. of the EURASIP Signal Processing Journal and a member of the Editorial Board
Wireless Commun., submitted for publication. of the EURASIP Journal of Advances in Signal Processing. He is a Fellow of
[22] A. Marshall and I. Olkin, Inequalities: Theory of Majorization and Its EURASIP. He is one of the first recipients of the European Research Council
Applications. Boston, MA: Academic Press, 1979. advanced research grant.

Authorized licensed use limited to: National Taiwan University. Downloaded on March 09,2023 at 05:21:44 UTC from IEEE Xplore. Restrictions apply.

You might also like