Semi-Blind Strategies for MMSE Channel Estimation Utilizing Generative Priors

Franz Weißer, Nurettin Turan,
Dominik Semmler, Fares Ben Jazia, and Wolfgang Utschick This work was supported by the Federal Ministry of Education and Research of Germany in the programme of “Souverän. Digital. Vernetzt.”. Joint project 6G-life, project identification number: 16KISK002. An earlier version of this work was presented at ICASSP’24[1]. The authors are with the TUM School of Computation, Information and Technology, Technical University of Munich, 80333 Munich, Germany (e-mail: [email protected]).

Abstract

This paper investigates semi-blind channel estimation for massive multiple-input multiple-output (MIMO) systems. To this end, we first estimate a subspace based on all received symbols (pilot and payload) to provide additional information for subsequent channel estimation. We show how this additional information enhances minimum mean square error (MMSE) channel estimation. Two variants of the linear MMSE (LMMSE) estimator are formulated, where the first one solves the estimation within the subspace, and the second one uses a subspace projection as a preprocessing step. Theoretical derivations show the superior estimation performance of the latter method in terms of mean square error for uncorrelated Rayleigh fading. Subsequently, we introduce parameterizations of this semi-blind LMMSE estimator based on two different conditional Gaussian latent models, i.e., the Gaussian mixture model and the variational autoencoder. Both models learn the underlying channel distribution of the propagation environment based on training data and serve as generative priors for semi-blind channel estimation. Extensive simulations for real-world measurement data and spatial channel models show the superior performance of the proposed methods compared to state-of-the-art semi-blind channel estimators with respect to the MSE.

Index Terms:

Semi-blind channel estimation, Gaussian mixture model, variational autoencoder, measurement data.

I Introduction

Accurate channel state information (CSI) is crucial for achieving the expected high data rates promised by multiple-input-multiple-output (MIMO) systems [2, 3, 4]. The CSI describes the communication link between transmitter and receiver, characterized by its time-varying and frequency-selective nature, which is prone to rapid changes making the task of channel estimation complex [5]. As accurate channel estimates are essential for the successful transmission of data, it is placed at the center of several research efforts [6, 7].

The most widely adopted methods in wireless communication utilize known training or pilot symbols transmitted across the channel using some of the radio resource blocks [8]. Afterward, the receiver uses the observed signals to determine a reliable CSI estimate. As the number of pilots scales with the number of users, the spectral efficiency decreases for higher number of users as less symbols are available for transmitting data. To enhance channel estimation without increasing the number of pilot symbols, various methods have been developed that leverage the information embedded in the observed data symbols at the receiver to infer channel characteristics [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]. These methods exploit structure and redundancy within the transmitted data and yield more accurate CSI estimates.

The benefit of semi-blind channel estimation was first studied in [9], where Cramer-Rao bounds (CRBs) for blind, semi-blind, and training-based channel estimation were investigated in the context of single-input-multiple-output (SIMO) systems. In [10, 11], semi-blind channel estimation schemes based on maximum likelihood (ML) estimation were introduced. The asymptotic performance of the respective estimators was also studied in [10] for infinitely long data sequences. Another asymptotic behavior, where the number of antennas grows to infinity, was studied in [12]. Here, the authors identified two interference components in semi-blind channel estimation, which do not vanish even for large numbers of antennas. Early work on improving the least squares (LS) estimator using a semi-blind algorithm was conducted in [13]. The authors in [14] propose to use partially decoded data for channel estimation. Furthermore, in [15], semi-blind and blind channel estimation was studied to enhance the maximum a-posteriori (MAP) channel estimates in massive MIMO systems. These findings are based on favorable propagation, which only holds for large antenna arrays deployed at the base station (BS). In [16], two semi-blind channel estimators based on the expectation maximization (EM) algorithm are studied. The assumption of a Gaussian distribution for the data was verified, leading to a closed-form solution for the E-step. Another iterative framework optimizing the likelihood based on message passing (MP) is used in [17, 18] and references therein. In [19], a data-aided iterative scheme is proposed for orthogonal time frequency space (OTFS) systems by employing affine-precoded superimposed pilots. The performance improvement achieved with these iterative approaches generally requires computational costly updates. A low complexity iterative LS channel estimation algorithm is proposed in [20] for a massive MIMO turbo-receiver. In [21], the concept of semi-blind channel estimation was adapted for time domain synchronous-orthogonal frequency division multiplexing (OFDM) systems, where in addition to the pseudo noise sequence in the guard interval, the sent OFDM data symbols are exploited for the channel estimation. In [22], a framework was introduced for choosing reliable decoded data symbols, which can be interpreted as additional pilots. Similarly, reliably detected symbols are used for a semi-data-aided channel estimation in [23]. In [24], peak-power carriers in an OFDM system are selected to eliminate the need to determine reliable data symbols at the receiver. Recently, a diffusion model-based approach for joint channel estimation and detection was proposed in [25], where a diffusion process is constructed that models the joint distribution of the channels and symbols given noisy observations.

In this article, we focus on pilot-based estimators which minimize the MSE and investigate how these estimators can be extended for the semi-blind case. Notably, in [26] a subspace formulation of the MMSE estimator was used to mitigate pilot contamination in massive MIMO systems. The MMSE estimator is known to be the conditional mean estimator (CME) [27, Ch. 10], which, in general, is intractable and can not be computed in closed form. Recently, powerful approximations based on machine learning were presented in [28, 29, 30, 31, 32]. The benefit of machine learning is to enhance the task at hand by using prior information captured during the learning stage. For a given BS cell environment, the probability density function (PDF) representing potential user channels can be considered valuable prior information. Since this true underlying distribution is unknown, machine learning methods rely on a representative data set, which is assumed to be available at the BS. Based on this data set the PDF of the user channels can be learned. The first proposal of using a Gaussian mixture model (GMM) to formulate an estimator was done in [28] for the case of image processing. The approaches in [30, 31, 32] build on that by constructing a conditionally Gaussian latent model (CGLM) for the PDF of a BS cell environment. The learned CGLM not only enables MMSE channel estimation in [30, 31, 32] but can also be used for e.g., a limited feedback scheme as in [33]. In this work, we propose to utilize CGLMs to parameterize the CME in the semi-blind setting.

The contributions of this work are summarized as follows:

•

We introduce two variants of the linear MMSE (LMMSE) estimator incorporating subspace knowledge provided by the payload data symbols. First, we depict how the LMMSE channel estimator can solve a subspace estimation problem [26]. As an alternative, we propose a projection method that is computationally more efficient since it allows for the pre-calculation of LMMSE filters.
•

With theoretical derivations we show the superior MSE performance of the proposed projection method in the case of uncorrelated Rayleigh fading and perfect subspace knowledge.
•

We show how the GMM [30] and variational autoencoder (VAE) [32], instances of the class of CGLMs, can be used to parameterize the semi-blind LMMSE estimator.
•

Extensive simulations on different datasets, consisting of typical massive MIMO systems with multiple users and including real-world measurement data, show the superior performance of our proposed methods compared to state-of-the-art semi-blind channel estimation algorithms with respect to the MSE.

Preliminary results were presented in [1] and extended to the multi-user MIMO case in [34], which we extend further in the following aspects. The theoretical analyses in Section III enhance the foundation of the proposed semi-blind channel estimation strategies and provide analytic insights into the superior performance of the proposed projection method. We extend our concept of semi-blind MMSE channel estimation to the whole class of CGLMs, providing a more general framework to parameterize the semi-blind LMMSE estimator. Finally, we provide more comprehensive simulation results to show the strengths of our proposed strategies.

Notations: Matrices and vectors are denoted with boldface symbols. $\bm{0}$ and $\mathbf{I}_{N}$ denote the zero vector of appropriate size and the identity matrix of size $N\times N$ , respectively. $\mathbb{E}[\cdot]$ , $\mathrm{tr}(\cdot)$ , $\mathrm{range}(\cdot)$ , and $\mathrm{rank}(\cdot)$ denote the expectation, trace, range, and rank operators, respectively. We use $(\cdot)^{\mathrm{T}}$ , $(\cdot)^{\mathrm{H}}$ , $(\cdot)^{-1}$ to denote the transpose, conjugate transpose, and inverse. $\|\cdot\|$ denotes the $2$ -norm of a vector. $\mathcal{N}_{\mathbb{C}}(\bm{\mu},{\bm{C}})$ denotes the circularly symmetric complex Gaussian distribution with mean $\bm{\mu}$ and covariance matrix ${\bm{C}}$ .

II System and Channel Model

We consider a multi-user uplink system with $J$ single-antenna users and a BS equipped with $M$ receive antennas. The received signal vector at time instance $n$ is then

\displaystyle{\bm{y}}(n)

\displaystyle={\bm{H}}{\bm{x}}(n)+{\bm{n}}(n),\quad n=1,...,N,

(1)

where ${\bm{x}}(n)=[x_{1}(n),...,x_{J}(n)]^{\mathrm{T}}\in\mathbb{C}^{J}$ and ${\bm{n}}(n)\in\mathbb{C}^{M}$ denote the signal sent by each of the $J$ users and the additive noise, respectively, whereas ${\bm{H}}=[{\bm{h}}_{1},...,{\bm{h}}_{J}]$ contains the individual channels of the users ${\bm{h}}_{j}\in\mathbb{C}^{M}$ . The case of multiple antennas at the users can be transformed into (1) by viewing each active stream as a different user. The corresponding channel ${\bm{h}}_{j}$ can then be seen as the effective channel. For further details on semi-blind channel estimation in multi-user MIMO, we refer the reader to [34]. For the task of channel estimation, we consider a channel coherence interval larger than the number of snapshots $N$ , i.e., the channels are constant over all snapshots. We assume that the noise is Gaussian with ${\bm{n}}(n)\sim\mathcal{N}_{\mathbb{C}}(\bm{0},{\bm{C}}_{\bm{n}}=\sigma^{2}% \mathbf{I}_{M})$ .

In conventional channel estimation schemes, each user’s signals include $N_{p}$ uplink pilots. These pilots are known to the BS. Hence, the received observations at the BS side are

\displaystyle{\bm{Y}}=\left[{\bm{Y}}_{p}^{\prime},{\bm{Y}}_{d}\right]={\bm{H}}% \left[{\bm{P}},{\bm{D}}\right]+{\bm{N}}={\bm{H}}{\bm{X}}+{\bm{N}},

(2)

where ${\bm{Y}}\in\mathbb{C}^{M\times N}$ , ${\bm{Y}}^{\prime}_{p}\in\mathbb{C}^{M\times N_{p}}$ , ${\bm{Y}}_{d}\in\mathbb{C}^{M\times N-N_{p}}$ , ${\bm{P}}\in\mathbb{C}^{J\times N_{p}}$ , and ${\bm{D}}\in\mathbb{C}^{J\times N-N_{p}}$ denote all received observations, received pilot observations, received payload data observations, sent pilots, and sent payload data symbols, respectively. In order to fully illuminate the channels, the number of pilots is, at minimum, the number of users $N_{p}\geq J$ , and orthogonal pilots are used. We set $N_{p}=J$ , and utilize discrete Fourier transform (DFT) pilot sequences. After decorrelating the orthogonal pilot sequences the received pilot observations simplify to

\displaystyle{\bm{Y}}_{p}={\bm{Y}}^{\prime}_{p}{\bm{P}}^{\mathrm{H}}={\bm{H}}{% \bm{P}}{\bm{P}}^{\mathrm{H}}+{\bm{N}}{\bm{P}}^{\mathrm{H}}={\bm{H}}+{\bm{N}}_{% p},

(3)

where ${\bm{N}}_{p}$ has the same statistics as ${\bm{N}}$ and, hence, we can omit the subscript. This enables to consider channel estimation from a per user perspective in the subsequent discussions. For reasons of simpler readability, the index for the respective user is, therefore, no longer given in the following. Consequently, we denote the pilot observation of a user as

\displaystyle{\bm{y}}_{p}={\bm{h}}+{\bm{n}},

(4)

with ${\bm{n}}\sim\mathcal{N}_{\mathbb{C}}(0,{\bm{C}}_{\bm{n}}=\sigma^{2}\mathbf{I}_% {M})$ .

II-A Spatial Channel Model

We consider a spatial channel model based on [35], where the channel vectors are considered as conditionally Gaussian distributed [29]

\displaystyle{\bm{h}}\mid{\bm{\delta}}\sim\mathcal{N}_{\mathbb{C}}\left(\bm{0}% ,{\bm{C}}_{\bm{\delta}}\right),

(5)

based on a set of parameters ${\bm{\delta}}$ , which describe the directions and properties of the multi-path propagation clusters. The main angles are drawn independently from a uniform distribution in $[0,2\pi]$ and the path gains are independent zero-mean Gaussians. The spatial covariance matrix is given by

\displaystyle{\bm{C}}_{\bm{\delta}}=\int_{-\pi}^{\pi}g(\vartheta,{\bm{\delta}}% ){\bm{a}}(\vartheta){\bm{a}}^{\mathrm{H}}(\vartheta)\mathrm{d}\vartheta,

(6)

where $g(\vartheta,{\bm{\delta}})$ is the power density consisting of a weighted sum of Laplace densities, which have standard deviations $\sigma_{\text{AS}}$ corresponding to the angular spread of the propagation clusters. The BS employs a uniform linear array (ULA) with $M=64$ antennas and $\lambda/2$ spacing. The steering vector is then given as

\displaystyle{\bm{a}}(\vartheta)=\frac{1}{\sqrt{M}}\left[1,\mathrm{e}^{-% \mathrm{j}\pi\sin(\vartheta)},\dots,\mathrm{e}^{-\mathrm{j}\pi(M-1)\sin(% \vartheta)}\right]^{\mathrm{T}}.

(7)

For every channel sample, we consider a new ${\bm{\delta}}$ and draw the sample according to ${\bm{h}}\sim\mathcal{N}_{\mathbb{C}}\left(\bm{0},{\bm{C}}_{\bm{\delta}}\right)$ .

II-B Measurement Campaign

Since synthetic data capture real-world CSI characteristics only up to some extent, we utilize real-world data from a measurement campaign conducted at the Nokia campus in Stuttgart, Germany, in October/November 2017, cf. [36]. The BS antenna with a uniform rectangular array (URA) comprises $N_{v}=4$ vertical ( $\lambda$ spacing) and $N_{h}=16$ horizontal ( $\lambda/2$ spacing) single polarized patch antennas. The operating carrier frequency is $2.18$ GHz and the antenna was mounted on a rooftop approximately $20$ meters above the ground. For further details, we refer the reader to [36].

III Semi-Blind Channel Estimation using Perfect Statistical Knowledge

In this section, we introduce two variants of the LMMSE estimator incorporating information provided by the payload data symbols. To do so, we first restrict ourselves to the case where perfect statistical knowledge is available at the receiver.

In channel estimation, commonly, only the pilot observation ${\bm{y}}_{p}$ is considered for channel estimation. The MSE optimal estimator given the pilot observation ${\bm{y}}_{p}$ is the CME

\displaystyle\hat{{\bm{h}}}_{\text{CME}}=\mathbb{E}\left[{\bm{h}}\mid{\bm{y}}_% {p}\right].

(8)

If the channel sample ${\bm{h}}$ is drawn from a Gaussian distribution according to

\displaystyle{\bm{h}}\sim\mathcal{N}_{\mathbb{C}}(\bm{0},{\bm{C}}),

(9)

and if further this statistic is known at the receiver, the genie-aided CME can be formulated as [27, Ch. 10]

\displaystyle\hat{{\bm{h}}}_{\text{CME}}={\bm{C}}\left({\bm{C}}+{\bm{C}}_{\bm{% n}}\right)^{-1}{\bm{y}}_{p}.

(10)

This LMMSE estimator achieves the MSE of

\displaystyle\mathrm{MSE}^{\mathrm{plain}}=\mathrm{tr}\left[{\bm{C}}-{\bm{C}}(% {\bm{C}}+{\bm{C}}_{\bm{n}})^{-1}{\bm{C}}\right].

(11)

For ${\bm{C}}_{\bm{n}}=\sigma^{2}\mathbf{I}_{M}$ , the MSE can be expressed using the Woodbury identity as

\displaystyle\mathrm{MSE}^{\mathrm{plain}}=\sum_{i=1}^{M}\frac{{\rho}_{i}% \sigma^{2}}{{\rho}_{i}+\sigma^{2}},

(12)

where $\rho_{i}$ are the eigenvalues of ${\bm{C}}$ .

For our considerations concerning semi-blind channel estimation, we assume knowledge about the subspace defined by $\mathrm{range}({\bm{H}})=\mathrm{range}({\bm{V}})$ , where we denote with ${\bm{V}}$ the left singular vectors of ${\bm{H}}$ , which span the same subspace as the columns of ${\bm{H}}$ . Generally, we can formulate the ML estimate of ${\bm{H}}$ in view of (2) as [16]

\displaystyle\min_{{\bm{H}},{\bm{X}}}\sum_{n=1}^{J}\left\|{\bm{y}}(n)-{\bm{h}}% _{n}\right\|^{2}+\sum_{n=J+1}^{N}\left\|{\bm{y}}(n)-{\bm{H}}{\bm{x}}(n)\right% \|^{2},

(13)

where the first term belongs to the pilot observations and the second part refers to the observation obtained from the payload symbols. In [16] the EM algorithm is introduced to solve this channel estimation problem in terms of maximum likelihood, whereas, in [11] a semi-blind method is derived based on utilizing the subspace $\mathrm{range}({\bm{H}})=\mathrm{range}({\bm{V}})$ . Now, given the subspace $\mathrm{range}({\bm{V}})$ , we can reformulate (13) as [11]

\displaystyle\min_{{\bm{S}},{\bm{X}}}\sum_{n=1}^{J}\left\|{\bm{y}}(n)-{\bm{V}}% {\bm{s}}_{n}\right\|^{2}+\sum_{n=J+1}^{N}\left\|{\bm{y}}(n)-{\bm{V}}{\bm{S}}{% \bm{x}}(n)\right\|^{2},

(14)

with ${\bm{H}}={\bm{V}}{\bm{S}}$ and ${\bm{S}}=[{\bm{s}}_{1},...,{\bm{s}}_{J}]\in\mathbb{C}^{J\times J}$ . The optimal solution for ${\bm{X}}$ is given as

	$\displaystyle{\bm{x}}^{*}(n)$	$\displaystyle=({\bm{S}}^{\mathrm{H}}{\bm{V}}^{\mathrm{H}}{\bm{V}}{\bm{S}})^{-1% }{\bm{S}}^{\mathrm{H}}{\bm{V}}^{\mathrm{H}}{\bm{y}}(n)$		(15)
		$\displaystyle=({\bm{S}}^{\mathrm{H}}{\bm{S}})^{-1}{\bm{S}}^{\mathrm{H}}{\bm{V}% }^{\mathrm{H}}{\bm{y}}(n).$		(16)

Reinserting this solution into (13), the right part of the objective simplifies to

\displaystyle\min_{{\bm{S}}}\sum_{n=1}^{J}\left\|{\bm{y}}(n)-{\bm{V}}{\bm{s}}_% {n}\right\|^{2}+\sum_{n=J+1}^{N}\left\|{\bm{y}}(n)-{\bm{V}}{\bm{V}}^{\mathrm{H% }}{\bm{y}}(n)\right\|^{2},

(17)

where only the left part depends on ${\bm{S}}$ . Thus, the ML problem for the user of interest results in [11]

\displaystyle\min_{\bm{s}}\|{\bm{y}}_{p}-{\bm{V}}{\bm{s}}\|^{2},

(18)

with the closed form solution $\hat{{\bm{h}}}_{\text{ML}}={\bm{V}}{\bm{V}}^{\mathrm{H}}{\bm{y}}_{p}$ . The MSE of this estimator is

\displaystyle\mathrm{MSE}^{\mathrm{ML}}=\mathbb{E}\left[\|{\bm{h}}-{\bm{V}}{% \bm{V}}^{\mathrm{H}}{\bm{y}}_{p}\|^{2}\right]=J\sigma^{2}.

(19)

In the following, we introduce two channel estimation strategies combining the information provided by ${\bm{C}}$ and $\mathrm{range}({\bm{V}})$ .

III-A Subspace Channel Estimator

Using the information in $\mathrm{range}({{\bm{V}}})$ , we can solve the estimation within the subspace as previously proposed in [26]. For this, the pilot system model in (4) is projected into the $J$ -dimensional subspace as

\displaystyle{\bm{y}}^{\prime}={\bm{V}}^{\mathrm{H}}{\bm{y}}_{p}

\displaystyle={\bm{V}}^{\mathrm{H}}{\bm{h}}+{\bm{V}}^{\mathrm{H}}{\bm{n}}={\bm% {h}}^{\prime}+{\bm{n}}^{\prime}.

(20)

Under the assumption that ${\bm{V}}$ is chosen independently of ${\bm{h}}$ , the distribution ${\bm{h}}^{\prime}\sim\mathcal{N}_{\mathbb{C}}(\bm{0},{\bm{V}}^{\mathrm{H}}{\bm% {C}}{\bm{V}})$ can be used to formulate the estimate [26]

\displaystyle\hat{{\bm{h}}}^{\prime}=\;

\displaystyle{\bm{V}}^{\mathrm{H}}{\bm{C}}{\bm{V}}\left({\bm{V}}^{\mathrm{H}}{% \bm{C}}{\bm{V}}+\sigma^{2}\mathbf{I}_{J}\right)^{-1}{\bm{V}}^{\mathrm{H}}{\bm{% y}}_{p}.

(21)

One should note that by design ${\bm{V}}$ actually depends on ${\bm{h}}$ and, hence, (21) is a suboptimal but feasible estimate for ${\bm{h}}^{\prime}$ . After solving the estimation in the subspace for ${\bm{h}}^{\prime}$ , the solution can be transformed back using [26]

\displaystyle\hat{{\bm{h}}}_{\text{sub}}={\bm{V}}\hat{{\bm{h}}}^{\prime}.

(22)

III-B Projected Channel Estimator

As an alternative approach, we propose the use of the orthogonal subspace projection ${\bm{P}}_{\bm{H}}={\bm{V}}{\bm{V}}^{\mathrm{H}}$ as a preprocessing filter. Since the projector ${\bm{P}}_{\bm{H}}$ does not affect ${\bm{h}}$ , the resulting projected observation is given by

\displaystyle\tilde{{\bm{y}}}={\bm{P}}_{\bm{H}}{\bm{y}}_{p}={\bm{h}}+{\bm{P}}_% {\bm{H}}{\bm{n}}={\bm{h}}+\tilde{{\bm{n}}}.

(23)

To formulate the CME $\hat{{\bm{h}}}=\mathbb{E}\left[{\bm{h}}\mid\tilde{{\bm{y}}}\right]$ , we need to calculate the statistic of the noise $\tilde{{\bm{n}}}$ with

\displaystyle{\bm{C}}_{\tilde{{\bm{n}}}}=\;

\displaystyle\mathbb{E}\left[\tilde{{\bm{n}}}\tilde{{\bm{n}}}^{\mathrm{H}}% \right]=\mathbb{E}\left[\sigma^{2}{\bm{P}}_{\bm{H}}\right].

(24)

To get an intuitive understanding of (24), let us consider a scenario involving spatially uncorrelated channels, meaning that path gains and channel directions are uncorrelated. This is the case when users are uniformly distributed over the directions, e.g., the spatial channel model in Section II-A, resulting in a channel covariance matrix of the scenario that is a scaled identity [37, Def. 2.3]. In such a case, the matrices with the eigenvectors of the sample covariance matrix of such channels are distributed with Haar measure [38, Chap. 1], i.e., uniformly distributed on the manifold of unitary matrices. Assuming spatially uncorrelated channels (24) results in

\displaystyle{\bm{C}}_{\tilde{{\bm{n}}}}=\sigma^{2}\frac{J}{M}\mathbf{I}_{M},

(25)

which we assume to hold for the remainder of this section. We can then formulate the projected LMMSE estimator as

\displaystyle\hat{{\bm{h}}}_{\text{proj}}={\bm{C}}\left({\bm{C}}+{\bm{C}}_{% \tilde{{\bm{n}}}}\right)^{-1}\tilde{{\bm{y}}}.

(26)

III-C Performance Analysis

If (25) is true, the MSE of the proposed projected channel estimator can directly be written as (cf. Section -A)

	$\displaystyle\mathrm{MSE}^{\mathrm{proj}}$	$\displaystyle=\mathrm{tr}\left({{\bm{C}}}-{{\bm{C}}}\left({{\bm{C}}}+\sigma^{2% }\frac{J}{M}\mathbf{I}_{M}\right)^{-1}{{\bm{C}}}\right)$		(27)
		$\displaystyle=\sum_{i=1}^{M}\frac{\rho_{i}\sigma^{2}}{\frac{M}{J}\rho_{i}+% \sigma^{2}}.$		(28)

Comparing the performance to the plain LMMSE we see that

\displaystyle\frac{\rho_{i}\sigma^{2}}{\frac{M}{J}\rho_{i}+\sigma^{2}}\leq% \frac{\rho_{i}\sigma^{2}}{\rho_{i}+\sigma^{2}},

(29)

holds for every $i=1,\dots,M$ , resulting in $\mathrm{MSE}^{\mathrm{proj}}\leq\mathrm{MSE}^{\mathrm{plain}}$ , cf. (12). The inequality in (29) only holds with equality if $J=M$ . Additionally, we can compare the MSE of the projected LMMSE to (19) by reformulating (28) as

\displaystyle\mathrm{MSE}^{\mathrm{proj}}=\frac{J}{M}\sigma^{2}\sum_{i=1}^{M}% \frac{\rho_{i}}{\rho_{i}+\frac{J}{M}\sigma^{2}}\leq J\sigma^{2}=\mathrm{MSE}^{% \text{ML}}.

(30)

To compare the projected variant to the subspace LMMSE, let us consider uncorrelated Rayleigh fading such that the channel covariance matrix is given as ${\bm{C}}=\mathbf{I}_{M}$ . The MSE of the projected variant results in

\displaystyle\mathrm{MSE}^{\mathrm{proj}}_{\mathrm{iid}}=\frac{JM\sigma^{2}}{{% M}+J\sigma^{2}}.

(31)

For the case of ${\bm{C}}=\mathbf{I}_{M}$ , the subspace LMMSE estimator boils down to

\displaystyle\hat{{\bm{h}}}_{\mathrm{sub}}=\frac{1}{1+\sigma^{2}}{\bm{V}}{\bm{% V}}^{\mathrm{H}}{\bm{y}},

(32)

with its corresponding MSE as (cf. Section -B)

\displaystyle\mathrm{MSE}^{\mathrm{sub}}_{\mathrm{iid}}=\frac{\sigma^{2}(M% \sigma^{2}+J)}{(1+\sigma^{2})^{2}}\geq\mathrm{MSE}^{\mathrm{proj}}_{\mathrm{% iid}}.

(33)

Fig. 1 shows the performance of the individual channel estimators based on perfect genie knowledge for the case of uncorrelated Rayleigh fading. As one can see, the projected channel estimator outperforms all other estimators across the whole signal-to-noise ratio (SNR) range. Further, we realize that the subspace LMMSE converges to the ML method from above.

Refer to caption — Figure 1: MSE over the SNR for given channel estimations based on perfect subspace and perfect statistical knowledge in a $J=8$ user and $M=64$ antennas scenario with uncorrelated Rayleigh fading.

IV Proposed Semi-Blind Channel Estimation - Utilizing Generative Prior

In practice, (10) can not be utilized directly as the channels have to be Gaussian distributed, and this distribution needs to be known. In general, the CME given the pilot observation ${\bm{y}}_{p}$ is formulated as

\displaystyle\hat{{\bm{h}}}_{\text{CME}}=\mathbb{E}\left[{\bm{h}}\mid{\bm{y}}_% {p}\right]=\int{\bm{h}}\frac{p_{{\bm{n}}}({\bm{y}}_{p}-{\bm{h}})p({\bm{h}})}{p% ({\bm{y}}_{p})}\mathrm{d}{\bm{h}}.

(34)

As can be seen in (34), the CME generally can not be computed analytically. First, the CME needs access to $p({\bm{h}})$ , which is generally unavailable in practice. Additionally, no closed-form solution exists to the integral in (34).

In order to reformulate the CME, we first use the property that for any arbitrarily distributed random variable ${\bm{h}}$ , we can always find a condition ${\bm{c}}$ which makes the conditional distribution Gaussian. Secondly, it has been shown in [39] that for wireless communication channels, this conditional Gaussian distribution preserves the zero-mean property as

\displaystyle{\bm{h}}\mid{\bm{\delta}}\sim\mathcal{N}_{\mathbb{C}}(\bm{0},{\bm% {C}}_{{\bm{h}}\mid{\bm{c}}}).

(35)

Thus, we can reformulate the CME as

$\displaystyle\mathbb{E}\left[{\bm{h}}\mid{\bm{y}}_{p}\right]$	$\displaystyle=\mathbb{E}\left[\mathbb{E}\left[{\bm{h}}\mid{\bm{y}}_{p},{\bm{c}% }\right]\mid{\bm{y}}_{p}\right]$	(36)
	$\displaystyle=\int\mathbb{E}\left[{\bm{h}}\mid{\bm{y}}_{p},{\bm{c}}\right]p({% \bm{c}}\mid{\bm{y}}_{p})\mathrm{d}{\bm{c}}$	(37)
	$\displaystyle\approx\int\hat{{\bm{h}}}_{\bm{c}}({\bm{y}}_{p})p({\bm{c}}\mid{% \bm{y}}_{p})\mathrm{d}{\bm{c}}$	(38)

where

\displaystyle\hat{{\bm{h}}}_{\bm{c}}({\bm{y}}_{p})={\bm{C}}_{{\bm{h}}\mid{\bm{% c}}}\left({\bm{C}}_{{\bm{h}}\mid{\bm{c}}}+{\bm{C}}_{\bm{n}}\right)^{-1}{\bm{y}% }_{p},

(39)

denotes the LMMSE estimate given ${\bm{c}}$ . However, finding a suitable condition $\bm{\delta}$ can be challenging, in particular as the true distribution of ${\bm{h}}$ is unknown. To this end, CGLMs were proposed in [30, 31, 32], which approximate the CME based on a GMM, mixtures of factor analyzers (MFA), and VAE, respectively. All three methods learn a model that provides the conditional Gaussian distribution ${\bm{h}}\mid{\bm{c}}$ based on a discrete (GMM, MFA) or continuous (VAE) latent variable ${\bm{c}}$ . In this work, we focus on the GMM and VAE, which we adapt to semi-blind channel estimation in the following.

IV-A GMM-based Semi-blind Channel Estimation

Based on the universal approximation property of GMMs [40], the PDF of ${\bm{h}}$ is approximated by

\displaystyle f_{\bm{h}}^{(K)}({\bm{h}})=\sum_{k=1}^{K}p(k)\mathcal{N}_{% \mathbb{C}}({\bm{h}};\bm{\mu}_{k},{\bm{C}}_{k}),

(40)

where $p(k)$ , $\bm{\mu}_{k}$ and ${\bm{C}}_{k}$ are the mixing coefficients, means and covariance matrices of the $k$ -th GMM component, respectively. As we are considering wireless channels, the mean of each component is set to $\bm{\mu}_{k}=\bm{0}$ , cf. [39]. The fitting of the components in (40) is accomplished with the well-known EM algorithm [41] based on a set $\mathcal{H}=\{{\bm{h}}_{t}\}^{T}_{t=1}$ of $T$ channel samples as training data. Based on the formulation in (40) the conditional PDF given $k$ is

\displaystyle{\bm{h}}\mid k\sim\mathcal{N}_{\mathbb{C}}({\bm{h}};\bm{0},{\bm{C% }}_{k}).

(41)

Thus, in the case of a GMM we have a discrete latent variable, which helps us parameterize the CME. The resulting semi-blind subspace GMM can be formulated as

\displaystyle\hat{{\bm{h}}}_{\text{sub. GMM}}={\bm{V}}\hat{{\bm{h}}}^{\prime}_% {\text{GMM}}={\bm{V}}\sum_{k=1}^{K}p(k\mid{\bm{y}}^{\prime})\hat{{\bm{h}}}^{% \prime}_{\text{GMM},k},

(42)

with

\displaystyle\hat{{\bm{h}}}^{\prime}_{\text{GMM},k}=\;

\displaystyle{\bm{V}}^{\mathrm{H}}{\bm{C}}_{k}{\bm{V}}\left({\bm{V}}^{\mathrm{% H}}{\bm{C}}_{k}{\bm{V}}+\sigma^{2}\mathbf{I}_{J}\right)^{-1}{\bm{V}}^{\mathrm{% H}}{\bm{y}}_{p},

(43)

and the corresponding responsibilities

\displaystyle p(k\mid{\bm{y}}^{\prime})=\frac{p(k)\mathcal{N}_{\mathbb{C}}% \left({\bm{y}}^{\prime};\bm{0},{\bm{V}}^{\mathrm{H}}{\bm{C}}_{k}{\bm{V}}+% \sigma^{2}\mathbf{I}_{J}\right)}{\sum_{i=1}^{K}p(i)\mathcal{N}_{\mathbb{C}}% \left({\bm{y}}^{\prime};\bm{0},{\bm{V}}^{\mathrm{H}}{\bm{C}}_{i}{\bm{V}}+% \sigma^{2}\mathbf{I}_{J}\right)}.

(44)

The projected GMM is

\displaystyle\hat{{\bm{h}}}_{\text{proj. GMM}}=\sum_{k=1}^{K}p(k\mid\tilde{{% \bm{y}}})\hat{{\bm{h}}}_{\text{proj. GMM},k},

(45)

with

\displaystyle\hat{{\bm{h}}}_{\text{proj. GMM},k}={\bm{C}}_{k}\left({\bm{C}}_{k% }+{\bm{C}}_{\tilde{{\bm{n}}}}\right)^{-1}\tilde{{\bm{y}}}

(46)

and the associated responsibilities

\displaystyle p(k\mid\tilde{{\bm{y}}})=\frac{p(k)\mathcal{N}_{\mathbb{C}}\left% (\tilde{{\bm{y}}};\bm{0},{\bm{C}}_{k}+{\bm{C}}_{\tilde{{\bm{n}}}}\right)}{\sum% _{i=1}^{K}p(i)\mathcal{N}_{\mathbb{C}}\left(\tilde{{\bm{y}}};\bm{0},{\bm{C}}_{% i}+{\bm{C}}_{\tilde{{\bm{n}}}}\right)}.

(47)

The respective estimators are summarized in Algorithm 1 and Algorithm 2.

Algorithm 1 Subspace GMM Channel Estimator

Offline Training Phase

1:Training dataset

\mathcal{H}=\{{\bm{h}}_{t}\}_{t=1}^{T}

2:Fit the GMM with the EM algorithm, cf. [30]

{\bm{Y}}=[{\bm{y}}(1),\dots,{\bm{y}}(N)]

{\bm{P}}

\sigma^{2}

\hat{{\bm{C}}}_{{\bm{y}}\mid{\bm{H}}}\leftarrow\frac{1}{N}{\bm{Y}}{\bm{Y}}^{% \mathrm{H}}

\hat{{\bm{V}}}\leftarrow J

dominant eigenvectors of

\hat{{\bm{C}}}_{{\bm{y}}\mid{\bm{H}}}

{\bm{Y}}_{p}=[{\bm{y}}_{p,1},\dots,{\bm{y}}_{p,J}]\leftarrow{\bm{Y}}^{\prime}_% {p}{\bm{P}}^{\mathrm{H}}

8:for

j=1,\dots,J

{\bm{y}}^{\prime}\leftarrow{\bm{V}}^{\mathrm{H}}{\bm{y}}_{p,j}

10: for

k=1,\dots,K

11:

\hat{{\bm{h}}}^{\prime}_{k}\leftarrow{\bm{V}}^{\mathrm{H}}{\bm{C}}_{k}{\bm{V}}% \left({\bm{V}}^{\mathrm{H}}{\bm{C}}_{k}{\bm{V}}+\sigma^{2}\mathbf{I}_{J}\right% )^{-1}{\bm{y}}^{\prime}

12: end for

13:

\hat{{\bm{h}}}_{j}\leftarrow{\bm{V}}\sum_{k=1}^{K}p(k\mid{\bm{y}}^{\prime})% \hat{{\bm{h}}}^{\prime}_{k}

14:end for

15:return

\hat{{\bm{h}}}_{j},\forall j=1,\dots,J

Algorithm 2 Projected GMM Channel Estimator

Offline Training Phase

1:Training dataset

\mathcal{H}=\{{\bm{h}}_{t}\}_{t=1}^{T}

2:Fit the GMM with the EM algorithm, cf. [30]

{\bm{Y}}=[{\bm{y}}(1),\dots,{\bm{y}}(N)]

{\bm{P}}

\sigma^{2}

\hat{{\bm{C}}}_{{\bm{y}}\mid{\bm{H}}}\leftarrow\frac{1}{N}{\bm{Y}}{\bm{Y}}^{% \mathrm{H}}

\hat{{\bm{V}}}\leftarrow J

dominant eigenvectors of

\hat{{\bm{C}}}_{{\bm{y}}\mid{\bm{H}}}

{\bm{Y}}_{p}=[{\bm{y}}_{p,1},\dots,{\bm{y}}_{p,J}]\leftarrow{\bm{Y}}^{\prime}_% {p}{\bm{P}}^{\mathrm{H}}

8:for

j=1,\dots,J

\tilde{{\bm{y}}}\leftarrow{\bm{V}}{\bm{V}}^{\mathrm{H}}{\bm{y}}_{p,j}

10: for

k=1,\dots,K

11:

\hat{{\bm{h}}}_{k}\leftarrow{\bm{C}}_{k}\left({\bm{C}}_{k}+{\bm{C}}_{\tilde{{% \bm{n}}}}\right)^{-1}\tilde{{\bm{y}}}

12: end for

13:

\hat{{\bm{h}}}_{j}\leftarrow\sum_{k=1}^{K}p(k\mid\tilde{{\bm{y}}})\hat{{\bm{h}% }}_{k}

14:end for

15:return

\hat{{\bm{h}}}_{j},\forall j=1,\dots,J

IV-B VAE-based Semi-blind Channel Estimation

To learn the unknown distribution $f_{\bm{h}}({\bm{h}})$ using a VAE, we lower bound the parameterized likelihood $p_{\bm{\theta}}({\bm{h}})$ using the evidence-lower bound (ELBO). To formulate the ELBO, the variational distributions $q_{\bm{\phi}}({\bm{z}}\mid{\bm{y}}^{\prime})$ and $q_{\bm{\phi}}({\bm{z}}\mid\tilde{{\bm{y}}})$ are introduced, which approximate $p({\bm{z}}\mid{\bm{y}}^{\prime})$ and $p({\bm{z}}\mid\tilde{{\bm{y}}})$ , respectively. In contrast to the GMM, the used subspace $\mathrm{range}({\bm{V}})$ is unknown to the encoder of the VAE making $p({\bm{z}}\mid{\bm{y}}^{\prime})$ difficult to learn. Additionally, the dimension of the encoder input would depend on the number of users in the system. Thus, we propose to approximate both posteriors $p({\bm{z}}\mid{\bm{y}}^{\prime})$ and $p({\bm{z}}\mid\tilde{{\bm{y}}})$ with $q_{\bm{\phi}}({\bm{z}}\mid\tilde{{\bm{y}}})$ . A version of the ELBO for this case, which is accessible, can be written as [42]

\displaystyle\mathcal{L}_{{\bm{\theta}},{\bm{\phi}}}=\mathbb{E}_{q_{\bm{\phi}}% }[\log p_{\bm{\theta}}({\bm{h}}\mid{\bm{z}})]-\mathrm{D}_{\mathrm{KL}}(q_{\bm{% \phi}}({\bm{z}}\mid\tilde{{\bm{y}}})\mid\mid p({\bm{z}})),

(48)

where $\mathbb{E}_{q_{\bm{\phi}}}[\cdot]=\mathbb{E}_{q_{\bm{\phi}}({\bm{z}}\mid\tilde% {{\bm{y}}})}[\cdot]$ is the expectation over the variational distribution $q_{\bm{\phi}}({\bm{z}}\mid\tilde{{\bm{y}}})$ . The second term in (48) is the Kullback-Leibler (KL) divergence

\displaystyle\mathrm{D}_{\mathrm{KL}}(q_{\bm{\phi}}({\bm{z}}\mid\tilde{{\bm{y}% }})\mid\mid p({\bm{z}}))=\mathbb{E}_{q_{\bm{\phi}}}\left[\log\left(\frac{q_{% \bm{\phi}}({\bm{z}}\mid\tilde{{\bm{y}}})}{p({\bm{z}})}\right)\right].

(49)

In the VAE framework, the ELBO is optimized using deep neural networks (DNNs) and the reparameterization trick [42]. In order to do so, the involved distributions are defined as

$\displaystyle p({\bm{z}})$	$\displaystyle=\mathcal{N}(\bm{0},\mathbf{I}_{Z}),$
$\displaystyle p_{\bm{\theta}}({\bm{h}}\mid{\bm{z}})$	$\displaystyle=\mathcal{N}_{\mathbb{C}}(\bm{\mu}_{\bm{\theta}}({\bm{z}}),{{\bm{% C}}}_{\bm{\theta}}({\bm{z}})),$	(50)
$\displaystyle q_{\bm{\phi}}({\bm{z}}\mid\tilde{{\bm{y}}})$	$\displaystyle=\mathcal{N}(\bm{\mu}_{\bm{\phi}}(\tilde{{\bm{y}}}),\mathrm{diag}% (\bm{\sigma}^{2}_{\bm{\phi}}(\tilde{{\bm{y}}}))).$

The resulting semi-blind VAE structure is shown in Fig. 2. In the case of a ULA or URA at the BS the channel covariance matrix is either Toeplitz or block-Toeplitz, respectively. As shown in [39], the conditional covariance matrix at the output of the VAE preserves this structure. Thus, we parameterize the output covariance matrix as

\displaystyle{{\bm{C}}}_{\bm{\theta}}({\bm{z}})={\bm{Q}}^{\mathrm{H}}\mathrm{% diag}({\bm{c}}_{\bm{\theta}}({\bm{z}})){\bm{Q}},

(51)

where ${\bm{Q}}={\bm{Q}}_{M}$ or ${\bm{Q}}={\bm{Q}}^{\prime}_{N_{v}}\otimes{\bm{Q}}^{\prime}_{N_{h}}$ , respectively, where ${\bm{Q}}_{M}$ is a DFT matrix of size $M$ resulting in a circulant approximation, cf. [32], and ${\bm{Q}}^{\prime}_{N_{x}}$ contains the first $N_{x}$ columns of the $2N_{x}\times 2N_{x}$ DFT matrix resulting in a block-Toeplitz parameterization, cf. [43]. Further, for the (block-)Toeplitz parameterization we can set $\bm{\mu}_{\bm{\theta}}({\bm{z}})=\bm{0}$ , cf. [39].

After successfully training the VAE, the output is a local parameterization of $f_{\bm{h}}({\bm{h}})$ as conditionally Gaussian

\displaystyle{\bm{h}}\mid{\bm{z}}\sim p_{\bm{\theta}}({\bm{h}}\mid{\bm{z}}).

(52)

As analyzed in [32] it is a reasonable approximation to set

\displaystyle p({\bm{z}}\mid\tilde{{\bm{y}}})=\begin{cases}1\quad\text{if }{% \bm{z}}=\bm{\mu}_{\bm{\phi}}(\tilde{{\bm{y}}}),\\ 0\quad\text{otherwise}.\end{cases}

(53)

Based on this parameterization we can formulate the semi-blind VAE-based estimators as

\displaystyle\hat{{\bm{h}}}_{\text{proj. VAE}}=\;

\displaystyle\bm{\mu}_{\bm{\theta}}({\bm{z}})+{\bm{C}}_{\bm{\theta}}({\bm{z}})% \left({\bm{C}}_{\bm{\theta}}({\bm{z}})+{\bm{C}}_{\tilde{{\bm{n}}}}\right)^{-1}% (\tilde{{\bm{y}}}-\bm{\mu}_{\bm{\theta}}({\bm{z}})),

(54)

and

	$\displaystyle\hat{{\bm{h}}}_{\text{sub. VAE}}=\;$	$\displaystyle{\bm{V}}{\bm{V}}^{\mathrm{H}}{\bm{C}}_{\bm{\theta}}({\bm{z}}){\bm% {V}}\left({\bm{V}}^{\mathrm{H}}{\bm{C}}_{\bm{\theta}}({\bm{z}}){\bm{V}}+\sigma% ^{2}\mathbf{I}_{J}\right)^{-1}$
		$\displaystyle\times({\bm{V}}^{\mathrm{H}}{\bm{y}}_{p}-{\bm{V}}^{\mathrm{H}}\bm% {\mu}_{\bm{\theta}}({\bm{z}}))-{\bm{V}}{\bm{V}}^{\mathrm{H}}\bm{\mu}_{\bm{% \theta}}({\bm{z}}).$		(55)

The respective estimators are summarized in Algorithm 3 and Algorithm 4. For a more detailed introduction into the VAE framework and its usage for parameterization of the CME we refer the reader to [32].

Algorithm 3 Subspace VAE Channel Estimator

Offline Training Phase

1:Training dataset

\mathcal{H}=\{{\bm{h}}_{t}\}_{t=1}^{T}

2:Fit the VAE by optimizing the ELBO, cf. [32]

{\bm{Y}}=[{\bm{y}}(1),\dots,{\bm{y}}(N)]

{\bm{P}}

\sigma^{2}

\hat{{\bm{C}}}_{{\bm{y}}\mid{\bm{H}}}\leftarrow\frac{1}{N}{\bm{Y}}{\bm{Y}}^{% \mathrm{H}}

\hat{{\bm{V}}}\leftarrow J

dominant eigenvectors of

\hat{{\bm{C}}}_{{\bm{y}}\mid{\bm{H}}}

{\bm{Y}}_{p}=[{\bm{y}}_{p,1},\dots,{\bm{y}}_{p,J}]\leftarrow{\bm{Y}}^{\prime}_% {p}{\bm{P}}^{\mathrm{H}}

8:for

j=1,\dots,J

\tilde{{\bm{y}}}\leftarrow{\bm{V}}{\bm{V}}^{\mathrm{H}}{\bm{y}}_{p,j}

10:

\bm{\mu}_{\bm{\theta}}({\bm{z}}),{\bm{C}}_{\bm{\theta}}({\bm{z}})\leftarrow% \mathrm{VAE}(\tilde{{\bm{y}}})

11:

\hat{{\bm{h}}}_{j}\leftarrow{\bm{V}}{\bm{V}}^{\mathrm{H}}{\bm{C}}_{\bm{\theta}% }({\bm{z}}){\bm{V}}\left({\bm{V}}^{\mathrm{H}}{\bm{C}}_{\bm{\theta}}({\bm{z}})% {\bm{V}}+\sigma^{2}\mathbf{I}_{J}\right)^{-1}

12:

\times({\bm{V}}^{\mathrm{H}}{\bm{y}}_{p}-{\bm{V}}^{\mathrm{H}}\bm{\mu}_{\bm{% \theta}}({\bm{z}}))-{\bm{V}}{\bm{V}}^{\mathrm{H}}\bm{\mu}_{\bm{\theta}}({\bm{z% }})

13:end for

14:return

\hat{{\bm{h}}}_{j},\forall j=1,\dots,J

Algorithm 4 Projected VAE Channel Estimator

Offline Training Phase

1:Training dataset

\mathcal{H}=\{{\bm{h}}_{t}\}_{t=1}^{T}

2:Fit the VAE by optimizing the ELBO, cf. [32]

{\bm{Y}}=[{\bm{y}}(1),\dots,{\bm{y}}(N)]

{\bm{P}}

\sigma^{2}

\hat{{\bm{C}}}_{{\bm{y}}\mid{\bm{H}}}\leftarrow\frac{1}{N}{\bm{Y}}{\bm{Y}}^{% \mathrm{H}}

\hat{{\bm{V}}}\leftarrow J

dominant eigenvectors of

\hat{{\bm{C}}}_{{\bm{y}}\mid{\bm{H}}}

{\bm{Y}}_{p}=[{\bm{y}}_{p,1},\dots,{\bm{y}}_{p,J}]\leftarrow{\bm{Y}}^{\prime}_% {p}{\bm{P}}^{\mathrm{H}}

8:for

j=1,\dots,J

\tilde{{\bm{y}}}\leftarrow{\bm{V}}{\bm{V}}^{\mathrm{H}}{\bm{y}}_{p,j}

10:

\bm{\mu}_{\bm{\theta}}({\bm{z}}),{\bm{C}}_{\bm{\theta}}({\bm{z}})\leftarrow% \mathrm{VAE}(\tilde{{\bm{y}}})

11:

\hat{{\bm{h}}}_{j}\leftarrow\bm{\mu}_{\bm{\theta}}({\bm{z}})+{\bm{C}}_{\bm{% \theta}}({\bm{z}})\left({\bm{C}}_{\bm{\theta}}({\bm{z}})+{\bm{C}}_{\tilde{{\bm% {n}}}}\right)^{-1}(\tilde{{\bm{y}}}-\bm{\mu}_{\bm{\theta}}({\bm{z}}))

12:

\times(\tilde{{\bm{y}}}-\bm{\mu}_{\bm{\theta}}({\bm{z}}))

13:end for

14:return

\hat{{\bm{h}}}_{j},\forall j=1,\dots,J

IV-C Maximum Likelihood Subspace Estimation

After introducing the methods utilizing the additional subspace information provided by $\mathrm{range}({\bm{V}})$ to enhance the CSI estimation quality, let us consider the estimation of such a subspace. As the received data symbols are also transmitted over the same channel, they can be used to estimate the subspace containing all user channels.

To this end, let us reconsider the ML estimate of ${\bm{H}}$ in (13). Instead of directly optimizing on this ML formulation as done in [16], which generally does not result in the MMSE, we only take this log-likelihood formulation as an intermediate step to estimate the subspace $\mathrm{range}({\bm{V}})$ . First, let us consider the right term of the objective function in (13). We can then reformulate the problem by again solving for ${\bm{X}}$ first and reinserting the solution resulting in [11]

\displaystyle\max_{\bm{H}}\;\mathrm{tr}\left({\bm{P}}_{\bm{H}}\hat{{\bm{C}}}^{% (d)}_{{\bm{y}}\mid{\bm{H}}}\right),

(56)

where ${\bm{P}}_{\bm{H}}={\bm{H}}({\bm{H}}^{\mathrm{H}}{\bm{H}})^{-1}{\bm{H}}^{% \mathrm{H}}={\bm{V}}{\bm{V}}^{\mathrm{H}}$ and $\hat{{\bm{C}}}^{(d)}_{{\bm{y}}\mid{\bm{H}}}=\frac{1}{N-J}{\bm{Y}}_{d}{\bm{Y}}_% {d}^{\mathrm{H}}$ , with ${\bm{Y}}_{d}$ from (2). The maximization in (56) is solved by setting ${\bm{P}}_{\bm{H}}$ equal to $\hat{{\bm{V}}}_{d}\hat{{\bm{V}}}_{d}^{\mathrm{H}}$ with $\hat{{\bm{V}}}_{d}$ holding the $J$ dominant eigenvectors of the receive sample covariance matrix $\hat{{\bm{C}}}^{(d)}_{{\bm{y}}\mid{\bm{H}}}$ . This result has also been used in [26]. Additionally, it is trivial to see that the first term in (13) is minimized by ${\bm{h}}_{n}={\bm{y}}(n)$ . The subspace spanned by the solution ${\bm{h}}_{n}={\bm{y}}(n)$ is the same as the subspace spanned by the $J$ eigenvectors of the sample covariance matrix $\hat{{\bm{C}}}^{(p)}_{{\bm{y}}\mid{\bm{H}}}=\frac{1}{J}{\bm{Y}}_{p}{\bm{Y}}_{p% }^{\mathrm{H}}$ , which ignores the additional phase information contained in the pilot observation. Thus, the overall subspace estimate $\hat{{\bm{V}}}=[{\bm{v}}_{1},\dots,{\bm{v}}_{J}]$ is found by taking the $J$ dominant eigenvectors ${\bm{v}}_{j}$ of the sample covariance matrix defined as

\displaystyle\hat{{\bm{C}}}_{{\bm{y}}\mid{\bm{H}}}=\frac{1}{N}{\bm{Y}}{\bm{Y}}% ^{\mathrm{H}}.

(57)

To utilize information from the previous coherence intervals, one can adaptively update the subspace using efficient tracking algorithms as proposed in, e.g., [44, 45].

IV-D Complexity Analysis

The standalone GMM estimator proposed by [30] precomputes the filters used for the individual components, resulting in a complexity of $\mathcal{O}(KM^{2})$ . For the standalone VAE the complexity is given as $\mathcal{O}(DM^{2})$ [32], where $D$ denotes the number of layers in the forward pass of the VAE. For our semi-blind methods, the calculation of the subspace requires $\mathcal{O}((N+J)M^{2})$ . This results from calculating the sample covariance matrix with $\mathcal{O}(NM^{2})$ and taking the eigenvectors of the $J$ largest eigenvalues for the solution of (56). Using the projection approximation subspace tracking (PAST) algorithm [45], the computational complexity of calculating the subspace reduces to $\mathcal{O}(JM)$ for every update. In the case of the subspace GMM the $K$ LMMSE estimates can not be precomputed, which results in a complexity of $\mathcal{O}(K(M^{2}+JM^{2}+J^{3}))$ . Similarily, the subspace VAE exhibits a complexity of $\mathcal{O}(DM^{2}+JM^{2}+J^{3})$ . For the projected versions of the GMM and VAE the complexity becomes $\mathcal{O}(KM^{2}+JM^{2})$ and $\mathcal{O}(DM^{2}+JM^{2})$ , respectively. One should note that the calculation for each of the $K$ components in the GMM can be parallelized. Similarly, the computations in the convolutional layers of the VAE can be parallelized, mitigating the complexity.

V Baseline estimators

To compare our methods, the following baseline channel estimators are considered. Based on the found subspace $\mathrm{range}({\bm{V}})$ we can formulate the pilot-based ML estimator as $\hat{{\bm{h}}}_{\text{ML}}={\bm{V}}{\bm{V}}^{\mathrm{H}}{\bm{y}}_{p}$ , which is the closed-form solution to (18). This can be interpreted as the subspace-adjusted version of the conventional least squares (LS) channel estimator given as $\hat{{\bm{h}}}_{\text{LS}}={\bm{y}}_{p}$ .

Another estimator is based on the sample covariance matrix, which we can compute from the training data set ${\mathcal{H}}$ to infer the global statistics of the channels as

\displaystyle{\bm{C}}_{s}=\frac{1}{|{\mathcal{H}}|}\sum_{{{\bm{h}}}\in{% \mathcal{H}}}{{\bm{h}}}{{\bm{h}}}^{\mathrm{H}}.

(58)

We can use the matrix ${\bm{C}}$ as statistical prior to parameterize the semi-blind channel estimators outlined in Section III-A and Section III-B as

\displaystyle\hat{{\bm{h}}}_{\text{sub. s-cov}}={\bm{V}}{\bm{V}}^{\mathrm{H}}{% \bm{C}}_{s}{\bm{V}}\left({\bm{V}}^{\mathrm{H}}{\bm{C}}_{s}{\bm{V}}+\sigma^{2}% \mathbf{I}_{J}\right)^{-1}{\bm{V}}^{\mathrm{H}}{\bm{y}}_{p},

(59)

and

\displaystyle\hat{{\bm{h}}}_{\text{proj. s-cov}}={\bm{C}}_{s}\left({\bm{C}}_{s% }+{\bm{C}}_{\tilde{{\bm{n}}}}\right)^{-1}{\bm{P}}_{\bm{H}}{\bm{y}}_{p}.

(60)

Lastly, we compare our proposed methods to two iterative algorithms optimizing the ML formulation in (13), namely the EM from [16] and a MP variant similar to [17], which we run both until convergence or $500$ iteration, whatever comes first.

VI Numerical Simulations

To evaluate our proposed methods, we use channel realizations, which are normalized with $\mathbb{E}\left[\|{\bm{h}}\|^{2}\right]=M$ . Thus, we can define the SNR $=\frac{1}{\sigma^{2}}$ . Further, the normalized MSE (NMSE) defined as

\displaystyle\text{NMSE}=\frac{1}{ML}\sum_{\ell=1}^{L}\|{\bm{h}}_{\ell}-\hat{{% \bm{h}}}_{\ell}\|^{2},

(61)

is used to characterize the performance of the estimators based on $L=10^{3}$ unseen channel samples stemming from the channel models detailed in Sections II-A and II-B. The assumption of spatial uncorrelated channels, as used in (25), only holds for the spatial channel model of Section II-A. For the case of the measurement campaign described in Section II-B, we approximate the noise covariance matrix (24) as

\displaystyle{\bm{C}}_{\tilde{{\bm{n}}}}\approx\sigma^{2}\frac{J}{M}\mathbf{I}% _{M}.

(62)

We use $\mathcal{H}=\{{\bm{h}}_{t}\}^{T}_{t=1}$ with $T=1.5\cdot 10^{5}$ training samples from the respective channel model to train the GMM and VAE, where we set the number of components to $K=64$ and the latent dimension to $Z=32$ , respectively. Further, in the case of the VAE we allow non-zero values for $\bm{\mu}_{\bm{\theta}}({\bm{z}})$ , as we use the circulant approximation for the spatial channel model and the block-Toeplitz property is not perfectly fulfilled for the measurement data, due to hardware imperfections. The “s-cov” variants (“sub. s-cov” and “proj. s-cov”) utilize the same training samples. The number of BS antenna is set to $M=64$ , cf. Sections II-A and II-B, serving $J=8=M/8$ number of users, a representative operating point [37, Chap. 1.3.3]. Further, the number of snapshots is set to $N=200$ , if not stated otherwise, corresponding to a scenario that allows high channel dispersion and high mobility, e.g., up to $135$ kph, c.f. [37, Chap. 2.1]. The sent symbols during data transmission generally stem from a discrete constellation, e.g., QPSK, $16$ -QAM. For this work, we utilize Gaussian symbols with $x_{j}(n)\sim\mathcal{N}_{\mathbb{C}}(0,P_{j}=1/J)$ such that $\sum_{j=1}^{J}P_{j}=1$ . Using a continuous symbol constellation has a negligible effect on the results of the simulations, as also previously observed in [16].

Fig. 3a and Fig. 3b show the performance of the different channel estimation methods with respect to the SNR for the spatial channel model (cf. Section II-A) and measurement data (cf. Section II-B), respectively. One can see that the semi-blind methods utilizing the CGLMs perform the best across the whole SNR. The projected variant slightly outperforms its subspace counterpart for most SNR values, which follows the derivations in Section III. Interestingly, the order of the semi-blind GMM and semi-blind VAE depends on the utilized channel model. In Fig. 3a, the projected GMM and VAE show both the best overall result, whereas in Fig. 3b, the projected GMM outperforms all other estimators. Additionally, we see that in Fig. 3a, the subspace VAE outperforms the subspace GMM and in Fig. 3b, the results are vice versa. This ordering follows the ordering of the standalone version, where the plain GMM is better than the plain VAE in the case of the measurement data and worse for the spatial channel model. For high SNR values, all semi-blind variants approach each other except the EM and MP methods. In Fig. 3a, the EM and MP drastically improve from $15$ dB to $20$ dB showing similar performance at $20$ dB as the other semi-blind methods, whereas, in Fig. 3b, they show inferior results also for high SNR. The CGLM-based approaches keep a slight advantage even for high SNR values, which can be attributed to the fact that prior information is beneficial even for high SNR. A notable observation is that, in the mid-SNR range, the semi-blind CGLM variants outperform all related estimators by roughly $3$ dB.

For our proposed strategies, the accuracy of the estimated subspace $\mathrm{range}(\hat{{\bm{V}}})$ influences the performance and, hence, the NMSE depends on the number of snapshots $N$ as shown in Fig. 4. We see that for an increasing number of snapshots, the NMSE of our proposed methods decreases. In the case of the spatial channel model (Fig. 4a) the projected GMM and VAE perform best for low numbers of snapshots, where the standalone VAE surpasses all other methods for $N=20$ . Additionally, we observe in Fig. 4a that for high $N$ , the subspace VAE becomes the best of all considered methods. In Fig. 4b, we observe again that for the measurement data, the semi-blind GMM variants perform the best, where for high $N$ the subspace GMM and low $N$ the projected GMM outperforms all other methods. Again, for less than $30$ snapshots, the semi-blind methods are outperformed by the standalone GMM and VAE due to inaccuracies in estimating the subspace with a low number of payload data symbols. For both utilized channel models, the subspace variant of the superior CGLM surpasses the projected variant for high numbers of snapshots converging to a lower error level. Thus, in practice, where, in general, uncorrelated Rayleigh fading is not the case, there are cases where the subspace CGLM outperforms its projected counterpart.

A critical decrease in performance can be observed for the EM and MP algorithms. Here, the NMSE increases after a certain point when increasing the number of snapshots. Even though the minimum appears at different $N$ , the overall behavior exhibits similarities. This is because both methods optimize the joint ML formulation in (13), where the optimization of the second term becomes dominant for a high number of snapshots. Hence, the impact of the pilot observations relevant to estimating the phase of the channel vanishes.

The dimension of the subspace $\mathrm{range}(\hat{{\bm{V}}})$ directly influences the estimation quality of the proposed methods as shown in Fig. 5a and Fig. 5b. For example, in the extreme case where the number of users in the system is equal to the number of BS antennas ( $J=M$ ), the solution to (56) becomes $\hat{{\bm{V}}}\hat{{\bm{V}}}^{\mathrm{H}}=\mathbf{I}$ and, hence, as the number of users in the system increases all semi-blind estimators approach their respective purely pilot based version. We restrict our simulations within the interval of $J\leq M/4=16$ , which is said to be the preferred operating regime in massive MIMO [37, Chap. 1.3.3], and set the number of snapshots to $N=200$ . In the case of a single user, all semi-blind variants exhibit similar performance, except for the subspace sample covariance estimator, the subspace VAE, and in the case of the spatial channel model (Fig. 5a) the subspace GMM. For all other considered numbers of users, the proposed projected CGLM methods outperform all other channel estimators. Additionally, for the spatial channel model in Fig. 5a, the subspace GMM also shows inferior results to the other CGLM-based methods for all numbers of users.

Overall, we can conclude that the proposed semi-blind CGLMs show superior channel estimation performance across all different setups. Depending on the used channel model, either the semi-blind GMMs or the semi-blind VAEs result in slightly better NMSE, where only in the case of the spatial channel model, the subspace GMM shows slightly worse performance compared to the other proposed methods. Moreover, the projected CGLMs outperform their respective subspace counterpart for most simulated operating points, showing the superiority of the proposed projection method.

VII Conclusion

This work presented a novel semi-blind channel estimation technique based on the class of CGLMs. To this end, two methods are discussed that incorporate subspace knowledge about the channel into the well-known LMMSE estimator. Both methods exploit the estimated subspace derived from the dominant eigenvectors of sample covariance matrices constructed using the received symbols. A theoretical analysis of the methods showed the superior estimation quality of the proposed projection-based estimator for uncorrelated Rayleigh fading channels. Secondly, we showed how two examples from the class of CGLMs, i.e., the GMM and VAE, can be used to parameterize these estimators. Extensive simulations based on real-world measurement and spatial channel model data demonstrated the superior estimation performance of the proposed methods compared to standard semi-blind channel estimators.

-A MSE of Projected LMMSE

For any linear estimator $\hat{{\bm{h}}}={\bm{W}}{\bm{y}}_{p}$ , the MSE is given as

	$\displaystyle\mathrm{MSE}$	$\displaystyle=\mathbb{E}\left[\\|{\bm{h}}-\hat{{\bm{h}}}\\|^{2}\right]=\mathbb{E% }\left[\mathrm{tr}\left(\left({\bm{h}}-\hat{{\bm{h}}}\right)\left({\bm{h}}^{% \mathrm{H}}-\hat{{\bm{h}}}^{\mathrm{H}}\right)\right)\right]$		(63)
		$\displaystyle=\mathbb{E}\left[\mathrm{tr}({\bm{h}}{\bm{h}}^{\mathrm{H}})-2% \mathrm{tr}({\bm{h}}{\bm{y}}^{\mathrm{H}}{\bm{W}})+\mathrm{tr}({\bm{W}}{\bm{y}% }{\bm{y}}^{\mathrm{H}}{\bm{W}}^{\mathrm{H}})\right].$		(64)

For the case of the projected LMMSE the second term in (64) can be rewritten as

	$\displaystyle\mathbb{E}\left[\mathrm{tr}\left({\bm{h}}\tilde{{\bm{y}}}^{% \mathrm{H}}{\bm{W}}^{\mathrm{H}}\right)\right]$		(65)
	$\displaystyle\quad\quad=\mathbb{E}\left[\mathrm{tr}\left({\bm{h}}{\bm{h}}^{% \mathrm{H}}{\bm{W}}^{\mathrm{H}}\right)\right]$		(66)
	$\displaystyle\quad\quad=\mathbb{E}\left[\mathrm{tr}\left({\bm{h}}{\bm{h}}^{% \mathrm{H}}\left({\bm{C}}+\sigma^{2}\frac{J}{M}\mathbf{I}_{M}\right)^{-1}{\bm{% C}}\right)\right]$		(67)
	$\displaystyle\quad\quad=\mathrm{tr}\left({\bm{C}}\left({\bm{C}}+\sigma^{2}% \frac{J}{M}\mathbf{I}_{M}\right)^{-1}{\bm{C}}\right).$		(68)

Similarly for the third term in (64) we have

	$\displaystyle\mathbb{E}\left[\mathrm{tr}\left({\bm{W}}\tilde{{\bm{y}}}\tilde{{% \bm{y}}}^{\mathrm{H}}{\bm{W}}^{\mathrm{H}}\right)\right]$		(69)
	$\displaystyle\quad\quad=\mathbb{E}\left[\mathrm{tr}\left({\bm{W}}\left({\bm{h}% }{\bm{h}}^{\mathrm{H}}+\tilde{{\bm{n}}}\tilde{{\bm{n}}}^{\mathrm{H}}\right){% \bm{W}}^{\mathrm{H}}\right)\right]$		(70)
	$\displaystyle\quad\quad=\mathbb{E}\bigg{[}\mathrm{tr}\bigg{(}{\bm{C}}\left({% \bm{C}}+\sigma^{2}\frac{J}{M}\mathbf{I}_{M}\right)^{-1}\left({\bm{h}}{\bm{h}}^% {\mathrm{H}}+\tilde{{\bm{n}}}\tilde{{\bm{n}}}^{\mathrm{H}}\right)$
	$\displaystyle\quad\quad\quad\times\left({\bm{C}}+\sigma^{2}\frac{J}{M}\mathbf{% I}_{M}\right)^{-1}{\bm{C}}\bigg{)}\bigg{]}$		(71)
	$\displaystyle\quad\quad=\mathrm{tr}\left({\bm{C}}\left({\bm{C}}+\sigma^{2}% \frac{J}{M}\mathbf{I}_{M}\right)^{-1}{\bm{C}}\right),$		(72)

where we assume that $\mathbb{E}\left[\tilde{{\bm{n}}}\tilde{{\bm{n}}}^{\mathrm{H}}\right]=\sigma^{2% }\frac{J}{M}\mathbf{I}_{M}$ . From this the overall MSE in (28) follows directly.

-B MSE of Subspace LMMSE for Rayleigh Fading

In the case of uncorrelated Rayleigh fading the subspace LMMSE filter is given as

\displaystyle{\bm{W}}_{\mathrm{sub}}=\frac{1}{1+\sigma^{2}}{\bm{V}}{\bm{V}}^{% \mathrm{H}}.

(73)

Using this filter the second term in (64) can be rewritten as

	$\displaystyle\mathbb{E}\left[\mathrm{tr}\left({\bm{h}}{\bm{y}}^{\mathrm{H}}{% \bm{W}}_{\mathrm{sub}}^{\mathrm{H}}\right)\right]$		(74)
	$\displaystyle\quad\quad=\mathbb{E}_{\bm{h}}\left[\mathbb{E}\left[\mathrm{tr}% \left({\bm{h}}{\bm{h}}^{\mathrm{H}}{\bm{W}}_{\mathrm{sub}}^{\mathrm{H}}\right)% \mid{\bm{h}}\right]\right]$		(75)
	$\displaystyle\quad\quad=\frac{1}{1+\sigma^{2}}\mathbb{E}_{\bm{h}}\left[\mathbb% {E}\left[\mathrm{tr}\left({\bm{h}}{\bm{h}}^{\mathrm{H}}{\bm{V}}{\bm{V}}^{% \mathrm{H}}\right)\mid{\bm{h}}\right]\right]$		(76)
	$\displaystyle\quad\quad=\frac{1}{1+\sigma^{2}}\mathbb{E}_{\bm{h}}\left[\mathrm% {tr}\left({\bm{h}}{\bm{h}}^{\mathrm{H}}\right)\right]$		(77)
	$\displaystyle\quad\quad=\frac{1}{1+\sigma^{2}}M.$		(78)

Similarly for the third term in (64) we have

	$\displaystyle\mathbb{E}\left[\mathrm{tr}\left({\bm{W}}_{\mathrm{sub}}{\bm{y}}{% \bm{y}}^{\mathrm{H}}{\bm{W}}_{\mathrm{sub}}^{\mathrm{H}}\right)\right]$		(79)
	$\displaystyle\quad\quad=\mathbb{E}_{\bm{h}}\left[\mathbb{E}\left[\mathrm{tr}% \left({\bm{W}}_{\mathrm{sub}}{\bm{h}}{\bm{h}}^{\mathrm{H}}{\bm{W}}_{\mathrm{% sub}}^{\mathrm{H}}\right)\mid{\bm{h}}\right]\right]$
	$\displaystyle\quad\quad\quad+\mathbb{E}_{\bm{h}}\left[\mathbb{E}\left[\mathrm{% tr}\left({\bm{W}}_{\mathrm{sub}}{\bm{n}}{\bm{n}}^{\mathrm{H}}{\bm{W}}_{\mathrm% {sub}}^{\mathrm{H}}\right)\mid{\bm{h}}\right]\right]$		(80)
	$\displaystyle\quad\quad=\frac{1}{(1+\sigma^{2})^{2}}\Big{[}\mathbb{E}_{\bm{h}}% \left[\mathbb{E}\left[\mathrm{tr}\left({\bm{V}}{\bm{V}}^{\mathrm{H}}{\bm{h}}{% \bm{h}}^{\mathrm{H}}{\bm{V}}{\bm{V}}^{\mathrm{H}}\right)\mid{\bm{h}}\right]\right]$
	$\displaystyle\quad\quad\quad+\mathbb{E}_{\bm{h}}\left[\mathbb{E}\left[\mathrm{% tr}\left({\bm{V}}{\bm{V}}^{\mathrm{H}}{\bm{n}}{\bm{n}}^{\mathrm{H}}{\bm{V}}{% \bm{V}}^{\mathrm{H}}\right)\mid{\bm{h}}\right]\right]\Big{]}$		(81)
	$\displaystyle\quad\quad=\frac{1}{(1+\sigma^{2})^{2}}\left[\mathbb{E}_{\bm{h}}% \left[\mathrm{tr}\left({\bm{h}}{\bm{h}}^{\mathrm{H}}\right)\right]+\mathbb{E}_% {\bm{h}}\left[\sigma^{2}\mathrm{tr}\left({\bm{V}}{\bm{V}}^{\mathrm{H}}\right)% \right]\right]$		(82)
	$\displaystyle\quad\quad=\frac{1}{(1+\sigma^{2})^{2}}(M+J\sigma^{2}).$		(83)

The overall MSE of the subspace variant for ${\bm{C}}=\mathbf{I}_{M}$ is then

	$\displaystyle\mathrm{MSE}_{\mathrm{iid}}^{\mathrm{sub}}$	$\displaystyle=M-2\frac{1}{1+\sigma^{2}}M+\frac{1}{(1+\sigma^{2})^{2}}(M+J% \sigma^{2})$		(84)
		$\displaystyle=\frac{\sigma^{2}(M\sigma^{2}+J)}{(1+\sigma^{2})^{2}}.$		(85)

References

[1] F. Weißer, N. Turan, D. Semmler, and W. Utschick, “Data-aided channel estimation utilizing Gaussian mixture models,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 8886–8890.
[2] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta, O. Edfors, and F. Tufvesson, “Scaling up MIMO: Opportunities and challenges with very large arrays,” IEEE Signal Processing Magazine, vol. 30, no. 1, pp. 40–60, 2013.
[3] Y. Kabalci, 5G Mobile Communication Systems: Fundamentals, Challenges, and Key Technologies. Singapore: Springer, 2019, pp. 329–359. [Online]. Available: https://doi.org/10.1007/978-981-13-1768-2_10
[4] H. Ye, G. Y. Li, and B.-H. Juang, “Power of deep learning for channel estimation and signal detection in OFDM systems,” IEEE Wireless Communications Letters, vol. 7, no. 1, pp. 114–117, 2018.
[5] Z. Shang, T. Zhang, G. Hu, Y. Cai, and W. Yang, “Secure transmission for NOMA-based cognitive radio networks with imperfect CSI,” IEEE Communications Letters, vol. 25, no. 8, pp. 2517–2521, 2021.
[6] H. Harkat, P. Monteiro, A. Gameiro, F. Guiomar, and H. Farhana Thariq Ahmed, “A survey on MIMO-OFDM systems: Review of recent trends,” Signals, vol. 3, no. 2, pp. 359–395, 2022.
[7] S. C and J. Sandeep, “A review of channel estimation mechanisms in wireless communication networks,” in 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA), 2021, pp. 603–608.
[8] T. L. Marzetta, “How much training is required for multiuser MIMO?” in 2006 Fortieth Asilomar Conference on Signals, Systems and Computers, 2006, pp. 359–363.
[9] E. De Carvalho and D. Slock, “Cramer-Rao bounds for semi-blind, blind and training sequence based channel estimation,” in First IEEE Signal Processing Workshop on Signal Processing Advances in Wireless Communications, 1997, pp. 129–132.
[10] ——, “Asymptotic performance of ML methods for semi-blind channel estimation,” in Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136), vol. 2, 1997, pp. 1624–1628 vol.2.
[11] A. Medles and D. Slock, “Augmenting the training sequence part in semiblind estimation for MIMO channels,” in The Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2, 2003, pp. 1825–1829 Vol.2.
[12] J. Ma and L. Ping, “Data-aided channel estimation in large antenna systems,” IEEE Transactions on Signal Processing, vol. 62, no. 12, pp. 3111–3124, 2014.
[13] M. Joham, W. Utschick, J. A. Nossek, and M. D. Zoltowski, “Semi-blind channel estimation: a new least-squares approach,” in International Conference on Telecommunications, Cheju Island, Korea, 1999, pp. 416–420.
[14] S. Park, B. Shim, and J. W. Choi, “Iterative channel estimation using virtual pilot signals for MIMO-OFDM systems,” IEEE Transactions on Signal Processing, vol. 63, no. 12, pp. 3032–3045, 2015.
[15] D. Neumann, M. Joham, and W. Utschick, “Channel estimation in massive MIMO systems,” 2015. [Online]. Available: https://arxiv.org/abs/1503.08691
[16] E. Nayebi and B. D. Rao, “Semi-blind channel estimation for multiuser massive MIMO systems,” IEEE Transactions on Signal Processing, vol. 66, no. 2, pp. 540–553, 2018.
[17] Y. Liu, L. Brunel, and J. J. Boutros, “Joint channel estimation and decoding using Gaussian approximation in a factor graph over multipath channel,” in 2009 IEEE 20th International Symposium on Personal, Indoor and Mobile Radio Communications, 2009, pp. 3164–3168.
[18] S. Wu, L. Kuang, Z. Ni, D. Huang, Q. Guo, and J. Lu, “Message-passing receiver for joint channel estimation and decoding in 3D massive MIMO-OFDM systems,” IEEE Transactions on Wireless Communications, vol. 15, no. 12, pp. 8122–8138, 2016.
[19] A. Mehrotra, S. Srivastava, A. K. Jagannatham, and L. Hanzo, “Data-aided CSI estimation using affine-precoded superimposed pilots in orthogonal time frequency space modulated MIMO systems,” IEEE Transactions on Communications, vol. 71, no. 8, pp. 4482–4498, 2023.
[20] A. Osinsky, A. Ivanov, D. Lakontsev, R. Bychkov, and D. Yarotsky, “Data-aided LS channel estimation in massive MIMO turbo-receiver,” in 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), 2020, pp. 1–5.
[21] M. Liu, M. Crussiere, and J.-F. Helard, “A novel data-aided channel estimation with reduced complexity for TDS-OFDM systems,” IEEE Transactions on Broadcasting, vol. 58, no. 2, pp. 247–260, 2012.
[22] I. Khan, M. Cheffena, and M. M. Hasan, “Data aided channel estimation for MIMO-OFDM wireless systems using reliable carriers,” IEEE Access, vol. 11, pp. 47 836–47 847, 2023.
[23] T.-K. Kim, Y.-S. Jeon, J. Li, N. Tavangaran, and H. V. Poor, “Semi-data-aided channel estimation for MIMO systems via reinforcement learning,” IEEE Transactions on Wireless Communications, vol. 22, no. 7, pp. 4565–4579, 2023.
[24] I. Khan, M. M. Hasan, and M. Cheffena, “A novel low-complexity peak-power-assisted data-aided channel estimation scheme for MIMO-OFDM wireless systems,” 2024. [Online]. Available: https://arxiv.org/abs/2410.05722
[25] N. Zilberstein, A. Swami, and S. Segarra, “Joint channel estimation and data detection in massive MIMO systems based on diffusion models,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 13 291–13 295.
[26] Y. Deng and T. Ohtsuki, “Low-complexity subspace MMSE channel estimation in massive MU-MIMO system,” IEEE Access, vol. 8, pp. 124 371–124 381, 2020.
[27] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1993.
[28] J. Yang, X. Liao, X. Yuan, P. Llull, D. J. Brady, G. Sapiro, and L. Carin, “Compressive sensing by learning a Gaussian mixture model from measurements,” IEEE Transactions on Image Processing, vol. 24, no. 1, pp. 106–119, 2015.
[29] D. Neumann, T. Wiese, and W. Utschick, “Learning the MMSE channel estimator,” IEEE Transactions on Signal Processing, vol. 66, no. 11, pp. 2905–2917, 2018.
[30] M. Koller, B. Fesl, N. Turan, and W. Utschick, “An asymptotically MSE-optimal estimator based on Gaussian mixture models,” IEEE Transactions on Signal Processing, vol. 70, pp. 4109–4123, 2022.
[31] B. Fesl, N. Turan, and W. Utschick, “Low-rank structured MMSE channel estimation with mixtures of factor analyzers,” in 2023 57th Asilomar Conference on Signals, Systems, and Computers, 2023, pp. 375–380.
[32] M. Baur, B. Fesl, and W. Utschick, “Leveraging variational autoencoders for parameterized MMSE estimation,” IEEE Transactions on Signal Processing, vol. 72, pp. 3731–3744, 2024.
[33] N. Turan, B. Fesl, M. Koller, M. Joham, and W. Utschick, “A versatile low-complexity feedback scheme for FDD systems via generative modeling,” IEEE Transactions on Wireless Communications, vol. 23, no. 6, pp. 6251–6265, 2024.
[34] F. Weißer, D. Semmler, N. Turan, and W. Utschick, “Data-aided MU-MIMO channel estimation utilizing Gaussian mixture models,” in ICC 2024 - IEEE International Conference on Communications, 2024, pp. 6684–6689.
[35] 3GPP, “Spatial channel model for multiple input multiple output (MIMO) simulations,” 3rd Generation Partnership Project (3GPP), Tech. Rep. 25.996 v16.0.0, 2020.
[36] N. Turan, B. Fesl, M. Grundei, M. Koller, and W. Utschick, “Evaluation of a Gaussian mixture model-based channel estimator using measurement data,” in 2022 International Symposium on Wireless Communication Systems (ISWCS), 2022, pp. 1–6.
[37] E. Björnson, J. Hoydis, and L. Sanguinetti, “Massive MIMO networks: Spectral, energy, and hardware efficiency,” Foundations and Trends® in Signal Processing, vol. 11, no. 3-4, pp. 154–655, 2017.
[38] V. Milman and G. Schechtman, Asymptotic Theory Of Finite Dimensional Normed Spaces, ser. Lecture Notes in Mathematics. Springer, 1986, vol. 1200.
[39] B. Böck, M. Baur, N. Turan, D. Semmler, and W. Utschick, “A statistical characterization of wireless channels conditioned on side information,” IEEE Wireless Communications Letters, pp. 1–1, 2024.
[40] T. T. Nguyen, H. D. Nguyen, F. Chamroukhi, and G. J. McLachlan, “Approximation by finite mixtures of continuous density functions that vanish at infinity,” Cogent Mathematics & Statistics, vol. 7, no. 1, p. 1750861, 2020.
[41] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics). Berlin, Heidelberg: Springer-Verlag, 2006.
[42] D. P. Kingma and M. Welling, “An introduction to variational autoencoders,” Foundations and Trends® in Machine Learning, vol. 12, no. 4, pp. 307–392, 2019.
[43] M. Baur, B. Böck, N. Turan, and W. Utschick, “Variational autoencoder for channel estimation: Real-world measurement insights,” in 2024 27th International Workshop on Smart Antennas (WSA), 2024, pp. 117–122.
[44] W. Utschick, “Tracking of signal subspace projectors,” IEEE Transactions on Signal Processing, vol. 50, no. 4, pp. 769–778, 2002.
[45] B. Yang, “Projection approximation subspace tracking,” IEEE Transactions on Signal Processing, vol. 43, no. 1, pp. 95–107, 1995.