Near-Optimal MIMO Detection Using Gradient-Based MCMC in Discrete Spaces

Xingyu Zhou,  Le Liang,  Jing Zhang,  Chao-Kai Wen,  and Shi Jin,  X. Zhou, L. Liang, J. Zhang, and S. Jin are with the National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). L. Liang is also with the Purple Mountain Laboratories, Nanjing 211111, China.C.-K. Wen is with the Institute of Communications Engineering, National Sun Yat-sen University, Kaohsiung 80424, Taiwan (e-mail: [email protected]).
Abstract

The discrete nature of transmitted symbols poses challenges for achieving optimal detection in multiple-input multiple-output (MIMO) systems associated with a large number of antennas. Recently, the combination of two powerful machine learning methods, Markov chain Monte Carlo (MCMC) sampling and gradient descent, has emerged as a highly efficient solution to address this issue. However, existing gradient-based MCMC detectors are heuristically designed and thus are theoretically untenable. To bridge this gap, we introduce a novel sampling algorithm tailored for discrete spaces. This algorithm leverages gradients from the underlying continuous spaces for acceleration while maintaining the validity of probabilistic sampling. We prove the convergence of this method and also analyze its convergence rate using both MCMC theory and empirical diagnostics. On this basis, we develop a MIMO detector that precisely samples from the target discrete distribution and generates posterior Bayesian estimates using these samples, whose performance is thereby theoretically guaranteed. Furthermore, our proposed detector is highly parallelizable and scalable to large MIMO dimensions, positioning it as a compelling candidate for next-generation wireless networks. Simulation results show that our detector achieves near-optimal performance, significantly outperforms state-of-the-art baselines, and showcases resilience to various system setups.

Index Terms:
MIMO detection, Markov chain Monte Carlo, gradient descent, Langevin algorithms, discrete sample space.

I Introduction

Over the past two decades, the multiple-input multiple-output (MIMO) technology has played a crucial role in improving spectral efficiency and network throughput of wireless communication systems [1]. To enhance data rates and coverage even further, future generations of wireless networks are expected to witness an unprecedented rise in the number of antennas, leading to extra-large-scale MIMO systems [2, 3]. However, this dramatically increased number of antennas results in the curse of dimensionality in symbol detection—a significant roadblock to realizing the full potential of MIMO.

The optimal maximum a posteriori (MAP) detection involves exhaustive enumeration and is tractable only for the most trivial cases [4]. Sphere decoding (SD) has been introduced to reduce the complexity of MAP detection while approaching near-optimal performance [5, 6]. Nonetheless, the substantial tree searches still entail complexity that exponentially grows with the system dimension. Linear detectors, such as the linear minimum mean square error (MMSE) method, are widely acknowledged for their low complexity. However, they are prone to severe performance degradation in high-order modulation or correlated channels.

Posterior inference problems like MIMO detection generally involve multidimensional function integration or maximization and become challenging as the dimension increases. Recently, stochastic sampling techniques [7], also known as Monte Carlo methods, have shown great potential in addressing the curse of dimensionality in posterior inference. These methods sample from certain probability distributions of interest and generate reliable estimates based on the drawn samples. Among them, the Markov chain Monte Carlo (MCMC) method [8] has been popular and has found extensive applications in various communication signal processing tasks that can be regarded as posterior inferences [9, 10]. In MCMC, statistical inferences are developed by simulating a Markov chain to generate a sequence of samples that converge to the target posterior distribution and performing Monte Carlo summation using the converged samples. This approach allows for the reduction of the exponential complexity associated with the dimension to a polynomial level. In particular, for MIMO detection, the MCMC method is highly acclaimed for its hardware-friendly architecture [11] and attainment of remarkable performance with relatively low complexity [12, 13].

Discrete data is ubiquitous in real-world applications, ranging from genomes in biology and texts in linguistics to bits and symbols in wireless communications. Exact inference on this type of data, such as optimal MIMO detection, requires sampling from discrete distributions, which has long been recognized as more challenging compared to sampling from continuous distributions [14]. Gibbs sampling [15], a classical MCMC method, is well known as a generic solution to this task; however, this method is subject to the sequential update of variables and hence suffers from low efficiency when the dimension is large or the distribution is highly correlated among its variables. As data dimensions scale up, developing efficient sampling algorithms for discrete distributions is urgent.

In recent years, MCMC has been combined with the gradient descent method, formulating a promising machine learning framework for nonconvex optimization [16]. This gradient-based MCMC paradigm was initially developed for continuous spaces and has demonstrated enhanced efficiency than conventional sampling and optimization methods in solving inference problems. Owing to its superiority in continuous spaces, new research trends have revolved around generalizing gradient-based MCMC to discrete spaces. The core concept is to leverage gradients of the underlying continuous function to navigate the sampling toward the target discrete distribution. Based on this idea, gradients have been used to accelerate Gibbs sampling by indicating the variables that should be prioritized for updating [14]. To further improve sampling efficiency, the Langevin algorithm [17, 18], which is an advanced gradient-based MCMC method originally developed for continuous spaces, has been extended to discrete spaces in [19, 20]. Unlike the Gibbs sampler, the Langevin method updates all variables in parallel to enable large movements and utilizes gradients to direct the sampling to high probability regions of the target distribution, thus showcasing orders-of-magnitude improvements in efficiency.

Given that MIMO detection is a typical posterior inference problem within the discrete space of the transmitted symbols, it is intuitively appealing to apply advanced gradient-based MCMC methods that have been developed for discrete spaces. In fact, this application has been extensively explored in previous works such as [21, 22, 23, 24], achieving both exceptional performance and high efficiency. Specifically, Newton’s method [25] was leveraged to accelerate MCMC sampling in [21]. In their proposed detector, MCMC’s exploration was conducted along the preconditioned gradient descent direction in the continuous-relaxed space, followed by a Metropolis-Hastings (MH) correction [29]. Nevertheless, the derivation of the correction step was based on heuristics. In [22], additional improvements were investigated by employing Nesterov’s accelerated gradient method [26] to expedite MCMC sampling. This method avoided the computationally intensive matrix inversion required by Newton’s method and hence enhanced the scalability of the detector. Moreover, an annealed Langevin algorithm was developed in [23] by setting multiple noise levels with decreasing variances to mimic the discrete prior of the transmitted symbols. This approach enabled the computation of gradients and the transformation of the intricate discrete approximation into a simplified continuous approximation. Additionally, the Langevin-based MIMO detector was independently investigated in [24]. However, their proposed detector did not consider the discrete nature of symbols and is limited to a specific modulation scheme.

Despite the rapid development, a notable deficiency of existing gradient-based MCMC detectors [21, 22, 23, 24] is their lack of theoretical guarantees. These schemes rely on heuristic designs due to the challenges posed by exact sampling in discrete spaces [14, 19] and give no convergence guarantees. More importantly, as shown in this paper, these heuristics can undermine the reliability of MCMC by generating samples that deviate from the target distribution for MIMO detection. Such divergence can lead to substantial errors in inference and severe performance degradation. Hence, the development of a gradient-based MCMC method that ensures precise sampling from the target discrete distribution holds great significance.

In this paper, we propose a near-optimal MIMO detector based on gradient-based MCMC that exactly samples in discrete spaces, ensuring the validity of stochastic sampling. The proposed detector offers an efficient solution for computing posterior Bayesian estimates via the generation of important samples and the subsequent Monte Carlo summation. The performance is guaranteed by the convergence of the sampling algorithm and the law of large numbers behind Monte Carlo methods. We perform extensive numerical experiments to verify the effectiveness of our proposed detector. Results show that our method achieves near-optimal performance with a limited number of samples and significantly outperforms state-of-the-art detectors.111The term “near-optimal” in this context refers to the proposed detector’s ability, theoretically guaranteed, to approach optimal detection performance given sufficient computational resources. Importantly, the proposed detector offers significant complexity savings compared to the optimal MAP detector, rendering it a practically viable solution for MIMO systems with a large number of antennas. Moreover, our proposed detector exhibits remarkable robustness to various channel environments and resilience to imperfect channel state information (CSI). Additionally, our detector is scalable to large MIMO dimensions and highly parallelizable, making it a promising solution for extra-large-scale MIMO systems in the next-generation wireless networks. We summarize the contributions of this paper as follows.

  • Gradient-Based MCMC Sampling for Discrete Distributions: We have developed a discrete analog to the Metropolis adjusted Langevin algorithm (MALA) [18] for the target discrete distributions within the context of the MIMO detection problem. This novel algorithm, referred to as DMALA, achieves high-quality sampling from discrete distributions by leveraging gradients from the underlying continuous function for acceleration, while strictly adhering to the systematic steps of MCMC.

  • Convergence Proof and Analysis: We provide a theoretical proof of DMALA’s convergence and analyze its convergence rate. This theoretically guaranteed property distinguishes our proposed algorithm from heuristic sampling algorithms used in existing gradient-based MCMC detectors. Additionally, we offer empirical diagnostics of the convergence (rate) to validate the theoretical findings.

  • Achievement of Near-Optimal Detection: Utilizing DMALA, we propose a MIMO detector that initially samples from the target discrete distribution and then computes Bayesian estimates (soft decisions) using the converged samples. The method employed for computing soft decisions is meticulously designed in alignment with Monte Carlo theory, setting it apart from conventional heuristic approaches. Consequently, the performance of our proposed detector is theoretically underpinned by the convergence of DMALA and the law of large numbers inherent in Monte Carlo methods. This near-optimal performance has been further corroborated through extensive numerical studies.

Notations: Lowercase and uppercase boldface letters denote column vectors and matrices, respectively. 𝐀1superscript𝐀1\mathbf{A}^{-1}bold_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT and 𝐀Tsuperscript𝐀𝑇\mathbf{A}^{T}bold_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT represent the inverse and transpose of a matrix 𝐀𝐀\mathbf{A}bold_A, respectively. 𝐈Nsubscript𝐈𝑁\mathbf{I}_{N}bold_I start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT represents an N×N𝑁𝑁N\times Nitalic_N × italic_N identity matrix. \mathbb{R}blackboard_R is the set of real numbers. 𝔼[]𝔼delimited-[]\mathbb{E}[\cdot]blackboard_E [ ⋅ ] denotes the expectation operation. 𝒰(a,b)𝒰𝑎𝑏\mathcal{U}(a,b)caligraphic_U ( italic_a , italic_b ) denotes a uniform distribution between [a,b]𝑎𝑏[a,b][ italic_a , italic_b ]. 𝒩(μ,σ2)𝒩𝜇superscript𝜎2\mathcal{N}(\mu,\sigma^{2})caligraphic_N ( italic_μ , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) indicates a real-valued Gaussian distribution with mean μ𝜇\muitalic_μ and variance σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. F\|\cdot\|_{F}∥ ⋅ ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT denotes Frobenius norm, and \|\cdot\|∥ ⋅ ∥ denotes l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm. |||\cdot|| ⋅ | represents the cardinality of a set.

II System Model and Preliminaries

This section commences with an introduction to the system model of the MIMO detection problem. It then proceeds to provide an overview of the basic concepts underlying MCMC.

II-A System Model

Consider a MIMO system with Ntsubscript𝑁tN_{\rm t}italic_N start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT antennas for transmitting data streams and Nrsubscript𝑁rN_{\rm r}italic_N start_POSTSUBSCRIPT roman_r end_POSTSUBSCRIPT antennas for receiving. The message bits are encoded and interleaved to generate the transmitted codeword, which is then partitioned into bit vectors. Each bit vector 𝐛{±1}Nb𝐛superscriptplus-or-minus1subscript𝑁b\mathbf{b}\in\{\pm 1\}^{N_{\rm b}}bold_b ∈ { ± 1 } start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is mapped into quadrature amplitude modulation (QAM) symbols with unit power in average, constituting the transmitted vector. The equivalent real-valued model for the MIMO transmission is given by

𝐲=𝐇𝐱+𝐧,𝐲𝐇𝐱𝐧\mathbf{y}=\mathbf{Hx}+\mathbf{n},bold_y = bold_Hx + bold_n , (1)

where 𝐱𝒜N×1𝐱superscript𝒜𝑁1\mathbf{x}\in\mathcal{A}^{N\times 1}bold_x ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT denotes the equivalent real-valued symbol vector, where N=2Nt𝑁2subscript𝑁tN=2N_{\rm t}italic_N = 2 italic_N start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT and 𝒜𝒜\mathcal{A}caligraphic_A is the finite set of real-valued transmitted symbols with a cardinality of |𝒜|=Q𝒜𝑄|\mathcal{A}|=Q| caligraphic_A | = italic_Q. Therefore, we have Nb=Nlog2Qsubscript𝑁b𝑁subscript2𝑄N_{\rm b}=N\log_{2}Qitalic_N start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT = italic_N roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_Q.222We assume that all elements of 𝐱𝐱\mathbf{x}bold_x are selected from a common finite set 𝒜𝒜\mathcal{A}caligraphic_A, implying the use of a single modulation scheme. However, extending the proposed method to handle cases where elements of 𝐱𝐱\mathbf{x}bold_x are drawn from different discrete spaces is straightforward. Moreover, in (1), 𝐲M×1𝐲superscript𝑀1\mathbf{y}\in\mathbb{R}^{M\times 1}bold_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × 1 end_POSTSUPERSCRIPT is the received real-valued signal with M=2Nr𝑀2subscript𝑁rM=2N_{\rm r}italic_M = 2 italic_N start_POSTSUBSCRIPT roman_r end_POSTSUBSCRIPT, 𝐇M×N𝐇superscript𝑀𝑁\mathbf{H}\in\mathbb{R}^{M\times N}bold_H ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_N end_POSTSUPERSCRIPT is the real-valued channel matrix, and 𝐧M×1𝐧superscript𝑀1\mathbf{n}\in\mathbb{R}^{M\times 1}bold_n ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × 1 end_POSTSUPERSCRIPT is the noise vector whose elements independently follow 𝒩(0,σ2/2)𝒩0superscript𝜎22\mathcal{N}(0,\sigma^{2}/2)caligraphic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 ), where σ2/2superscript𝜎22\sigma^{2}/2italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 is the noise variance per real element.

Given the observation 𝐲𝐲\mathbf{y}bold_y and assuming that the channel 𝐇𝐇\mathbf{H}bold_H is known, the posterior distribution of 𝐱𝐱\mathbf{x}bold_x is given by the discrete distribution π(𝐱)=p(𝐱|𝐲)p(𝐱)p(𝐲|𝐱)𝜋𝐱𝑝conditional𝐱𝐲proportional-to𝑝𝐱𝑝conditional𝐲𝐱\pi(\mathbf{x})=p(\mathbf{x}|\mathbf{y})\propto p(\mathbf{x})p(\mathbf{y}|% \mathbf{x})italic_π ( bold_x ) = italic_p ( bold_x | bold_y ) ∝ italic_p ( bold_x ) italic_p ( bold_y | bold_x ), where p(𝐱)𝑝𝐱p(\mathbf{x})italic_p ( bold_x ) is the prior distribution, and p(𝐲|𝐱)𝑝conditional𝐲𝐱p(\mathbf{y}|\mathbf{x})italic_p ( bold_y | bold_x ) is the likelihood. When the prior distribution is uniform, the posterior distribution can be further expressed as

π(𝐱)=1Zexp(f(𝐱))n=1N𝕀xn𝒜,𝜋𝐱1𝑍𝑓𝐱superscriptsubscriptproduct𝑛1𝑁subscript𝕀subscript𝑥𝑛𝒜\displaystyle\pi(\mathbf{x})=\frac{1}{Z}\exp\big{(}f(\mathbf{x})\big{)}\prod_{% n=1}^{N}\mathbb{I}_{x_{n}\in\mathcal{A}},italic_π ( bold_x ) = divide start_ARG 1 end_ARG start_ARG italic_Z end_ARG roman_exp ( italic_f ( bold_x ) ) ∏ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT blackboard_I start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ caligraphic_A end_POSTSUBSCRIPT , (2)

where f(𝐱)𝑓𝐱f(\mathbf{x})italic_f ( bold_x ) is a metric function given by

f(𝐱)𝑓𝐱\displaystyle f(\mathbf{x})italic_f ( bold_x ) =1σ2𝐲𝐇𝐱2,absent1superscript𝜎2superscriptnorm𝐲𝐇𝐱2\displaystyle=-\frac{1}{\sigma^{2}}\|\mathbf{y}-\mathbf{Hx}\|^{2},= - divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_y - bold_Hx ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (3)

Z𝑍Zitalic_Z is a normalization constant whose computation involves multidimensional summation and is generally intractable, xnsubscript𝑥𝑛x_{n}italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT denotes the n𝑛nitalic_n-th entry of 𝐱𝐱\mathbf{x}bold_x, and 𝕀xn𝒜subscript𝕀subscript𝑥𝑛𝒜\mathbb{I}_{x_{n}\in\mathcal{A}}blackboard_I start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ caligraphic_A end_POSTSUBSCRIPT is an indicator function that takes the value of one only if xn𝒜subscript𝑥𝑛𝒜x_{n}\in\mathcal{A}italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ caligraphic_A and zero otherwise.

The MIMO detector forwards soft outputs in terms of log-likelihood ratios (LLRs) to the subsequent channel decoder. The optimal MAP detector computes the posterior LLR for the k𝑘kitalic_k-th element bksubscript𝑏𝑘b_{k}italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT of 𝐛𝐛\mathbf{b}bold_b as

Lk=logp(bk=+1|𝐲)p(bk=1|𝐲)=log𝐱𝒜k+N×1exp(f(𝐱))𝐱𝒜kN×1exp(f(𝐱)),subscript𝐿𝑘𝑝subscript𝑏𝑘conditional1𝐲𝑝subscript𝑏𝑘conditional1𝐲subscript𝐱subscriptsuperscript𝒜𝑁1limit-from𝑘𝑓𝐱subscript𝐱subscriptsuperscript𝒜𝑁1limit-from𝑘𝑓𝐱L_{k}=\log\frac{p(b_{k}=+1|\mathbf{y})}{p(b_{k}=-1|\mathbf{y})}=\log\frac{\sum% _{\mathbf{x}\in\mathcal{A}^{N\times 1}_{k+}}\exp\big{(}f(\mathbf{x})\big{)}}{% \sum_{\mathbf{x}\in\mathcal{A}^{N\times 1}_{k-}}\exp\big{(}f(\mathbf{x})\big{)% }},italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_log divide start_ARG italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = + 1 | bold_y ) end_ARG start_ARG italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = - 1 | bold_y ) end_ARG = roman_log divide start_ARG ∑ start_POSTSUBSCRIPT bold_x ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_f ( bold_x ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT bold_x ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k - end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_f ( bold_x ) ) end_ARG , (4)

where 𝒜k+N×1superscriptsubscript𝒜limit-from𝑘𝑁1\mathcal{A}_{k+}^{N\times 1}caligraphic_A start_POSTSUBSCRIPT italic_k + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT and 𝒜kN×1superscriptsubscript𝒜limit-from𝑘𝑁1\mathcal{A}_{k-}^{N\times 1}caligraphic_A start_POSTSUBSCRIPT italic_k - end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT denote the subsets of 𝒜N×1superscript𝒜𝑁1\mathcal{A}^{N\times 1}caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT for the transmitted vector 𝐱𝐱\mathbf{x}bold_x mapped from 𝐛𝐛\mathbf{b}bold_b, where bksubscript𝑏𝑘b_{k}italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT corresponds to +11+1+ 1 and 11-1- 1, respectively. This computation involves evaluating two a posteriori probabilities (APPs) that both require iteration over all possible values of 𝐛k=[b1,,bk1,bk+1,,bNb]Tsubscript𝐛𝑘superscriptsubscript𝑏1subscript𝑏𝑘1subscript𝑏𝑘1subscript𝑏subscript𝑁b𝑇\mathbf{b}_{-k}=[b_{1},\ldots,b_{k-1},b_{k+1},\ldots,b_{N_{\rm b}}]^{T}bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT = [ italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT , … , italic_b start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and the summation of 2Nb1superscript2subscript𝑁b12^{N_{\rm b}-1}2 start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_b end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT terms, causing prohibitive complexity for large N𝑁Nitalic_N and/or Q𝑄Qitalic_Q. Therefore, a low-complexity alternative should be developed.

II-B Basics of MCMC

We begin with the basics of Monte Carlo methods. Let X𝑋Xitalic_X denote a discrete random variable, possibly multidimensional, with a distribution g(X)𝑔𝑋g(X)italic_g ( italic_X ). Consider evaluating the expectation of some function h(X)𝑋h(X)italic_h ( italic_X ) with respect to g(X)𝑔𝑋g(X)italic_g ( italic_X ), i.e.,

𝔼g[h(X)]=x𝒳h(x)g(x),subscript𝔼𝑔delimited-[]𝑋subscript𝑥𝒳𝑥𝑔𝑥\mathbb{E}_{g}[h(X)]=\sum_{x\in\mathcal{X}}h({x})g(x),blackboard_E start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT [ italic_h ( italic_X ) ] = ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_X end_POSTSUBSCRIPT italic_h ( italic_x ) italic_g ( italic_x ) , (5)

where 𝒳𝒳\mathcal{X}caligraphic_X is the domain of X𝑋Xitalic_X. Suppose that a set of S𝑆Sitalic_S samples {x[s]}s=1Ssuperscriptsubscriptsuperscript𝑥delimited-[]𝑠𝑠1𝑆\{x^{[s]}\}_{s=1}^{S}{ italic_x start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT with x[s]g(X)similar-tosuperscript𝑥delimited-[]𝑠𝑔𝑋x^{[s]}\sim g(X)italic_x start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ∼ italic_g ( italic_X ) is available, then the empirical average

h¯=1Ss=1Sh(x[s])¯1𝑆superscriptsubscript𝑠1𝑆superscript𝑥delimited-[]𝑠\bar{h}=\frac{1}{S}\sum_{s=1}^{S}h\left({x}^{[s]}\right)over¯ start_ARG italic_h end_ARG = divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_h ( italic_x start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) (6)

is an unbiased estimate of the expectation 𝔼g[h(X)]subscript𝔼𝑔delimited-[]𝑋\mathbb{E}_{g}[h(X)]blackboard_E start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT [ italic_h ( italic_X ) ], i.e., h¯ a.s. 𝔼g[h(X)]superscript a.s. ¯subscript𝔼𝑔delimited-[]𝑋\bar{h}\stackrel{{\scriptstyle\text{ a.s. }}}{{\longrightarrow}}\mathbb{E}_{g}% [h(X)]over¯ start_ARG italic_h end_ARG start_RELOP SUPERSCRIPTOP start_ARG ⟶ end_ARG start_ARG a.s. end_ARG end_RELOP blackboard_E start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT [ italic_h ( italic_X ) ] as S𝑆S\to\inftyitalic_S → ∞. A notable feature of this approach is that the required number of samples S𝑆Sitalic_S to achieve acceptable accuracy is weakly dependent on the dimension of X𝑋Xitalic_X [12]. Therefore, the exponential complexity that is often encountered when computing the summation in (5) can be mitigated.

Importance sampling (IS) is a classical Monte Carlo method that has the potential to further reduce the variance of h¯¯\bar{h}over¯ start_ARG italic_h end_ARG [12]. IS introduces an auxiliary distribution ga(X)subscript𝑔a𝑋g_{\rm a}(X)italic_g start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_X ) on 𝒳𝒳\mathcal{X}caligraphic_X, which has the same support as g(X)𝑔𝑋g(X)italic_g ( italic_X ), and approximates the expectation by

1Ss=1Sg(x[s])ga(x[s])h(x[s]),x[s]ga(X).similar-to1𝑆superscriptsubscript𝑠1𝑆𝑔superscript𝑥delimited-[]𝑠subscript𝑔asuperscript𝑥delimited-[]𝑠superscript𝑥delimited-[]𝑠superscript𝑥delimited-[]𝑠subscript𝑔a𝑋\frac{1}{S}\sum_{s=1}^{S}\frac{g(x^{[s]})}{g_{\rm a}(x^{[s]})}h\left(x^{[s]}% \right),\quad x^{[s]}\sim g_{\rm a}(X).divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT divide start_ARG italic_g ( italic_x start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG italic_h ( italic_x start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) , italic_x start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ∼ italic_g start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_X ) . (7)

It turns out that by a wise selection of ga(X)subscript𝑔a𝑋g_{\rm a}(X)italic_g start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_X ), an accurate approximation can be obtained using fewer samples than the scheme in (6) [7, 27].

The above Monte Carlo methods require samples from a target distribution. MCMC is a popular and generic method for realizing this aim, effective for a wide range of distributions, and scales well with the dimension of the space [7]. Specifically, the MCMC method simulates a Markov chain x(1),x(2),,x(t),superscript𝑥1superscript𝑥2superscript𝑥𝑡x^{(1)},x^{(2)},\ldots,x^{(t)},\ldotsitalic_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT , … , italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , … that is regulated by a transition kernel P(|)P(\cdot|\cdot)italic_P ( ⋅ | ⋅ ), where t𝑡titalic_t is the time step index. Each state in the chain corresponds to a sample, and given the current state x(t)superscript𝑥𝑡x^{(t)}italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT, the new state is generated as x(t+1)P(|x(t))x^{(t+1)}\sim P(\cdot|x^{(t)})italic_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ∼ italic_P ( ⋅ | italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ). The transition kernel is designed to ensure that the chain’s stationary distribution coincides with the target distribution, e.g., g𝑔gitalic_g or gasubscript𝑔ag_{\rm a}italic_g start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT. In this manner, the samples {x(t)}superscript𝑥𝑡\{x^{(t)}\}{ italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT } asymptotically converge to the target distribution for large t𝑡titalic_t. These converged samples can then be used for the Monte Carlo summation in (6) or (7). As the most popular MCMC-type scheme, the MH algorithm first samples a simple proposal distribution xq(|x(t))x^{\prime}\sim q(\cdot|x^{(t)})italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∼ italic_q ( ⋅ | italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ). To ensure convergence to the target distribution, such as g=π𝑔𝜋g=\piitalic_g = italic_π without loss of generality, the proposal xsuperscript𝑥x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is then accepted as the new state x(t+1)superscript𝑥𝑡1x^{(t+1)}italic_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT with a probability [29]

min{1,π(x)q(x(t)|x)π(x(t))q(x|x(t))}.1𝜋superscript𝑥𝑞conditionalsuperscript𝑥𝑡superscript𝑥𝜋superscript𝑥𝑡𝑞conditionalsuperscript𝑥superscript𝑥𝑡\min\left\{1,\frac{\pi({x}^{\prime})q({x}^{(t)}|{x}^{\prime})}{\pi({x}^{(t)})q% ({x}^{\prime}|{x}^{(t)})}\right\}.roman_min { 1 , divide start_ARG italic_π ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_q ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT | italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_π ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) italic_q ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_ARG } . (8)

Otherwise, the current state is retained as the new state, i.e., x(t+1)=x(t)superscript𝑥𝑡1superscript𝑥𝑡x^{(t+1)}=x^{(t)}italic_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT. With this criterion, the corresponding Markov chain admits π𝜋\piitalic_π as the stationary distribution [9]. Meanwhile, this algorithm eliminates the need to evaluate the normalization constant of π𝜋\piitalic_π since π𝜋\piitalic_π appears as a ratio in (8).

III DMALA-Based MIMO Detection

In this section, we derive the DMALA-based MIMO detector. Initially, we develop a highly efficient gradient-based MCMC sampling algorithm tailored to discrete spaces. Subsequently, we establish the convergence of the proposed sampling algorithm and substantiate our claims through empirical verification. Then, we discuss the utilization of the samples for soft decisions, specifically in terms of LLR computation, and examine the computational complexity of our proposed detector. Overall, the proposed detector achieves high accuracy in approximating the exact posterior Bayesian estimate in (4), while mitigating the prohibitive complexity.

III-A Gradient-Based MCMC Sampling in Discrete Spaces

For ease of exposition, we target the posterior distribution π(𝐱)𝜋𝐱\pi(\mathbf{x})italic_π ( bold_x ) in (2) for sample generation. It is noteworthy that this probability mass function can be regarded as a restriction of a continuous distribution defined over the domain N×1superscript𝑁1\mathbb{R}^{N\times 1}blackboard_R start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT to the discrete subset 𝒜N×1superscript𝒜𝑁1\mathcal{A}^{N\times 1}caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT. Therefore, gradients from the logarithm of the underlying continuous distribution, i.e., f(𝐱)𝑓𝐱f(\mathbf{x})italic_f ( bold_x ) in (3), are informative for the sampling of the discrete distribution.333The proposed algorithm and the subsequent theoretical analysis are not limited to the distribution in (2) and can be generalized to different f(𝐱)𝑓𝐱f(\mathbf{x})italic_f ( bold_x ). Moreover, we consider gradients from the log-probability to simplify calculation.

We start with introducing the Langevin algorithm [17, 18], which is a powerful gradient-based MCMC method in continuous spaces. Initialized with 𝐱(1)N×1superscript𝐱1superscript𝑁1\mathbf{x}^{(1)}\in\mathbb{R}^{N\times 1}bold_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT, each sampling iteration of this algorithm generates a proposal vector [17, 18, 28]

𝐱=𝐱(t)+α2f(𝐱(t))+α𝐰(t),superscript𝐱superscript𝐱𝑡𝛼2𝑓superscript𝐱𝑡𝛼superscript𝐰𝑡\mathbf{x}^{\prime}=\mathbf{x}^{(t)}+\frac{\alpha}{2}\nabla f(\mathbf{x}^{(t)}% )+\sqrt{\alpha}\mathbf{w}^{(t)},bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT + divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) + square-root start_ARG italic_α end_ARG bold_w start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , (9)

where t𝑡titalic_t is the sampling iteration index (equivalent to the time step index of the underlying Markov chain), α>0𝛼0\alpha>0italic_α > 0 is the step size, and 𝐰(t)superscript𝐰𝑡\mathbf{w}^{(t)}bold_w start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT is a random perturbation that follows 𝒩(𝟎,𝐈N)𝒩0subscript𝐈𝑁\mathcal{N}(\mathbf{0},\mathbf{I}_{N})caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ), where 𝟎0\mathbf{0}bold_0 is a zero vector. f𝑓\nabla f∇ italic_f is the gradient of f(𝐱)𝑓𝐱f(\mathbf{x})italic_f ( bold_x ) given by

f(𝐱)=2σ2𝐇T(𝐲𝐇𝐱).𝑓𝐱2superscript𝜎2superscript𝐇𝑇𝐲𝐇𝐱\nabla f(\mathbf{x})=\frac{2}{\sigma^{2}}\mathbf{H}^{T}(\mathbf{y}-\mathbf{Hx}).∇ italic_f ( bold_x ) = divide start_ARG 2 end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG bold_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_y - bold_Hx ) . (10)

This gradient facilitates efficient exploration of high probability regions of the sample space. It should be noted that the update rule in (9) can be viewed as drawing the proposal vector from the Gaussian distribution

𝒩(𝐱(t)+α2f(𝐱(t)),α𝐈N).𝒩superscript𝐱𝑡𝛼2𝑓superscript𝐱𝑡𝛼subscript𝐈𝑁\mathcal{N}\left(\mathbf{x}^{(t)}+\frac{\alpha}{2}\nabla f(\mathbf{x}^{(t)}),% \alpha\mathbf{I}_{N}\right).caligraphic_N ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT + divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) , italic_α bold_I start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) . (11)

Based on (9) and considering the variable domain 𝒜N×1superscript𝒜𝑁1\mathcal{A}^{N\times 1}caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT, we derive the discrete proposal function

q𝑞\displaystyle qitalic_q (𝐱|𝐱(t))conditionalsuperscript𝐱superscript𝐱𝑡\displaystyle(\mathbf{x}^{\prime}|\mathbf{x}^{(t)})( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT )
=exp(12α𝐱𝐱(t)α2f(𝐱(t))2)Z𝒜(𝐱(t))n=1N𝕀xn𝒜,absent12𝛼superscriptnormsuperscript𝐱superscript𝐱𝑡𝛼2𝑓superscript𝐱𝑡2subscript𝑍𝒜superscript𝐱𝑡superscriptsubscriptproduct𝑛1𝑁subscript𝕀superscriptsubscript𝑥𝑛𝒜\displaystyle=\frac{\exp\left(-\frac{1}{2\alpha}\|\mathbf{x}^{\prime}-\mathbf{% x}^{(t)}-\frac{\alpha}{2}\nabla f(\mathbf{x}^{(t)})\|^{2}\right)}{Z_{\mathcal{% A}}(\mathbf{x}^{(t)})}\prod_{n=1}^{N}\mathbb{I}_{x_{n}^{\prime}\in\mathcal{A}},= divide start_ARG roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_α end_ARG ∥ bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_Z start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_ARG ∏ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT blackboard_I start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A end_POSTSUBSCRIPT , (12)

where xnsuperscriptsubscript𝑥𝑛x_{n}^{\prime}italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the n𝑛nitalic_n-th element of 𝐱superscript𝐱\mathbf{x}^{\prime}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and the normalization constant Z𝒜(𝐱(t))subscript𝑍𝒜superscript𝐱𝑡Z_{\mathcal{A}}(\mathbf{x}^{(t)})italic_Z start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) is given by

Z𝒜(𝐱(t))=𝐱𝒜N×1exp(12α𝐱𝐱(t)α2f(𝐱(t))2),subscript𝑍𝒜superscript𝐱𝑡subscriptsuperscript𝐱superscript𝒜𝑁112𝛼superscriptnormsuperscript𝐱superscript𝐱𝑡𝛼2𝑓superscript𝐱𝑡2Z_{\mathcal{A}}(\mathbf{x}^{(t)})=\sum_{\mathbf{x}^{\prime}\in\mathcal{A}^{N% \times 1}}\exp\Big{(}-\frac{1}{2\alpha}\|\mathbf{x}^{\prime}-\mathbf{x}^{(t)}-% \frac{\alpha}{2}\nabla f(\mathbf{x}^{(t)})\|^{2}\Big{)},italic_Z start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_α end_ARG ∥ bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (13)

whose computation is generally intractable since it requires traversal over the full space of size QNsuperscript𝑄𝑁Q^{N}italic_Q start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT. However, a distinct feature of the proposal in (12) is that it enjoys an elementwise factorization as [19]:

q(𝐱|𝐱(t))=n=1Nqn(xn|xn(t)),𝑞conditionalsuperscript𝐱superscript𝐱𝑡superscriptsubscriptproduct𝑛1𝑁subscript𝑞𝑛conditionalsubscriptsuperscript𝑥𝑛subscriptsuperscript𝑥𝑡𝑛q(\mathbf{x}^{\prime}|\mathbf{x}^{(t)})=\prod_{n=1}^{N}q_{n}(x^{\prime}_{n}|x^% {(t)}_{n}),italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , (14)

where qn(xn|xn(t))subscript𝑞𝑛conditionalsubscriptsuperscript𝑥𝑛subscriptsuperscript𝑥𝑡𝑛q_{n}(x^{\prime}_{n}|x^{(t)}_{n})italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) is a categorical distribution given by

qn(xn|xn(t))subscript𝑞𝑛conditionalsubscriptsuperscript𝑥𝑛subscriptsuperscript𝑥𝑡𝑛\displaystyle q_{n}(x^{\prime}_{n}|x^{(t)}_{n})italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =ς(μ(xn))𝕀xn𝒜,absent𝜍𝜇subscriptsuperscript𝑥𝑛subscript𝕀superscriptsubscript𝑥𝑛𝒜\displaystyle=\varsigma\big{(}\mu(x^{\prime}_{n})\big{)}\mathbb{I}_{x_{n}^{% \prime}\in\mathcal{A}},= italic_ς ( italic_μ ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) blackboard_I start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A end_POSTSUBSCRIPT , (15)

where μ()𝜇\mu(\cdot)italic_μ ( ⋅ ) denotes one term in the factorization of the l2subscript𝑙2l_{2}italic_l start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm in (12), and ς()𝜍\varsigma(\cdot)italic_ς ( ⋅ ) is a softmax function. These two functions are given by

μ(xn)𝜇subscriptsuperscript𝑥𝑛\displaystyle\mu(x^{\prime}_{n})italic_μ ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) =12[f(𝐱(t))]n(xnxn(t))(xnxn(t))22αabsent12subscriptdelimited-[]𝑓superscript𝐱𝑡𝑛subscriptsuperscript𝑥𝑛subscriptsuperscript𝑥𝑡𝑛superscriptsubscriptsuperscript𝑥𝑛subscriptsuperscript𝑥𝑡𝑛22𝛼\displaystyle=\frac{1}{2}\left[\nabla f(\mathbf{x}^{(t)})\right]_{n}(x^{\prime% }_{n}-x^{(t)}_{n})-\frac{(x^{\prime}_{n}-x^{(t)}_{n})^{2}}{2\alpha}= divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - divide start_ARG ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_α end_ARG (16)

and

ς(μ(xn))=exp(μ(xn))xn𝒜exp(μ(xn)),𝜍𝜇subscriptsuperscript𝑥𝑛𝜇subscriptsuperscript𝑥𝑛subscriptsuperscriptsubscript𝑥𝑛𝒜𝜇subscriptsuperscript𝑥𝑛\varsigma\big{(}\mu(x^{\prime}_{n})\big{)}=\frac{\exp\big{(}\mu(x^{\prime}_{n}% )\big{)}}{\sum_{x_{n}^{\prime}\in\mathcal{A}}\exp\big{(}\mu(x^{\prime}_{n})% \big{)}},italic_ς ( italic_μ ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) = divide start_ARG roman_exp ( italic_μ ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A end_POSTSUBSCRIPT roman_exp ( italic_μ ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) end_ARG , (17)

where [f(𝐱(t))]nsubscriptdelimited-[]𝑓superscript𝐱𝑡𝑛\left[\nabla f(\mathbf{x}^{(t)})\right]_{n}[ ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the n𝑛nitalic_n-th element of the gradient vector f(𝐱(t))𝑓superscript𝐱𝑡\nabla f(\mathbf{x}^{(t)})∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ). This factorization enables the parallel update of each element in 𝐱(t)superscript𝐱𝑡\mathbf{x}^{(t)}bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT, i.e., xnqn(|xn(t)),n=1,,N{x^{\prime}_{n}\sim q_{n}(\cdot|x^{(t)}_{n})},n=1,\ldots,Nitalic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∼ italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ | italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , italic_n = 1 , … , italic_N, after gradient computation, whose cost is only 𝒪(N2)𝒪superscript𝑁2\mathcal{O}(N^{2})caligraphic_O ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Therefore, the overall computational cost of constructing the proposal in (15) and parallel updating scales polynomially, rather than exponentially, with N𝑁Nitalic_N.

The update rule in (9) originates from discretizing the stochastic differential equation of Langevin diffusion [17]. Due to discretization errors, the unadjusted Langevin algorithm, i.e., directly letting 𝐱(t+1)=𝐱superscript𝐱𝑡1superscript𝐱\mathbf{x}^{(t+1)}=\mathbf{x}^{\prime}bold_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, typically suffers from asymptotic bias towards the target distribution [18, 19, 28]. To deal with this issue, we integrate the MH adjustment with the discrete Langevin proposal (15) to ensure the reversibility of the Markov chain {𝐱(t)}superscript𝐱𝑡\{\mathbf{x}^{(t)}\}{ bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT } and the convergence to the target distribution [29], resulting in DMALA. Specifically, after generating 𝐱superscript𝐱\mathbf{x}^{\prime}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT using the proposal distribution in (15), the MH adjustment introduced in Sec. II-B is performed to accept 𝐱superscript𝐱\mathbf{x}^{\prime}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with a probability given by

A(𝐱|𝐱(t))𝐴conditionalsuperscript𝐱superscript𝐱𝑡\displaystyle A(\mathbf{x}^{\prime}|\mathbf{x}^{(t)})italic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) =min{1,exp(f(𝐱)f(𝐱(t)))q(𝐱(t)|𝐱)q(𝐱|𝐱(t))},absent1𝑓superscript𝐱𝑓superscript𝐱𝑡𝑞conditionalsuperscript𝐱𝑡superscript𝐱𝑞conditionalsuperscript𝐱superscript𝐱𝑡\displaystyle=\min\left\{1,\exp\big{(}f(\mathbf{x}^{\prime})-f(\mathbf{x}^{(t)% })\big{)}\frac{q(\mathbf{x}^{(t)}|\mathbf{x}^{\prime})}{q(\mathbf{x}^{\prime}|% \mathbf{x}^{(t)})}\right\},= roman_min { 1 , roman_exp ( italic_f ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ) divide start_ARG italic_q ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_ARG } , (18)

where q(𝐱(t)|𝐱)𝑞conditionalsuperscript𝐱𝑡superscript𝐱q(\mathbf{x}^{(t)}|\mathbf{x}^{\prime})italic_q ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) is the reverse proposal calculated similarly as the forward proposal q(𝐱|𝐱(t))𝑞conditionalsuperscript𝐱superscript𝐱𝑡q(\mathbf{x}^{\prime}|\mathbf{x}^{(t)})italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ). This equation is derived by substituting the target distribution π𝜋\piitalic_π (2) and the discrete proposal q𝑞qitalic_q (14) into (8). Note that the indicator 𝕀𝕀\mathbb{I}blackboard_I is omitted for simplicity.

Algorithm 1 DMALA Sampler
0:  𝐲𝐲{\bf{y}}bold_y, 𝐇𝐇{\bf{H}}bold_H, σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, α𝛼\alphaitalic_α.
1:  Initialize: 𝐱(1)superscript𝐱1\mathbf{x}^{(1)}bold_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT, f(𝐱(1))𝑓superscript𝐱1f(\mathbf{x}^{(1)})italic_f ( bold_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ), f(𝐱(1))𝑓superscript𝐱1\nabla f(\mathbf{x}^{(1)})∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ).
2:  Construct the proposal qn(|xn(1)),n=1,,Nq_{n}(\cdot|x_{n}^{(1)}),n=1,\ldots,Nitalic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ | italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ) , italic_n = 1 , … , italic_N in (15).
3:  for t=1𝑡1t=1italic_t = 1 to T1𝑇1T-1italic_T - 1 do
4:   Sample xnqn(|xn(t))x^{\prime}_{n}\sim q_{n}(\cdot|x^{(t)}_{n})italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∼ italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ | italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) for n=1𝑛1n=1italic_n = 1 to N𝑁Nitalic_N in parallel.
5:   Compute q(𝐱|𝐱(t))=n=1Nqn(xn|xn(t))𝑞conditionalsuperscript𝐱superscript𝐱𝑡superscriptsubscriptproduct𝑛1𝑁subscript𝑞𝑛conditionalsubscriptsuperscript𝑥𝑛subscriptsuperscript𝑥𝑡𝑛q(\mathbf{x}^{\prime}|\mathbf{x}^{(t)})=\prod_{n=1}^{N}q_{n}(x^{\prime}_{n}|x^% {(t)}_{n})italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ).
6:   Compute f(𝐱)𝑓superscript𝐱\nabla f(\mathbf{x}^{\prime})∇ italic_f ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) via (10).
7:   Construct the proposal qn(|xn),n=1,,Nq_{n}(\cdot|x^{\prime}_{n}),n=1,\ldots,Nitalic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ | italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , italic_n = 1 , … , italic_N in (15) and compute q(𝐱(t)|𝐱)=n=1Nqn(xn(t)|xn)𝑞conditionalsuperscript𝐱𝑡superscript𝐱superscriptsubscriptproduct𝑛1𝑁subscript𝑞𝑛conditionalsubscriptsuperscript𝑥𝑡𝑛subscriptsuperscript𝑥𝑛q(\mathbf{x}^{(t)}|\mathbf{x}^{\prime})=\prod_{n=1}^{N}q_{n}(x^{(t)}_{n}|x^{% \prime}_{n})italic_q ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ).
8:   Compute f(𝐱)𝑓superscript𝐱f(\mathbf{x}^{\prime})italic_f ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) via (3).
9:   Compute A(𝐱|𝐱(t))𝐴conditionalsuperscript𝐱superscript𝐱𝑡A(\mathbf{x}^{\prime}|\mathbf{x}^{(t)})italic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) via (18) and sample u𝒰(0,1)similar-to𝑢𝒰01u\sim\mathcal{U}(0,1)italic_u ∼ caligraphic_U ( 0 , 1 ).
10:   if A(𝐱|𝐱(t))>u𝐴conditionalsuperscript𝐱superscript𝐱𝑡𝑢A(\mathbf{x}^{\prime}|\mathbf{x}^{(t)})>uitalic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) > italic_u then
11:     𝐱(t+1)=𝐱superscript𝐱𝑡1superscript𝐱\mathbf{x}^{(t+1)}=\mathbf{x}^{\prime}bold_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, f(𝐱(t+1))=f(𝐱)𝑓superscript𝐱𝑡1𝑓superscript𝐱f(\mathbf{x}^{(t+1)})=f(\mathbf{x}^{\prime})italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) = italic_f ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), and qn(|xn(t+1))=qn(|xn),n=1,,Nq_{n}(\cdot|x^{(t+1)}_{n})=q_{n}(\cdot|x^{\prime}_{n}),n=1,\ldots,Nitalic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ | italic_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ | italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , italic_n = 1 , … , italic_N.
12:   else
13:     𝐱(t+1)=𝐱(t)superscript𝐱𝑡1superscript𝐱𝑡{\mathbf{x}^{(t+1)}=\mathbf{x}^{(t)}}bold_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT, f(𝐱(t+1))=f(𝐱(t))𝑓superscript𝐱𝑡1𝑓superscript𝐱𝑡{f(\mathbf{x}^{(t+1)})=f(\mathbf{x}^{(t)})}italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT ) = italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ), and qn(|xn(t+1))=qn(|xn(t)),n=1,,N{q_{n}(\cdot|x^{(t+1)}_{n})}=q_{n}(\cdot|x^{(t)}_{n}),n=1,\ldots,Nitalic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ | italic_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( ⋅ | italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , italic_n = 1 , … , italic_N.
14:   end if
15:  end for
15:  Samples {𝐱(t)}t=1Tsuperscriptsubscriptsuperscript𝐱𝑡𝑡1𝑇\{\mathbf{x}^{(t)}\}_{t=1}^{T}{ bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT.

The sampler using DMALA is outlined in Algorithm 1. Parallel samplers can be employed for sample generation, facilitating efficient utilization of computational resources while reducing correlation among the generated samples. Utilizing these samples, hard decisions are determined by selecting the sample with the smallest residual norm 𝐲𝐇𝐱norm𝐲𝐇𝐱\|\mathbf{y}-\mathbf{Hx}\|∥ bold_y - bold_Hx ∥, whereas soft decisions are inferred using the Monte Carlo methods discussed in Sec. II-B. In this paper, we focus on the latter approach, with the developed method detailed in Sec. III-C.

Refer to caption
(a) Existing methods
Refer to caption
(b) DMALA
Figure 1: Visual comparison between existing gradient-based MCMC methods [21, 22] and the proposed DMALA for the 16-QAM lattice. The black cross represents the current state of the Markov chain. In (a), the candidate sample is generated via a sequential process of gradient descent, random walk, and QAM mapping. In (b), the left subplot shows the underlying continuous distribution of the target discrete distribution, with colors indicating probability levels. Gradients from the logarithm of this continuous distribution are employed to construct the proposal distribution, as shown by color dots in the right subplot.
Remark 1

The target distribution may demonstrate strong second-order correlations across dimensions. In such cases, using naive gradients in (10) leads to slow convergence [21, 19, 20]. To accelerate convergence, we explore a preconditioned variant of DMALA. Specifically, we precondition the naive gradients using the inverse of a damped Hessian, given by

𝐌=(𝐇T𝐇+γ𝐈N)1,𝐌superscriptsuperscript𝐇𝑇𝐇𝛾subscript𝐈𝑁1\mathbf{M}=(\mathbf{H}^{T}\mathbf{H}+\gamma\mathbf{I}_{N})^{-1},bold_M = ( bold_H start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_H + italic_γ bold_I start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , (19)

where γ>0𝛾0\gamma>0italic_γ > 0 is a Tikhonov damping parameter. This preconditioner 𝐌𝐌\mathbf{M}bold_M utilizes the second-order information from the target distribution to expedite the gradient descent process. With this preconditioner, the Langevin algorithm (in the continuous space) proceeds as

𝐱=𝐱(t)+α2𝐌f(𝐱(t))+αβ𝐰(t).superscript𝐱superscript𝐱𝑡𝛼2𝐌𝑓superscript𝐱𝑡𝛼𝛽superscript𝐰𝑡\mathbf{x}^{\prime}=\mathbf{x}^{(t)}+\frac{\alpha}{2}\mathbf{M}\nabla f(% \mathbf{x}^{(t)})+\sqrt{\alpha\beta}\mathbf{w}^{(t)}.bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT + divide start_ARG italic_α end_ARG start_ARG 2 end_ARG bold_M ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) + square-root start_ARG italic_α italic_β end_ARG bold_w start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT . (20)

In this variant, we also scale the random perturbation with an additional parameter β>0𝛽0\beta>0italic_β > 0, which allows for adjusting the gradient to perturbation intensity ratio. Correspondingly, in the preconditioned DMALA, the proposal in (15) is adapted by substituting μ(xn)𝜇subscriptsuperscript𝑥𝑛\mu(x^{\prime}_{n})italic_μ ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) with ν(xn)𝜈subscriptsuperscript𝑥𝑛\nu(x^{\prime}_{n})italic_ν ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), represented by

ν(xn)=12β[𝐌f(𝐱(t))]n(xnxn(t))(xnxn(t))22αβ.𝜈subscriptsuperscript𝑥𝑛12𝛽subscriptdelimited-[]𝐌𝑓superscript𝐱𝑡𝑛subscriptsuperscript𝑥𝑛subscriptsuperscript𝑥𝑡𝑛superscriptsubscriptsuperscript𝑥𝑛subscriptsuperscript𝑥𝑡𝑛22𝛼𝛽\nu(x^{\prime}_{n})=\frac{1}{2\beta}\left[\mathbf{M}\nabla f(\mathbf{x}^{(t)})% \right]_{n}(x^{\prime}_{n}-x^{(t)}_{n})-\frac{(x^{\prime}_{n}-x^{(t)}_{n})^{2}% }{2\alpha\beta}.italic_ν ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG 2 italic_β end_ARG [ bold_M ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) - divide start_ARG ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_α italic_β end_ARG . (21)
Remark 2

We present a visual comparison between existing gradient-based MCMC methods for MIMO detection [21, 22] and the proposed DMALA, as illustrated in Fig. 1. Fig. 1(a) depicts how existing methods relax the problem to the continuous space, where they generate a candidate sample through a sequence of steps: gradient descent, random walk (by adding a random Gaussian perturbation), and QAM mapping. Regrettably, the QAM mapping, which discretizes the continuous update to its nearest lattice point, is deterministic and lacks an associated proposal probability. This absence makes the exact MH correction unattainable, leading, as shown later, to inexact sampling and significant inference errors. Conversely, as depicted in Fig. 1(b), DMALA constructs the proposal with guidance from gradients of the underlying continuous function, allowing for an exact MH correction. This key differentiation ensures the convergence of our method, a point elaborated upon in the following subsection.

III-B Convergence of the DMALA Sampler

A distinct feature of the DMALA sampler, compared to existing gradient-based MCMC methods [21, 22], is its guaranteed (asymptotic) convergence to the target distribution. To begin with, we analyze the statistical properties of the Markov chain induced by DMALA. For two states 𝐱,𝐱𝒜N×1𝐱superscript𝐱superscript𝒜𝑁1\mathbf{x},\mathbf{x}^{\prime}\in\mathcal{A}^{N\times 1}bold_x , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT, the transition probability P(𝐱|𝐱)𝑃conditionalsuperscript𝐱𝐱P(\mathbf{x}^{\prime}|\mathbf{x})italic_P ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) of this chain is given by

P(𝐱|𝐱)𝑃conditionalsuperscript𝐱𝐱\displaystyle P(\mathbf{x}^{\prime}|\mathbf{x})italic_P ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x )
={q(𝐱|𝐱)A(𝐱|𝐱), if 𝐱𝐱,q(𝐱|𝐱)+𝐳𝐱q(𝐳|𝐱)(1A(𝐳|𝐱)), otherwise,absentcases𝑞conditionalsuperscript𝐱𝐱𝐴conditionalsuperscript𝐱𝐱 if superscript𝐱𝐱𝑞conditional𝐱𝐱subscript𝐳𝐱𝑞conditional𝐳𝐱1𝐴conditional𝐳𝐱 otherwise\displaystyle=\left\{\begin{array}[]{ll}q(\mathbf{x}^{\prime}|\mathbf{x})A(% \mathbf{x}^{\prime}|\mathbf{x}),&\text{ if }\mathbf{x}^{\prime}\neq\mathbf{x},% \\ q(\mathbf{x}|\mathbf{x})+\sum_{\mathbf{z}\neq\mathbf{x}}q(\mathbf{z}|\mathbf{x% })\big{(}1-A(\mathbf{z}|\mathbf{x})\big{)},&\text{ otherwise},\end{array}\right.= { start_ARRAY start_ROW start_CELL italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) italic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) , end_CELL start_CELL if bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ bold_x , end_CELL end_ROW start_ROW start_CELL italic_q ( bold_x | bold_x ) + ∑ start_POSTSUBSCRIPT bold_z ≠ bold_x end_POSTSUBSCRIPT italic_q ( bold_z | bold_x ) ( 1 - italic_A ( bold_z | bold_x ) ) , end_CELL start_CELL otherwise , end_CELL end_ROW end_ARRAY (24)

where q(|)q(\cdot|\cdot)italic_q ( ⋅ | ⋅ ) represents the proposal probability, computed as per (14), and A(|)A(\cdot|\cdot)italic_A ( ⋅ | ⋅ ) denotes the acceptance probability according to (18). Armed with this transition probability (kernel) in position, we present a lemma detailing the statistical properties of the Markov chain, proved in Appendix A.

Lemma 1

Assume that the probability density function of the noise 𝐧𝐧\mathbf{n}bold_n satisfies 0<p𝐧()<+0subscript𝑝𝐧0<p_{\bf n}(\cdot)<+\infty0 < italic_p start_POSTSUBSCRIPT bold_n end_POSTSUBSCRIPT ( ⋅ ) < + ∞, i.e., σ2>0superscript𝜎20\sigma^{2}>0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 0. Then, the transition kernel of the Markov chain induced by DMALA (α>0𝛼0\alpha>0italic_α > 0) is irreducible and aperiodic. Moreover, the target posterior distribution π𝜋\piitalic_π is a unique stationary distribution of this chain.

Building upon Lemma 1 and the convergence theorem of MCMC, we assert the theorem concerning the exponential convergence of the Markov chain induced by DMALA, as presented herein without an accompanying proof. A comprehensive proof is available in [30, Theorem 4.9].

Theorem 1

Given that Lemma 1 holds, the Markov chain induced by DMALA exhibits exponential convergence to its unique stationary distribution π𝜋\piitalic_π in total variation distance. Specifically, there exist constants C>0𝐶0C>0italic_C > 0 and 0<r<10𝑟10<r<10 < italic_r < 1 such that

P(t)(𝐱,)πTVCrt,𝐱𝒜N×1.formulae-sequencesubscriptnormsuperscript𝑃𝑡𝐱𝜋TV𝐶superscript𝑟𝑡for-all𝐱superscript𝒜𝑁1\|P^{(t)}(\mathbf{x},\cdot)-\pi\|_{\rm TV}\leq Cr^{t},\;\forall\mathbf{x}\in% \mathcal{A}^{N\times 1}.∥ italic_P start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( bold_x , ⋅ ) - italic_π ∥ start_POSTSUBSCRIPT roman_TV end_POSTSUBSCRIPT ≤ italic_C italic_r start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , ∀ bold_x ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT . (25)

Herein, r𝑟ritalic_r represents the convergence rate [10], with a smaller r𝑟ritalic_r indicating faster convergence, P(t)(𝐱,)superscript𝑃𝑡𝐱P^{(t)}(\mathbf{x},\cdot)italic_P start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( bold_x , ⋅ ) denotes the probability distribution induced by the t𝑡titalic_t-step transition function of the Markov chain with the initial state 𝐱𝐱\mathbf{x}bold_x, defined as [10]

P(t)(𝐱,𝐱)=P(𝐱(t+1)=𝐱|𝐱(1)=𝐱),superscript𝑃𝑡𝐱superscript𝐱𝑃superscript𝐱𝑡1conditionalsuperscript𝐱superscript𝐱1𝐱P^{(t)}(\mathbf{x},\mathbf{x}^{\prime})=P\big{(}\mathbf{x}^{(t+1)}=\mathbf{x}^% {\prime}|\mathbf{x}^{(1)}=\mathbf{x}\big{)},italic_P start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( bold_x , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = italic_P ( bold_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT = bold_x ) , (26)

and π1π2TVsubscriptnormsubscript𝜋1subscript𝜋2TV\|\pi_{1}-\pi_{2}\|_{\rm TV}∥ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_TV end_POSTSUBSCRIPT signifies the total variation distance between two distributions π1subscript𝜋1{\pi}_{1}italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and π2subscript𝜋2\pi_{2}italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT over 𝒜N×1superscript𝒜𝑁1\mathcal{A}^{N\times 1}caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT, defined as

π1π2TV=12𝐱𝒜N×1|π1(𝐱)π2(𝐱)|.subscriptnormsubscript𝜋1subscript𝜋2TV12subscript𝐱superscript𝒜𝑁1subscript𝜋1𝐱subscript𝜋2𝐱\|\pi_{1}-\pi_{2}\|_{\rm TV}=\frac{1}{2}\sum_{\mathbf{x}\in\mathcal{A}^{N% \times 1}}|\pi_{1}(\mathbf{x})-\pi_{2}(\mathbf{x})|.∥ italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT roman_TV end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT bold_x ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT | italic_π start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_x ) - italic_π start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_x ) | . (27)

Theorem 1 implies that the distribution of the samples generated by DMALA exponentially converges to the target distribution for large t𝑡titalic_t. Furthermore, this theorem sheds light on the rate of convergence, denoted as r𝑟ritalic_r. Delving into the convergence rate r𝑟ritalic_r, we consider the discrete space 𝒜N×1superscript𝒜𝑁1\mathcal{A}^{N\times 1}caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT represented by {𝐱1,𝐱2,,𝐱QN}subscript𝐱1subscript𝐱2subscript𝐱superscript𝑄𝑁\left\{\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{Q^{N}}\right\}{ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_x start_POSTSUBSCRIPT italic_Q start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_POSTSUBSCRIPT }. The transition kernel of the Markov chain, induced by DMALA, can be modeled as a QN×QNsuperscript𝑄𝑁superscript𝑄𝑁Q^{N}\times Q^{N}italic_Q start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT × italic_Q start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT matrix 𝐏=[Pij]𝐏delimited-[]subscript𝑃𝑖𝑗\mathbf{P}=[P_{ij}]bold_P = [ italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ]. Here, the entry Pijsubscript𝑃𝑖𝑗P_{ij}italic_P start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT signifies the transition probability from 𝐱isubscript𝐱𝑖\mathbf{x}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to 𝐱jsubscript𝐱𝑗\mathbf{x}_{j}bold_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, calculated as per (24). It is established that the convergence rate r𝑟ritalic_r corresponds to the second largest eigenvalue λ2subscript𝜆2{\lambda_{2}}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT of 𝐏𝐏\mathbf{P}bold_P [10, 31]. Although deriving analytical expressions for this eigenvalue poses challenges, it is feasible to exactly compute r𝑟ritalic_r in small sample spaces, aiding in convergence diagnostics. Such an analysis not only enhances comprehension of the algorithm’s performance but also illuminates paths for further improvement.

Refer to caption
Figure 2: Total variation distance as a function of the number of sampling iterations within a 2×222{\text{2}\times\text{2}}2 × 2 MIMO system featuring Rayleigh fading channels, QPSK modulation, and SNR=8dBSNR8dB{\text{SNR}=\text{8}\;\text{dB}}SNR = 8 dB.
Refer to caption
Figure 3: Comparison between the sampled and true posterior distributions within the 2×222{\text{2}\times\text{2}}2 × 2 MIMO-QPSK space, which encompasses 16 possible states. States exhibiting a probability lower than 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT are omitted for clarity.

We conducted an empirical verification of Theorem 1 using a 2×222{\text{2}\times\text{2}}2 × 2 MIMO system with quadrature phase shift keying (QPSK) modulation (QN=24=16superscript𝑄𝑁superscript2416{Q^{N}=2^{4}=16}italic_Q start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT = 2 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT = 16) and the signal-to-noise ratio (SNR) per receiving antenna of 8 dB.444The SNR is defined as SNR=𝔼[𝐇𝐱2]/𝔼[𝐧2]SNR𝔼delimited-[]superscriptnorm𝐇𝐱2𝔼delimited-[]superscriptnorm𝐧2\text{SNR}=\mathbb{E}[\|\mathbf{Hx}\|^{2}]/\mathbb{E}[\|\mathbf{n}\|^{2}]SNR = blackboard_E [ ∥ bold_Hx ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] / blackboard_E [ ∥ bold_n ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]. Fig. 2 shows the total variation distance between the sampled distribution P(t)(𝐱,)superscript𝑃𝑡𝐱P^{(t)}(\mathbf{x},\cdot)italic_P start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ( bold_x , ⋅ ) from DMALA and the true target distribution as a function of the sampling iteration index t𝑡titalic_t. This sampled distribution was derived by collecting the generated samples of 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT independent samplers at each time step t𝑡titalic_t and counting the number of occurrences to reflect the probability of each state in the 2×222{\text{2}\times\text{2}}2 × 2 MIMO-QPSK space. For this empirical study, we employed the preconditioned version of DMALA. We benchmarked against a state-of-the-art gradient-based MCMC method for MIMO detection, specifically MHGD [21]. An exponential convergence curve of rtsuperscript𝑟𝑡r^{t}italic_r start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT serves as a reference. For this specific channel realization, r=0.657𝑟0.657r=0.657italic_r = 0.657 was determined through the eigendecomposition of the 16×16161616\times 1616 × 16 transition matrix 𝐏𝐏\mathbf{P}bold_P associated with DMALA.

The analysis yields several key insights. First, the total variation distance for DMALA approaches zero, indicating successful convergence to the target distribution, whereas for MHGD, it converges to an approximate value of 0.2, suggesting a deviation from the target. This outcome underscores the superiority of DMALA’s methodical probabilistic sampling approach. Second, the empirical convergence curve of DMALA’s total variation distance (depicted by a solid line) closely aligns with the theoretical exponential convergence curve (dashed line), thereby corroborating the assertions of Theorem 1. Additionally, when comparing the sampled distributions to the true distribution, as shown in Fig. 3 via histograms, it is evident that MHGD’s sampled distribution exhibits bias, while DMALA’s distribution closely matches the true distribution.

Refer to caption
(a) Naive
Refer to caption
(b) Preconditioned
Figure 4: Boxplots illustrating the convergence rates in a 2×222{\text{2}\times\text{2}}2 × 2 MIMO system employing QPSK modulation under Rayleigh fading channels. The convergence rates for naive and preconditioned DMALA configurations are depicted in panels (a) and (b), respectively.

In Fig. 4, we extend our investigation to examine the performance differences between naive and preconditioned DMALA configurations under varying SNR conditions. The system setup remains identical to that described in Fig. 2, with SNR levels adjusted to 4, 6, 8, and 10 dB. For each SNR setting, we simulate 100 independent channel realizations. The transition matrix 𝐏𝐏\mathbf{P}bold_P is constructed for each realization, from which we compute the second largest eigenvalue, i.e., the convergence rate r𝑟ritalic_r. Fig. 4 shows the boxplots555A boxplot is a method for displaying the distribution of a dataset based on the five-number summary: the minimum, first quartile, medium, third quartile, and maximum. The first and third quartiles are denoted by the two ends of the box, and the medium is represented by the middle bar. The points outside the upper and lower bounds are considered outliners. of these convergence rates, distinguishing between the results for naive DMALA in Fig. 4(a) and the preconditioned variant in Fig. 4(b). From Fig. 4, naive DMALA exhibits increasingly slow convergence (r1𝑟1r\to 1italic_r → 1) at SNR values above 8 dB, a phenomenon recognized in the literature as the high SNR stalling issue [12, 13]. Conversely, the preconditioned DMALA demonstrates steady convergence rates across the explored SNR spectrum, indicating notable enhancements in performance.

We further investigate the convergence rate beyond empirical analysis, drawing inspiration from [32]. We first introduce a lemma on the proposal distribution of DMALA, with the proof provided in Appendix B.

Lemma 2

For the Markov chain of the proposed DMALA, there exists a constant ξ>0𝜉0\xi>0italic_ξ > 0 such that

q(𝐱|𝐱(t))π(𝐱)ξG(𝐱(t),𝐱),𝐱(t)𝒜N×1,formulae-sequence𝑞conditionalsuperscript𝐱superscript𝐱𝑡𝜋superscript𝐱𝜉𝐺superscript𝐱𝑡superscript𝐱for-allsuperscript𝐱𝑡superscript𝒜𝑁1\frac{q(\mathbf{x}^{\prime}|\mathbf{x}^{(t)})}{\pi(\mathbf{x}^{\prime})}\geq% \xi\cdot G(\mathbf{x}^{(t)},\mathbf{x}^{\prime}),\;\forall\mathbf{x}^{(t)}\in% \mathcal{A}^{N\times 1},divide start_ARG italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_π ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG ≥ italic_ξ ⋅ italic_G ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , ∀ bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT , (28)

where

ξ=𝐬𝒜N×1exp(f(𝐬))n=1Nxnexp(12α|xn|2)𝜉subscript𝐬superscript𝒜𝑁1𝑓𝐬superscriptsubscriptproduct𝑛1𝑁subscriptsuperscriptsubscript𝑥𝑛12𝛼superscriptsuperscriptsubscript𝑥𝑛2\xi=\frac{\sum_{\mathbf{s}\in\mathcal{A}^{N\times 1}}\exp\big{(}f(\mathbf{s})% \big{)}}{\prod_{n=1}^{N}\sum_{x_{n}^{\prime}\in\mathbb{Z}}\exp\big{(}-\frac{1}% {2\alpha}|x_{n}^{\prime}|^{2}\big{)}}italic_ξ = divide start_ARG ∑ start_POSTSUBSCRIPT bold_s ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_f ( bold_s ) ) end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_Z end_POSTSUBSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_α end_ARG | italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG (29)

with \mathbb{Z}blackboard_Z denoting the set of all integers (encompassing the QAM lattice 𝒜𝒜\mathcal{A}caligraphic_A), and the function G𝐺Gitalic_G is given by

G(𝐱(t),𝐱)=exp(12α𝐱𝐱(t)α2f(𝐱(t))2)exp(f(𝐱)).𝐺superscript𝐱𝑡superscript𝐱12𝛼superscriptnormsuperscript𝐱superscript𝐱𝑡𝛼2𝑓superscript𝐱𝑡2𝑓superscript𝐱G(\mathbf{x}^{(t)},\mathbf{x}^{\prime})=\frac{\exp\big{(}-\frac{1}{2\alpha}\|% \mathbf{x}^{\prime}-\mathbf{x}^{(t)}-\frac{\alpha}{2}\nabla f(\mathbf{x}^{(t)}% )\|^{2}\big{)}}{\exp\big{(}f(\mathbf{x}^{\prime})\big{)}}.italic_G ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = divide start_ARG roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_α end_ARG ∥ bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG roman_exp ( italic_f ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) end_ARG . (30)

Based on this lemma, we have

q(𝐱|𝐱(t))ξG(𝐱(t),𝐱)π(𝐱)δπ(𝐱)𝑞conditionalsuperscript𝐱superscript𝐱𝑡𝜉𝐺superscript𝐱𝑡superscript𝐱𝜋superscript𝐱𝛿𝜋superscript𝐱q(\mathbf{x}^{\prime}|\mathbf{x}^{(t)})\geq\xi\cdot G(\mathbf{x}^{(t)},\mathbf% {x}^{\prime})\pi(\mathbf{x}^{\prime})\geq\delta\pi(\mathbf{x}^{\prime})italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ≥ italic_ξ ⋅ italic_G ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_π ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ≥ italic_δ italic_π ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) (31)

for all 𝐱(t),𝐱𝒜N×1superscript𝐱𝑡superscript𝐱superscript𝒜𝑁1\mathbf{x}^{(t)},\mathbf{x}^{\prime}\in\mathcal{A}^{N\times 1}bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT, where

δ=ξmin𝐱(t),𝐱𝒜N×1[G(𝐱(t),𝐱)]>0.𝛿𝜉superscript𝐱𝑡superscript𝐱superscript𝒜𝑁1delimited-[]𝐺superscript𝐱𝑡superscript𝐱0\delta=\xi\cdot\underset{\mathbf{x}^{(t)},\mathbf{x}^{\prime}\in\mathcal{A}^{N% \times 1}}{\min}[G(\mathbf{x}^{(t)},\mathbf{x}^{\prime})]>0.italic_δ = italic_ξ ⋅ start_UNDERACCENT bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_min end_ARG [ italic_G ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] > 0 . (32)

Following this relationship, the coupling technique can be employed to establish the following theorem that delineates the exponential convergence rate of DMALA. The related proof is available in [33, Theorem 1].

Theorem 2

The convergence rate r𝑟ritalic_r in Theorem 1 can be specified as r=1δ𝑟1𝛿r=1-\deltaitalic_r = 1 - italic_δ, where δ=ξmin𝐱(t),𝐱𝒜N×1[G(𝐱(t),𝐱)]𝛿𝜉superscript𝐱𝑡superscript𝐱superscript𝒜𝑁1delimited-[]𝐺superscript𝐱𝑡superscript𝐱\delta=\xi\cdot\underset{\mathbf{x}^{(t)},\mathbf{x}^{\prime}\in\mathcal{A}^{N% \times 1}}{\min}[G(\mathbf{x}^{(t)},\mathbf{x}^{\prime})]italic_δ = italic_ξ ⋅ start_UNDERACCENT bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT end_UNDERACCENT start_ARG roman_min end_ARG [ italic_G ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ] also provides a lower bound for the spectral gap (i.e., 1λ21subscript𝜆21-\lambda_{2}1 - italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT) of DMALA’s transition kernel 𝐏𝐏\mathbf{P}bold_P.

This theorem shows that the convergence rate depends on factors such as the step size α𝛼\alphaitalic_α and the system setup. Optimizing these parameters, such as selecting a suitable α𝛼\alphaitalic_α to make ξ𝜉\xiitalic_ξ and G(𝐱(t),𝐱)𝐺superscript𝐱𝑡superscript𝐱G(\mathbf{x}^{(t)},\mathbf{x}^{\prime})italic_G ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) approach 1 and improve the convergence rate, is beyond the scope of this work and is left for future research.

III-C LLR Computation Methods

After deriving the sample list \mathcal{L}caligraphic_L, various methods can be utilized for LLR computation. Conventional list-based soft-output detectors [5, 6, 34] approximate the APPs in (4) by using \mathcal{L}caligraphic_L to replace the entire set 𝒜N×1superscript𝒜𝑁1\mathcal{A}^{N\times 1}caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT of all possible vectors. Since the list size is generally much smaller than QNsuperscript𝑄𝑁Q^{N}italic_Q start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, the exponentially increased complexity can be alleviated. Consequently, the LLR as defined in (4) is approximated by

L^k=log𝐱𝒜k+N×1exp(f(𝐱))𝐱𝒜kN×1exp(f(𝐱)).subscript^𝐿𝑘subscript𝐱subscriptsuperscript𝒜𝑁1limit-from𝑘𝑓𝐱subscript𝐱subscriptsuperscript𝒜𝑁1limit-from𝑘𝑓𝐱\hat{L}_{k}=\log\frac{\sum_{\mathbf{x}\in{\mathcal{L}}\cap\mathcal{A}^{N\times 1% }_{k+}}\exp\big{(}f(\mathbf{x})\big{)}}{\sum_{\mathbf{x}\in{\mathcal{L}}\cap% \mathcal{A}^{N\times 1}_{k-}}\exp\big{(}f(\mathbf{x})\big{)}}.over^ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_log divide start_ARG ∑ start_POSTSUBSCRIPT bold_x ∈ caligraphic_L ∩ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_f ( bold_x ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT bold_x ∈ caligraphic_L ∩ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k - end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_f ( bold_x ) ) end_ARG . (33)

However, this approach is not tailored to the proposed sampling-based detector since the distribution information of the generated samples is not fully harnessed.

To better accommodate the proposed sampling algorithm, we adopt an alternative approach for LLR computation based on the IS-based Monte Carlo method [12, 35]. Initially, to facilitate a Monte Carlo summation for approximation, we reformulate the APPs p(bk=±1|𝐲)𝑝subscript𝑏𝑘plus-or-minusconditional1𝐲p(b_{k}=\pm 1|\mathbf{y})italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ± 1 | bold_y ) in (4) into the form of expectation. Observe that the APPs can be expressed as

p(bk=±1|𝐲)𝑝subscript𝑏𝑘plus-or-minusconditional1𝐲\displaystyle p(b_{k}=\pm 1|\mathbf{y})italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ± 1 | bold_y ) =𝐛kp(bk=±1,𝐛k|𝐲)absentsubscriptsubscript𝐛𝑘𝑝subscript𝑏𝑘plus-or-minus1conditionalsubscript𝐛𝑘𝐲\displaystyle=\sum_{\mathbf{b}_{-k}}p(b_{k}=\pm 1,\mathbf{b}_{-k}|\mathbf{y})= ∑ start_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ± 1 , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT | bold_y )
=𝐛kp(bk=±1|𝐲,𝐛k)p(𝐛k|𝐲),absentsubscriptsubscript𝐛𝑘𝑝subscript𝑏𝑘plus-or-minusconditional1𝐲subscript𝐛𝑘𝑝conditionalsubscript𝐛𝑘𝐲\displaystyle=\sum_{\mathbf{b}_{-k}}p(b_{k}=\pm 1|\mathbf{y},\mathbf{b}_{-k})p% (\mathbf{b}_{-k}|\mathbf{y}),= ∑ start_POSTSUBSCRIPT bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ± 1 | bold_y , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT ) italic_p ( bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT | bold_y ) , (34)

where p(𝐛k|𝐲)𝑝conditionalsubscript𝐛𝑘𝐲p(\mathbf{b}_{-k}|\mathbf{y})italic_p ( bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT | bold_y ) is considered as g(x)𝑔𝑥g(x)italic_g ( italic_x ) and p(bk=±1|𝐲,𝐛k)𝑝subscript𝑏𝑘plus-or-minusconditional1𝐲subscript𝐛𝑘p(b_{k}=\pm 1|\mathbf{y},\mathbf{b}_{-k})italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ± 1 | bold_y , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT ) as h(x)𝑥h(x)italic_h ( italic_x ) in (5). Thus, the computation of APPs translates to evaluating 𝔼p(𝐛k|𝐲)[p(bk=±1|𝐲,𝐛k)]subscript𝔼𝑝conditionalsubscript𝐛𝑘𝐲delimited-[]𝑝subscript𝑏𝑘plus-or-minusconditional1𝐲subscript𝐛𝑘\mathbb{E}_{p(\mathbf{b}_{-k}|\mathbf{y})}[p(b_{k}=\pm 1|\mathbf{y},\mathbf{b}% _{-k})]blackboard_E start_POSTSUBSCRIPT italic_p ( bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT | bold_y ) end_POSTSUBSCRIPT [ italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ± 1 | bold_y , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT ) ], where IS-based Monte Carlo summation becomes applicable.

Furthermore, to implement the IS-based method, we introduce an auxiliary distribution as the target according to [35]:

πa(𝐱)exp(f(𝐱)/τ)n=1N𝕀xn𝒜,proportional-tosubscript𝜋a𝐱𝑓𝐱𝜏superscriptsubscriptproduct𝑛1𝑁subscript𝕀subscript𝑥𝑛𝒜\pi_{\rm a}(\mathbf{x})\propto\exp\big{(}{f(\mathbf{x})/\tau}\big{)}\prod_{n=1% }^{N}\mathbb{I}_{x_{n}\in\mathcal{A}},italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( bold_x ) ∝ roman_exp ( italic_f ( bold_x ) / italic_τ ) ∏ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT blackboard_I start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ caligraphic_A end_POSTSUBSCRIPT , (35)

where τ>1𝜏1\tau>1italic_τ > 1 is a temperature parameter for enhancing the mobility of the Markov chain [12, 34, 36]. Given that πasubscript𝜋a\pi_{\rm a}italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT represents a tempered posterior and maintains a form analogous to the posterior distribution π𝜋\piitalic_π in (2), our proposed DMALA is well-suited to sample from it, following the derivations in Sec. III-A.

Let ={𝐱[s]}s=1Ssuperscriptsubscriptsuperscript𝐱delimited-[]𝑠𝑠1𝑆\mathcal{L}=\{\mathbf{x}^{[s]}\}_{s=1}^{S}caligraphic_L = { bold_x start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT denote the sample list utilized for LLR computation, where S𝑆Sitalic_S is the size of the sample list (including repetitive samples). To reflect our focus on employing samples nearing convergence for decisions rather than all samples generated during the sampling iterations, we shift the index notation from (t)𝑡(t)( italic_t ) to [s]delimited-[]𝑠[s][ italic_s ]. Building upon (7) and (35) and through some algebraic manipulation, the computation of (4) can be approximated by

L^k=logs=1S11+exp(γk[s])exp(τ1τf(𝐱+1[s]))s=1S11+exp(+γk[s])exp(τ1τf(𝐱1[s])),subscript^𝐿𝑘superscriptsubscript𝑠1𝑆11superscriptsubscript𝛾𝑘delimited-[]𝑠𝜏1𝜏𝑓superscriptsubscript𝐱1delimited-[]𝑠superscriptsubscript𝑠1𝑆11superscriptsubscript𝛾𝑘delimited-[]𝑠𝜏1𝜏𝑓superscriptsubscript𝐱1delimited-[]𝑠\hat{L}_{k}=\log\frac{\sum_{s=1}^{S}\frac{1}{1+\exp(-\gamma_{k}^{[s]})}\exp% \left(\frac{\tau-1}{\tau}f(\mathbf{x}_{+1}^{[s]})\right)}{\sum_{s=1}^{S}\frac{% 1}{1+\exp(+\gamma_{k}^{[s]})}\exp\left(\frac{\tau-1}{\tau}f(\mathbf{x}_{-1}^{[% s]})\right)},over^ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = roman_log divide start_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 + roman_exp ( - italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG roman_exp ( divide start_ARG italic_τ - 1 end_ARG start_ARG italic_τ end_ARG italic_f ( bold_x start_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 1 + roman_exp ( + italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG roman_exp ( divide start_ARG italic_τ - 1 end_ARG start_ARG italic_τ end_ARG italic_f ( bold_x start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) ) end_ARG , (36)

where γk[s]=1τ(f(𝐱+1[s])f(𝐱1[s]))superscriptsubscript𝛾𝑘delimited-[]𝑠1𝜏𝑓superscriptsubscript𝐱1delimited-[]𝑠𝑓superscriptsubscript𝐱1delimited-[]𝑠\gamma_{k}^{[s]}=\frac{1}{\tau}\left(f(\mathbf{x}_{+1}^{[s]})-f(\mathbf{x}_{-1% }^{[s]})\right)italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_τ end_ARG ( italic_f ( bold_x start_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) ), and 𝐱+1[s]superscriptsubscript𝐱1delimited-[]𝑠\mathbf{x}_{+1}^{[s]}bold_x start_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT and 𝐱1[s]superscriptsubscript𝐱1delimited-[]𝑠\mathbf{x}_{-1}^{[s]}bold_x start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT are mapped from 𝐛[s]superscript𝐛delimited-[]𝑠\mathbf{b}^{[s]}bold_b start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT with its k𝑘kitalic_k-th bit bk[s]superscriptsubscript𝑏𝑘delimited-[]𝑠{b}_{k}^{[s]}italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT corresponding to +11+1+ 1 or 11-1- 1, respectively. The bit vector 𝐛[s]superscript𝐛delimited-[]𝑠\mathbf{b}^{[s]}bold_b start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT is demapped from the sample 𝐱[s]superscript𝐱delimited-[]𝑠\mathbf{x}^{[s]}bold_x start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT drawn from πasubscript𝜋a\pi_{\rm a}italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT by DMALA. A detailed derivation of (36) is provided in Appendix C. The complete process of DMALA-based soft MIMO detection, which integrates the proposed DMALA sampler and IS-based LLR computation, is detailed in Algorithm 2.

Algorithm 2 DMALA-Based Soft MIMO Detection
0:  𝐲𝐲{\bf{y}}bold_y, 𝐇𝐇{\bf{H}}bold_H, σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, α𝛼\alphaitalic_α, τ𝜏\tauitalic_τ.
1:  Run the DMALA sampler (Algorithm 1 with f()𝑓f(\cdot)italic_f ( ⋅ ) replaced by f()/τ𝑓𝜏f(\cdot)/\tauitalic_f ( ⋅ ) / italic_τ) to sample from πa(𝐱)subscript𝜋a𝐱\pi_{\rm a}(\mathbf{x})italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( bold_x ) in (35).
2:  Select the converged samples ={𝐱[s]}s=1Ssuperscriptsubscriptsuperscript𝐱delimited-[]𝑠𝑠1𝑆\mathcal{L}=\{\mathbf{x}^{[s]}\}_{s=1}^{S}caligraphic_L = { bold_x start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT for LLR computation.
3:  Demap {𝐱[s]}s=1Ssuperscriptsubscriptsuperscript𝐱delimited-[]𝑠𝑠1𝑆\{\mathbf{x}^{[s]}\}_{s=1}^{S}{ bold_x start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT to derive {𝐛[s]}s=1Ssuperscriptsubscriptsuperscript𝐛delimited-[]𝑠𝑠1𝑆\{\mathbf{b}^{[s]}\}_{s=1}^{S}{ bold_b start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT.
4:  Find {𝐱+1[s]}s=1Ssuperscriptsubscriptsuperscriptsubscript𝐱1delimited-[]𝑠𝑠1𝑆\{\mathbf{x}_{+1}^{[s]}\}_{s=1}^{S}{ bold_x start_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT and {𝐱1[s]}s=1Ssuperscriptsubscriptsuperscriptsubscript𝐱1delimited-[]𝑠𝑠1𝑆\{\mathbf{x}_{-1}^{[s]}\}_{s=1}^{S}{ bold_x start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT based on {𝐛[s]}s=1Ssuperscriptsubscriptsuperscript𝐛delimited-[]𝑠𝑠1𝑆\{\mathbf{b}^{[s]}\}_{s=1}^{S}{ bold_b start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT.
5:  for k=1𝑘1k=1italic_k = 1 to Nlog2Q𝑁subscript2𝑄N\log_{2}Qitalic_N roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_Q do
6:   Compute γk[s]=1τ(f(𝐱+1[s])f(𝐱1[s])),s=1,,Sformulae-sequencesuperscriptsubscript𝛾𝑘delimited-[]𝑠1𝜏𝑓superscriptsubscript𝐱1delimited-[]𝑠𝑓superscriptsubscript𝐱1delimited-[]𝑠𝑠1𝑆\gamma_{k}^{[s]}=\frac{1}{\tau}\left(f(\mathbf{x}_{+1}^{[s]})-f(\mathbf{x}_{-1% }^{[s]})\right),s=1,\ldots,Sitalic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_τ end_ARG ( italic_f ( bold_x start_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) ) , italic_s = 1 , … , italic_S.
7:   Compute L^ksubscript^𝐿𝑘\hat{L}_{k}over^ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT based on (36).
8:  end for
8:  LLRs {L^k}k=1Nlog2Qsuperscriptsubscriptsubscript^𝐿𝑘𝑘1𝑁subscript2𝑄\{\hat{L}_{k}\}_{k=1}^{N\log_{2}Q}{ over^ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_Q end_POSTSUPERSCRIPT.
Remark 3

In alignment with established practices within the MCMC literature, we run the samplers for a sufficiently large T𝑇Titalic_T to ensure the Markov chains have (approximately) converged. Following this, we gather a list of converged samples (typically a limited number) for LLR computation. This strategy aids in excluding potentially inaccurate samples from the initial burn-in phase [12, 10] and also reduces LLR computation complexity. Moreover, we observe that utilizing independent samples from parallel samplers significantly enhances the accuracy of LLR computations over using correlated samples from a single sampler. Therefore, we opt to gather converged samples from a set of parallel samplers. Although each sampler necessitates a burn-in period, this strategy can be more efficient than extending the run time of a single sampler well beyond the necessary T𝑇Titalic_T for convergence to reduce sample correlation.

Remark 4

We also evaluated the LLR computation using Monte Carlo summation in (6) with samples drawn from π𝜋\piitalic_π. We found that this method requires a larger sample size to match the accuracy afforded by the approximation in (7). This observation aligns with the findings in [35] and can be ascribed to the variance reduction effect inherent in IS. Nonetheless, both methods are theoretically guaranteed to asymptotically converge to the exact LLR calculation in (4), a consequence of the law of large numbers. Furthermore, unlike the approach in (33), these methods not only capitalize on the samples collected in \mathcal{L}caligraphic_L but also leverage the occurrence frequencies of these samples, effectively utilizing the probability distribution across the sample space. This aspect of the methodology contributes to performance improvements, as evidenced by our simulation results.

III-D Computational Complexity

We analyze the computational complexity of our proposed detector, focusing on the number of dominant arithmetic operations involved, including multiplications and exponential functions. The analysis is structured into two primary segments: sampling and LLR computation. In this analysis, we denote the number of sampling iterations as T𝑇Titalic_T, the number of parallel samplers as Npsubscript𝑁pN_{\rm p}italic_N start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT, and the number of samples selected for LLR computation as S𝑆Sitalic_S.

III-D1 Sampling Complexity

The computational cost during the sampling stage is twofold: initialization and iteration per sample generation. The initialization cost of the preconditioned DMALA is dominated by the matrix inversion required to calculate the preconditioner 𝐌𝐌\mathbf{M}bold_M, leading to 𝒪(N3)𝒪superscript𝑁3\mathcal{O}(N^{3})caligraphic_O ( italic_N start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) real number multiplications. Nonetheless, this preconditioner needs to be computed only once, allowing for reuse across all sampling iterations, which mitigates its impact on overall complexity.

Each sampling iteration’s complexity primarily involves three matrix-vector multiplications: two for computing the gradient f𝑓\nabla f∇ italic_f as shown in (10), and one for applying the preconditioner to the gradient as 𝐌f𝐌𝑓\mathbf{M}\nabla fbold_M ∇ italic_f, resulting in 𝒪(N2+MN)𝒪superscript𝑁2𝑀𝑁\mathcal{O}(N^{2}+MN)caligraphic_O ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_M italic_N ) multiplications. Specifically, the matrix-vector multiplication 𝐇𝐱superscript𝐇𝐱\mathbf{Hx}^{\prime}bold_Hx start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in the gradient computation can be reduced to a summation of 𝐇𝐇\mathbf{H}bold_H’s columns scaled by distinct QAM magnitudes since the elements of 𝐱superscript𝐱\mathbf{x}^{\prime}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are from 𝒜𝒜\mathcal{A}caligraphic_A. Additional per-iteration operations, such as proposal construction and probability calculation, impose a modest 𝒪(NQ)𝒪𝑁𝑄\mathcal{O}(NQ)caligraphic_O ( italic_N italic_Q ) on the computational load.

Collectively, the total complexity in the sampling stage is 𝒪(N3+(N2+MN+NQ)TNp)𝒪superscript𝑁3superscript𝑁2𝑀𝑁𝑁𝑄𝑇subscript𝑁p\mathcal{O}\big{(}N^{3}+(N^{2}+MN+NQ)TN_{\rm p}\big{)}caligraphic_O ( italic_N start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_M italic_N + italic_N italic_Q ) italic_T italic_N start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT ), effectively curtailing the exponential rise in complexity with increases in N𝑁Nitalic_N and Q𝑄Qitalic_Q. Furthermore, the parallelization of sampling across multiple samplers significantly diminishes computation delay.

III-D2 LLR Computation Complexity

When using the IS-based LLR computation described in (36), the computational demand for each bit includes the evaluation of γk[s]superscriptsubscript𝛾𝑘delimited-[]𝑠\gamma_{k}^{[s]}italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT and 4S4𝑆4S4 italic_S exponential operations. Specifically, the calculation of γk[s]superscriptsubscript𝛾𝑘delimited-[]𝑠\gamma_{k}^{[s]}italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT involves f(𝐱+1[s])𝑓superscriptsubscript𝐱1delimited-[]𝑠f(\mathbf{x}_{+1}^{[s]})italic_f ( bold_x start_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) and f(𝐱1[s])𝑓superscriptsubscript𝐱1delimited-[]𝑠f(\mathbf{x}_{-1}^{[s]})italic_f ( bold_x start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ), with at least one of these evaluations already performed in the sampling stage. The remaining evaluation can also be executed efficiently, as 𝐱+1[s]superscriptsubscript𝐱1delimited-[]𝑠\mathbf{x}_{+1}^{[s]}bold_x start_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT and 𝐱1[s]superscriptsubscript𝐱1delimited-[]𝑠\mathbf{x}_{-1}^{[s]}bold_x start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT differ in only one entry [11]. To further reduce the need for exponential function computations, (36) can be reformulated as

L^k=subscript^𝐿𝑘absent\displaystyle\hat{L}_{k}=over^ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = log[s=1Sexp(τ1τf(𝐱+1[s])F(γk[s]))]superscriptsubscript𝑠1𝑆𝜏1𝜏𝑓superscriptsubscript𝐱1delimited-[]𝑠𝐹superscriptsubscript𝛾𝑘delimited-[]𝑠\displaystyle\log\left[\sum_{s=1}^{S}\exp\left(\frac{\tau-1}{\tau}f(\mathbf{x}% _{+1}^{[s]})-F(-\gamma_{k}^{[s]})\right)\right]roman_log [ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT roman_exp ( divide start_ARG italic_τ - 1 end_ARG start_ARG italic_τ end_ARG italic_f ( bold_x start_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) - italic_F ( - italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) ) ]
log[s=1Sexp(τ1τf(𝐱1[s])F(+γk[s]))],superscriptsubscript𝑠1𝑆𝜏1𝜏𝑓superscriptsubscript𝐱1delimited-[]𝑠𝐹superscriptsubscript𝛾𝑘delimited-[]𝑠\displaystyle-\log\left[\sum_{s=1}^{S}\exp\left(\frac{\tau-1}{\tau}f(\mathbf{x% }_{-1}^{[s]})-F(+\gamma_{k}^{[s]})\right)\right],- roman_log [ ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT roman_exp ( divide start_ARG italic_τ - 1 end_ARG start_ARG italic_τ end_ARG italic_f ( bold_x start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) - italic_F ( + italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) ) ] , (37)

where F(a)=log(1+ea),aformulae-sequence𝐹𝑎1superscript𝑒𝑎𝑎{F(a)=\log(1+e^{a}),\;a\in\mathbb{R}}italic_F ( italic_a ) = roman_log ( 1 + italic_e start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) , italic_a ∈ blackboard_R. For a0𝑎0a\leq 0italic_a ≤ 0, F(a)𝐹𝑎F(a)italic_F ( italic_a ) can be approximated using pre-computed values in a lookup table; while for a>0𝑎0{a>0}italic_a > 0, the identity F(a)=a+F(a)𝐹𝑎𝑎𝐹𝑎{F(a)=a+F(-a)}italic_F ( italic_a ) = italic_a + italic_F ( - italic_a ) is utilized [35]. On this basis, the computation of v=logs=1Seas𝑣superscriptsubscript𝑠1𝑆superscript𝑒subscript𝑎𝑠v=\log\sum_{s=1}^{S}e^{a_{s}}italic_v = roman_log ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT in (37) is simplified using the following recursive operation for s=1𝑠1s=1italic_s = 1 to S𝑆Sitalic_S, initializing v𝑣vitalic_v as -\infty- ∞ [5]:

vlog(ev+eas)=max(v,as)+log(1+e|vas|)F(|vas|).𝑣superscript𝑒𝑣superscript𝑒subscript𝑎𝑠𝑣subscript𝑎𝑠subscript1superscript𝑒𝑣subscript𝑎𝑠𝐹𝑣subscript𝑎𝑠v\leftarrow\log(e^{v}+e^{a_{s}})=\max(v,a_{s})+\underbrace{\log(1+e^{-|v-a_{s}% |})}_{F(-|v-a_{s}|)}.italic_v ← roman_log ( italic_e start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT + italic_e start_POSTSUPERSCRIPT italic_a start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ) = roman_max ( italic_v , italic_a start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) + under⏟ start_ARG roman_log ( 1 + italic_e start_POSTSUPERSCRIPT - | italic_v - italic_a start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT ) end_ARG start_POSTSUBSCRIPT italic_F ( - | italic_v - italic_a start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT | ) end_POSTSUBSCRIPT . (38)

IV Simulation Results

In this section, we demonstrate the near-optimal bit error rate (BER) performance achieved by our proposed DMALA-based detector within coded MIMO systems. Initially, we outline the common configurations employed across our simulation studies. Subsequently, we delve into the assessment of coded BER performance, exploring a variety of MIMO dimensions, channel models, and scenarios with imperfect CSI. For the sake of conciseness, we refer to our proposed detector simply as “DMALA” throughout this section.

IV-A Simulation Setups

In our simulations, we consistently employ a rate-3/4343/43 / 4 low-density parity-check code with a block length of 1944 bits for channel coding, alongside belief propagation for channel decoding. The BER is assessed over 70,000 blocks transmitted across independent channel realizations. Our primary focus is on Rayleigh fading channels, characterized by independently Gaussian-distributed channel coefficients with zero mean and unit variance. Furthermore, to ascertain the robustness of our proposed DMALA detector, we extend our analysis to include different channel models. This encompasses Kronecker spatially correlated channels [37] and the more practical 3rd generation partnership project (3GPP) technical reports (TR) 36.873 3D channels [38], both of which are prevalent in the literature for evaluating MIMO detectors. Unless explicitly noted otherwise, we assume perfect receiver knowledge of the channel matrix 𝐇𝐇\mathbf{H}bold_H.

In light of its demonstrated superiority in Sec. III, we opt for the preconditioned DMALA for our evaluation. For its parameter setups, the step size is set to α=σ2𝛼superscript𝜎2\alpha=\sigma^{2}italic_α = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Following [21], the perturbation scaling parameter β𝛽\betaitalic_β and the damping parameter γ𝛾\gammaitalic_γ are selected as β=dmin2/σ2𝛽superscriptsubscript𝑑2superscript𝜎2\beta={d_{\min}^{2}}/{{\sigma^{2}}}italic_β = italic_d start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and γ=σ2/(2dmin2)𝛾superscript𝜎22superscriptsubscript𝑑2\gamma={\sigma^{2}}/({2d_{\min}^{2}})italic_γ = italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / ( 2 italic_d start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), respectively, where dminsubscript𝑑d_{\min}italic_d start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT represents half of the minimum distance between any two QAM lattice points. Benchmark algorithms for comparative analysis encompass the expectation propagation (EP) detector [39], the K𝐾Kitalic_K-best SD detector [6], the state-of-the-art MHGD detector [21], and the optimal MAP detector that calculates exact LLRs using (4). The EP detector is configured to run for 10 iterations as per [39] with LLRs calculated from posterior marginals approximated by Gaussian distributions [40, Eq. (3)]. Given the deterministic nature of its Gaussian approximation, further increases in iterations do not enhance performance. For the K𝐾Kitalic_K-best SD, we set a large list size (denoted by K𝐾Kitalic_K) to establish a near-optimal baseline and utilize (33) for LLR computation due to its deterministic list-based nature. For both MHGD and DMALA, Npsubscript𝑁pN_{\rm p}italic_N start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT parallel samplers are employed, each initialized randomly and running for T𝑇Titalic_T iterations, with the final sample from each sampler selected for soft decisions. Therefore, the parameter Npsubscript𝑁pN_{\rm p}italic_N start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT also indicates the number of samples used in LLR computation. The rationale for this selection strategy has been discussed in Sec. III-C. IS-based LLR computation is applied to both MHGD and DMALA, with a temperature parameter τ=2𝜏2\tau=2italic_τ = 2 as suggested by [35], unless specified otherwise.

IV-B Small- and Medium-Sized MIMO

Refer to caption
Figure 5: BER performance of DMALA across varying numbers of sampling iterations within a 4×444{\text{4}\times\text{4}}4 × 4 MIMO system employing 16-QAM modulation under Rayleigh fading channels. Both MHGD and DMALA are configured with Np=128subscript𝑁p128N_{\rm p}=128italic_N start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = 128 parallel samplers.

Fig. 5 illustrates the BER performance of DMALA with different numbers of sampling iterations T{25,100}𝑇25100T\in\{25,100\}italic_T ∈ { 25 , 100 } in a 4×444{\text{4}\times\text{4}}4 × 4 MIMO system with 16-QAM and Rayleigh fading channels. The number of parallel samplers is set as Np=128subscript𝑁p128N_{\rm p}=128italic_N start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = 128. A notable trend observed is the gradual improvement in DMALA’s BER as T𝑇Titalic_T increases, a phenomenon that resonates with DMALA’s convergence properties detailed in Theorem 1. This improvement underscores the enhanced accuracy in Monte Carlo approximation for LLRs, as the sampled distribution increasingly aligns with the target distribution with larger T𝑇Titalic_T. Nonetheless, the gains become marginal as the performance approaches that of the optimal detector. Contrarily, MHGD’s performance does not exhibit similar improvements with increased T𝑇Titalic_T, and its best achievable BER is inferior to the optimal detector. This result is in accordance with the findings in Fig. 2, where MHGD’s sampled distribution consistently exhibits bias compared to the target distribution, hence leading to an insurmountable gap in soft decisions.

For subsequent simulations, we set T=100𝑇100T=100italic_T = 100, unless noted otherwise, to ensure ample convergence of the Markov chain using DMALA, affirming the performance gains achieved via precise posterior sampling. For a fair comparison, we also set T=100𝑇100T=100italic_T = 100 for MHGD. This setup offers a balance between achieving high accuracy and maintaining moderate computational complexity, as each iteration’s complexity is merely 𝒪(N2)𝒪superscript𝑁2\mathcal{O}(N^{2})caligraphic_O ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).666Similar to DMALA, the per iteration complexity of MHGD is dominated by gradient computation, which is on the order of 𝒪(N2)𝒪superscript𝑁2\mathcal{O}(N^{2})caligraphic_O ( italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ).

Refer to caption
(a) QPSK
Refer to caption
(b) 64-QAM
Figure 6: BER performance within an 8×888{\text{8}\times\text{8}}8 × 8 MIMO system employing QPSK/64-QAM modulation under Rayleigh fading channels. Both MHGD and DMALA are configured with Np=128subscript𝑁p128N_{\rm p}=128italic_N start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = 128 parallel samplers.

Fig. 6 explores the BER performance in a medium-sized 8×888{\text{8}\times\text{8}}8 × 8 MIMO system across different modulation schemes under Rayleigh fading channels. As shown in Fig. 6(a), DMALA significantly outperforms both the EP and MHGD detectors and approaches the optimal detector’s performance with QPSK modulation. Such gains are attributed to DMALA’s efficacious sampling from the target distribution, bolstering the accuracy of statistical inference. Given DMALA’s comparable complexity to MHGD and substantially lower complexity relative to the optimal detector, the proposed method demonstrates substantial promise for efficient and accurate MIMO detection.

Additionally, in the context of 64-QAM modulation, as shown in Fig. 6(b), the optimal MAP detector is computationally prohibitive. Consequently, the K𝐾Kitalic_K-best SD detector, with a large list size of K=2048𝐾2048K=2048italic_K = 2048, is used as a near-optimal surrogate. We present the BER performance of DMALA with various numbers of sampling iterations T{25,50,100,150}𝑇2550100150T\in\{25,50,100,150\}italic_T ∈ { 25 , 50 , 100 , 150 } under this high-order modulation. The analysis reveals that the BER of DMALA converges quickly as T𝑇Titalic_T increases. Notably, the DMALA detector outperforms the MHGD detector using only half the iterations (T=50𝑇50T=50italic_T = 50 vs. T=100𝑇100T=100italic_T = 100) and approaches the near-optimal K𝐾Kitalic_K-best SD detector with T=150𝑇150T=150italic_T = 150 iterations. These results demonstrate that increasing the QAM order does not significantly raise the required number of iterations for convergence, highlighting the robustness and efficiency of the proposed DMALA in handling complex modulation scenarios.

Refer to caption
Figure 7: BER performance comparison within a 16×161616{\text{16}\times\text{16}}16 × 16 MIMO system employing 16-QAM modulation under Rayleigh fading channels. Both MHGD and DMALA are configured with Np=128subscript𝑁p128N_{\rm p}=128italic_N start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = 128 parallel samplers.

IV-C Large MIMO

This subsection delves into the performance evaluation of the proposed detector within MIMO systems featuring an increased antenna count. Fig. 7 showcases a performance comparison in a large 16×161616{\text{16}\times\text{16}}16 × 16 MIMO system using 16-QAM under Rayleigh fading channels. This comparison specifically focuses on the performance distinction between different LLR computation methods as outlined in Sec. III-C: the conventional method, as per (33) (“w/ (33)”), and the IS-based method, as per (36). These methods are evaluated for both MHGD and DMALA detectors, each utilizing Np=128subscript𝑁p128N_{\rm p}=128italic_N start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = 128 parallel samplers. Notably, the IS-based method outperforms the conventional method for both detectors, particularly at high SNRs. This advantage is largely due to the comprehensive exploitation of distribution information from the generated samples in Monte Carlo approximation, as opposed to solely relying on distinct samples within the sample list. Furthermore, DMALA’s performance benefits become even more evident within this large MIMO setting. Specifically, the DMALA with IS significantly outperforms the K𝐾Kitalic_K-best SD with K=256𝐾256K=256italic_K = 256 while employing merely half the number of samples (128 vs. 256), underscoring the efficiency of sampling from critical regions of the sample space.

Refer to caption
Figure 8: BER performance comparison within a 24×242424{\text{24}\times\text{24}}24 × 24 MIMO system employing 16-QAM modulation under Rayleigh fading channels.

Fig. 8 shows the BER performance of DMALA in an even larger 24×242424{\text{24}\times\text{24}}24 × 24 MIMO system, again with 16-QAM. We consider this system setup because enabling support for more than 24 spatial streams is a key objective in advancing MIMO technologies for the 5.5G era and has been actively pursued in numerous practical massive MIMO testbeds [41, 42]. This configuration reflects a balanced increase in spatial streams and complexity that is representative of advanced MIMO configurations, while still being feasible within the constraints of current computational resources and hardware capabilities. The analysis compares MHGD, DMALA, and K𝐾Kitalic_K-best SD under two different parameter settings. Unsurprisingly, all methods exhibit performance improvements with an increased sample list size (Npsubscript𝑁pN_{\rm p}italic_N start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT and K𝐾Kitalic_K) for soft decision-making. The figure distinctly illustrates DMALA’s superior performance over the benchmark methods across both parameter sets. Notably, DMALA maintains the same number of sampling iterations T𝑇Titalic_T as previously demonstrated to achieve these gains, whereas K𝐾Kitalic_K-best SD necessitates a substantial increase in K𝐾Kitalic_K to sustain competitive performance. This distinction highlights the remarkable scalability of our proposed approach.

Refer to caption
Figure 9: BER performance comparison within a 16×161616{\text{16}\times\text{16}}16 × 16 MIMO system employing 16-QAM modulation under spatially correlated channels (ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5).

IV-D Correlated MIMO Channels

This subsection delves into the detection performance within Kronecker spatially correlated MIMO channels, following the model introduced in [37]. The channel model is defined as

𝐇=𝐑r1/2𝐆𝐑t1/2,𝐇superscriptsubscript𝐑r12superscriptsubscript𝐆𝐑t12\mathbf{H}=\mathbf{R}_{\rm r}^{1/2}\mathbf{G}\mathbf{R}_{\rm t}^{1/2},bold_H = bold_R start_POSTSUBSCRIPT roman_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT bold_GR start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT , (39)

where 𝐆𝐆\mathbf{G}bold_G represents a Rayleigh fading channel, while 𝐑rsubscript𝐑r\mathbf{R}_{\rm r}bold_R start_POSTSUBSCRIPT roman_r end_POSTSUBSCRIPT and 𝐑tsubscript𝐑t\mathbf{R}_{\rm t}bold_R start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT denote the spatial correlation matrices at the receiver and transmitter, respectively. The exponential correlation model [37] is utilized to generate these correlation matrices, with the correlation coefficient denoted by ρ𝜌\rhoitalic_ρ.

For our simulations, we consider a 16×161616{\text{16}\times\text{16}}16 × 16 MIMO system employing 16-QAM modulation, setting the spatial correlation coefficient as ρ=0.5𝜌0.5\rho=0.5italic_ρ = 0.5. The selection of 0.5 as the spatial correlation coefficient represents a typical value indicating a moderate level of spatial correlation. This level is commonly encountered in practical MIMO systems and is widely adopted in the literature to assess the effectiveness of MIMO detectors [43, 44]. We explore different sample list sizes (128 and 256) for MHGD, DMALA, and K𝐾Kitalic_K-best SD. Fig. 9 shows the comparative performance analysis. In this challenging channel scenario, all detectors undergo performance degradation—approximately a 3 dB loss in SNR compared to their performance in uncorrelated Rayleigh fading channels (as shown in Fig. 7). Despite this, the proposed DMALA consistently achieves performance improvements over the baseline detectors with varying sample sizes, underscoring its robustness to spatial correlation effects.

Refer to caption
Figure 10: BER performance comparison under the 3GPP 3D channel model employing 16-QAM modulation, 64 BS antennas, and 16 single-antenna users, giving an effective 64×166416{\text{64}\times\text{16}}64 × 16 MIMO system.

IV-E 3GPP Massive MIMO Channels

To further attest to the effectiveness of our proposed detector, we delve into performance validation within practical 3GPP 3D MIMO channels [38], employing the QuaDRiGa channel simulator (Version 2.2.0) [45] for this purpose. We simulate the non-line-of-sight massive MIMO uplink transmission (NrNtmuch-greater-thansubscript𝑁rsubscript𝑁tN_{\rm r}\gg N_{\rm t}italic_N start_POSTSUBSCRIPT roman_r end_POSTSUBSCRIPT ≫ italic_N start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT) in an urban macrocell environment, where the base station (BS) is equipped with 32 dual-polarized (Nr=64subscript𝑁r64N_{\rm r}=64italic_N start_POSTSUBSCRIPT roman_r end_POSTSUBSCRIPT = 64) antennas, arranged in a planar array with half-wavelength spacing and positioned at a height of 25 m. The BS services a cell sector extending to a 500 m radius, with 16 single-antenna users uniformly dispersed within this area, resulting in an effective Nr×Nt=64×16subscript𝑁rsubscript𝑁t6416N_{\rm r}\times N_{\rm t}=\text{64}\times\text{16}italic_N start_POSTSUBSCRIPT roman_r end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT = 64 × 16 MIMO system configuration. Channel sampling is performed at a center frequency of 2.53 GHz, with transformation to the frequency domain via 256 effective subcarriers within a 20 MHz bandwidth. A total of 600 channel realizations are generated, each characterized by independent user locations, creating a channel dataset comprising 600×256600256{\text{600}\times\text{256}}600 × 256 channel matrices for performance evaluation.

Fig. 10 presents the performance comparison within these 3GPP MIMO channels. Notably, the MHGD detector underperforms, even lagging behind the deterministic EP method, indicative of MHGD’s sampled distribution deviating significantly from the target distribution in realistic channel conditions, thereby precipitating considerable errors in final inference (LLR computation). In contrast, the DMALA detector, thanks to its precise sampling capabilities, registers significant enhancements in performance relative to both MHGD and EP. Remarkably, compared to the near-optimal K𝐾Kitalic_K-best SD, DMALA achieves a 0.2 dB performance gain when using 128 samples and maintains comparable performance when using 256 samples. These findings affirm the resilience of our proposed detector in realistic massive MIMO channels.

IV-F Imperfect CSI

This subsection delves into the impact of imperfect CSI on the efficacy of our proposed detector. For the evaluation, we incorporate a linear MMSE channel estimator at the receiver to model the imperfect CSI, resulting in an estimated channel matrix 𝐇^=𝐇+𝐄^𝐇𝐇𝐄\hat{\mathbf{H}}=\mathbf{H}+\mathbf{E}over^ start_ARG bold_H end_ARG = bold_H + bold_E, where 𝐄𝐄\mathbf{E}bold_E is the channel estimation error matrix with entries following 𝒩(0,σe2)𝒩0superscriptsubscript𝜎e2\mathcal{N}(0,\sigma_{\rm e}^{2})caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT roman_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), with σe2superscriptsubscript𝜎e2\sigma_{\rm e}^{2}italic_σ start_POSTSUBSCRIPT roman_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT denoting the error variance [46]. Fig. 11 presents the BER performance across different levels of σe2superscriptsubscript𝜎e2\sigma_{\rm e}^{2}italic_σ start_POSTSUBSCRIPT roman_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, employing the normalized mean square error (NMSE) as a metric on the x𝑥xitalic_x-axis, defined by

NMSE=𝔼[𝐄F2]𝔼[𝐇F2].NMSE𝔼delimited-[]superscriptsubscriptnorm𝐄𝐹2𝔼delimited-[]superscriptsubscriptnorm𝐇𝐹2\text{NMSE}=\frac{\mathbb{E}[\|\mathbf{E}\|_{F}^{2}]}{\mathbb{E}[\|\mathbf{H}% \|_{F}^{2}]}.NMSE = divide start_ARG blackboard_E [ ∥ bold_E ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG start_ARG blackboard_E [ ∥ bold_H ∥ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] end_ARG . (40)

Herein, we consider a 16×161616{\text{16}\times\text{16}}16 × 16 MIMO system under Rayleigh fading channels with 16-QAM modulation at SNR=16.5dBSNR16.5dB{\text{SNR}=\text{16.5}\;\text{dB}}SNR = 16.5 dB. Consistent with the results observed under perfect CSI, the DMALA-based detector notably outperforms baseline detectors, including the K𝐾Kitalic_K-best SD configured with K=256𝐾256K=256italic_K = 256, across various levels of CSI accuracy. Specifically, DMALA achieves equivalent BER performance (BER=103BERsuperscript103\text{BER}=10^{-3}BER = 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT) as MHGD and K𝐾Kitalic_K-best SD at 4 dB and 1 dB higher channel estimation NMSE, respectively. This comparative analysis unequivocally showcases the robustness of our proposed detector to imperfect CSI.

Refer to caption
Figure 11: BER performance with respect to channel estimation errors within a 16×161616{\text{16}\times\text{16}}16 × 16 MIMO system under Rayleigh fading channels with 16-QAM modulation at SNR=16.5dBSNR16.5dB{\text{SNR}=\text{16.5}\;\text{dB}}SNR = 16.5 dB.

IV-G Performance and Complexity Comparisons with Various Baselines

Refer to caption
(a) 4×444{\text{4}\times\text{4}}4 × 4 MIMO
Refer to caption
(b) 24×242424{\text{24}\times\text{24}}24 × 24 MIMO
Figure 12: BER performance comparison in 4×444{\text{4}\times\text{4}}4 × 4 and 24×242424{\text{24}\times\text{24}}24 × 24 MIMO systems employing 16-QAM modulation under Rayleigh fading channels. All MCMC-based detectors are configured with Np=128subscript𝑁p128N_{\rm p}=128italic_N start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = 128 and Np=256subscript𝑁p256N_{\rm p}=256italic_N start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT = 256 parallel samplers for the 4×444{\text{4}\times\text{4}}4 × 4 and 24×242424{\text{24}\times\text{24}}24 × 24 MIMO systems, respectively.

In this subsection, we conduct performance and computational complexity comparisons with additional baselines, including the MMSE detector, and the Excited MCMC (X-MCMC) detector [13]—an advanced Gibbs sampling-based MIMO detector. Furthermore, the performance of the graph neural network-enhanced EP detector (GEPNet) [47] is included to illustrate the comparison against state-of-the-art deep learning-based MIMO detection.777We implement the GEPNet using the code released at https://github.com/GNN-based-MIMO-Detection/GNN-based-MIMO-Detection. We perform the comparison in two typical MIMO setups, including a small 4×444{\text{4}\times\text{4}}4 × 4 MIMO system and a large 24×242424{\text{24}\times\text{24}}24 × 24 MIMO system. Fig. 12 presents the comparison results, showing that the proposed DMALA exhibits the most competitive performance across both system setups.

TABLE I: Computational Complexity Comparison for the MIMO Detectors in Fig. 12
Algorithms Number of FLOPs
4×444{\text{4}\times\text{4}}4 × 4 MIMO 24×242424{\text{24}\times\text{24}}24 × 24 MIMO
MMSE [4] 6.60×1026.60superscript1026.60\times 10^{2}6.60 × 10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 2.76×1042.76superscript1042.76\times 10^{4}2.76 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT
EP [39] 3.14×1043.14superscript1043.14\times 10^{4}3.14 × 10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT 2.04×1052.04superscript1052.04\times 10^{5}2.04 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT
K𝐾Kitalic_K-best SD [6] 1.54×1051.54superscript1051.54\times 10^{5}1.54 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT 1.26×107delimited-⟨⟩1.26superscript107{\it\langle 1.26\times 10^{7}\rangle}⟨ italic_1.26 × italic_10 start_POSTSUPERSCRIPT italic_7 end_POSTSUPERSCRIPT ⟩ 1.66×1071.66superscript107{1.66\times 10^{7}}1.66 × 10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT 5.40×109delimited-⟨⟩5.40superscript109{\it\langle 5.40\times 10^{9}\rangle}⟨ italic_5.40 × italic_10 start_POSTSUPERSCRIPT italic_9 end_POSTSUPERSCRIPT ⟩
GEPNet [47] 4.90×1064.90superscript1064.90\times 10^{6}4.90 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT 1.71×1071.71superscript1071.71\times 10^{7}1.71 × 10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT
X-MCMC [13] 9.20×107(8.14×105)9.20superscript1078.14superscript1059.20\times 10^{7}\;(8.14\times 10^{5})9.20 × 10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT ( 8.14 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT ) 6.03×109(2.38×107)6.03superscript1092.38superscript1076.03\times 10^{9}\;(2.38\times 10^{7})6.03 × 10 start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT ( 2.38 × 10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT )
MHGD [21] 2.63×107(2.32×105)2.63superscript1072.32superscript1052.63\times 10^{7}\;(2.32\times 10^{5})2.63 × 10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT ( 2.32 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT ) 6.67×108(2.64×106)6.67superscript1082.64superscript1066.67\times 10^{8}\;(2.64\times 10^{6})6.67 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ( 2.64 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT )
DMALA 1.45×107(1.47×105)1.45superscript1071.47superscript105{1.45\times 10^{7}\;(1.47\times 10^{5})}1.45 × 10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT ( 1.47 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT ) 1.53×108(6.53×105)1.53superscript1086.53superscript105{1.53\times 10^{8}\;({6.53\times 10^{5}})}1.53 × 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ( 6.53 × 10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT )
  • Note: (similar-to\sim) denotes the FLOPs count of a single MCMC sampler. delimited-⟨⟩similar-to\langle\sim\rangle⟨ ∼ ⟩ represents the number of comparison operations within the K𝐾Kitalic_K-best SD detector, which is not included in the FLOPs count.

Subsequently, we illustrate the corresponding computational complexity of the compared detectors in Table I, using the number of floating point operations (FLOPs) as the metric. This metric was profiled using the Python package for the performance application programming interface library [48]. The simulations were performed on a machine equipped with an Intel Xeon E5-4627 CPU at 2.6 GHz and 512 GB of memory. For MCMC-based detectors, the FLOPs count for both the whole detection process and one single sampler are presented. Furthermore, it is important to note that the considerable comparison operations888The number of comparison operations within the K𝐾Kitalic_K-best SD detector is separately recorded in Table I, marked with delimited-⟨⟩similar-to\langle\sim\rangle⟨ ∼ ⟩. involved in the list administration of the K𝐾Kitalic_K-best SD detector are not included in the FLOPs count, resulting in a complexity metric that favors the SD detector [6, 49]. Based on Table I, several observations are derived:

  • Compared to other MCMC-based detectors, including X-MCMC and MHGD, the proposed DMALA exhibits lower complexity in both the 4×444{\text{4}\times\text{4}}4 × 4 and 24×242424{\text{24}\times\text{24}}24 × 24 MIMO systems. The advantage over X-MCMC can be attributed to the parallel update of all variables and the utilization of gradient information for acceleration. Additionally, the proposed method circumvents the high-complexity matrix inversion and line search involved in MHGD [21], thereby exhibiting higher computational efficiency. This superiority is particularly pronounced in the larger 24×242424{\text{24}\times\text{24}}24 × 24 MIMO setup, where DMALA achieves approximately a 77% reduction in total complexity compared to MHGD.

  • The overall computational complexity of the proposed DMALA is relatively higher than the K𝐾Kitalic_K-best SD and GEPNet. Nonetheless, the proposed method benefits from parallel implementation, and the complexity of a single sampler is notably reduced. Particularly, in the 24×242424{\text{24}\times\text{24}}24 × 24 MIMO system, the complexity of a single DMALA sampler is significantly lower compared to both K𝐾Kitalic_K-best SD and GEPNet. Therefore, by distributing the sampling process over these parallel samplers, the computation latency can be substantially reduced.

  • The complexity increase from the 4×444{\text{4}\times\text{4}}4 × 4 to the 24×242424{\text{24}\times\text{24}}24 × 24 MIMO setup in the proposed method is less pronounced when compared to the K𝐾Kitalic_K-best SD. This favorable complexity scaling property of DMALA becomes evident especially when considering that a significant complexity component of the K𝐾Kitalic_K-best SD is disregarded.

Taking a holistic view of the performance illustrated in Fig. 12 and the complexity presented in Table I, our proposed detector showcases significant potential in striking a desirable balance between performance and complexity.

V Conclusion

In this study, we introduced a near-optimal MIMO detection scheme leveraging a novel gradient-based MCMC algorithm, designed for exact sampling within discrete spaces. This advancement enables the computation of highly accurate Bayesian estimates (soft decisions) through Monte Carlo techniques. We have rigorously proved the convergence of our proposed sampling algorithm, DMALA, and conducted a thorough analysis of its convergence rate. These theoretical underpinnings not only guarantee the performance of our proposed detector but also clearly delineate it from existing heuristic approaches. Our extensive numerical investigations empirically validate the near-optimality and resilience of the proposed method, affirming its superior performance compared to contemporary state-of-the-art MIMO detectors. Moreover, the proposed detector exhibits exceptional parallelization capabilities and scalability to the number of antennas, rendering it a viable and effective option for next-generation wireless communication systems.

The current limitations of the proposed MCMC detector include increased computational overhead to achieve near-optimal performance and potential sensitivity to model mismatches. Future research should focus on improving the complexity scaling to accommodate ultra-large MIMO configurations and developing efficient hardware implementations for accelerated processing. Additionally, exploring a holistic approach that combines model-driven and data-driven methods would enhance the detector’s adaptability to diverse channel conditions and system configurations.

Appendix A Proof of Lemma 1

To avoid cluttered expressions, we consider the DMALA using the proposal in (15). Nonetheless, the proof also holds for the preconditioned version. We first show that the transition kernel of the Markov chain using DMALA is irreducible and aperiodic. For any two distinct states 𝐱𝐱\mathbf{x}bold_x and 𝐱superscript𝐱\mathbf{x}^{\prime}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT from 𝒜N×1superscript𝒜𝑁1\mathcal{A}^{N\times 1}caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT, the transition probability P(𝐱(t+1)=𝐱|𝐱(t)=𝐱)𝑃superscript𝐱𝑡1conditionalsuperscript𝐱superscript𝐱𝑡𝐱P(\mathbf{x}^{(t+1)}=\mathbf{x}^{\prime}|\mathbf{x}^{(t)}=\mathbf{x})italic_P ( bold_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = bold_x ) is given by q(𝐱|𝐱)A(𝐱|𝐱)𝑞conditionalsuperscript𝐱𝐱𝐴conditionalsuperscript𝐱𝐱q(\mathbf{x}^{\prime}|\mathbf{x})A(\mathbf{x}^{\prime}|\mathbf{x})italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) italic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ), as stated in (24), where the proposal probability q(𝐱|𝐱)𝑞conditionalsuperscript𝐱𝐱q(\mathbf{x}^{\prime}|\mathbf{x})italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) and the acceptance probability A(𝐱|𝐱)𝐴conditionalsuperscript𝐱𝐱A(\mathbf{x}^{\prime}|\mathbf{x})italic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) are given in (14) and (18), respectively. The proposal probability q(𝐱|𝐱)𝑞conditionalsuperscript𝐱𝐱q(\mathbf{x}^{\prime}|\mathbf{x})italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) satisfies 0<q(𝐱|𝐱)<10𝑞conditionalsuperscript𝐱𝐱10<q(\mathbf{x}^{\prime}|\mathbf{x})<10 < italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) < 1 as long as the step size α>0𝛼0\alpha>0italic_α > 0, given that [f(𝐱)]nsubscriptdelimited-[]𝑓𝐱𝑛\left[\nabla f(\mathbf{x})\right]_{n}[ ∇ italic_f ( bold_x ) ] start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT and (xnxn)subscriptsuperscript𝑥𝑛subscript𝑥𝑛(x^{\prime}_{n}-x_{n})( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) in (16) are bounded and the proposal probability is normalized. Moreover, based on (3) and (18), the acceptance probability can be written as

A(𝐱\displaystyle A(\mathbf{x}^{\prime}italic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT |𝐱)\displaystyle|\mathbf{x})| bold_x )
=\displaystyle== min{1,exp(f(𝐱)f(𝐱))q(𝐱|𝐱)q(𝐱|𝐱)}1𝑓superscript𝐱𝑓𝐱𝑞conditional𝐱superscript𝐱𝑞conditionalsuperscript𝐱𝐱\displaystyle\min\left\{1,\exp\big{(}f(\mathbf{x}^{\prime})-f(\mathbf{x})\big{% )}\frac{q(\mathbf{x}|\mathbf{x}^{\prime})}{q(\mathbf{x}^{\prime}|\mathbf{x})}\right\}roman_min { 1 , roman_exp ( italic_f ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - italic_f ( bold_x ) ) divide start_ARG italic_q ( bold_x | bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) end_ARG }
=\displaystyle== min{1,exp(𝐲𝐇𝐱2𝐲𝐇𝐱2σ2)q(𝐱|𝐱)q(𝐱|𝐱)},1superscriptnorm𝐲𝐇𝐱2superscriptnorm𝐲superscript𝐇𝐱2superscript𝜎2𝑞conditional𝐱superscript𝐱𝑞conditionalsuperscript𝐱𝐱\displaystyle\min\left\{1,\exp\left(\frac{\|\mathbf{y}-\mathbf{Hx}\|^{2}-\|% \mathbf{y}-\mathbf{Hx}^{\prime}\|^{2}}{\sigma^{2}}\right)\frac{q(\mathbf{x}|% \mathbf{x}^{\prime})}{q(\mathbf{x}^{\prime}|\mathbf{x})}\right\},roman_min { 1 , roman_exp ( divide start_ARG ∥ bold_y - bold_Hx ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∥ bold_y - bold_Hx start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) divide start_ARG italic_q ( bold_x | bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) end_ARG } , (41)

which remains nonzero as long as σ2>0superscript𝜎20\sigma^{2}>0italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 0. This ensures that

P(𝐱(t+1)=𝐱|𝐱(t)=𝐱)>0,𝐱,𝐱𝒜N×1,𝐱𝐱,formulae-sequence𝑃superscript𝐱𝑡1conditionalsuperscript𝐱superscript𝐱𝑡𝐱0for-all𝐱formulae-sequencesuperscript𝐱superscript𝒜𝑁1superscript𝐱𝐱P\big{(}\mathbf{x}^{(t+1)}=\mathbf{x}^{\prime}|\mathbf{x}^{(t)}=\mathbf{x}\big% {)}>0,\;\forall\mathbf{x},\mathbf{x}^{\prime}\in\mathcal{A}^{N\times 1},% \mathbf{x}^{\prime}\neq\mathbf{x},italic_P ( bold_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = bold_x ) > 0 , ∀ bold_x , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ bold_x , (42)

thereby confirming the chain’s irreducibility by definition [50].

On the other hand, when 𝐱=𝐱superscript𝐱𝐱\mathbf{x}^{\prime}=\mathbf{x}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = bold_x, according to (24) we have

P(𝐱(t+1)=𝐱|𝐱(t)=𝐱)𝑃superscript𝐱𝑡1conditional𝐱superscript𝐱𝑡𝐱\displaystyle P\big{(}\mathbf{x}^{(t+1)}=\mathbf{x}|\mathbf{x}^{(t)}=\mathbf{x% }\big{)}italic_P ( bold_x start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = bold_x | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = bold_x )
=q(𝐱|𝐱)+𝐳𝐱q(𝐳|𝐱)(1A(𝐳|𝐱))q(𝐱|𝐱)>0,absent𝑞conditional𝐱𝐱subscript𝐳𝐱𝑞conditional𝐳𝐱1𝐴conditional𝐳𝐱𝑞conditional𝐱𝐱0\displaystyle=q(\mathbf{x}|\mathbf{x})+\sum_{\mathbf{z}\neq\mathbf{x}}q(% \mathbf{z}|\mathbf{x})\big{(}1-A(\mathbf{z}|\mathbf{x})\big{)}\geq q(\mathbf{x% }|\mathbf{x})>0,= italic_q ( bold_x | bold_x ) + ∑ start_POSTSUBSCRIPT bold_z ≠ bold_x end_POSTSUBSCRIPT italic_q ( bold_z | bold_x ) ( 1 - italic_A ( bold_z | bold_x ) ) ≥ italic_q ( bold_x | bold_x ) > 0 , (43)

where the first equation is due to the fact that the chain stays in state 𝐱𝐱\mathbf{x}bold_x either when the proposal is 𝐱𝐱\mathbf{x}bold_x, or when the proposal is 𝐳𝐱𝐳𝐱\mathbf{z}\neq\mathbf{x}bold_z ≠ bold_x but is rejected. The first inequality holds because 0<q(𝐳|𝐱)<10𝑞conditional𝐳𝐱10<q(\mathbf{z}|\mathbf{x})<10 < italic_q ( bold_z | bold_x ) < 1 based on the preceding analysis, and 0<A(𝐳|𝐱)10𝐴conditional𝐳𝐱10<A(\mathbf{z}|\mathbf{x})\leq 10 < italic_A ( bold_z | bold_x ) ≤ 1 according to (41). The second inequality holds for the same reason as q(𝐱|𝐱)>0(𝐱𝐱)𝑞conditionalsuperscript𝐱𝐱0superscript𝐱𝐱q(\mathbf{x}^{\prime}|\mathbf{x})>0\;(\mathbf{x}^{\prime}\neq\mathbf{x})italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) > 0 ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ bold_x ). Leveraging (43), we deduce that

gcd{m:P(𝐱(t+m)=𝐱|𝐱(t)=𝐱)>0}=1gcdconditional-set𝑚𝑃superscript𝐱𝑡𝑚conditional𝐱superscript𝐱𝑡𝐱01\text{gcd}\left\{m:P\big{(}\mathbf{x}^{(t+m)}=\mathbf{x}|\mathbf{x}^{(t)}=% \mathbf{x}\big{)}>0\right\}=1gcd { italic_m : italic_P ( bold_x start_POSTSUPERSCRIPT ( italic_t + italic_m ) end_POSTSUPERSCRIPT = bold_x | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = bold_x ) > 0 } = 1 (44)

for any positive integer m>0𝑚0m>0italic_m > 0, where “gcd” denotes the greatest common divisor. This implies that the chain is aperiodic by definition [50].

We then show that the chain induced by DMALA is reversible with respect to the target posterior distribution π𝜋\piitalic_π, i.e., given two states 𝐱,𝐱𝒜N×1𝐱superscript𝐱superscript𝒜𝑁1\mathbf{x},\mathbf{x}^{\prime}\in\mathcal{A}^{N\times 1}bold_x , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT, the following equation (also called detailed balance condition [7]) holds:

π(𝐱)P(𝐱|𝐱)=π(𝐱)P(𝐱|𝐱),𝜋𝐱𝑃conditionalsuperscript𝐱𝐱𝜋superscript𝐱𝑃conditional𝐱superscript𝐱\pi(\mathbf{x})P(\mathbf{x}^{\prime}|\mathbf{x})=\pi(\mathbf{x}^{\prime})P(% \mathbf{x}|\mathbf{x}^{\prime}),italic_π ( bold_x ) italic_P ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) = italic_π ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_P ( bold_x | bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , (45)

which is a sufficient condition for guaranteeing that π𝜋\piitalic_π is a stationary distribution of the Markov chain. Since (45) naturally holds when 𝐱=𝐱superscript𝐱𝐱\mathbf{x}^{\prime}=\mathbf{x}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = bold_x, we consider the case where 𝐱𝐱superscript𝐱𝐱\mathbf{x}^{\prime}\neq\mathbf{x}bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ bold_x. In this case, we have

π(𝐱)P(𝐱|𝐱)𝜋𝐱𝑃conditionalsuperscript𝐱𝐱\displaystyle\pi(\mathbf{x})P(\mathbf{x}^{\prime}|\mathbf{x})italic_π ( bold_x ) italic_P ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) =π(𝐱)q(𝐱|𝐱)A(𝐱|𝐱)absent𝜋𝐱𝑞conditionalsuperscript𝐱𝐱𝐴conditionalsuperscript𝐱𝐱\displaystyle=\pi(\mathbf{x})q(\mathbf{x}^{\prime}|\mathbf{x})A(\mathbf{x}^{% \prime}|\mathbf{x})= italic_π ( bold_x ) italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) italic_A ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x )
=min{π(𝐱)q(𝐱|𝐱),π(𝐱)q(𝐱|𝐱)},absent𝜋𝐱𝑞conditionalsuperscript𝐱𝐱𝜋superscript𝐱𝑞conditional𝐱superscript𝐱\displaystyle=\min\left\{\pi(\mathbf{x})q(\mathbf{x}^{\prime}|\mathbf{x}),{\pi% (\mathbf{x}^{\prime})q(\mathbf{x}|\mathbf{x}^{\prime})}\right\},= roman_min { italic_π ( bold_x ) italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) , italic_π ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_q ( bold_x | bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) } , (46)

and by virtue of the same rationale,

π(𝐱)P(𝐱|𝐱)𝜋superscript𝐱𝑃conditional𝐱superscript𝐱\displaystyle\pi(\mathbf{x}^{\prime})P(\mathbf{x}|\mathbf{x}^{\prime})italic_π ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_P ( bold_x | bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) =π(𝐱)q(𝐱|𝐱)A(𝐱|𝐱)absent𝜋superscript𝐱𝑞conditional𝐱superscript𝐱𝐴conditional𝐱superscript𝐱\displaystyle=\pi(\mathbf{x}^{\prime})q(\mathbf{x}|\mathbf{x}^{\prime})A(% \mathbf{x}|\mathbf{x}^{\prime})= italic_π ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_q ( bold_x | bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_A ( bold_x | bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )
=min{π(𝐱)q(𝐱|𝐱),π(𝐱)q(𝐱|𝐱)}.absent𝜋superscript𝐱𝑞conditional𝐱superscript𝐱𝜋𝐱𝑞conditionalsuperscript𝐱𝐱\displaystyle=\min\left\{{\pi(\mathbf{x}^{\prime})q(\mathbf{x}|\mathbf{x}^{% \prime})},\pi(\mathbf{x})q(\mathbf{x}^{\prime}|\mathbf{x})\right\}.= roman_min { italic_π ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_q ( bold_x | bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , italic_π ( bold_x ) italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) } . (47)

Hence, we justify that (45) holds. With this equation, we have

𝐱𝒜N×1π(𝐱)P(𝐱|𝐱)subscriptsuperscript𝐱superscript𝒜𝑁1𝜋superscript𝐱𝑃conditional𝐱superscript𝐱\displaystyle\sum_{\mathbf{x}^{\prime}\in\mathcal{A}^{N\times 1}}\pi(\mathbf{x% }^{\prime})P(\mathbf{x}|\mathbf{x}^{\prime})∑ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_π ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) italic_P ( bold_x | bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) =𝐱𝒜N×1π(𝐱)P(𝐱|𝐱)absentsubscriptsuperscript𝐱superscript𝒜𝑁1𝜋𝐱𝑃conditionalsuperscript𝐱𝐱\displaystyle=\sum_{\mathbf{x}^{\prime}\in\mathcal{A}^{N\times 1}}\pi(\mathbf{% x})P(\mathbf{x}^{\prime}|\mathbf{x})= ∑ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_π ( bold_x ) italic_P ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x )
=π(𝐱)𝐱𝒜N×1P(𝐱|𝐱)=π(𝐱),absent𝜋𝐱subscriptsuperscript𝐱superscript𝒜𝑁1𝑃conditionalsuperscript𝐱𝐱𝜋𝐱\displaystyle=\pi(\mathbf{x})\sum_{\mathbf{x}^{\prime}\in\mathcal{A}^{N\times 1% }}P(\mathbf{x}^{\prime}|\mathbf{x})=\pi(\mathbf{x}),= italic_π ( bold_x ) ∑ start_POSTSUBSCRIPT bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_P ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x ) = italic_π ( bold_x ) , (48)

completing the proof of π𝜋\piitalic_π being a stationary distribution of the chain using DMALA. Furthermore, since the chain is irreducible, this stationary distribution is unique [30, Corollary 1.17], completing the proof of Lemma 1.

Appendix B Proof of Lemma 2

The proposal distribution of DMALA can be expressed as

q𝑞\displaystyle qitalic_q (𝐱|𝐱(t))conditionalsuperscript𝐱superscript𝐱𝑡\displaystyle(\mathbf{x}^{\prime}|\mathbf{x}^{(t)})( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT )
=exp(12α𝐱𝐱(t)α2f(𝐱(t))2)n=1Nxn𝒜exp(12α|xnxn(t)α2[f(𝐱(t))]n|2).absent12𝛼superscriptnormsuperscript𝐱superscript𝐱𝑡𝛼2𝑓superscript𝐱𝑡2superscriptsubscriptproduct𝑛1𝑁subscriptsuperscriptsubscript𝑥𝑛𝒜12𝛼superscriptsuperscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛𝑡𝛼2subscriptdelimited-[]𝑓superscript𝐱𝑡𝑛2\displaystyle=\frac{\exp\big{(}-\frac{1}{2\alpha}\|\mathbf{x}^{\prime}-\mathbf% {x}^{(t)}-\frac{\alpha}{2}\nabla f(\mathbf{x}^{(t)})\|^{2}\big{)}}{\prod_{n=1}% ^{N}\sum_{x_{n}^{\prime}\in\mathcal{A}}\exp\big{(}-\frac{1}{2\alpha}|x_{n}^{% \prime}-x_{n}^{(t)}-\frac{\alpha}{2}[\nabla f(\mathbf{x}^{(t)})]_{n}|^{2}\big{% )}}.= divide start_ARG roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_α end_ARG ∥ bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A end_POSTSUBSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_α end_ARG | italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG [ ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG . (49)

Combining this equation with the expression of the target posterior distribution, we have

q(𝐱|𝐱(t))π(𝐱)𝑞conditionalsuperscript𝐱superscript𝐱𝑡𝜋superscript𝐱\displaystyle\frac{q(\mathbf{x}^{\prime}|\mathbf{x}^{(t)})}{\pi(\mathbf{x}^{% \prime})}divide start_ARG italic_q ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_π ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG
=\displaystyle=\;= exp(12α𝐱𝐱(t)α2f(𝐱(t))2)n=1Nxn𝒜exp(12α|xnxn(t)α2[f(𝐱(t))]n|2)12𝛼superscriptnormsuperscript𝐱superscript𝐱𝑡𝛼2𝑓superscript𝐱𝑡2superscriptsubscriptproduct𝑛1𝑁subscriptsuperscriptsubscript𝑥𝑛𝒜12𝛼superscriptsuperscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛𝑡𝛼2subscriptdelimited-[]𝑓superscript𝐱𝑡𝑛2\displaystyle\frac{\exp\big{(}-\frac{1}{2\alpha}\|\mathbf{x}^{\prime}-\mathbf{% x}^{(t)}-\frac{\alpha}{2}\nabla f(\mathbf{x}^{(t)})\|^{2}\big{)}}{\prod_{n=1}^% {N}\sum_{x_{n}^{\prime}\in\mathcal{A}}\exp\big{(}-\frac{1}{2\alpha}|x_{n}^{% \prime}-x_{n}^{(t)}-\frac{\alpha}{2}[\nabla f(\mathbf{x}^{(t)})]_{n}|^{2}\big{% )}}divide start_ARG roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_α end_ARG ∥ bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A end_POSTSUBSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_α end_ARG | italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG [ ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG
𝐬𝒜N×1exp(f(𝐬))expf(𝐱)absentsubscript𝐬superscript𝒜𝑁1𝑓𝐬𝑓superscript𝐱\displaystyle\cdot\frac{\sum_{\mathbf{s}\in\mathcal{A}^{N\times 1}}\exp\big{(}% f(\mathbf{s})\big{)}}{\exp f(\mathbf{x}^{\prime})}⋅ divide start_ARG ∑ start_POSTSUBSCRIPT bold_s ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_f ( bold_s ) ) end_ARG start_ARG roman_exp italic_f ( bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG
\displaystyle\geq\; 𝐬𝒜N×1exp(f(𝐬))n=1Nxnexp(12α|xnxn(t)α2[f(𝐱(t))]n|2)subscript𝐬superscript𝒜𝑁1𝑓𝐬superscriptsubscriptproduct𝑛1𝑁subscriptsuperscriptsubscript𝑥𝑛12𝛼superscriptsuperscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛𝑡𝛼2subscriptdelimited-[]𝑓superscript𝐱𝑡𝑛2\displaystyle\frac{\sum_{\mathbf{s}\in\mathcal{A}^{N\times 1}}\exp\big{(}f(% \mathbf{s})\big{)}}{\prod_{n=1}^{N}\sum_{x_{n}^{\prime}\in\mathbb{Z}}\exp\big{% (}-\frac{1}{2\alpha}|x_{n}^{\prime}-x_{n}^{(t)}-\frac{\alpha}{2}[\nabla f(% \mathbf{x}^{(t)})]_{n}|^{2}\big{)}}divide start_ARG ∑ start_POSTSUBSCRIPT bold_s ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_f ( bold_s ) ) end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_Z end_POSTSUBSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_α end_ARG | italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG [ ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG
G(𝐱(t),𝐱)absent𝐺superscript𝐱𝑡superscript𝐱\displaystyle\cdot G(\mathbf{x}^{(t)},\mathbf{x}^{\prime})⋅ italic_G ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )
\displaystyle\geq\; 𝐬𝒜N×1exp(f(𝐬))n=1Nxnexp(12α|xn|2)G(𝐱(t),𝐱)subscript𝐬superscript𝒜𝑁1𝑓𝐬superscriptsubscriptproduct𝑛1𝑁subscriptsuperscriptsubscript𝑥𝑛12𝛼superscriptsuperscriptsubscript𝑥𝑛2𝐺superscript𝐱𝑡superscript𝐱\displaystyle\frac{\sum_{\mathbf{s}\in\mathcal{A}^{N\times 1}}\exp\big{(}f(% \mathbf{s})\big{)}}{\prod_{n=1}^{N}\sum_{x_{n}^{\prime}\in\mathbb{Z}}\exp\big{% (}-\frac{1}{2\alpha}|x_{n}^{\prime}|^{2}\big{)}}\cdot G(\mathbf{x}^{(t)},% \mathbf{x}^{\prime})divide start_ARG ∑ start_POSTSUBSCRIPT bold_s ∈ caligraphic_A start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_f ( bold_s ) ) end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_Z end_POSTSUBSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_α end_ARG | italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG ⋅ italic_G ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )
=\displaystyle=\;= ξG(𝐱(t),𝐱),𝜉𝐺superscript𝐱𝑡superscript𝐱\displaystyle\xi\cdot G(\mathbf{x}^{(t)},\mathbf{x}^{\prime}),italic_ξ ⋅ italic_G ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT , bold_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) , (50)

where the two inequalities follow the fact from the lattice theory [51]:

xn𝒜exp(12α|xnxn(t)α2[f(𝐱(t))]n|2)subscriptsuperscriptsubscript𝑥𝑛𝒜12𝛼superscriptsuperscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛𝑡𝛼2subscriptdelimited-[]𝑓superscript𝐱𝑡𝑛2\displaystyle\sum_{x_{n}^{\prime}\in\mathcal{A}}\exp\bigg{(}-\frac{1}{2\alpha}% |x_{n}^{\prime}-x_{n}^{(t)}-\frac{\alpha}{2}[\nabla f(\mathbf{x}^{(t)})]_{n}|^% {2}\bigg{)}∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ caligraphic_A end_POSTSUBSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_α end_ARG | italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG [ ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
\displaystyle\leq xnexp(12α|xnxn(t)α2[f(𝐱(t))]n|2)subscriptsuperscriptsubscript𝑥𝑛12𝛼superscriptsuperscriptsubscript𝑥𝑛superscriptsubscript𝑥𝑛𝑡𝛼2subscriptdelimited-[]𝑓superscript𝐱𝑡𝑛2\displaystyle\sum_{x_{n}^{\prime}\in\mathbb{Z}}\exp\bigg{(}-\frac{1}{2\alpha}|% x_{n}^{\prime}-x_{n}^{(t)}-\frac{\alpha}{2}[\nabla f(\mathbf{x}^{(t)})]_{n}|^{% 2}\bigg{)}∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_Z end_POSTSUBSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_α end_ARG | italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - divide start_ARG italic_α end_ARG start_ARG 2 end_ARG [ ∇ italic_f ( bold_x start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ) ] start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
\displaystyle\leq xnexp(12α|xn|2),subscriptsuperscriptsubscript𝑥𝑛12𝛼superscriptsuperscriptsubscript𝑥𝑛2\displaystyle\sum_{x_{n}^{\prime}\in\mathbb{Z}}\exp\bigg{(}-\frac{1}{2\alpha}|% x_{n}^{\prime}|^{2}\bigg{)},∑ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ blackboard_Z end_POSTSUBSCRIPT roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 italic_α end_ARG | italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) , (51)

completing the proof of Lemma 2.

Appendix C Derivation of (36)

According to the IS-based Monte Carlo summation in (7), by treating p(bk=±1|𝐲)𝑝subscript𝑏𝑘plus-or-minusconditional1𝐲p(b_{k}=\pm 1|\mathbf{y})italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ± 1 | bold_y ) as the expectation 𝔼p(𝐛k|𝐲)[p(bk=±1|𝐲,𝐛k)]subscript𝔼𝑝conditionalsubscript𝐛𝑘𝐲delimited-[]𝑝subscript𝑏𝑘plus-or-minusconditional1𝐲subscript𝐛𝑘\mathbb{E}_{p(\mathbf{b}_{-k}|\mathbf{y})}[p(b_{k}=\pm 1|\mathbf{y},\mathbf{b}% _{-k})]blackboard_E start_POSTSUBSCRIPT italic_p ( bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT | bold_y ) end_POSTSUBSCRIPT [ italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ± 1 | bold_y , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT ) ], these APPs can be approximated by

1Ss=1Sp(bk=±1|𝐲,𝐛k[s])p(𝐛k[s]|𝐲)πa(𝐛k[s]),1𝑆superscriptsubscript𝑠1𝑆𝑝subscript𝑏𝑘plus-or-minusconditional1𝐲superscriptsubscript𝐛𝑘delimited-[]𝑠𝑝conditionalsuperscriptsubscript𝐛𝑘delimited-[]𝑠𝐲subscript𝜋asuperscriptsubscript𝐛𝑘delimited-[]𝑠\frac{1}{S}\sum_{s=1}^{S}p(b_{k}=\pm 1|\mathbf{y},\mathbf{b}_{-k}^{[s]})\frac{% p(\mathbf{b}_{-k}^{[s]}|\mathbf{y})}{\pi_{\rm a}(\mathbf{b}_{-k}^{[s]})},divide start_ARG 1 end_ARG start_ARG italic_S end_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ± 1 | bold_y , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) divide start_ARG italic_p ( bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT | bold_y ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG , (52)

where 𝐛k[s]superscriptsubscript𝐛𝑘delimited-[]𝑠\mathbf{b}_{-k}^{[s]}bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT is obtained by dropping the k𝑘kitalic_k-th bit after demapping the sample 𝐱[s]superscript𝐱delimited-[]𝑠\mathbf{x}^{[s]}bold_x start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT drawn from πasubscript𝜋a\pi_{\rm a}italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT. Note that herein we use the argument 𝐛𝐛\mathbf{b}bold_b for πasubscript𝜋a\pi_{\rm a}italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT to simplify the derivation of the LLR expression. Consequently, the posterior LLR of the k𝑘kitalic_k-th bit can be approximated as

L^ksubscript^𝐿𝑘\displaystyle\hat{L}_{k}over^ start_ARG italic_L end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT =logs=1Sp(bk=+1|𝐲,𝐛k[s])p(𝐛k[s]|𝐲)πa(𝐛k[s])s=1Sp(bk=1|𝐲,𝐛k[s])p(𝐛k[s]|𝐲)πa(𝐛k[s])absentsuperscriptsubscript𝑠1𝑆𝑝subscript𝑏𝑘conditional1𝐲superscriptsubscript𝐛𝑘delimited-[]𝑠𝑝conditionalsuperscriptsubscript𝐛𝑘delimited-[]𝑠𝐲subscript𝜋asuperscriptsubscript𝐛𝑘delimited-[]𝑠superscriptsubscript𝑠1𝑆𝑝subscript𝑏𝑘conditional1𝐲superscriptsubscript𝐛𝑘delimited-[]𝑠𝑝conditionalsuperscriptsubscript𝐛𝑘delimited-[]𝑠𝐲subscript𝜋asuperscriptsubscript𝐛𝑘delimited-[]𝑠\displaystyle=\log\frac{\sum_{s=1}^{S}p(b_{k}=+1|\mathbf{y},\mathbf{b}_{-k}^{[% s]})\frac{p(\mathbf{b}_{-k}^{[s]}|\mathbf{y})}{\pi_{\rm a}(\mathbf{b}_{-k}^{[s% ]})}}{\sum_{s=1}^{S}p(b_{k}=-1|\mathbf{y},\mathbf{b}_{-k}^{[s]})\frac{p(% \mathbf{b}_{-k}^{[s]}|\mathbf{y})}{\pi_{\rm a}(\mathbf{b}_{-k}^{[s]})}}= roman_log divide start_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = + 1 | bold_y , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) divide start_ARG italic_p ( bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT | bold_y ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = - 1 | bold_y , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) divide start_ARG italic_p ( bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT | bold_y ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG end_ARG
=logs=1Sπa(bk=+1|𝐛k[s])p(bk=+1,𝐛k[s]|𝐲)πa(bk=+1,𝐛k[s])s=1Sπa(bk=1|𝐛k[s])p(bk=1,𝐛k[s]|𝐲)πa(bk=1,𝐛k[s]),absentsuperscriptsubscript𝑠1𝑆subscript𝜋asubscript𝑏𝑘conditional1superscriptsubscript𝐛𝑘delimited-[]𝑠𝑝subscript𝑏𝑘1conditionalsuperscriptsubscript𝐛𝑘delimited-[]𝑠𝐲subscript𝜋asubscript𝑏𝑘1superscriptsubscript𝐛𝑘delimited-[]𝑠superscriptsubscript𝑠1𝑆subscript𝜋asubscript𝑏𝑘conditional1superscriptsubscript𝐛𝑘delimited-[]𝑠𝑝subscript𝑏𝑘1conditionalsuperscriptsubscript𝐛𝑘delimited-[]𝑠𝐲subscript𝜋asubscript𝑏𝑘1superscriptsubscript𝐛𝑘delimited-[]𝑠\displaystyle=\log\frac{\sum_{s=1}^{S}\pi_{\rm a}(b_{k}=+1|\mathbf{b}_{-k}^{[s% ]})\frac{p(b_{k}=+1,\mathbf{b}_{-k}^{[s]}|\mathbf{y})}{\pi_{\rm a}(b_{k}=+1,% \mathbf{b}_{-k}^{[s]})}}{\sum_{s=1}^{S}\pi_{\rm a}(b_{k}=-1|\mathbf{b}_{-k}^{[% s]})\frac{p(b_{k}=-1,\mathbf{b}_{-k}^{[s]}|\mathbf{y})}{\pi_{\rm a}(b_{k}=-1,% \mathbf{b}_{-k}^{[s]})}},= roman_log divide start_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = + 1 | bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) divide start_ARG italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = + 1 , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT | bold_y ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = + 1 , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S end_POSTSUPERSCRIPT italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = - 1 | bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) divide start_ARG italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = - 1 , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT | bold_y ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = - 1 , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG end_ARG , (53)

where the second equality follows from the Bayes rule p(a1,a2)=p(a2|a1)p(a1)𝑝subscript𝑎1subscript𝑎2𝑝conditionalsubscript𝑎2subscript𝑎1𝑝subscript𝑎1p(a_{1},a_{2})=p(a_{2}|a_{1})p(a_{1})italic_p ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = italic_p ( italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_p ( italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ). For the conditional probability πa(bk=±1|𝐛k[s])subscript𝜋asubscript𝑏𝑘plus-or-minusconditional1superscriptsubscript𝐛𝑘delimited-[]𝑠\pi_{\rm a}(b_{k}=\pm 1|\mathbf{b}_{-k}^{[s]})italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ± 1 | bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ), to avoid numerical instability, we consider a log-domain implementation by defining

γk[s]superscriptsubscript𝛾𝑘delimited-[]𝑠\displaystyle\gamma_{k}^{[s]}italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT =logπa(bk=+1|𝐛k[s])πa(bk=1|𝐛k[s])=logπa(bk=+1,𝐛k[s])πa(bk=1,𝐛k[s])absentsubscript𝜋asubscript𝑏𝑘conditional1superscriptsubscript𝐛𝑘delimited-[]𝑠subscript𝜋asubscript𝑏𝑘conditional1superscriptsubscript𝐛𝑘delimited-[]𝑠subscript𝜋asubscript𝑏𝑘1superscriptsubscript𝐛𝑘delimited-[]𝑠subscript𝜋asubscript𝑏𝑘1superscriptsubscript𝐛𝑘delimited-[]𝑠\displaystyle=\log\frac{\pi_{\rm a}(b_{k}=+1|\mathbf{b}_{-k}^{[s]})}{\pi_{\rm a% }(b_{k}=-1|\mathbf{b}_{-k}^{[s]})}=\log\frac{\pi_{\rm a}(b_{k}=+1,\mathbf{b}_{% -k}^{[s]})}{\pi_{\rm a}(b_{k}=-1,\mathbf{b}_{-k}^{[s]})}= roman_log divide start_ARG italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = + 1 | bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = - 1 | bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG = roman_log divide start_ARG italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = + 1 , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = - 1 , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG
=1τ(f(𝐱+1[s])f(𝐱1[s])).absent1𝜏𝑓superscriptsubscript𝐱1delimited-[]𝑠𝑓superscriptsubscript𝐱1delimited-[]𝑠\displaystyle=\frac{1}{\tau}\left(f(\mathbf{x}_{+1}^{[s]})-f(\mathbf{x}_{-1}^{% [s]})\right).= divide start_ARG 1 end_ARG start_ARG italic_τ end_ARG ( italic_f ( bold_x start_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) - italic_f ( bold_x start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) ) . (54)

Therefore, we have

πa(bk=±1|𝐛k[s])=11+exp(γk[s]).subscript𝜋asubscript𝑏𝑘plus-or-minusconditional1superscriptsubscript𝐛𝑘delimited-[]𝑠11minus-or-plussuperscriptsubscript𝛾𝑘delimited-[]𝑠\pi_{\rm a}(b_{k}=\pm 1|\mathbf{b}_{-k}^{[s]})=\frac{1}{1+\exp(\mp\gamma_{k}^{% [s]})}.italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ± 1 | bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG 1 + roman_exp ( ∓ italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG . (55)

Moreover, based on the definition of π𝜋\piitalic_π and πasubscript𝜋a\pi_{\rm a}italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT, we have

p(bk=±1,𝐛k[s]|𝐲)πa(bk=±1,𝐛k[s])π(𝐱±1[s])π(𝐱±1[s])1/τ=exp(τ1τf(𝐱±1[s])).proportional-to𝑝subscript𝑏𝑘plus-or-minus1conditionalsuperscriptsubscript𝐛𝑘delimited-[]𝑠𝐲subscript𝜋asubscript𝑏𝑘plus-or-minus1superscriptsubscript𝐛𝑘delimited-[]𝑠𝜋superscriptsubscript𝐱plus-or-minus1delimited-[]𝑠𝜋superscriptsuperscriptsubscript𝐱plus-or-minus1delimited-[]𝑠1𝜏𝜏1𝜏𝑓superscriptsubscript𝐱plus-or-minus1delimited-[]𝑠\frac{p(b_{k}=\pm 1,\mathbf{b}_{-k}^{[s]}|\mathbf{y})}{\pi_{\rm a}(b_{k}=\pm 1% ,\mathbf{b}_{-k}^{[s]})}\propto\frac{\pi(\mathbf{x}_{\pm 1}^{[s]})}{\pi(% \mathbf{x}_{\pm 1}^{[s]})^{1/\tau}}=\exp\left(\frac{\tau-1}{\tau}f(\mathbf{x}_% {\pm 1}^{[s]})\right).divide start_ARG italic_p ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ± 1 , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT | bold_y ) end_ARG start_ARG italic_π start_POSTSUBSCRIPT roman_a end_POSTSUBSCRIPT ( italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ± 1 , bold_b start_POSTSUBSCRIPT - italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG ∝ divide start_ARG italic_π ( bold_x start_POSTSUBSCRIPT ± 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_π ( bold_x start_POSTSUBSCRIPT ± 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 1 / italic_τ end_POSTSUPERSCRIPT end_ARG = roman_exp ( divide start_ARG italic_τ - 1 end_ARG start_ARG italic_τ end_ARG italic_f ( bold_x start_POSTSUBSCRIPT ± 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT [ italic_s ] end_POSTSUPERSCRIPT ) ) . (56)

Substituting (55) and (56) into (53), we derive (36).

References

  • [1] E. Björnson, J. Hoydis, and L. Sanguinetti, “Massive MIMO networks: Spectral, energy, and hardware efficiency,” Foundations and Trends® in Signal Processing, vol. 11, no. 3-4, pp. 154–655, 2017.
  • [2] E. Björnson, L. Sanguinetti, H. Wymeersch, J. Hoydis, and T. L. Marzetta, “Massive MIMO is a reality—What is next?: Five promising research directions for antenna arrays,” Digit. Signal Process., vol. 94, pp. 3–20, Nov. 2019.
  • [3] J. Zhang, E. Björnson, M. Matthaiou, D. W. K. Ng, H. Yang, and D. J. Love, “Prospective multiple antenna technologies for beyond 5G,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1637–1660, Aug. 2020.
  • [4] S. Yang and L. Hanzo, “Fifty years of MIMO detection: The road to large-scale MIMOs,” IEEE Commun. Surv. Tuts., vol. 17, no. 4, pp. 1941–1988, 2015.
  • [5] B. Hochwald and S. ten Brink, “Achieving near-capacity on a multiple-antenna channel,” IEEE Trans. Commun., vol. 51, no. 3, pp. 389–399, Mar. 2003.
  • [6] Z. Guo and P. Nilsson, “Algorithm and implementation of the K-best sphere decoding for MIMO detection,” IEEE J. Sel. Areas Commun., vol. 24, no. 3, pp. 491–503, Mar. 2006.
  • [7] C. M. Bishop, Pattern Recognition and Machine Learning.   New York, NY, USA: Springer, 2006.
  • [8] S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, Handbook of Markov Chain Monte Carlo.    Boca Raton, FL, USA: Chapman & Hall, 2011.
  • [9] A. Doucet and X. Wang, “Monte Carlo methods for signal processing: A review in the statistical signal processing context,” IEEE Signal Process. Mag., vol. 22, no. 6, pp. 152–170, Nov. 2005.
  • [10] R. Chen, J. Liu, and X. Wang, “Convergence analyses and comparisons of Markov chain Monte Carlo algorithms in digital communications,” IEEE Trans. Signal Process., vol. 50, no. 2, pp. 255–270, Feb. 2002.
  • [11] S. A. Laraway and B. Farhang-Boroujeny, “Implementation of a Markov chain Monte Carlo based multiuser/MIMO detector,” IEEE Trans. Circuits and Syst. I: Reg. Papers, vol. 56, no. 1, pp. 246–255, Jan. 2009.
  • [12] B. Farhang-Boroujeny, H. Zhu, and Z. Shi, “Markov chain Monte Carlo algorithms for CDMA and MIMO communication systems,” IEEE Trans. Signal Process., vol. 54, no. 5, pp. 1896–1909, May 2006.
  • [13] J. C. Hedstrom, C. H. Yuen, R.-R. Chen, and B. Farhang-Boroujeny, “Achieving near MAP performance with an excited Markov chain Monte Carlo MIMO detector,” IEEE Trans. Wireless Commun., vol. 16, no. 12, pp. 7718–7732, Dec. 2017.
  • [14] W. Grathwohl, K. Swersky, M. Hashemi, D. Duvenaud, and C. Maddison, “Oops I took a gradient: Scalable sampling for discrete distributions,” in Proc. Int. Conf. Mach. Learn. (ICML), Jul. 2021, pp. 3831–3841.
  • [15] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 6, no. 6, pp. 721–741, Nov. 1984.
  • [16] Y.-A. Ma, Y. Chen, C. Jin, N. Flammarion, and M. I. Jordan, “Sampling can be faster than optimization,” Proc. Nat. Acad. Sci., vol. 116, no. 42, pp. 20 881–20 885, Sep. 2019.
  • [17] G. O. Roberts and R. L. Tweedie, “Exponential convergence of Langevin distributions and their discrete approximations,” Bernoulli, vol. 2, no. 4, pp. 341–363, 1996.
  • [18] G. O. Roberts and O. Stramer, “Langevin diffusions and Metropolis-Hastings algorithms,” Methodol. Comput. Appl. Probab., vol. 4, no. 4, pp. 337–357, Dec. 2002.
  • [19] R. Zhang, X. Liu, and Q. Liu, “A Langevin-like sampler for discrete distributions,” in Proc. Int. Conf. Mach. Learn. (ICML), Jun. 2022, pp. 26 375–26 396.
  • [20] B. Rhodes and M. Gutmann, “Enhanced gradient-based MCMC in discrete spaces,” Trans. Mach. Learn. Res., pp. 1–30, Oct. 2022.
  • [21] N. M. Gowda, S. Krishnamurthy, and A. Belogolovy, “Metropolis-Hastings random walk along the gradient descent direction for MIMO detection,” in Proc. IEEE Int. Conf. Commun. (ICC), Montreal, QC, Canada, Jun. 2021, pp. 1–7.
  • [22] X. Zhou, L. Liang, J. Zhang, C.-K. Wen, and S. Jin, “Gradient-based Markov chain Monte Carlo for MIMO detection,” IEEE Trans. Wireless Commun. vol. 23, no. 7, pp. 7566–7581, Jul. 2024.
  • [23] N. Zilberstein, C. Dick, R. Doost-Mohammady, A. Sabharwal, and S. Segarra, “Annealed Langevin dynamics for massive MIMO detection,” IEEE Trans. Wireless Commun., vol. 22, no. 6, pp. 3762–3776, Jun. 2023.
  • [24] Z. Wu and H. Li, “Stochastic gradient Langevin dynamics for massive MIMO detection,” IEEE Commun. Lett., vol. 26, no. 5, pp. 1062–1065, May 2022.
  • [25] D. G. Luenberger, Introduction to Linear and Nonlinear Programming.   Reading, MA, USA: Addison-Wesley, 1973.
  • [26] Y. Nesterov, “A method for solving the convex programming problem with convergence rate o (1/k^ 2),” in Dokl. Akad. Nauk SSSR, vol. 269, pp. 543–547, 1983.
  • [27] F. M. Kashif, H. Wymeersch, and M. Z. Win, “Monte Carlo equalization for nonlinear dispersive satellite channels,” IEEE J. Sel. Areas Commun., vol. 26, no. 2, pp. 245–255, Feb. 2008.
  • [28] M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient Langevin dynamics,” in Proc. Int. Conf. Mach. Learn. (ICML), 2011, pp. 681-688.
  • [29] W. K. Hastings, “Monte Carlo sampling methods using Markov chains and their applications,” Biometrika, vol. 57, no. 1, pp. 97–109, Apr. 1970.
  • [30] D. A. Levin, Y. Peres, and E. L. Wilmer, Markov Chains and Mixing Times.   Providence, RI, USA: Amer. Math. Soc., 2008.
  • [31] I. Kontoyiannis and S. P. Meyn, “Geometric ergodicity and the spectral gap of non-reversible Markov chains,” Proc. Probab. Theory Related Fields, vol. 154, no. 1-2, pp. 327–339, 2012.
  • [32] Z. Wang, Y. Xia, C. Ling, and Y. Huang, “Randomized iterative sampling decoding algorithm for large-scale MIMO detection,” IEEE Trans. Signal Process., vol. 72, pp. 580–593, 2024.
  • [33] Z. Wang and C. Ling, “On the geometric ergodicity of Metropolis-Hastings algorithms for lattice Gaussian sampling,” IEEE Trans. Inf. Theory, vol. 64, no. 2, pp. 738–751, Feb. 2018.
  • [34] Z. Wang, S. Lyu, Y. Xia, and Q. Wu, “Expectation propagation-based sampling decoding: Enhancement and optimization,” IEEE Trans. Signal Process., vol. 69, pp. 195–209, 2021.
  • [35] M. Senst and G. Ascheid, “A Rao-Blackwellized Markov chain Monte Carlo algorithm for efficient MIMO detection,” in Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2011, pp. 1–6.
  • [36] B. Hassibi, M. Hansen, A. G. Dimakis, H. A. J. Alshamary, and W. Xu, “Optimized Markov chain Monte Carlo for signal detection in MIMO systems: An analysis of the stationary distribution and mixing time,” IEEE Trans. Signal Process., vol. 62, no. 17, pp. 4436–4450, Sep. 2014.
  • [37] S. Loyka, “Channel capacity of MIMO architecture using the exponential correlation matrix,” IEEE Commun. Lett., vol. 5, no. 9, pp. 369–371, Sep. 2001.
  • [38] 3GPP TR 36.873, “Study on 3D channel model for LTE (Release 12),” Tech. Rep., Jun. 2017. [Online]. Available: https://www.3gpp.org/ftp/Specs/archive/36_series/36.873/.
  • [39] J. Céspedes, P. M. Olmos, M. Sánchez-Fernández, and F. Perez-Cruz, “Probabilistic MIMO symbol detection with expectation consistency approximate inference,” IEEE Trans. Veh. Technol., vol. 67, no. 4, pp. 3481–3494, Apr. 2018.
  • [40] X. Zhou, J. Zhang, C.-K. Wen, S. Jin, and S. Han, “Graph neural network-enhanced expectation propagation algorithm for MIMO turbo receivers,” IEEE Trans. Signal Process., vol. 71, pp. 3458–3473, 2023.
  • [41] S. Suyama, J. Shen, H. Suzuki, K. Fukawa, and Y. Okumura, “Evaluation of 30 Gbps super high bit rate mobile communications using channel data in 11 GHz band 24×\times×24 MIMO experiment,” in Proc. IEEE Int. Conf. Commun., Jun. 2014, pp. 5203–5208.
  • [42] P. Gröschel et al., “An ultra-versatile massive MIMO transceiver testbed for multi-Gb/s communication,” in Proc. IEEE 5G World Forum, Sep. 2019, pp. 1–6.
  • [43] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Model-driven deep learning for MIMO detection,” IEEE Trans. Signal Process., vol. 68, pp. 1702–1715, Mar. 2020.
  • [44] Y. Wei, M.-M. Zhao, M. Hong, M.-J. Zhao, and M. Lei, “Learned conjugate gradient descent network for massive MIMO detection,” IEEE Trans. Signal Process., vol. 68, pp. 6336–6349, Nov. 2020.
  • [45] S. Jaeckel, L. Raschkowski, K. Börner, L. Thiele, F. Burkhardt, and E. Eberlein, “QuaDRiGa—Quasi deterministic radio channel generator user manual and documentation,” Fraunhofer Heinrich Hertz Inst., Berlin, Germany, Tech. Rep. V2.2.0, Aug. 2017.
  • [46] T. Weber, A. Sklavos, and M. Meurer, “Imperfect channel-state information in MIMO transmission,” IEEE Trans. Commun., vol 54, no. 3, pp. 543–552, Mar. 2006.
  • [47] A. Kosasih, V. Onasis, V. Miloslavskaya, W. Hardjawana, V. Andrean, and B. Vucetic, “Graph neural network aided MU-MIMO detectors,” IEEE J. Sel. Areas Commun., vol. 40, no. 9, pp. 2540–2555, Sep. 2022.
  • [48] Flozz/PYPAPI: Python binding for the PAPI (Performance Application Programming Interface) library. Accessed: Aug. 30, 2024. [Online]. Available: https://github.com/flozz/pypapi.
  • [49] C. Studer and H. Bolcskei, “Soft-input soft-output single tree-search sphere decoding,” IEEE Trans. Inf. Theory, vol. 56, no. 10, pp. 4827–4842, Oct. 2010.
  • [50] J. R. Norris, Markov Chains.   Cambridge, U.K.: Cambridge Univ. Press, 1997.
  • [51] D. Micciancio and O. Regev, “Worst-case to average-case reductions based on Gaussian measures,” in Proc. Ann. Symp. Found. Computer Science, Rome, Italy, Oct. 2004, pp. 372–381.