Einstein from Noise: Statistical Analysis
Abstract
“Einstein from noise” (EfN) is a prominent example of the model bias phenomenon: systematic errors in the statistical model that lead to erroneous but consistent estimates. In the EfN experiment, one falsely believes that a set of observations contains noisy, shifted copies of a template signal (e.g., an Einstein image), whereas in reality, it contains only pure noise observations. To estimate the signal, the observations are first aligned with the template using cross-correlation, and then averaged. Although the observations contain nothing but noise, it was recognized early on that this process produces a signal that resembles the template signal! This pitfall was at the heart of a central scientific controversy about validation techniques in structural biology.
This paper provides a comprehensive statistical analysis of the EfN phenomenon above. We show that the Fourier phases of the EfN estimator (namely, the average of the aligned noise observations) converge to the Fourier phases of the template signal, explaining the observed structural similarity. Additionally, we prove that the convergence rate is inversely proportional to the number of noise observations and, in the high-dimensional regime, to the Fourier magnitudes of the template signal. Moreover, in the high-dimensional regime, the Fourier magnitudes converge to a scaled version of the template signal’s Fourier magnitudes. This work not only deepens the theoretical understanding of the EfN phenomenon but also highlights potential pitfalls in template matching techniques and emphasizes the need for careful interpretation of noisy observations across disciplines in engineering, statistics, physics, and biology.
1 Introduction
Model bias is a fundamental pitfall arising across a broad range of statistical problems, leading to consistent but inaccurate estimations due to systematic errors in the model. This paper focuses on the Einstein from Noise (EfN) experiment: a prototype example of model bias that appears in template matching techniques. Consider a scenario where scientists acquire observational data and genuinely believe their observations contain noisy, shifted copies of a known template signal. However, in reality, their data consists of pure noise with no actual signal present. To estimate the (absent) signal, the scientists align each observation by cross-correlating it with the template and then average the aligned observations. Remarkably, empirical evidence has shown, multiple times, that the reconstructed structure from this process is structurally similar to the template, even when all the measurements are pure noise [18, 36, 38]. This phenomenon stands in striking contrast to the prediction of the unbiased model, that averaging pure noise signals would converge towards a signal of zeros, as the number of noisy observations diverges. Thus, the above EfN estimation procedure is biased towards the template signal.
Although widely recognized, the EfN phenomenon has not been examined so far theoretically. This article fills this gap by characterizing precisely the relationship between the reconstructed and the template signals. The authors of the original article presenting the EfN phenomenon chose an image of Einstein as the template signal, and hence the name [36]. Consequently, we refer to the average of the aligned pure noise signals as the EfN estimator. The problem is formulated in detail in Section 3, and is illustrated in Figure 1.

Main results.
The central results of this work are as follows. Our first result, stated in Theorem 4.1, shows that the Fourier phases of the EfN estimator converge to the Fourier phases of the template signal, as the number of noisy observations converges to infinity. We also show that the corresponding mean squared error (MSE) decays to zero with a rate of . Since it is known that the Fourier phases are responsible for the formation of geometrical image elements, like contours and edges [30, 37], this clarifies why the resulting EfN estimator image exhibits a structural similarity to the template, but not necessarily a full recovery. Our second result, stated in Theorem 4.3, proves that in the high-dimensional regime, where the dimension of the signal diverges, the convergence rate of the Fourier phases is inversely proportional to the square of the Fourier magnitudes of the template signal. In this case, the Fourier magnitudes of the EfN estimator converge to a scaled version of the template’s Fourier magnitudes.
Organization.
The rest of this paper is organized as follows. The next section discusses the connection between the EfN problem and single-particle cryo-electron microscopy (cryo-EM)—the main motivation of this paper—and provides empirical demonstrations. Section 3 formulates the problem in detail. Our main results, Theorems 4.1 and 4.3, are presented in Section 4, and proved in Appendix B and Appendix C, respectively. Finally, our conclusions and outlook appear in Section 5.
2 Cryo-EM and Empirical Demonstration
Cryo-EM is a powerful tool of modern structural biology, offering advanced methods to visualize complex biological macromolecules with ever-increasing precision. One of its central advantages lies in its capability to resolve the structures of proteins that are hard to crystallize in traditional methods, especially, in a near-physiological environment, see e.g., [29, 40]. This advantage enables researchers to delve into the dynamic behaviors of proteins and their complexes, shedding light on fundamental biological processes.
Cryo-EM uses single-particle electron microscopy to reconstruct 3D structures from 2D tomographic projection images [6]. Typically, the 3D reconstruction involves two main steps: detecting and extracting single particle images using a particle picking algorithm, [35, 17, 8, 16], and then reconstructing the 3D density map [34, 33]. Most detection algorithms use template-matching techniques, which can introduce bias if improper templates are chosen, especially in low signal-to-noise ratio (SNR) conditions, which is the standard scenario in cryo-EM.
The EfN controversy.
A publication of the 3D structure of an HIV molecule in PNAS in 2013 [27] initiated a fundamental controversy about validation techniques within the cryo-EM community, published as four follow-up PNAS publications [18, 46, 44, 26]. The EfN pitfall played a central role in this discussion. The primary question of the discussion was whether the collected datasets contained informative biological data or merely pure noise images. The core of the debate emphasized the importance of exercising caution and implementing cross-validation techniques when fitting data to a predefined model. This precautionary approach aims to mitigate the risk of erroneous fittings, which could ultimately lead to inaccuracies in 3D density map reconstruction. Model bias is still a fundamental problem in cryo-EM, as highlighted by an ongoing debate concerning validation tools, see for example, [43, 36, 19, 11, 12, 20, 21, 42].
Empirical demonstration.
The EfN phenomenon depends on several key parameters: (1) the number of observations which we denote by ; (2) the dimension of the signal, denoted as ; and (3) the power spectral density (PSD) of the template signal. To demonstrate the dependency on these parameters and provide insight into our main results, Figures 2 and 3 show the convergence of the EfN estimator. Specifically, Figure 2 illustrates the behavior of the Fourier phases as a function of . Figure 2(c) highlights that the convergence rate is proportional to . It can be seen that the convergence rate is faster for higher spectral components. Figure 3 illustrates the impact of the PSD of the template signal on the cross-correlation between the template and the EfN estimator. Notably, a flatter PSD (i.e., a faster decay of the auto-correlation) leads to a higher correlation between the template and the estimator signals. These empirical results are proved theoretically in Theorems 4.1 and 4.3.
More applications.
The EfN phenomenon extends to various applications employing template matching, whether through a feature-based or direct template-based approach. For instance, template matching holds significance in computational anatomy, where it aids in discovering unknown diffeomorphism to align a template image with a target image [10]. Other areas include medical imaging processing [1], manufacturing quality control [3], and navigation systems for mobile robots [22]. This pitfall may also arise in the feature-based approach, which relies on extracting image features like shapes, textures, and colors to match a target image by neural networks and deep-learning classifiers [50, 28, 45, 24].


3 Problem Formulation and Notation
This section outlines the probabilistic model behind the EfN experiment and delineates our main mathematical objectives. Although the EfN phenomenon is typically described for images, we will formulate and analyze it for one-dimensional signals, bearing in mind that the extension to two-dimensional images is straightforward (see Section 5 for more details).
Let denote the template signal (e.g., an Einstein image), and , for , be a set of independent and identically distributed (i.i.d.) -dimensional Gaussian noise vectors. Here, denotes the number of observations, and without loss of generality, we will assume that is even. To describe the EfN estimation process, we define the circular shift operator. Fix , and let denotes an operator which acts on , and defined as , for . Throughout this paper, we assume that the template signal is normalized, i.e., , where is the Euclidean norm, and further assume that its Fourier transform in non-vanishing. The first assumption is used for convenience and does not alter (up to a normalization factor) our main results in Theorems 4.1 and 4.3. The second assumption is essential for the theoretical analysis of the EfN process and is expected to hold in many applications, including cryo-EM. A similar assumption is frequently taken in related work, e.g., [5, 31, 7].
We are now in a position to define the EfN estimation process. First, for each noise observation , we compute the maximal correlation shift,
(3.1) |
Then, the EfN estimator is given by the average of the noise observations, but each first aligned according to the above maximal shifts, i.e.,
(3.2) |
The EfN phenomenon states that, at least empirically, and appear “close” in some sense; our goal is to understand this phenomenon mathematically. To that end, we will consider the two asymptotic regimes where either and is fixed, or . It should be noted that since the spectrum of is non-vanishing, is unique almost surely.
As will become clear in the next sections, it is convenient to work in the Fourier domain. Let denote the phase of a complex number , and recall that the discrete Fourier transform (DFT) of a -length signal is given by,
(3.3) |
where , and . Accordingly, we let , , and , denote the DFTs of , , and , respectively, for . These DFT sequences can be equivalently represented in the magnitude-phase domain as follows,
(3.4) |
for . Note that the random variables and are two independent sequences of i.i.d. random variables, such that, has Rayleigh distribution, and the phase is uniformly distributed over .
With the definitions above, we can express the estimation process in the Fourier domain. Since a shift in real-space corresponds to a linear phase shift in the Fourier space, it follows that,
(3.5) |
for . It is important to note that the location of the maximum correlation, i.e., , captures the dependency on the template signal, as well as the connections between the different spectral components.
We mention that recent research has explored a related, but distinct, problem [47]. In this alternate problem formulation, rather than averaging over all shifted noisy signals, only the “most biased” members—those with the highest cross-correlation values with the template—are averaged. Finally, throughout the rest of this paper, we use , , , and , to denote the convergence of sequences of random variables in distribution, in probability, almost surely, and in norm, respectively. We denote by the MSE of the Fourier phases of the -th spectral component.
4 Main Results
In this section, we present our main results. We begin by analyzing the regime where and fixed . We show that the Fourier phases of the EfN estimator converge to the Fourier phases of the template signal. We also characterize the convergence of the magnitudes. Then, we consider the high-dimensional regime, where as well. Here, we will prove stronger convergence results, provided that some additional assumptions are met. Throughout the following theorems, we assume that the template signal has unit norm, and that its spectrum is non-vanishing, as discussed in the previous section.
Theorem 4.1.
Fix and assume that , for all .
-
1.
For any , we have,
(4.1) as . Furthermore,
(4.2) for a finite constant .
- 2.
Theorem 4.1 captures two central properties. The first addresses the convergence of the EfN estimator’s phases to those of the template signal. In addition, the corresponding convergence rate in MSE is proportional to . The second result captures the convergence of the EfN estimator’s magnitudes to the term given in the right-hand-side (r.h.s.) of (4.3). Interestingly, this term is not necessarily proportional to the magnitudes of the template signal.
Next, we consider the high-dimensional regime (after taking ). Here, we place several additional technical assumptions on the template signal. Roughly speaking, we need to control the decay rate of the auto-correlation function of the template signal as a function of ; the auto-correlation function of the signal should decay faster than , and each one of the template’s Fourier magnitudes components should decay faster than . Note that both and depend on . We define the auto-correlation function of the signal by , and recall the auto-correlation function is given by the Fourier transform of the PSD of , i.e., .
Assumption 4.2.
Let and let be the auto-correlation of the signal . We say that the template signal satisfies Assumption 4.2 if the following hold:
-
1.
The auto-correlation satisfies,
(4.4) -
2.
The magnitudes satisfy,
(4.5) -
3.
The signal’s DC component is zero, i.e., .
Theorem 4.3.
Assume that , for all , and that satisfies Assumption 4.2. Then,
-
1.
For any , we have,
(4.6) -
2.
For any , we have,
(4.7) as .
Based on Theorem 4.3, as , the convergence rate of the Fourier phases of the EfN estimator is inversely proportional to the Fourier magnitude square. In addition, unlike Theorem 4.1, the Fourier magnitudes of the EfN estimator converge to those of the template signal, up to a constant factor. Therefore, when , under Assumption 4.2, the signal recovers the template signal, up to a known normalization factor. This, in turn, implies that the normalized cross-correlation between the template and the EfN estimator approaches unity.
Figure 4 exemplifies the results of Theorem 4.3 and shows that the convergence rate depends on the template signal’s PSD. In particular, Figure 4(b) demonstrates how the vector length and spectral density impact the convergence of the magnitudes of the estimator, showing that longer vectors and flatter spectral densities yield higher cross-correlation between the estimator and the template signal. In addition, Figure 4(c) demonstrates that the estimator’s phases convergence closely aligns with our analytical prediction as the PSD flattens. We remark that the last assumption in Assumption 4.2 about the zero DC component is not necessary empirically but is used as part of the proof.

Strategy of the proofs.
We discuss briefly the ideas behind the proofs of our main results. The proof of Theorem 4.1 relies on two central ingredients. Specifically, it can be shown that the central limit theorem (CLT) and the law of large numbers (LLN) imply that as , we have , where is a Gaussian random variable with zero mean and variance , with some explicit formula for . Then, by using certain properties of cyclo-stationary Gaussian processes, we show that , for any , from which we deduce (4.1)–(4.2). Then, to obtain the refined convergence rate in Theorem 4.3 for (namely, that ), we utilize results from the theory of extrema of Gaussian processes, particularly, the convergence of the maximum of a stationary Gaussian process to the Gumbel distribution, see, e.g., [23, 9, 2, 4].
5 Conclusion and outlook
In this paper, we have shown that the Fourier phases of the EfN estimator converge to those of the template signal for an asymptotic number of observations. Since Fourier phases are crucial for perceiving image structure, the reconstructed image appears structurally similar to the template signal, even in cases where the estimator’s spectral magnitudes differ from those of the template [30, 37]. We also show that the Fourier phases of spectral components with higher magnitudes converge faster, leading to faster structural similarity in the overall image perception.
Perspective.
We anticipate that the findings of this paper will be beneficial in various fields. For example, the paper sheds light on a fundamental pitfall in template matching techniques, which may lead engineers and statisticians to misleading results. In addition, physicists and biologists working with data sets of low SNRs will benefit from understanding limitations and potential biases introduced by template matching techniques. More generally, this work provides a cautionary framework for the broader scientific community, highlighting the importance of exercising care when interpreting noisy observations.
Implications to cryo-EM.
These findings have practical implications for cryo-EM. Typically, protein spectra exhibit rapid decay at low frequencies (known as the Guinier plot) and remain relatively constant at high frequencies, a behavior characterized by Wilson in [49] and known as Wilson statistics. Wilson statistics is used to sharpen 3-D structures [39]. To mitigate the risk of model bias, we suggest using templates with reduced high frequencies, recommending filtered, smooth templates. This insight may also relate to, or support the common practice of initializing the expectation-maximization (EM) algorithm for 3-D refinement with a smooth 3-D volume. Each iteration of the EM algorithm effectively applies a version of template matching multiple times, although projection images typically contain a signal, not just noise as in the EfN case.
Extension to higher dimensions and asymptotic regimes.
While this paper focuses on one-dimensional signals, the analysis can be readily extended to higher dimensions. This extension involves replacing the one-dimensional DFT with its -dimensional counterpart. The symmetry properties established in Theorem 4.1, including the results in Propositions B.1 and B.3, remain valid. For the high-dimensional case of Theorem 4.3, the conditions on the PSD adjust for the -dimensional case. Specifically, the auto-correlation decay rate of the multidimensional array should be faster than , in each dimension. Finally, while our focus in this paper was on the asymptotic regimes where either is fixed and or both , there are other challenging asymptotic regimes that should be studied. Most notably, it is interesting to understand what happens in the regime where both , but with a fixed ratio, say, ; it seems as though that more powerful techniques are needed to analyze this scenario.
Hard assignment algorithms and the EM algorithm.
One promising avenue for future research involves examining hard-assignment algorithms. These algorithms iteratively refine estimates of an underlying signal from noisy observations, where the signal is obscured by high noise (unlike the pure noise scenario in EfN). The process begins by aligning observations with a template signal in the initial iteration and averaging them to improve the template for subsequent iterations. A primary goal is to characterize the model bias in this iterative algorithm, particularly focusing on the relationship between the output and the initial model.
Another important direction is investigating the EM algorithm, a cornerstone of cryo-EM algorithms [34, 33]. EM maximizes the likelihood function of models incorporating nuisance parameters [14], a topic of significant recent interest [13, 48]. Unlike hard-assignment algorithms, EM operates iteratively as a soft assignment algorithm, assigning probabilities to various possibilities and computing a weighted average rather than selecting a single optimal alignment per observation. Further exploration of EM could provide deeper insights into iterative methodologies in cryo-EM and their associated model biases.
Acknowledgment
T.B. is supported in part by BSF under Grant 2020159, in part by NSF-BSF under Grant 2019752, and in part by ISF under Grant 1924/21. W.H. is supported by ISF under Grant 1734/21.
References
- [1] Ashley Aberneithy. Automatic detection of calcified nodules of patients with tuberculous. University College, London, 2007.
- [2] Robert J Adler and Jonathan E Taylor. Random fields and geometry. Springer Science & Business Media, 2009.
- [3] MS Aksoy, Orhan Torkul, and Ismail Hakki Cedimoglu. An industrial visual inspection system that uses inductive learning. Journal of Intelligent Manufacturing, 15:569–574, 2004.
- [4] Jean-Marc Azaïs and Mario Wschebor. Level sets and extrema of random processes and fields. John Wiley & Sons, 2009.
- [5] Afonso S Bandeira, Ben Blum-Smith, Joe Kileel, Jonathan Niles-Weed, Amelia Perry, and Alexander S Wein. Estimation under group actions: recovering orbits from invariants. Applied and Computational Harmonic Analysis, 66:236–319, 2023.
- [6] Tamir Bendory, Alberto Bartesaghi, and Amit Singer. Single-particle cryo-electron microscopy: Mathematical theory, computational challenges, and opportunities. IEEE signal processing magazine, 37(2):58–76, 2020.
- [7] Tamir Bendory, Nicolas Boumal, Chao Ma, Zhizhen Zhao, and Amit Singer. Bispectrum inversion with application to multireference alignment. IEEE Transactions on signal processing, 66(4):1037–1050, 2017.
- [8] Tristan Bepler, Andrew Morin, Micah Rapp, Julia Brasch, Lawrence Shapiro, Alex J Noble, and Bonnie Berger. Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. Nature methods, 16(11):1153–1160, 2019.
- [9] Simeon M Berman. Limit theorems for the maximum term in stationary sequences. The Annals of Mathematical Statistics, pages 502–516, 1964.
- [10] Gary E Christensen, Richard D Rabbitt, and Michael I Miller. Deformable templates using large deformation kinematics. IEEE transactions on image processing, 5(10):1435–1447, 1996.
- [11] Jon Cohen. Is high-tech view of HIV too good to be true?, 2013.
- [12] Pilar Cossio. Need for cross-validation of single particle cryo-EM. Journal of Chemical Information and Modeling, 60(5):2413–2418, 2020.
- [13] Constantinos Daskalakis, Christos Tzamos, and Manolis Zampetakis. Ten steps of EM suffice for mixtures of two gaussians. In Conference on Learning Theory, pages 704–710. PMLR, 2017.
- [14] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological), 39(1):1–22, 1977.
- [15] Rick Durrett. Probability: theory and examples, volume 49. Cambridge university press, 2019.
- [16] Amitay Eldar, Keren Mor Waknin, Samuel Davenport, Tamir Bendory, Armin Schwartzman, and Yoel Shkolnisky. Object detection under the linear subspace model with application to cryo-EM images. arXiv preprint arXiv:2405.00364, 2024.
- [17] Ayelet Heimowitz, Joakim Andén, and Amit Singer. APPLE picker: Automatic particle picking, a low-effort cryo-EM framework. Journal of structural biology, 204(2):215–227, 2018.
- [18] Richard Henderson. Avoiding the pitfalls of single particle cryo-electron microscopy: Einstein from noise. Proceedings of the National Academy of Sciences, 110(45):18037–18041, 2013.
- [19] Richard Henderson, Andrej Sali, Matthew L Baker, Bridget Carragher, Batsal Devkota, Kenneth H Downing, Edward H Egelman, Zukang Feng, Joachim Frank, Nikolaus Grigorieff, et al. Outcome of the first electron microscopy validation task force meeting. Structure, 20(2):205–214, 2012.
- [20] J Bernard Heymann. Validation of 3D EM reconstructions: The phantom in the noise. AIMS biophysics, 2(1):21, 2015.
- [21] Gerard J Kleywegt, Paul D Adams, Sarah J Butcher, Catherine L Lawson, Alexis Rohou, Peter B Rosenthal, Sriram Subramaniam, Maya Topf, Sanja Abbott, Philip R Baldwin, et al. Community recommendations on cryoEM data archiving and validation. IUCrJ, 11(2), 2024.
- [22] Theocharis Kyriacou, Guido Bugmann, and Stanislao Lauria. Vision-based urban navigation procedures for verbally instructed robots. Robotics and Autonomous Systems, 51(1):69–80, 2005.
- [23] Malcolm R Leadbetter, Georg Lindgren, and Holger Rootzén. Extremes and related properties of random sequences and processes. Springer Science & Business Media, 2012.
- [24] Yuhai Li, Jian Liu, Jinwen Tian, and Hongbo Xu. A fast rotated template matching based on point feature. In MIPPR 2005: SAR and Multispectral Image Processing, volume 6043, pages 453–459. SPIE, 2005.
- [25] Sergio I Lopez and Leandro PR Pimentel. On the location of the maximum of a process: L’evy, gaussian and multidimensional cases. arXiv preprint arXiv:1611.02334, 2016.
- [26] Youdong Mao, Luis R Castillo-Menendez, and Joseph G Sodroski. Reply to subramaniam, van heel, and henderson: Validity of the cryo-electron microscopy structures of the HIV-1 envelope glycoprotein complex. Proceedings of the National Academy of Sciences, 110(45):E4178–E4182, 2013.
- [27] Youdong Mao, Liping Wang, Christopher Gu, Alon Herschhorn, Anik Désormeaux, Andrés Finzi, Shi-Hua Xiang, and Joseph G Sodroski. Molecular architecture of the uncleaved HIV-1 envelope glycoprotein trimer. Proceedings of the National Academy of Sciences, 110(30):12438–12443, 2013.
- [28] Amit Moscovich and Saharon Rosset. On the cross-validation bias due to unsupervised preprocessing. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(4):1474–1502, 2022.
- [29] Eva Nogales. The development of cryo-EM into a mainstream structural biology technique. Nature methods, 13(1):24–27, 2016.
- [30] Alan V Oppenheim and Jae S Lim. The importance of phase in signals. Proceedings of the IEEE, 69(5):529–541, 1981.
- [31] Amelia Perry, Jonathan Weed, Afonso S Bandeira, Philippe Rigollet, and Amit Singer. The sample complexity of multireference alignment. SIAM Journal on Mathematics of Data Science, 1(3):497–517, 2019.
- [32] Leandro PR Pimentel. On the location of the maximum of a continuous stochastic process. Journal of Applied Probability, 51(1):152–161, 2014.
- [33] Ali Punjani, John L Rubinstein, David J Fleet, and Marcus A Brubaker. cryosparc: algorithms for rapid unsupervised cryo-em structure determination. Nature methods, 14(3):290–296, 2017.
- [34] Sjors HW Scheres. RELION: implementation of a bayesian approach to cryo-EM structure determination. Journal of structural biology, 180(3):519–530, 2012.
- [35] Sjors HW Scheres. Semi-automated selection of cryo-EM particles in relion-1.3. Journal of structural biology, 189(2):114–122, 2015.
- [36] Maxim Shatsky, Richard J Hall, Steven E Brenner, and Robert M Glaeser. A method for the alignment of heterogeneous macromolecules from electron microscopy. Journal of structural biology, 166(1):67–78, 2009.
- [37] Yoav Shechtman, Yonina C Eldar, Oren Cohen, Henry Nicholas Chapman, Jianwei Miao, and Mordechai Segev. Phase retrieval with application to optical imaging: a contemporary overview. IEEE signal processing magazine, 32(3):87–109, 2015.
- [38] Fred J Sigworth. A maximum-likelihood approach to single-particle image refinement. Journal of structural biology, 122(3):328–339, 1998.
- [39] Amit Singer. Wilson statistics: derivation, generalization and applications to electron cryomicroscopy. Acta Crystallographica Section A: Foundations and Advances, 77(5):472–479, 2021.
- [40] Amit Singer and Fred J Sigworth. Computational methods for single-particle electron cryomicroscopy. Annual review of biomedical data science, 3:163–190, 2020.
- [41] E. Slutsky. Über stochastische Asymptoten und Grenzwerte. 1925.
- [42] Carlos OS Sorzano, JL Vilas, Erney Ramírez-Aportela, J Krieger, D Del Hoyo, David Herreros, Estrella Fernandez-Giménez, D Marchán, JR Macías, I Sánchez, et al. Image processing tools for the validation of CryoEM maps. Faraday Discussions, 240:210–227, 2022.
- [43] Alex Stewart and Nikolaus Grigorieff. Noise bias in the refinement of structures derived from single particles. Ultramicroscopy, 102(1):67–84, 2004.
- [44] Sriram Subramaniam. Structure of trimeric HIV-1 envelope glycoproteins. Proceedings of the National Academy of Sciences, 110(45):E4172–E4174, 2013.
- [45] Itamar Talmi, Roey Mechrez, and Lihi Zelnik-Manor. Template matching with deformable diversity similarity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 175–183, 2017.
- [46] Marin van Heel. Finding trimeric HIV-1 envelope glycoproteins in random noise. Proceedings of the National Academy of Sciences, 110(45):E4175–E4177, 2013.
- [47] Shao-Hsuan Wang, Yi-Ching Yao, Wei-Hau Chang, and I-Ping Tu. Quantification of model bias underlying the phenomenon of “Einstein from noise”. Statistica Sinica, 31:2355–2379, 2021.
- [48] Ji Xu, Daniel J Hsu, and Arian Maleki. Global analysis of expectation maximization for mixtures of two gaussians. Advances in Neural Information Processing Systems, 29, 2016.
- [49] SH Yü. Determination of absolute from relative X-ray intensity data. Nature, 150(3796):151–152, 1942.
- [50] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
Appendix A Preliminaries
Before we delve into the proofs of Theorems 4.1 and 4.3, we fix some notations and definitions. Recall the definition of the Fourier transform of and (3.4). Note that since and are real-valued, their Fourier coefficients satisfy the conjugate-symmetry relation:
In particular, and , which implies that only the first components of are statistically independent.
The definition of the maximal correlation in (3.1) can be represented in the Fourier domain as follows,
(A.1) |
To simplify notation, we define
(A.2) |
for , and therefore, . We note that for any , the random vector is Gaussian distributed, with zero mean vector, and a circulant covariance matrix; therefore, it is a cyclo-stationary random process.
Throughout the proofs, we condition on the -th Fourier coefficient of the noise realization. Specifically, note that , where
(A.3) |
for , and
(A.4) |
for , where is defined by:
(A.5) |
Without loss of generality, we normalize the diagonal elements of the above covariance matrix by setting,
(A.6) |
Note that the conditional process is Gaussian because it is given by a linear transform of i.i.d. Gaussian variables. Also, since its covariance matrix is circulant and depends only on the difference between the two indices, i.e., , it is cycle-stationary with a cosine trend. The eigenvalues of this circulant matrix are given by the DFT of its first row, and thus its -th eigenvalue equals to , for . Finally, for simplicity of notation, whenever it is clear from the context, we will omit the dependence of the above quantities on the -th observation and -th frequency indices, and use and , instead. Furthermore, for convenience, we will assume the template vector is normalized to unity, i.e., .
Our goal is to investigate the phase and magnitude of the estimator in (3.5). Simple manipulations reveal that, for any , the estimator’s phases are given by,
(A.7) |
where we define,
(A.8) |
and
(A.9) |
Appendix B Proof of Theorem 4.1
Proposition B.1.
Remark B.2.
Note that the above result implies that in (4.2) is given by,
(B.3) |
Proposition B.3.
Fix , and assume that for all . Then, for any ,
(B.4) |
By definition, Proposition B.3 implies that there is a positive correlation between the EfN estimator and the template . We are now in a position to prove Theorem 4.1.
Proof of Theorem 4.1.
We start with the convergence of the estimator’s magnitudes. Following (A.9), by applying the strong law of large numbers (SLLN), we get,
(B.5) |
where we have used the fact that the sequences of random variables and are i.i.d. with finite mean and variances. Now, using simple symmetry arguments, we show in the proof of Proposition B.1 (see, (B.38)), that,
(B.6) |
Together with (B.5), this proves the second item of Theorem 4.1.
Next, we prove the first item of Theorem 4.1, starting with (4.1). To this end, using (A.7) and using the continuous mapping theorem, it is evident that it suffices to prove that . This, however, follows by applying the SLLN,
(B.7) |
where and . We already saw that , while by Proposition B.3, we have that , and thus their ratio converges a.s. to zero by the Continuous Mapping Theorem. Thus, we proved that .
Finally, we prove (4.2). To that end, we use Portmanteau Lemma, which states that if is a sequence of random variables such that , then for any bounded and continuous function we have , as . In our case, for any fixed , we let this sequence of random variables, indexed by , be defined as , as in Proposition B.1, and we note that the index is hidden also in . Then, Proposition B.1 implies that , where we recall that . The Portmanteau Lemma then implies that for any bounded ,
(B.8) |
We are now ready to prove (4.2). Using (A.7) it is clear that,
(B.9) |
and thus
(B.10) |
Therefore, we have,
(B.11) |
Theorem 4.1 is a direct application of the following result.
Lemma B.4.
Recall the definition of in (B.2). Then,
(B.12) |
Proof of Lemma B.4.
One should note that the denominator can be zero with positive probability, and thus we need to control such an event. To that end, recall that . Then, for any , we decompose,
(B.13) |
Next, we first show that the second term at the r.h.s. of (B.13) converges to zero with rate . Indeed, since , for any , we have
(B.14) | ||||
(B.15) | ||||
(B.16) | ||||
(B.17) |
Let us denote the summand in the denominator in (B.9) by , for . Then, we note that,
(B.18) |
Thus, by Chebyshev’s inequality,
(B.19) |
Now, by the definition of , we have,
(B.20) | ||||
(B.21) |
Thus, it is clear that there is a constant (which depends on the second and forth moment of ), such that,
(B.22) |
Thus, plugging (B.19) and (B.22) into (B.17) leads to,
(B.23) |
Thus, the second term at the r.h.s. of (B.13) indeed converges to zero as . Next, we analyze the first term at the r.h.s. of (B.13). We will show that,
(B.24) |
As this is true for every , it would imply that,
(B.25) |
Let use denote by . Since , as , and because , then the following holds from the Taylor expansion of , around ,
(B.26) |
As the sum in (B.26) converges for every , and converges to zero as , then,
(B.27) |
Now, since the term at the left-hand-side of (B.27) as well as the first term at the right-hand-side of (B.27), are bounded for every and converges to zero, then also the last term at the right-hand-side of (B.27) is bounded for every and converge to zero as . Specifically, we note that the last term converges to zero with rate , while the first term in the right-hand-side converges to zero with rate . Thus,
(B.28) |
Also, note that
(B.29) |
Thus, combining (B.28) and (B.29) we get,
(B.30) |
which proves the upper bound in (B.24). Similarly, since , for any , we have,
(B.31) | ||||
(B.32) |
Since (B.32) is true for every , we get the lower bound in (B.24), which concludes the proof of (B.25). Substituting (B.23) and (B.25) in (B.13), leads to the proof of the lemma. ∎
∎
B.1 Proof of Proposition B.1
Recall the relation in (B.9) and the definitions of and . Since is an i.i.d. sequence of random variables, and because each depends on solely (in particular, independent of , for ), we have that and are two sequences of i.i.d. random variables. Accordingly, let,
(B.33) | ||||
(B.34) |
which are the mean value and variance of , as defined in (B.9). Then, by the CLT:
(B.35) |
where .
Next, we show that . Indeed, let , and recall the definition of in (A.1). Note that depends on and only through . Accordingly, viewing as a function of , we have,
(B.36) |
namely, by flipping the signs of all the phases, the location of the maximum flips its sign as well. Then, by the law of total expectation,
(B.37) |
The inner expectation in (B.37) is taken w.r.t. the randomness of the phases . However, due to (B.36), and since the sine function is odd around zero, the integration in (B.37) nullifies. Therefore,
(B.38) |
and thus .
Next, we analyze the denominator in (B.9). Specifically, we already saw that form a sequence of i.i.d. random variables, and thus by the SLLN we have , where,
(B.39) |
We will prove in Proposition B.3 that . Finally, applying Slutsky’s Theorem on the ratio , we obtain,
(B.40) |
which concludes the proof.
B.2 Proof of Proposition B.3
To prove Proposition B.3, we will first establish some notations and state two auxiliary results. Let and be two -dimensional Gaussian vectors, where is the circulant covariance matrix defined in (A.4). We define the entries of as,
(B.41) |
for , and . Note that , for . Define,
(B.42) | ||||
(B.43) |
We further claim that due to the assumption that , for all in Proposition B.3, it follows that . Indeed, for the rank of the covariance matrix to be larger than , at least half of its eigenvalues should be non-zero. Now, as mentioned in Subsection A, the eigenvalues of are given by , for . Thus, assuming that the spectrum of is not vanishing clearly implies that ; in fact it implies that , which is larger than , for . We have the following result.
Lemma B.5.
Before proving Lemma B.5, we use its result to establish Proposition B.3. Our goal is to prove that . By the law of total expectation, we have,
(B.47) |
More explicitly, we can write,
(B.48) |
Now, note that the inner integral can be written as,
(B.49) |
The main observation here is that the conditional distribution of on the event coincides with that of in (B.42), and similarly the conditional distribution of on the event coincides with that of in (B.43). Thus, the sum of the integrands at the r.h.s. of (B.49) is exactly the left-hand-side (l.h.s.) of (B.46), and thus by Lemma B.5, this sum is positive for every . Together with (B.48), this concludes the proof of Proposition B.3. It is left to prove Lemma B.5.
Proof of Lemma B.5.
By definition, it is clear that,
(B.50) |
for and . Since and can be decomposed as and , where is a cyclo-stationary process, and is defined in (B.41). Then,
(B.51) |
and,
(B.52) |
We will show that for any such that , we have,
(B.53) |
which in turn implies that .
By definition, since is a cyclo-stationary random process, its cumulative distribution function is invariant under cyclic shifts, i.e.,
(B.54) |
for any , where the indices are taken modulo . Furthermore, the time indices can be reverted and the distribution will remain the same, namely,
(B.55) |
Combining (B.54) and (B.55) yields,
(B.56) |
Accordingly, let us define the Gaussian vectors and , such that their -th entry is,
(B.57) |
(B.58) |
for . It is clear from (B.56) that and have the same cumulative distribution function, i.e.,
(B.59) |
Therefore, the following holds,
(B.60) |
where the second equality follows from (B.59). Next, we note that for every and ,
(B.61) |
Therefore,
(B.62) |
which implies
(B.63) |
or, equivalently,
(B.64) |
Note that the above inequality is strict if , which implies that most of the inequalities are strict (at least of the inequalities for ). Thus, using (B.63) and (B.64), we get for that,
(B.65) |
and
(B.66) |
where the strict inequality arises because there are at least arguments from the l.h.s. that are greater than their corresponding counterparts on the r.h.s. Therefore, a direct consequence of (B.66) is that since , and thus,
(B.67) |
Combining (B.60) and (B.67) leads to,
(B.68) |
or, equivalently,
(B.69) |
A similar result can be obtained for the case where , i.e.,
(B.70) |
Finally, we prove (B.46). By definition, it is clear that
(B.71) |
where we have used the fact that , for any . By Lemma B.5, for any such that it holds that , otherwise, for such that , it holds that . Therefore,
(B.72) |
which in light of (B.71) concludes the proof.
∎
Appendix C Proof of Theorem 4.3
Remark on notation. In this section, we omit the dependence on , where this is clear from the context, i.e., and .
The proof of Theorem 4.3 relies on the following central result. To state the result, it is convenient to define the functions,
(C.1) |
and
(C.2) |
for . Note that and correspond to the denominator and nominator in (B.3), respectively.
Lemma C.1.
The proof of Lemma C.1 relies on the following proposition.
Proposition C.2.
Let be a -dimensional Gaussian random vector, with mean and a covariance matrix . Assume that , where is a sequence of real-valued numbers such that , , and , as . Assume also that , as , and let . Then, for a bounded deterministic function , we have,
(C.5) |
where .
Proof of Theorem 4.3.
The signal , which satisfies the conditions of Theorem 4.3, also satisfies the conditions of Lemma C.1. By definition, the constant in (B.3) can be rewritten as,
(C.6) |
Accordingly, Lemma C.1 and (C.6) imply that,
(C.7) |
Thus, to prove (4.6) it is left to show that , as . By Assumption 4.2, and the definition of the eigenvalues of the covariance matrix (see, (A.5)), it follows that,
(C.8) |
Therefore, since we assume that the template signal is normalized, i.e., , it follows from (A.6) and (C.8) that,
(C.9) |
as claimed. Combining the last result with (C.7), we obtain that,
(C.10) |
which proves (4.6).
C.1 Proof of Lemma C.1
Our goal is to prove (C.3) and (C.4). By the law of total expectation, we have,
(C.12) |
and
(C.13) |
Accordingly, we will prove
(C.14) |
and,
(C.15) |
which yields the desired result. To that end, we will use Proposition C.2, which is proved in Section C.2.
Recall our definition for the vector in (A.2). When conditioned on , this random vector is Gaussian with mean , defined in (A.3), and a covariance matrix , defined in (A.4). Recall that we assume that these mean vectors and covariance matrices satisfy Assumption 4.2, and we claim that these satisfy the conditions of Proposition C.2 as well. Indeed, note that:
-
1.
By the definition of the covariance matrix of in (A.4), which is circulant and entirely defined by the eigenvalues given by (Eq. (A.5)) for , we have , where is the Fourier transform of the PSD of (or equivalently, the auto-correlation of ), and is defined in Proposition C.2. By Assumption 4.2, , for asymptotic ; thus, satisfying the requirement of the decay rate of the covariance entries. One should note that the eigenvalues of the covariance matrix of are and not , yet also satisfies the conditions of Proposition C.2. Also,
(C.16) which follows from the definition of in (A.5). Therefore, satisfies the conditions of Lemma C.3.
-
2.
By Assumption 4.2, we have , as , for every , implying that , where the term in is finite and independent of .
We invoke Proposition C.2 and (C.5) on and , conditioned on . Since the conditions of Proposition C.2 are satisfied, it implies that,
(C.17) |
and
(C.18) |
where . Next, we evaluate the terms at the left-hand-side of (C.17) and (C.18). Specifically, we first prove that
(C.19) |
The definition of implies that,
(C.20) |
almost surely. By definition,
(C.21) |
Now, by Assumption 4.2, we have as . Then, (C.21) and the continuous mapping theorem it imply that,
(C.22) |
Therefore, using the fact that , we deduce from the continuous mapping theorem and (C.22) that,
(C.23) |
Similarly, applying the continuous mapping theorem we also have,
(C.24) |
Now, by definition,
(C.25) |
and as so,
(C.26) |
as . Therefore, combining (C.23)–(C.26), we get,
(C.27) |
which proves (C.19).
Let us now deduce (C.3). Note that,
(C.28) |
and
(C.29) |
Therefore, substituting (C.19), (C.28), and (C.29), in (C.17), yields,
(C.30) |
Denote the term at the left-hand-side of (C.27) by , and the right-hand-side by , and so . By definition, note that , and it is clear that . Therefore, by the dominated convergence theorem, we have , and in particular,
(C.31) |
Thus, by the law of total expectation, we obtain,
(C.32) |
as , which proves (C.3).
C.2 Proof of Proposition C.2
The proof of Proposition C.2 relies on an auxiliary result, which we prove in Section C.3. To state this result, we introduce some additional notations. Let , for , be a discrete stochastic process. We define the function as follows,
(C.35) |
where is a bounded deterministic function, and . We further define,
(C.36) |
and
(C.37) |
Note that and are random variables. Finally, we denote . We have the following result.
Lemma C.3.
The following holds,
(C.38) |
Lemma C.3 implies that finding the expected value of is related directly to the derivative of the expected value of the maximum around zero. Thus, the problem of finding the expected value of is related to finding the expected value of the maximum of the stochastic process. In our case, will be a Gaussian vector with mean given by (A.3) and a covariance matrix given by (A.4). Thus, our goal now is to find the expected value of the maximum of . For this purpose, we will recall some well-known results on the maximum of Gaussian processes.
It is known that for an i.i.d. sequence of normally distributed random variables , the asymptotic distribution of the maximum is the Gumbel distribution, i.e., for any ,
(C.39) |
as , where,
(C.40) |
and,
(C.41) |
It turns out that the above convergence result remains valid even if the sequence is not independent and normally distributed. Specifically, as shown in [23, Theorem 6.2.1], a similar result holds for Gaussian random variables with a covariance matrix that decays such that , and with a mean vector whose maximum value decays faster than . These conditions precisely match those specified in Theorem 4.3.
Proof of Proposition C.2.
Conditioned on , the Gaussian vector (see, (A.3) and (A.4)) can be represented as,
(C.42) |
where is a zero mean Gaussian random vector with covariance matrix given by (A.4) and is given by (A.3). Define,
(C.43) |
where we use the same notations as in Lemma C.3. Then, using Lemma C.3,
(C.44) |
where . Therefore, our goal is now to find the derivative of .
Using [23, Theorem 6.2.1], under the assumptions of Proposition C.2, for a sufficiently small value of such that , we have for any ,
(C.45) |
where and are given in (C.40) and (C.41), respectively, and
(C.46) |
For brevity, we denote,
(C.47) |
and we note that,
(C.48) |
and so,
(C.49) |
for any . The following result shows converges zero in the sense.
Lemma C.4.
For any ,
(C.50) |
i.e., , as .
Proof of Lemma C.4.
To prove (C.50), we will first show that converges to zero in probability. Because is uniformly integrable, this is sufficient for the desired convergence above. Specifically, recall from (C.45) that converges in distribution to the Gumbel random variable with location zero and unit scale, i.e., , as . Furthermore, it is clear that , as . Thus, Slutsky’s theorem [41] implies that,
(C.51) |
It is known that convergence in distribution to a constant implies also convergence in probability to the same constant [15], and thus,
(C.52) |
Therefore, the above result together with the continuous mapping theorem [15] implies that,
(C.53) |
for every .
Next, we show that is bounded with probability one. Indeed, by the definition of in (C.36), we have,
(C.54) |
for some , where we have used the fact that is bounded. Furthermore, note that,
(C.55) |
which is bounded because,
(C.56) |
Combining (C.49), (C.54) and (C.56), leads to,
(C.57) |
Now, since is bounded, it is also uniformly integrable, and thus when combined with (C.53) we may conclude that,
(C.58) |
as claimed. ∎
We continue with the proof of Proposition C.2. First, we show that,
(C.59) |
Indeed, note that,
(C.60) |
where is the probability measure associated with . From (C.57) we know that is bounded. Thus, applying the dominated convergence theorem, we obtain,
(C.61) |
Since the integral at the right-hand-side of (C.61) is finite and bounded for each value of , and for each value of , the order of the limits can be exchanged, thus leading to (C.59). Therefore, from (C.49) and (C.59), we have,
(C.62) | ||||
(C.63) |
Now, Lemma C.4 implies that the left-hand-side of (C.62) nullifies, and thus,
(C.64) |
Finally, combining (C.55) and (C.64), we obtain (C.5), which concludes the proof.
∎
C.3 Proof of Lemma C.3
The proof technique of Lemma C.3 is similar to the technique used in [32, 25], but with a non-trivial adaption to the discrete case. To prove this lemma, we will first establish a deterministic counterpart of (C.38). Specifically, we define,
(C.65) |
where . The functions , and are assumed bounded and deterministic. We further assume that is injective, i.e., for , we have . Define,
(C.66) |
and note that is well-defined over the supports of and , and it is a continuous function of around . Finally, we let,
(C.67) |
We have the following result.
Lemma C.5.
The following relation holds,
(C.68) |
Proof of Lemma C.5.
Note that,
(C.69) |
By the definition of , we have,
(C.70) |
and
(C.71) |
Now, the main observation here is that for a sufficiently small value of around zero, we must have that and equal because can take discrete values only, and it is unique. Thus, for , we have,
(C.72) |
Combining (C.69)–(C.72) yields,
(C.73) | ||||
(C.74) |
which concludes the proof. ∎
We are now in a position to prove Lemma C.3. Similarly to the deterministic case, we define the random function,
(C.75) |
where is a discrete stochastic process, and is a deterministic function. We assume that has a continuous probability distribution without any single point with a measure greater than 0. Using Lemma C.5, for each realization of , such that is injective, we have,
(C.76) |
Under the assumption above of , the measure of the set of events that is not injective is zero. Therefore, the fact that is bounded (see, (C.54)) and (C.76), imply that,
(C.77) |
which concludes the proof.