Improving variable selection properties by leveraging external data

Rognon-Vael, Paul; Rossell, David; Zwiernik, Piotr

Mathematics > Statistics Theory

arXiv:2502.15584 (math)

[Submitted on 21 Feb 2025 (v1), last revised 29 Mar 2025 (this version, v2)]

Title:Improving variable selection properties by leveraging external data

Authors:Paul Rognon-Vael, David Rossell, Piotr Zwiernik

View PDF

Abstract:Sparse high-dimensional signal recovery is only possible under certain conditions on the number of parameters, sample size, signal strength and underlying sparsity. We show that leveraging external information, as possible with data integration or transfer learning, allows to push these mathematical limits. Specifically, we consider external information that allows splitting parameters into blocks, first in a simplified case, the Gaussian sequence model, and then in the general linear regression setting. We show how external information dependent, block-based, $\ell_0$ penalties attain model selection consistency under milder conditions than standard $\ell_0$ penalties, and they also attain faster model recovery rates. We first provide results for oracle-based $\ell_0$ penalties that have access to perfect sparsity and signal strength information. Subsequently, we propose an empirical Bayes data analysis method that does not require oracle information and for which efficient computation is possible via standard MCMC techniques. Our results provide a mathematical basis to justify the use of data integration methods in high-dimensional structural learning.

Subjects:	Statistics Theory (math.ST); Methodology (stat.ME)
MSC classes:	62F07 (Primary) 62C12, 62R07 (Secondary)
Cite as:	arXiv:2502.15584 [math.ST]
	(or arXiv:2502.15584v2 [math.ST] for this version)
	https://doi.org/10.48550/arXiv.2502.15584

Submission history

From: Paul Rognon-Vael [view email]
[v1] Fri, 21 Feb 2025 16:52:20 UTC (465 KB)
[v2] Sat, 29 Mar 2025 16:55:46 UTC (471 KB)

Mathematics > Statistics Theory

Title:Improving variable selection properties by leveraging external data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Statistics Theory

Title:Improving variable selection properties by leveraging external data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators