Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization

Chen, Yang; Yang, Long; Liang, Yitao; Lin, Zhouchen

Computer Science > Machine Learning

arXiv:2410.08898 (cs)

[Submitted on 11 Oct 2024 (v1), last revised 9 Jun 2025 (this version, v2)]

Title:Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization

Authors:Yang Chen, Long Yang, Yitao Liang, Zhouchen Lin

View PDF HTML (experimental)

Abstract:Low-Dimension-to-High-Dimension (LDHD) generalization is a special case of Out-of-Distribution (OOD) generalization, where the training data are restricted to a low-dimensional subspace of the high-dimensional testing space. Assuming that each instance is generated from a latent variable and the dimension of the latent variable reflects the problem scale, the inherent scaling challenge in length generalization can be captured by the LDHD generalization in the latent space. We theoretically demonstrate that LDHD generalization is generally unattainable without exploiting prior knowledge to provide appropriate inductive bias. Specifically, we explore LDHD generalization in Boolean functions. We verify that different architectures trained with (S)GD converge to \emph{min-degree interpolators w.r.t. different independent sets}. LDHD generalization is achievable if and only if the target function coincides with this inductive bias. Applying the insights from LDHD generalization to length generalization, we explain the effectiveness of CoT as changing the structure latent space to enable better LDHD generalization. We also propose a principle for position embedding design to handle both the inherent LDHD generalization and the nuisances such as the data format. Following the principle, we propose a novel position embedding called RPE-Square that remedies the RPE for dealing with the data format nuisance.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2410.08898 [cs.LG]
	(or arXiv:2410.08898v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.08898

Submission history

From: Yang Chen [view email]
[v1] Fri, 11 Oct 2024 15:18:43 UTC (111 KB)
[v2] Mon, 9 Jun 2025 07:19:42 UTC (275 KB)

Computer Science > Machine Learning

Title:Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators