R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Li, Zhongyang; Li, Ziyue; Zhou, Tianyi

Computer Science > Machine Learning

arXiv:2502.20395 (cs)

[Submitted on 27 Feb 2025 (v1), last revised 1 Mar 2025 (this version, v2)]

Title:R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Authors:Zhongyang Li, Ziyue Li, Tianyi Zhou

View PDF HTML (experimental)

Abstract:In large multimodal models (LMMs), the perception of non-language modalities (e.g., visual representations) is usually not on par with the large language models (LLMs)' powerful reasoning capabilities, deterring LMMs' performance on challenging downstream tasks. This weakness has been recently mitigated by replacing the vision encoder with a mixture-of-experts (MoE), which provides rich, multi-granularity, and diverse representations required by diverse downstream tasks. The performance of multimodal MoE largely depends on its router, which reweights and mixes the representations of different experts for each input. However, we find that the end-to-end trained router does not always produce the optimal routing weights for every test sample. To bridge the gap, we propose a novel and efficient method "Re-Routing in Test-Time (R2-T2)" that locally optimizes the vector of routing weights in test-time by moving it toward those vectors of the correctly predicted samples in a neighborhood of the test sample. We propose three R2-T2 strategies with different optimization objectives and neighbor-search spaces. R2-T2 consistently and greatly improves state-of-the-art LMMs' performance on challenging benchmarks of diverse tasks, without training any base-model parameters.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2502.20395 [cs.LG]
	(or arXiv:2502.20395v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.20395

Submission history

From: Zhongyang Li [view email]
[v1] Thu, 27 Feb 2025 18:59:32 UTC (7,316 KB)
[v2] Sat, 1 Mar 2025 02:17:00 UTC (7,316 KB)

Computer Science > Machine Learning

Title:R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators