Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine

Hemker, Konstantin; Simidjievski, Nikola; Jamnik, Mateja

Computer Science > Machine Learning

arXiv:2405.19950 (cs)

[Submitted on 30 May 2024 (v1), last revised 16 Apr 2025 (this version, v2)]

Title:Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine

Authors:Konstantin Hemker, Nikola Simidjievski, Mateja Jamnik

View PDF HTML (experimental)

Abstract:Learning holistic computational representations in physical, chemical or biological systems requires the ability to process information from different distributions and modalities within the same model. Thus, the demand for multimodal machine learning models has sharply risen for modalities that go beyond vision and language, such as sequences, graphs, time series, or tabular data. While there are many available multimodal fusion and alignment approaches, most of them require end-to-end training, scale quadratically with the number of modalities, cannot handle cases of high modality imbalance in the training set, or are highly topology-specific, making them too restrictive for many biomedical learning tasks. This paper presents Multimodal Lego (MM-Lego), a general-purpose fusion framework to turn any set of encoders into a competitive multimodal model with no or minimal fine-tuning. We achieve this by introducing a wrapper for any unimodal encoder that enforces shape consistency between modality representations. It harmonises these representations by learning features in the frequency domain to enable model merging with little signal interference. We show that MM-Lego 1) can be used as a model merging method which achieves competitive performance with end-to-end fusion models without any fine-tuning, 2) can operate on any unimodal encoder, and 3) is a model fusion method that, with minimal fine-tuning, surpasses all benchmarks in five out of seven datasets.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2405.19950 [cs.LG]
	(or arXiv:2405.19950v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2405.19950

Submission history

From: Konstantin Hemker [view email]
[v1] Thu, 30 May 2024 11:14:01 UTC (9,702 KB)
[v2] Wed, 16 Apr 2025 16:43:35 UTC (2,385 KB)

Computer Science > Machine Learning

Title:Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators