Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Sun, Lingchen; Wu, Rongyuan; Ma, Zhiyuan; Liu, Shuaizheng; Yi, Qiaosi; Zhang, Lei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.03017 (cs)

[Submitted on 4 Dec 2024 (v1), last revised 3 Apr 2025 (this version, v2)]

Title:Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Authors:Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, Lei Zhang

View PDF HTML (experimental)

Abstract:Diffusion prior-based methods have shown impressive results in real-world image super-resolution (SR). However, most existing methods entangle pixel-level and semantic-level SR objectives in the training process, struggling to balance pixel-wise fidelity and perceptual quality. Meanwhile, users have varying preferences on SR results, thus it is demanded to develop an adjustable SR model that can be tailored to different fidelity-perception preferences during inference without re-training. We present Pixel-level and Semantic-level Adjustable SR (PiSA-SR), which learns two LoRA modules upon the pre-trained stable-diffusion (SD) model to achieve improved and adjustable SR results. We first formulate the SD-based SR problem as learning the residual between the low-quality input and the high-quality output, then show that the learning objective can be decoupled into two distinct LoRA weight spaces: one is characterized by the $\ell_2$-loss for pixel-level regression, and another is characterized by the LPIPS and classifier score distillation losses to extract semantic information from pre-trained classification and SD models. In its default setting, PiSA-SR can be performed in a single diffusion step, achieving leading real-world SR results in both quality and efficiency. By introducing two adjustable guidance scales on the two LoRA modules to control the strengths of pixel-wise fidelity and semantic-level details during inference, PiSASR can offer flexible SR results according to user preference without re-training. Codes and models can be found at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.03017 [cs.CV]
	(or arXiv:2412.03017v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.03017

Submission history

From: Lingchen Sun [view email]
[v1] Wed, 4 Dec 2024 04:07:49 UTC (38,133 KB)
[v2] Thu, 3 Apr 2025 07:58:27 UTC (22,320 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators