Ultra-Resolution Adaptation with Ease
Ruonan Yu*, Songhua Liu*, Zhenxiong Tan, and Xinchao Wang
xML Lab, National University of Singapore
- Easy-to-Use High-Quality and High-Resolution Generation😊: Ultra-Resolution Adaptation with Ease, or URAE in short, generates high-resolution images with FLUX, with minimal code modifications.
- Easy Training🚀: URAE tames light-weight adapters with a handful of synthetic data from FLUX1.1 Pro Ultra.
[2025/05/01] URAE is accepted by ICML2025! 🎉
[2025/03/20] We release models and codes for both training and inference of URAE.
Text-to-image diffusion models have achieved remarkable progress in recent years. However, training models for high-resolution image generation remains challenging, particularly when training data and computational resources are limited. In this paper, we explore this practical problem from two key perspectives: data and parameter efficiency, and propose a set of key guidelines for ultra-resolution adaptation termed URAE. For data efficiency, we theoretically and empirically demonstrate that synthetic data generated by some teacher models can significantly promote training convergence. For parameter efficiency, we find that tuning minor components of the weight matrices outperforms widely-used low-rank adapters when synthetic data are unavailable, offering substantial performance gains while maintaining efficiency. Additionally, for models leveraging guidance distillation, such as FLUX, we show that disabling classifier-free guidance, i.e., setting the guidance scale to 1 during adaptation, is crucial for satisfactory performance. Extensive experiments validate that URAE achieves comparable 2K-generation performance to state-of-the-art closed-source models like FLUX1.1 [Pro] Ultra with only 3K samples and 2K iterations, while setting new benchmarks for 4K-resolution generation.
-
If you have not, install PyTorch, diffusers, transformers, and peft.
-
Clone this repo to your project directory:
git clone https://github.com/Huage001/URAE.git cd URAE
-
You only need minimal modifications!
import torch - from diffusers import FluxPipeline + from pipeline_flux import FluxPipeline + from transformer_flux import FluxTransformer2DModel bfl_repo = "black-forest-labs/FLUX.1-dev" + transformer = FluxTransformer2DModel.from_pretrained(bfl_repo, subfolder="transformer", torch_dtype=torch.bfloat16) - pipe = FluxPipeline.from_pretrained(bfl_repo, torch_dtype=torch.bfloat16) + pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=transformer, torch_dtype=torch.bfloat16) + pipe.scheduler.config.use dynamic_shifting = False + pipe.scheduler.config.time shift = 10 pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power + pipe.load_lora_weights("Huage001/URAE", weight_name="urae_2k_adapter.safetensors") prompt = "An astronaut riding a green horse" image = pipe( prompt, - height=1024, - width=1024, + height=2048, + width=2048, guidance_scale=3.5, num_inference_steps=50, max_sequence_length=512, generator=torch.Generator("cpu").manual_seed(0) ).images[0] image.save("flux-urae.png")
⚠️ FLUX requires at least 28GB of GPU memory to operate at a 2K resolution. A 48GB GPU is recommended for the full functionalities of URAE, including both 2K and 4K. We are actively integrating model lightweighting strategies into URAE! If you have a good idea, feel free to submit a PR! -
Do not want to run the codes? No worry! Try the model on Huggingface Space!
- URAE w. FLUX1.schnell (Faster)
- URAE w. FLUX1.dev (Higher Quality)
-
Clone this repo to your project directory:
git clone https://github.com/Huage001/URAE.git cd URAE
-
URAE has been tested on
torch==2.5.1
anddiffusers==0.31.0
, but it should also be compatible to similar variants. You can set up a new environment if you wish and install packages listed inrequirements.txt
:conda create -n URAE python=3.12 conda activate URAE pip install -r requirements.txt
-
We use LoRA adapters to adapt FLUX 1.dev to 2K resolution. You can try the corresponding URAE model in
inference_2k.ipynb
. -
We also support FLUX 1.schnell for a faster inference process. Please refer to
inference_2k_schnell.ipynb
.
- Instead of LoRA, we use minor-component adapters at the 4K stage. You can try the models for FLUX 1.dev and FLUX 1.schnell in
inference_4k.ipynb
andinference_4k_schnell.ipynb
. - Alternatively, although these adapters are different from LoRA, it can also be converted to the LoRA format, so that we can apply interfaces provided by
peft
for a more convenient usage. If you only want to try the models instead of understanding the principle,inference_4k_lora_conversion.ipynb
andinference_4k_lora_conversion_schnell.ipynb
are what you want! Their outputs should be equivalent to the above counterparts.
🚧 The 4K model is still in beta and the performance may not be stable. A more recommended use case for the current 4K model is to integrate it with some training-free high-resolution generation pipelines based on coarse-to-fine strategies, such as SDEdit (refer to this repo for a sample usage) and I-Max, and load the 4K adapter at high resolution stages.
-
Prepare training data.
-
To train the 2K model, we collect 3,000 images generated by FLUX1.1 Pro Ultra. You can use your API and follow the instructions on acquiring images.
-
To train the 4K model, we use ~16,000 images with resolution greater than 4K from LAION-High-Resolution.
-
The training data folder should be organized to the following format:
train_data ├── image_0.jpg ├── image_0.json ├── image_1.jpg ├── image_1.json ├── ...
The json file should contain a dictionary with entries
prompt
and/orgenerated_prompt
, to specify the original caption and generated caption with detailed descriptions by VLM such as GPT-4o or Llama3. -
The T5 and VAE can take a large amount of GPU memory, which can trigger OOM when training at high resolutions. Therefore, we pre-cache the T5 and VAE features instead of computing them online:
bash cache_prompt_embeds.sh bash cache_latent_codes.sh
Make sure to modify these bash files beforehand and configure the number of processes in parallel (
$NUM_WORKERS
), training data folder ($DATA_DIR
), the target resolution (--resolution
), and the key to the prompt in json files (--column
, can beprompt
orgenerated_prompt
). -
The final format of the training data folder should be:
train_data ├── image_0_generated_prompt_embed.safetensors ├── image_0.jpg ├── image_0.json ├── image_0_latent_code.safetensors ├── image_0_prompt_embed.safetensors ├── image_1_generated_prompt_embed.safetensors ├── image_1.jpg ├── image_1.json ├── image_1_latent_code.safetensors ├── image_1_prompt_embed.safetensors ├── ...
-
-
Let's start training! For 2K model, make sure to modify
train_2k.sh
and configure the training data folder ($DATA_DIR
), output folder ($OUTPUT_DIR
), and the number of GPUs (--num_processes
).bash train_2k.sh
-
4K model is based on the 2K LoRA trained previously. Make sure to modify
train_4k.sh
to configure the path (--pretrained_lora
) in addition to the aforementioned items.bash train_4k.sh
⚠️ Currently, if the training images have different resolutions, the--train_batch_size
can only be 1 because we did not customize data sampler to handle this case. Nevertheless, in practice, with such large resolutions, the maximal batch size per GPU (before gradient accumulation) can only be 1 even on 80G GPUs 😂.
- FLUX for the source models.
- diffusers, CLEAR, and I-Max for the code base.
- patch_conv for solution to VAE OOM error at high resolutions.
- @Xinyin Ma for valuable discussions.
- NUS IT’s Research Computing group using grant numbers NUSREC-HPC-00001.
If you finds this repo is helpful, please consider citing:
@article{yu2025urae,
title = {Ultra-Resolution Adaptation with Ease},
author = {Yu, Ruonan and Liu, Songhua and Tan, Zhenxiong and Wang, Xinchao},
journal = {International Conference on Machine Learning},
year = {2025},
}