PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach

Rai, Nitin; Schumann, Arnold W.; Boyd, Nathan

doi:10.48550/arXiv.2505.01823

Computer Science > Computer Vision and Pattern Recognition

arXiv:2505.01823 (cs)

[Submitted on 3 May 2025]

Title:PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach

Authors:Nitin Rai, Arnold W. Schumann, Nathan Boyd

View PDF HTML (experimental)

Abstract:Collecting large-scale crop disease images in the field is labor-intensive and time-consuming. Generative models (GMs) offer an alternative by creating synthetic samples that resemble real-world images. However, existing research primarily relies on Generative Adversarial Networks (GANs)-based image-to-image translation and lack a comprehensive analysis of computational requirements in agriculture. Therefore, this research explores a multi-modal text-to-image approach for generating synthetic crop disease images and is the first to provide computational benchmarking in this context. We trained three Stable Diffusion (SD) variants-SDXL, SD3.5M (medium), and SD3.5L (large)-and fine-tuned them using Dreambooth and Low-Rank Adaptation (LoRA) fine-tuning techniques to enhance generalization. SD3.5M outperformed the others, with an average memory usage of 18 GB, power consumption of 180 W, and total energy use of 1.02 kWh/500 images (0.002 kWh per image) during inference task. Our results demonstrate SD3.5M's ability to generate 500 synthetic images from just 36 in-field samples in 1.5 hours. We recommend SD3.5M for efficient crop disease data generation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
Cite as:	arXiv:2505.01823 [cs.CV]
	(or arXiv:2505.01823v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2505.01823
Related DOI:	https://doi.org/10.48550/arXiv.2505.01823

Submission history

From: Nitin Rai [view email]
[v1] Sat, 3 May 2025 14:03:42 UTC (2,687 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:PhytoSynth: Leveraging Multi-modal Generative Models for Crop Disease Data Generation with Novel Benchmarking and Prompt Engineering Approach

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators