A Tiered GAN Approach For Monet-Style Image Generation
A Tiered GAN Approach For Monet-Style Image Generation
Generation
FNU Neha Deepshikha Bhati Deepak Kumar Shukla Md Amiruzzaman
Dept. of Computer Science Dept. of Computer Science Rutgers Business School Dept. of Computer Science
Kent State University Kent State University Rutgers University West Chester University
Kent, OH, USA Kent, OH, USA Newark, New Jersey, USA West Chester, PA, USA
[email protected] [email protected] [email protected] [email protected]
arXiv:2412.05724v1 [cs.CV] 7 Dec 2024
max Ex∼pdata (x) [log D(x)] + Ez∼pz (z) [log(1 − D(G(z))], (6)
D(x; θD ), (2) D
(c) M2 to M1 Training Loss (d) M1 to MF Training Loss as suggested by [22]; second, replacing the input dense layers
with convolutional layers to reduce memory usage and support
Fig. 6: Training loss curves for discriminator (d loss) and gen- larger input sizes.
erator (g loss) across four phases(a-d): showing the adversarial Initial attempts with standard Conv2D layers proved inef-
dynamics and progressive model refinement. fective. Drawing on methods from [21] and [24], the approach
was adjusted to use deconvolution layers (Conv2DTranspose)
constant values of 1, highlighting the generator’s inability to for upsampling, along with batch normalization to improve
learn meaningful features. This required a thorough reevalua- training stability and reduce convergence time. Despite these
tion of the model’s architecture and training strategy. Figures modifications, the generated images lacked clarity and artistic
6 show the training loss for each GAN tier, with a high initial resemblance to Monet’s style.
loss, particularly for the generator, which gradually decreases The use of a smaller 128-element input array with upsam-
as both the generator and discriminator learn to balance pling showed better results. While clarity remained a limi-
each other. Specifically, Figure 6c highlights the continued tation, the generated images exhibited recognizable patterns
refinement of the model during this phase, where the generator of light and dark areas, indicating that the model had started
and discriminator show improved synchronization, reducing learning relevant features from the dataset. Figures 7 and 8
loss and producing more coherent outputs. show the training loss and an example of the generated images
for this approach.
B. Model Re-evaluation and Alternative Approaches The generated images were evaluated for their resemblance
Issues were identified in the initial tiered GAN models, to Monet’s style and ability to capture artistic details. Quanti-
particularly with the downsampling convolution layers in tative metrics like training loss monitored model performance,
the generator. To address these challenges, two alternative while human evaluation ensured alignment with Monet’s
approaches were explored: first, using a small array of 128 unique style. This combined approach of automated metrics
random values and upsampling them to the target image size, and human assessment identified areas for further refinement
Fig. 8: Generated Image from Small Input Vector with Upsampling
in the model’s architecture and training. [8] A. Jang, A. S. Uzsoy, and P. Culliton, “I’m something of a painter my-
self,” https://www.kaggle.com/competitions/gan-getting-started, 2020.
VI. C ONCLUSION AND F UTURE W ORK [9] L. Vela, F. Fuentes-Hurtado, and A. Colomer, “Improving the quality
of image generation in art with top-k training and cyclic generative
This research introduced a tiered GAN architecture to em- methods,” Scientific Reports, vol. 13, no. 1, p. 17764, 2023.
[10] X. Jin, “[retracted] art style transfer of oil painting based on parallel
ploy multiple GANs sequentially for enhancing image quality, convolutional neural network,” Security and Communication Networks,
transforming low-quality images into refined representations of vol. 2022, no. 1, p. 5087129, 2022.
Monet’s style. The training methodology efficiently handled [11] M. Mirza, “Conditional generative adversarial nets,” arXiv preprint
arXiv:1411.1784, 2014.
large images using downsampling and convolutional layers, [12] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image
enabling high-quality artistic generation with limited compu- translation using cycle-consistent adversarial networks,” in Proceedings
tational resources. of the IEEE international conference on computer vision, 2017, pp.
2223–2232.
Experimental results were mixed; while the system showed [13] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “An-
potential, it struggled to fully capture Monet’s intricacies. The alyzing and improving the image quality of stylegan,” in Proceedings of
limited dataset of 300 images likely constrained the model’s the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), June 2020.
ability to learn complex artistic features. A larger and more [14] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing
diverse dataset could improve the model’s performance and of gans for improved quality, stability, and variation,” 2018. [Online].
learning capability. Available: https://arxiv.org/abs/1710.10196
[15] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta,
Future work will address these challenges through three A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic
strategies: (1) using larger datasets, augmented with bootstrap single image super-resolution using a generative adversarial network,”
aggregating (bagging), to enhance prediction stability and in 2017 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2017, pp. 105–114.
robustness [25]; (2) employing distributed computing inspired [16] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using
by Firebase for efficient processing and synchronization of convolutional neural networks,” Proceedings of the IEEE conference on
large datasets [26]; and (3) incorporating pre-trained models computer vision and pattern recognition, pp. 2414–2423, 2016.
[17] W. R. Tan, C. S. Chan, H. Aguirre, and K. Tanaka, “Artgan: Artwork
via Transfer Learning to accelerate convergence and better synthesis with conditional categorical gans,” 2017. [Online]. Available:
capture Monet’s artistic style. https://arxiv.org/abs/1702.03410
These approaches aim to refine the tiered GAN system by [18] A. Elgammal, B. Liu, M. Elhoseiny, and M. Mazzone, “Can:
Creative adversarial networks, generating ”art” by learning about
leveraging real-time data handling and distributed computing, styles and deviating from style norms,” 2017. [Online]. Available:
addressing current limitations in computational resources and https://arxiv.org/abs/1706.07068
dataset size. [19] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv
preprint arXiv:1701.07875, 2017.
[20] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normal-
R EFERENCES ization for generative adversarial networks,” in International Conference
[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, on Learning Representations, 2018.
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” [21] A. Jang, “Monet cyclegan tutorial,” https://www.kaggle.com/code/
Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020. amyjang/monet-cyclegan-tutorial/notebook, 2020, accessed 23 July
2023.
[2] C. Hu, T. Tu, Y. Gong, J. Jiang, Z. Zheng, and D. Cheng, “Tackling
[22] N. Renotte, “Build a generative adversarial neural network with
multiplayer interaction for federated generative adversarial networks,”
tensorflow and python — deep learning projects,” YouTube, June 2022,
IEEE Transactions on Mobile Computing, 2024.
accessed 18 July 2023. [Online]. Available: https://www.youtube.com/
[3] Y. Chang, “Enhancing super resolution of oil painting patterns through
watch?v=AALBGpLbj6Q
optimization of unet architecture model,” Soft Computing, vol. 28, no. 2,
[23] PodcastPrereamea, “Cyclegan monet paintings,” https://www.kaggle.
pp. 1295–1316, 2024.
com/code/podcastprereamea/cyclegan-monet-paintings/data, 2024.
[4] Z. Cai, Z. Xiong, H. Xu, P. Wang, W. Li, and Y. Pan, “Generative
[24] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing
adversarial networks: A survey toward private and secure applications,”
of gans for improved quality, stability, and variation,” in International
ACM Computing Surveys (CSUR), vol. 54, no. 6, pp. 1–38, 2021.
Conference on Learning Representations, 2018.
[5] P. Sharma, M. Kumar, H. K. Sharma, and S. M. Biju, “Generative ad-
[25] A. Darweesh, A. Abouelfarag, and R. Kadry, “Real time adaptive
versarial networks (gans): Introduction, taxonomy, variants, limitations,
approach for image processing using mobile nodes,” in 2018 6th Inter-
and applications,” Multimedia Tools and Applications, pp. 1–48, 2024.
national Conference on Future Internet of Things and Cloud Workshops
[6] D. Bhati, F. Neha, and M. Amiruzzaman, “A survey on explainable
(FiCloudW). IEEE, 2018, pp. 158–163.
artificial intelligence (XAI) techniques for visualizing deep learning
[26] R. Eltehewy, A. Abouelfarag, and S. N. Saleh, “Efficient classifica-
models in medical imaging,” Journal of Imaging, vol. 10, no. 10, p.
tion of imbalanced natural disasters data using generative adversarial
239, 2024.
networks for data augmentation,” ISPRS International Journal of Geo-
[7] N. Shi, Z. Chen, L. Chen, and R. S. Lee, “Relu-oscillator: Chaotic vgg10
Information, vol. 12, no. 6, p. 245, 2023.
model for real-time neural style transfer on painting authentication,”
Expert Systems with Applications, p. 124510, 2024.