Group 8

Uploaded by

r93235595

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views13 pages

Group 8

Uploaded by

r93235595

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

odisha university of

technology and research

VisuScript: Text to Image Generation Using
Stack GAN
Minor Project
6th Sem
Presented by: Project Mentor:
Rahul Kumar Sahoo - Dr. Subhalaxmi Das
2211100132
Manasi Naik - 2211100190
Sovan Sekhar Senapati -
OVERVI
EW
• • Proposed
• Introduction
Motive • Model Result
• Literature • Analysis
Conclusion
• Review
Research • Future
• Gap Work
Objective • Reference
INTRODUCTION
• Generative Adversarial Networks (GANs)
GANs revolutionize data synthesis by leveraging a dual-network
system—Generator vs. Discriminator—engaged in a competitive
learning process to produce highly realistic synthetic data.
• StackGAN Innovation
StackGAN extends the GAN framework with a multi-stage
architecture, refining image generation in stages—from low-resolution
sketches to high-quality photorealistic images.
• Text-to-Image Translation
By converting natural language descriptions into detailed visual
content, StackGAN enables AI to interpret and visualize linguistic input
effectively.
• Impact and Applications
This model paves the way for breakthroughs in AI creativity, content
generation, and human-computer interaction, particularly in areas like
design, media, and assistive tech.
MOTIVATI
ON
• Text-to-Image Bridging
To enable AI systems to translate natural language descriptions
into
vivid and realistic images.
• High-Quality Image
To generate high-resolution, photorealistic images using a multi-stage
Generation
GAN
(StackGAN) framework.

To use LSTM encoders

• Enhanced Text for capturing deep semantic and sequential
meaning
Understanding
from text inputs.

To integrate Conditioning Augmentation (CA) for stabilizing training

and
• Training Stability &
promoting output diversity.
Diversity
To support creative and practical applications in fields like design, media,
and
assistive technology.
LITERATURE
REVIEW
RESEARCH
GAP
Limitations in Existing Models:
• Similar architectures often produce blurry or low-quality images at high
resolutions.
• The generator might fail to explore the full variety of images, producing only a
limited set of outputs.
• Most existing text-to-image models rely on conditional CNNs or Bag-of-Words as
text encoders, which often fail to capture deeper contextual and semantic
relationships in complex textual descriptions, limiting the richness and accuracy of
generated images.
• Methods like conditional CNNs and Bag-of-Words miss deep contextual meaning in
Need for a Solution:
text, limiting image detail.
• There’s a gap in generating high-resolution images with sharp details that align
well with complex textual descriptions.
• To integrate hybrid architectures combining CNN and RNN capabilities for richer
text-image alignment and better feature representation.
• LSTM encoders offer better contextual understanding than CNNs or Bag-of-
Words for complex text.
OBJECTIV
ES
• To design a text-to-image synthesis model using StackGAN for
generating high-resolution, realistic images.
• To implement LSTM-based text encoders for capturing deep semantic
and contextual meaning from textual descriptions.
• To integrate Conditioning Augmentation (CA) for improving training
stability and enhancing diversity in generated outputs.
• To evaluate the model's performance using loss curves and qualitative
image outputs for realism and relevance.
• To evaluate the generated images using metrics like Inception Score
(IS) and Fréchet Inception Distance (FID).
• To explore potential improvements and real-world applications of text-
to-image generation in AI, design, and creative industries.
PROPOSED
Text Encoding (LSTM)
MODEL
• Processes input text sequentially, capturing
word relationships and context
• Outputs a fixed-dimensional semantic vector
representing the full description
Conditioning Augmentation (CA)
• Adds controlled noise to text embeddings to
increase output diversity
• Ensures robustness against minor text
variations while preserving core meaning
Stage I (Low-Res Generation)
• Produces a 64×64 base image with correct
layout and primary colors
• Focuses on structural accuracy rather than
fine details
Stage II (High-Res Refinement)
• Enhances resolution to 256×256 while
adding realistic textures
• Uses cross-modal attention to align visual
details with text descriptions

Datasets Used: MNIST Handswritten Letters, CUB Birds, Oxford

Flower
RESULT
ANALYSIS
Epoch 20 (Real(top) vs Generated(Bottom) Epoch 50 (Real(top) vs Generated(Bottom)

Graphical Representation of Generator Loss and Discriminator

Loss
CONCLUSI
• StackGAN ON
proves effective for high-quality text-to-
image synthesis.
• Two-stage architecture enhances resolution and
realism.
• Applicable across many domains, from design to
forensics.
• StackGAN enables text-to-image synthesis, which can
be applied in advertising, gaming, virtual reality, and
accessibility tools.
• Improving and modifying GANs leads to higher fidelity,
making them more useful in real-world deployment
FUTURE
WORK
• Use transformer-based text encoders for better
semantic understanding
• Enable real-time text-to-image generation
• Add support for multi-modal inputs (e.g., audio,
sketches)
• Train on larger datasets to handle complex scene
generation
• Integrate the model into creative tools and
applications
REFERENCES
Arya, R., Bhakuni, V. S., Joshi, D., Sharma, K., Vats, S., & Sharma, V.
(2024). Stacked Generative Adversarial Networks (StackGAN) Text-to-
Image Generator.
Sahithi, Y. L., Sunny, N., Deepak, M. M. L., & Amrutha, S. (2023). Text-to-
Image Synthesis using StackGAN. In 2023 Global Conference on
Information Technologies and Communications (GCITC), Karnataka, India.
IEEE.
Dhivya, K., & Navas, N. S. (2020). Text to Realistic Image Generation
Using StackGAN. In 2020 7th International Conference on Smart
Structures and Systems (ICSSS). IEEE
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas, D.
(2017). StackGAN: Text to Photo-realistic Image Synthesis with Stacked
Generative Adversarial Networks. arXiv preprint arXiv:1612.03242.
THANK YOU

Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
BTP Presentation On Text To Image Synthesis
100% (1)
BTP Presentation On Text To Image Synthesis
38 pages
BTP - 6 Sem - Part1
No ratings yet
BTP - 6 Sem - Part1
40 pages
Documents 5
No ratings yet
Documents 5
5 pages
Engproc 20 00016 With Cover
No ratings yet
Engproc 20 00016 With Cover
7 pages
perceptionGAN Preprint PDF
No ratings yet
perceptionGAN Preprint PDF
7 pages
Survey Paper On Text-to-Image Generation
No ratings yet
Survey Paper On Text-to-Image Generation
8 pages
Utilizing Generative AI For Text-To-Image Generation
No ratings yet
Utilizing Generative AI For Text-To-Image Generation
6 pages
Just Report Without Info
No ratings yet
Just Report Without Info
25 pages
An Adaptive Approach To Text To Image
No ratings yet
An Adaptive Approach To Text To Image
5 pages
Ijariie 26613
No ratings yet
Ijariie 26613
5 pages
Building A System That Can Generate High
No ratings yet
Building A System That Can Generate High
2 pages
Text and Image Generation With Generative Ai
No ratings yet
Text and Image Generation With Generative Ai
14 pages
Generating AI Text To Image A Comprehensive Guide
No ratings yet
Generating AI Text To Image A Comprehensive Guide
3 pages
AI Image Generator PPT-1
No ratings yet
AI Image Generator PPT-1
15 pages
Final All Correct
No ratings yet
Final All Correct
49 pages
Tao DF-GAN A Simple and Effective Baseline For Text-to-Image Synthesis CVPR 2022 Paper
No ratings yet
Tao DF-GAN A Simple and Effective Baseline For Text-to-Image Synthesis CVPR 2022 Paper
11 pages
Advancements in Text To Image Generation Through Generative AI
No ratings yet
Advancements in Text To Image Generation Through Generative AI
8 pages
Sketch-to-Photo Synthesis Guide
No ratings yet
Sketch-to-Photo Synthesis Guide
19 pages
Dense Caption Imagining
No ratings yet
Dense Caption Imagining
8 pages
Base Paper Batch 9 Final Updated 3
No ratings yet
Base Paper Batch 9 Final Updated 3
10 pages
Mpai05 - Final Document
No ratings yet
Mpai05 - Final Document
40 pages
Dynamic Image Generation From Text Prompt Research Paper-JOT-5135
100% (1)
Dynamic Image Generation From Text Prompt Research Paper-JOT-5135
7 pages
Ttoimage Merged
No ratings yet
Ttoimage Merged
57 pages
Meta
No ratings yet
Meta
17 pages
Image Generation Using: Generative Ai
No ratings yet
Image Generation Using: Generative Ai
20 pages
AI Text-to-Image Generator Guide
No ratings yet
AI Text-to-Image Generator Guide
12 pages
Text-to-Image Synthesis With Generative Models Met
No ratings yet
Text-to-Image Synthesis With Generative Models Met
16 pages
Image Generation From Caption
No ratings yet
Image Generation From Caption
10 pages
Ernie-V LG: U G P - B V - L G: I Nified Enerative RE Training For Idirectional Ision Anguage Eneration
No ratings yet
Ernie-V LG: U G P - B V - L G: I Nified Enerative RE Training For Idirectional Ision Anguage Eneration
15 pages
AI Image Generation
No ratings yet
AI Image Generation
12 pages
Tivgan: Text To Image To Video Generation With Step-By-Step Evolutionary Generator
No ratings yet
Tivgan: Text To Image To Video Generation With Step-By-Step Evolutionary Generator
10 pages
Title Aproval Page
No ratings yet
Title Aproval Page
1 page
Text To Image Synthesis Using Generative Adversarial Networks
No ratings yet
Text To Image Synthesis Using Generative Adversarial Networks
10 pages
Sample Report PDF
No ratings yet
Sample Report PDF
25 pages
Photographic Text-to-Image Synthesis With A Hierarchically-Nested Adversarial Network
No ratings yet
Photographic Text-to-Image Synthesis With A Hierarchically-Nested Adversarial Network
10 pages
Cycle-Consistent Inverse GAN For Text-to-Image Synthesis - 3474085.3475226
No ratings yet
Cycle-Consistent Inverse GAN For Text-to-Image Synthesis - 3474085.3475226
2 pages
Rishab Paper Final
No ratings yet
Rishab Paper Final
7 pages
Text-to-Image Synthesis With Generative Models Methods Datasets Performance Metrics Challenges and Future Direction Basiv
No ratings yet
Text-to-Image Synthesis With Generative Models Methods Datasets Performance Metrics Challenges and Future Direction Basiv
16 pages
NM Narash
No ratings yet
NM Narash
6 pages
Text-To-Image Generation Using Generative AI
No ratings yet
Text-To-Image Generation Using Generative AI
5 pages
Satgan Paper
No ratings yet
Satgan Paper
17 pages
Final SRS
No ratings yet
Final SRS
10 pages
Development and Deployment of A Generative Model-Based Framework For Text To Photorealistic Image Generation
No ratings yet
Development and Deployment of A Generative Model-Based Framework For Text To Photorealistic Image Generation
16 pages
From Words To Pictures Artificial Intelligence Based Art Generator
No ratings yet
From Words To Pictures Artificial Intelligence Based Art Generator
9 pages
Ai Image Generator
No ratings yet
Ai Image Generator
37 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
10 pages
Mirrorgan: Learning Text-To-Image Generation by Redescription
No ratings yet
Mirrorgan: Learning Text-To-Image Generation by Redescription
10 pages
Synthesizing Visual Realities Design and Implementation of A Text To Image Synthesizer Leveraging Spatial Transformer Generative Adversarial Networks
No ratings yet
Synthesizing Visual Realities Design and Implementation of A Text To Image Synthesizer Leveraging Spatial Transformer Generative Adversarial Networks
5 pages
Text-to-Image Generation for Interior Design
No ratings yet
Text-to-Image Generation for Interior Design
61 pages
Saw Gan
No ratings yet
Saw Gan
11 pages
Project Proposal
No ratings yet
Project Proposal
22 pages
SanjanaSademba 2205348.
No ratings yet
SanjanaSademba 2205348.
8 pages
Image Caption
No ratings yet
Image Caption
16 pages
GANs for Creative Photomontages
No ratings yet
GANs for Creative Photomontages
9 pages
Deep Learning Based Text To Image Genera
No ratings yet
Deep Learning Based Text To Image Genera
6 pages
Conference
No ratings yet
Conference
11 pages
Deep Neural Architecture For Natural Language Image Synthesis Fortamil Text Using BASEGAN and Hybrid Super Resolution GAN (HSRGAN)
No ratings yet
Deep Neural Architecture For Natural Language Image Synthesis Fortamil Text Using BASEGAN and Hybrid Super Resolution GAN (HSRGAN)
12 pages
Fin Irjmets1689950550
No ratings yet
Fin Irjmets1689950550
5 pages
DataMining Reflection Paper
No ratings yet
DataMining Reflection Paper
2 pages
PHD Thesis Deep Learning
100% (3)
PHD Thesis Deep Learning
8 pages
Comprehensive AI & ML Course - From Beginner To Gen...
No ratings yet
Comprehensive AI & ML Course - From Beginner To Gen...
5 pages
Battery Management System To Estimate Battery Agin
No ratings yet
Battery Management System To Estimate Battery Agin
15 pages
Amharic Hate Speech Detection on Facebook
No ratings yet
Amharic Hate Speech Detection on Facebook
12 pages
MLS 1 - Presentation
No ratings yet
MLS 1 - Presentation
11 pages
Artificial Intelligence and Data Driven Optimization of Internal Combustion Engines 1st Edition Jihad Badra (Editor) Instant Download
No ratings yet
Artificial Intelligence and Data Driven Optimization of Internal Combustion Engines 1st Edition Jihad Badra (Editor) Instant Download
51 pages
Adaline and Delta Learning Rule
No ratings yet
Adaline and Delta Learning Rule
18 pages
Deep Learning for ADR Prediction
No ratings yet
Deep Learning for ADR Prediction
36 pages
Using Keras and Deep Q-Network To Play FlappyBird - Ben Lau PDF
No ratings yet
Using Keras and Deep Q-Network To Play FlappyBird - Ben Lau PDF
21 pages
Module-02 AIML NOTES
No ratings yet
Module-02 AIML NOTES
29 pages
Video Diffusion Tutorial Prof Mike Shou NUS 2023 Dec 15
No ratings yet
Video Diffusion Tutorial Prof Mike Shou NUS 2023 Dec 15
274 pages
Ranked L - S - Turning A Formula Into A Strategy
No ratings yet
Ranked L - S - Turning A Formula Into A Strategy
8 pages
AWS - Data Flow Poster - Long - Final
No ratings yet
AWS - Data Flow Poster - Long - Final
1 page
Automated Algorithms vs. Data Scientists
No ratings yet
Automated Algorithms vs. Data Scientists
5 pages
2025 KINGS RCA Admission Guideline
No ratings yet
2025 KINGS RCA Admission Guideline
16 pages
A IEEE Topics 2024-25
No ratings yet
A IEEE Topics 2024-25
3 pages
Kukbit Internship Projects
No ratings yet
Kukbit Internship Projects
14 pages
PE 24 2024 INIT - en
No ratings yet
PE 24 2024 INIT - en
419 pages
Generative AI
No ratings yet
Generative AI
17 pages
Human-AI Synergy in Investment
No ratings yet
Human-AI Synergy in Investment
4 pages
MSEIJ
No ratings yet
MSEIJ
7 pages
Prospectus 2022
No ratings yet
Prospectus 2022
1 page
B.voc Information Technology - Final
No ratings yet
B.voc Information Technology - Final
45 pages
Implementing AI As Cyber IoT Devices The House Valuation Example
No ratings yet
Implementing AI As Cyber IoT Devices The House Valuation Example
9 pages
Yonas Kenenisa Defar
No ratings yet
Yonas Kenenisa Defar
103 pages
Future Trends in Internet Technologies
No ratings yet
Future Trends in Internet Technologies
5 pages
Project Report Final Black Book
No ratings yet
Project Report Final Black Book
40 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
2 pages

Group 8

Uploaded by

Group 8

Uploaded by

odisha university of

technology and research

To use LSTM encoders

To integrate Conditioning Augmentation (CA) for stabilizing training

Datasets Used: MNIST Handswritten Letters, CUB Birds, Oxford

Graphical Representation of Generator Loss and Discriminator

You might also like