0% found this document useful (0 votes)
46 views14 pages

Image Captioning: - A Deep Learning Approach

The document discusses image captioning using deep learning. Image captioning generates textual descriptions of images using both computer vision and natural language processing. It combines image and text processing to build useful applications. Some applications include image search tools, guidance devices, self-driving cars, and web development. The document outlines the technologies used, including Keras with TensorFlow, pretrained ResNet50 models, CNNs, LSTMs, RNNs and the Flickr_8K dataset. It shows the implementation, output, accuracy predictions over epochs, and future applications before concluding.

Uploaded by

Pallavi Bharti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views14 pages

Image Captioning: - A Deep Learning Approach

The document discusses image captioning using deep learning. Image captioning generates textual descriptions of images using both computer vision and natural language processing. It combines image and text processing to build useful applications. Some applications include image search tools, guidance devices, self-driving cars, and web development. The document outlines the technologies used, including Keras with TensorFlow, pretrained ResNet50 models, CNNs, LSTMs, RNNs and the Flickr_8K dataset. It shows the implementation, output, accuracy predictions over epochs, and future applications before concluding.

Uploaded by

Pallavi Bharti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

IMAGE CAPTIONING

- A DEEP LEARNING APPROACH

Presented By: Guided By:


Pornima Nikam (18141101)
Prof. C. P. Garware
Priyanka Gundesha (18141106)
Pallavi Bharti (18141112)
Introduction
Image Captioning is the process of generating textual
description of an image. It uses both Natural
Language Processing and Computer Vision to
generate the captions. It is a multi modal topic where
we will combine both image and text processing to
build a useful Deep Learning application.
Motivation 01 02
Image Search Guidance
Tool Device

We must first understand


how important it is to real 03 04
world scenario. Let us see Self Driving Web
few applications where this Cars Development
model can be a solution
Technologies Used
Platform, Tools and Dataset

01 02 03 04 05 06
Kaggle Keras with Pre-trained Flicker_8K CNN LSTM,
Platform tensorflow ResNet50 Dataset RNN
as backend model
CNN Convolutional
Network
Neural

A Convolutional Neural Network


(CNN) is a Deep Learning
algorithm which can take in an
input image, assign importance to
various aspects/objects in the
image and be able to differentiate
one from the other.
Recurrent Neural Network(RNN)
RNN are a type of neural network where
the output from previous step are
fed as input to the current step.
Recurrent Neural Network Main feature of RNN is Hidden
state, which remembers some
information about a sequence.
Resnet is short name for Residual
ResNet50 Network that supports Residual
Learning. The 50 indicates the
Residual Network
number of layers that it has. In
residual learning, instead of trying
to learn some features, try to learn
some residual. Residual can be
simply understood as subtraction of
feature learned from input of that
layer.
Flickr_8K LSTM

It contains 8000 images, For generating the captions, we


most of them featuring make use of Long Short-Term
people and animals in a state Memory (LSTM) networks.
of action. Each image is LSTMs are a variant of
provided with five different Recurrent Neural Networks
captions. which are widely used in
Natural Language Processing.
Implementation
Output
Accuracy and Predictions
30 75

20 50
Accuracy

Accuracy
10 25

0
0 46 47 48 49 50
2 3 4 5 6 7

Epoch Epoch
Future Scope

Monitoring Guidance Self Driving


Device Tool Cars

Speech Web
Conversion Application
Conclusion

Thus we have implemented a deep learning


approach for the captioning of images. The
sequential API of Keras was used with
Tenserflow as a backend to implement a
deep learning architecture.

You might also like