0% found this document useful (0 votes)

86 views24 pages

19L038 - Deep Learning - Assignment Presentation

In navigating and understanding an outdoor environment, our world often requires the ability to see. People with visual impairments are, therefore, faced with significant challenges in exploring these environments. Deep learning has the potential to alleviate part of the frustrations they face. In this project, we assess the effectiveness of using deep learning to assist people with visual impairments.

Uploaded by

19L147 - VIBOOSITHASRI N S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views24 pages

19L038 - Deep Learning - Assignment Presentation

Uploaded by

19L147 - VIBOOSITHASRI N S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

A GUIDE FOR VISUALLY IMPAIRED PEOPLE

19L038 – DEEP LEARNING

VIBOOSITHASRI N S (19L147)
SHARATH S (19L247)

Presentation submitted in partial fulfillment of the requirements for the degree of

BACHELOR OF ENGINEERING

Branch: ELECTRONICS AND COMMUNICATION

ENGINEERING
of Anna University

NOVEMBER 2022

PSG COLLEGE OF TECHNOLOGY

(Autonomous Institution)

COIMBATORE – 641 004

TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

CONTENTS ii
ABSTRACT iii

1. Introduction ................................................................................... 1

1.1 Problem Statement ............................................................................ 1

1.2 Approach To The Problem Statement......................................... 1

2. Dataset ......................................................................................................... 3

2.1 Why Flickr8k dataset? ..................................................................... 3

2.2 Understanding the data ................................................................... 4

2.3 How to Featurize images? .............................................................. 4

3. Caption Preprocessing ............................................................................ 6

3.1 Sequential Data preparation ........................................................ 6

4. Code .............................................................................................................. 7

5. Conclusion .................................................................................................. 20

6. References ................................................................................................... 21

ii
ABSTRACT
In navigating and understanding an outdoor environment, our world
often requires the ability to see. People with visual impairments are,
therefore, faced with significant challenges in exploring these environments.
Deep learning has the potential to alleviate part of the frustrations they face.
In this project, we assess the effectiveness of using deep learning to assist
people with visual impairments.

Visually impaired and blind people frequently have no knowledge of

outdoor obstacles and need guidance in order to avoid colliding risks. The
aim of this project is to develop a mobile-based navigation system for helping
visually impaired people in outdoor navigation. The proposed system will be
able to reduce obstacle collision risks by enabling users to walk outside
smoothly with voice awareness. The currently used systems for navigating
the visually impaired have several drawbacks such as cost, dependency,
and usability.

The suggested solution includes a mobile-based camera vision system

equipped with internet connectivity to build an independent application for
indoor and outdoor navigation. Moreover, the system has high usability to
navigate visually impaired people in unfamiliar environments such as a park,
roads, and so on. In the presented work, deep learning algorithms are
employed for image recognition which will be implemented as a mobile
navigation application. The boosted RNN, LSTM-based visual object
recognition model is used to implement the system. The suggested
smartphone-based system is not restricted to the defined outdoor
environments and does not depend on any other positioning system.
Therefore, the proposed solution is not limited to any specific environment
and provides voice aids about surrounding obstacles for users.

iii
1. INTRODUCTION

1.1 PROBLEM ADDRESSED

We see blind people difficult to walk because they cannot see the
obstacles. We suggest a solution to help my visually impaired people to
identify obstacles when they come out alone with the help of deep learning
techniques.

Here, we propose an approach that:

1. uses a smartphone to capture real-time images as input of a deep

learning model for image recognition;
2. provides a voice guide to let visually impaired people realize the
existence of possible obstacles on the path.

1.2 APPROACH TO THE PROBLEM STATEMENT

We will tackle this problem using an Encoder-Decoder model. Here our

encoder model will combine both the encoded form of the image and the
encoded form of the text caption and feed to the decoder.

Our model will treat CNN as the ‘image model’ and the RNN/LSTM as the
‘language model’ to encode the text sequences of varying lengths. The
vectors resulting from both encodings are then merged and processed by a
Dense layer to make a final prediction.

We will create a merge architecture in order to keep the image out of the
RNN/LSTM and thus be able to train the part of the neural network that
handles images and the part that handles language separately, using images
and sentences from separate training sets.

In our merge model, a different representation of the image can be combined

with the final RNN state before each prediction.

1
The above diagram is a visual representation of our approach.

The merging of image features with text encodings to a later stage in the
architecture is advantageous and can generate better quality captions with
smaller layers than the traditional inject architecture (CNN as encoder and
RNN as a decoder).

To encode our image features we will make use of transfer learning. There
are a lot of models that we can use like VGG-16, InceptionV3, ResNet, etc.
We will make use of the inceptionV3 model which has the least number of
training parameters in comparison to the others and also outperforms them.

To encode our text sequence we will map every word to a 200-dimensional

vector. For this will use a pre-trained Glove model. This mapping will be done
in a separate layer after the input layer called the embedding layer.

To generate the caption we will be using two popular methods which are
Greedy Search and Beam Search. These methods will help us in picking the
best words to accurately define the image.

2
2. DATASET

A number of datasets are used for training, testing, and evaluation of

the image captioning methods. The datasets differ in various perspectives
such as the number of images, the number of captions per image, the format
of the captions, and image size. Three datasets: Flickr8k, Flickr30k, and MS
COCO Dataset are popularly used.

For training our model I’m using Flickr8K dataset. It consists of 8000
unique images and each image will be mapped to five different sentences
which will describe the image. By associating each image with multiple,
independently produced sentences, the dataset captures some of the
linguistic variety that can be used to describe the same image.

2.1 Why Flickr8k dataset?

1. It is small in size. So, the model can be trained easily on low-end

laptops/desktops...
2. Data is properly labelled. For each image 5 captions are provided.
3. The dataset is available for free.

Flickr8k is a good starting dataset as it is small in size and can be trained

easily on low-end laptops/desktops using a CPU.
Our dataset structure is as follows:-

• Flick8k/
o Flick8k_Dataset/ :- contains the 8000 images

o Flick8k_Text/
▪ [Link]:- contains the image id along with the 5
captions
▪ [Link]:- contains the training image id’s
▪ [Link]:- contains the test image id’s

3
2.2 Understanding the data

Data pre-processing and cleaning is an important part of the whole

model building process. Understanding the data helps us to build more
accurate models.

After extracting zip files you will find below folders…

Flickr8k_Dataset: Contains a total of 8092 images in JPEG format

with different shapes and sizes. Of which 6000 are used for training, 1000
for test and 1000 for development.

Flickr8k_text: Contains text files describing train_set, test_set.

[Link] contains 5 captions for each image i.e. total 40460
captions.

We have mainly two types of data:

1. Images
2. Captions (Text)

The size of the training vocabulary is 7371. Since the words which occur
very less does not carry much information. We are considering words with
a frequency of more than 10.

2.3 How to Featurize images?

There are already pre-trained models on standard Imagenet dataset

provided in keras. Imagenet is a standard dataset used for classification. It
contains more than 14 million images in the dataset, with little more than 21
thousand groups or classes.

We will be using InceptionV3 by google.

4
Why Inception…?

1. It has a smaller weight file i.e approx 96 MB

2. It is faster to train.

We will remove soft-max layer form inception as we want to use it as a

feature extractor. For a given input image inception gives us 2048
dimensional feature extracted vector.

For every training image, we are resizing it to (299,299) and then passing it
to Inception for feature extraction.

5
3. Caption Preprocessing
Each image in the dataset is provided with 5 captions. Captions are
read from [Link] file and stored in dictionary k:v where k = image
id and value =[ list of caption ]. Since there are 5 captions for each image
and we have preprocessed and encoded them in below format

“startseq “ + caption + “ endseq”

The reason behind startseq and endseq is,

startseq : Will act as our first word when feature extracted image
vector is fed to decoder. It will kick-start the caption generation process.

endseq : This will tell the decoder when to stop. We will stop predicting
word as soon as endseq appears or we have predicted all words from train
dictionary whichever comes first.

3.1 Sequential Data preparation

First fed the image to inception and get feature extracted 2048 dimensional
vector.

Caption: startseq a bunch of people swimming in water endseq.

6
4. CODE

Step 1:- Import the required libraries

Here we will be making use of the Keras library for creating our model and
training it. You can make use of Google Colab or Kaggle notebooks if you
want a GPU to train it.

import numpy as np
from numpy import array
import [Link] as plt
%matplotlib inline

import string
import os
import glob
from PIL import Image
from time import time

from keras import Input, layers

from keras import optimizers
from [Link] import Adam
from [Link] import sequence
from [Link] import image
from [Link] import Tokenizer
from [Link] import pad_sequences
from [Link] import LSTM, Embedding, Dense, Activation, Flatten, Reshape,
Dropout
from [Link] import Bidirectional
from [Link] import add
from [Link].inception_v3 import InceptionV3
from [Link].inception_v3 import preprocess_input
from [Link] import Model
from [Link] import to_categorical

Step 2:- Data loading and Preprocessing

We will define all the paths to the files that we require and save the images
id and their captions.

7
token_path = //PROVIDE PATH
train_images_path = //PROVIDE PATH
test_images_path = //PROVIDE PATH
images_path = //PROVIDE PATH
glove_path = //PROVIDE PATH

doc = open(token_path,'r').read()
print(doc[:410])

So, we can see the format in which our image id’s and their captions are
stored. Next, we create a dictionary named “descriptions” which contains the
name of the image as keys and a list of the 5 captions for the corresponding
image as values.

descriptions = dict()
for line in [Link]('\n'):
tokens = [Link]()
if len(line) > 2:
image_id = tokens[0].split('.')[0]
image_desc = ' '.join(tokens[1:])
if image_id not in descriptions:
descriptions[image_id] = list()
descriptions[image_id].append(image_desc)

Now let’s perform some basic text clean to get rid of punctuation and convert
our descriptions to lowercase.

table = [Link]('', '', [Link])

for key, desc_list in [Link]():
for i in range(len(desc_list)):
desc = desc_list[i]
desc = [Link]()
desc = [[Link]() for word in desc]
desc = [[Link](table) for w in desc]
desc_list[i] = ' '.join(desc)

8
Next, we create a vocabulary of all the unique words present across all the
8000*5 (i.e. 40000) image captions in the data set. We have 8828 unique
words across all the 40000 image captions.

vocabulary = set()
for key in [Link]():
[[Link]([Link]()) for d in descriptions[key]]
print('Original Vocabulary Size: %d' % len(vocabulary))

Now let’s save the image id’s and their new cleaned captions in the same
format as the [Link] file:-

lines = list()
for key, desc_list in [Link]():
for desc in desc_list:
[Link](key + ' ' + desc)
new_descriptions = '\n'.join(lines)

Next, we load all the 6000 training image id’s in a variable train from the
‘Flickr_8k.[Link]’ file:-

doc = open(train_images_path,'r').read()
dataset = list()
for line in [Link]('\n'):
if len(line) > 1:
identifier = [Link]('.')[0]
[Link](identifier)
train = set(dataset)

Now we save all the training and testing images in train_img and test_img
lists respectively:-

Now, we load the descriptions of the training images into a dictionary.

However, we will add two tokens in every caption, ‘startseq’ and ‘endseq’:-

train_descriptions = dict()
for line in new_descriptions.split('\n'):
tokens = [Link]()
image_id, image_desc = tokens[0], tokens[1:]
if image_id in train:
if image_id not in train_descriptions:
train_descriptions[image_id] = list()
desc = 'startseq ' + ' '.join(image_desc) + ' endseq'
train_descriptions[image_id].append(desc)

Create a list of all the training captions:-

all_train_captions = []
for key, val in train_descriptions.items():
for cap in val:
all_train_captions.append(cap)

To make our model more robust we will reduce our vocabulary to only those
words which occur at least 10 times in the entire corpus.

word_count_threshold = 10
word_counts = {}
nsents = 0
for sent in all_train_captions:
nsents += 1
for w in [Link](' '):
word_counts[w] = word_counts.get(w, 0) + 1
vocab = [w for w in word_counts if word_counts[w] >= word_count_threshold]
print('Vocabulary = %d' % (len(vocab)))

10
Now we create two dictionaries to map words to an index and vice versa.
Also, we append 1 to our vocabulary since we append 0’s to make all
captions of equal length.

ixtoword = {}
wordtoix = {}
ix = 1
for w in vocab:
wordtoix[w] = ix
ixtoword[ix] = w
ix += 1
vocab_size = len(ixtoword) + 1

Hence now our total vocabulary size is 1660.

We also need to find out what the max length of a caption can be since we
cannot have captions of arbitrary length.

all_desc = list()
for key in train_descriptions.keys():
[all_desc.append(d) for d in train_descriptions[key]]
lines = all_desc
max_length = max(len([Link]()) for d in lines)

print('Description Length: %d' % max_length)

Step 3:- Glove Embeddings

Word vectors map words to a vector space, where similar words are
clustered together and different words are separated. The advantage of
using Glove over Word2Vec is that GloVe does not just rely on the local
context of words but it incorporates global word co-occurrence to obtain word
vectors.

The basic premise behind Glove is that we can derive semantic relationships
between words from the co-occurrence matrix. For our model, we will map
all the words in our 38-word long caption to a 200-dimension vector using
Glove.

11
embeddings_index = {}
f = open([Link](glove_path, '[Link]'), encoding="utf-8")
for line in f:
values = [Link]()
word = values[0]
coefs = [Link](values[1:], dtype='float32')
embeddings_index[word] = coefs

Next, we make the matrix of shape (1660,200) consisting of our vocabulary

and the 200-d vector.

embedding_dim = 200
embedding_matrix = [Link]((vocab_size, embedding_dim))
for word, i in [Link]():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
embedding_matrix[i] = embedding_vector

Step 4:- Model Building and Training

As you have seen from our approach we have opted for transfer
learning using InceptionV3 network which is pre-trained on the ImageNet
dataset.

model = InceptionV3(weights='imagenet')

We must remember that we do not need to classify the images here, we only
need to extract an image vector for our images. Hence we remove the
softmax layer from the inceptionV3 model.

model_new = Model([Link], [Link][-2].output)

Since we are using InceptionV3 we need to pre-process our input before

feeding it into the model. Hence we define a preprocess function to reshape
the images to (299 x 299) and feed to the preprocess_input() function of
Keras.

12
def preprocess(image_path):
img = image.load_img(image_path, target_size=(299, 299))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
return x

Now we can go ahead and encode our training and testing images, i.e extract
the images vectors of shape (2048,)

def encode(image):
image = preprocess(image)
fea_vec = model_new.predict(image)
fea_vec = [Link](fea_vec, fea_vec.shape[1])
return fea_vec

encoding_train = {}
for img in train_img:
encoding_train[img[len(images_path):]] = encode(img)
train_features = encoding_train

encoding_test = {}
for img in test_img:
encoding_test[img[len(images_path):]] = encode(img)

Now let’s define our model.

We are creating a Merge model where we combine the image vector and the
partial caption. Therefore our model will have 3 major steps:

1. Processing the sequence from the text

2. Extracting the feature vector from the image
3. Decoding the output using softmax by concatenating the above two
layers

13
inputs1 = Input(shape=(2048,))
fe1 = Dropout(0.5)(inputs1)
fe2 = Dense(256, activation='relu')(fe1)

inputs2 = Input(shape=(max_length,))
se1 = Embedding(vocab_size, embedding_dim, mask_zero=True)(inputs2)
se2 = Dropout(0.5)(se1)
se3 = LSTM(256)(se2)

decoder1 = add([fe2, se3])

decoder2 = Dense(256, activation='relu')(decoder1)
outputs = Dense(vocab_size, activation='softmax')(decoder2)

model = Model(inputs=[inputs1, inputs2], outputs=outputs)

[Link]()

Input_3 is the partial caption of max length 34 which is fed into the
embedding layer. This is where the words are mapped to the 200-d Glove
embedding. It is followed by a dropout of 0.5 to avoid overfitting. This is then
fed into the LSTM for processing the sequence.

Input_2 is the image vector extracted by our InceptionV3 network. It is

followed by a dropout of 0.5 to avoid overfitting and then fed into a Fully
Connected layer.

Both the Image model and the Language model are then concatenated by
adding and fed into another Fully Connected layer. The layer is a softmax
layer that provides probabilities to our 1660 word vocabulary.

Step 5:- Model Training

Before training the model we need to keep in mind that we do not want
to retrain the weights in our embedding layer (pre-trained Glove vectors).

[Link][2].set_weights([embedding_matrix])
[Link][2].trainable = False

14
Next, compile the model using Categorical_Crossentropy as the Loss
function and Adam as the optimizer.

[Link](loss='categorical_crossentropy', optimizer='adam')

Since our dataset has 6000 images and 40000 captions we will create a
function that can train the data in batches.

def data_generator(descriptions, photos, wordtoix, max_length,

num_photos_per_batch):
X1, X2, y = list(), list(), list()
n=0
# loop for ever over images
while 1:
for key, desc_list in [Link]():
n+=1
# retrieve the photo feature
photo = photos[key+'.jpg']
for desc in desc_list:
# encode the sequence
seq = [wordtoix[word] for word in [Link](' ') if word in wordtoix]
# split one sequence into multiple X, y pairs
for i in range(1, len(seq)):
# split into input and output pair
in_seq, out_seq = seq[:i], seq[i]
# pad input sequence
in_seq = pad_sequences([in_seq], maxlen=max_length)[0]
# encode output sequence
out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
# store
[Link](photo)
[Link](in_seq)
[Link](out_seq)

if n==num_photos_per_batch:
yield ([array(X1), array(X2)], array(y))
X1, X2, y = list(), list(), list()
n=0

15
Next, let’s train our model for 30 epochs with batch size of 3 and 2000 steps
per epoch. The complete training of the model took 1 hour and 40 minutes
on the Kaggle GPU.

epochs = 30
batch_size = 3
steps = len(train_descriptions)//batch_size

generator = data_generator(train_descriptions, train_features, wordtoix, max_length,

batch_size)
[Link](generator, epochs=epochs, steps_per_epoch=steps, verbose=1)

Step 6:- Greedy and Beam Search

As the model generates a 1660 long vector with a probability distribution

across all the words in the vocabulary we greedily pick the word with the
highest probability to get the next word prediction. This method is called
Greedy Search.

def greedySearch(photo):
in_text = 'startseq'
for i in range(max_length):
sequence = [wordtoix[w] for w in in_text.split() if w in wordtoix]
sequence = pad_sequences([sequence], maxlen=max_length)
yhat = [Link]([photo,sequence], verbose=0)
yhat = [Link](yhat)
word = ixtoword[yhat]
in_text += ' ' + word
if word == 'endseq':
break

final = in_text.split()
final = final[1:-1]
final = ' '.join(final)
return final

Beam Search is where we take top k predictions, feed them again in the
model and then sort them using the probabilities returned by the model.

16
So, the list will always contain the top k predictions and we take the one with
the highest probability and go through it till we encounter ‘endseq’ or reach
the maximum caption length.

def beam_search_predictions(image, beam_index = 3):

start = [wordtoix["startseq"]]
start_word = [[start, 0.0]]
while len(start_word[0][0]) < max_length:
temp = []
for s in start_word:
par_caps = sequence.pad_sequences([s[0]], maxlen=max_length,
padding='post')
preds = [Link]([image,par_caps], verbose=0)
word_preds = [Link](preds[0])[-beam_index:]
# Getting the top <beam_index>(n) predictions and creating a
# new list so as to put them via the model again
for w in word_preds:
next_cap, prob = s[0][:], s[1]
next_cap.append(w)
prob += preds[0][w]
[Link]([next_cap, prob])

start_word = temp
# Sorting according to the probabilities
start_word = sorted(start_word, reverse=False, key=lambda l: l[1])
# Getting the top words
start_word = start_word[-beam_index:]

start_word = start_word[-1][0]
intermediate_caption = [ixtoword[i] for i in start_word]
final_caption = []

for i in intermediate_caption:
if i != 'endseq':
final_caption.append(i)
else:
break
final_caption = ' '.join(final_caption[1:])
return final_caption

17
Step 7:- Evaluation

Let’s now test our model on different images and see what captions it
generates. We will also look at the different captions generated by Greedy
search and Beam search with different k values.

First, we will take a look at the example image we saw at the start of the
article. We saw that the caption for the image was ‘A black dog and a brown
dog in the snow’. Let’s see how our model compares.

pic = '2398605966_1d0c9e6a20.jpg'
image = encoding_test[pic].reshape((1,2048))
x=[Link](images_path+pic)
[Link](x)
[Link]()

print("Greedy Search:",greedySearch(image))
print("Beam Search, K = 3:",beam_search_predictions(image, beam_index = 3))
print("Beam Search, K = 5:",beam_search_predictions(image, beam_index = 5))
print("Beam Search, K = 7:",beam_search_predictions(image, beam_index = 7))
print("Beam Search, K = 10:",beam_search_predictions(image, beam_index = 10))

OUTPUT:

18
You can see that our model was able to identify two dogs in the snow. But at
the same time, it misclassified the black dog as a white dog. Nevertheless, it
was able to form a proper sentence to describe the image as a human would.

pic = list(encoding_test.keys())[1]
image = encoding_test[pic].reshape((1,2048))
x=[Link](images_path+pic)
[Link](x)
[Link]()

print("Greedy:",greedySearch(image))
print("Beam Search, K = 3:",beam_search_predictions(image, beam_index = 3))
print("Beam Search, K = 5:",beam_search_predictions(image, beam_index = 5))
print("Beam Search, K = 7:",beam_search_predictions(image, beam_index = 7))

OUTPUT:

Here we can see that we accurately described what was happening in the
image. You will also notice the captions generated are much better using
Beam Search than Greedy Search.

19
5. CONCLUSION
What we have developed today is just the start. There has been a lot of
research on this topic and you can make much better Image caption
generators.

Things you can implement to improve your model:-

1. Make use of the larger datasets, especially the MS COCO dataset or

the Stock3M dataset which is 26 times larger than MS COCO.
2. Making use of an evaluation metric to measure the quality of machine-
generated text like BLEU (Bilingual evaluation understudy).
3. Implementing an Attention-Based model:- Attention-based
mechanisms are becoming increasingly popular in deep learning
because they can dynamically focus on the various parts of the input
image while the output sequences are being produced.
4. Image-based factual descriptions are not enough to generate high-
quality captions. We can add external knowledge in order to generate
attractive image captions. Therefore, working on Open-domain
datasets can be an interesting prospect.

We have incorporated the field of Computer Vision and Natural Language

Processing together and implement a method like Beam Search that is able
to generate better descriptions than the standard.

There is still a lot to improve right from the datasets used to the
methodologies implemented.

20
6. REFERENCES

• [Link]
ning_based_Object_Detection_and_Recognition_Framework_for
_the_Visually-Impaired
• [Link]
image-caption-generator-using-keras/
• [Link]
[Link]
• [Link]
• [Link]
• [Link]
image-caption-generator-using-keras/
• [Link]
flickr8k-dataset-bleu-4bcba0b52926
• [Link]
help-visually-impaired-people-4fcdc76816b2

Image Captioning For Assisting The Visually Impaired
No ratings yet
Image Captioning For Assisting The Visually Impaired
10 pages
Image Caption Generator Report
No ratings yet
Image Caption Generator Report
27 pages
Image Captioning For The Visually Impaired
No ratings yet
Image Captioning For The Visually Impaired
5 pages
Scene Description
No ratings yet
Scene Description
6 pages
Image Caption Generator Research Paper
No ratings yet
Image Caption Generator Research Paper
4 pages
Android App for Blind: Real-Time Object Detection
No ratings yet
Android App for Blind: Real-Time Object Detection
11 pages
Image Caption Generator
No ratings yet
Image Caption Generator
16 pages
Building A Voice Based Image Caption Generator With Deep Learning
No ratings yet
Building A Voice Based Image Caption Generator With Deep Learning
6 pages
Deep Learning for Image Captioning
No ratings yet
Deep Learning for Image Captioning
17 pages
BTP Report
No ratings yet
BTP Report
27 pages
Multi-Modal Vision with GPT-4o
No ratings yet
Multi-Modal Vision with GPT-4o
17 pages
Implementation of Simple and Efficient P
No ratings yet
Implementation of Simple and Efficient P
8 pages
ImageCaptioning (BLIP) Final
No ratings yet
ImageCaptioning (BLIP) Final
90 pages
Blind Assistance: Real-Time Object Detection
No ratings yet
Blind Assistance: Real-Time Object Detection
16 pages
Image Captioning with Deep Learning
No ratings yet
Image Captioning with Deep Learning
85 pages
AI Image Captioning Guide
No ratings yet
AI Image Captioning Guide
10 pages
Internship Report (Sanjay Final)
No ratings yet
Internship Report (Sanjay Final)
45 pages
Imagecaptionusing CNNand LSTM
No ratings yet
Imagecaptionusing CNNand LSTM
11 pages
AI Optics: Object Recognition and Caption Generation For Blinds Using Deep Learning Methodologies
No ratings yet
AI Optics: Object Recognition and Caption Generation For Blinds Using Deep Learning Methodologies
6 pages
Deep Learning Image Captioning
No ratings yet
Deep Learning Image Captioning
6 pages
0th Review
No ratings yet
0th Review
18 pages
Image Captioning with Deep Learning
No ratings yet
Image Captioning with Deep Learning
5 pages
Final Project Report
No ratings yet
Final Project Report
18 pages
AI-Based Image Captioning Techniques
No ratings yet
AI-Based Image Captioning Techniques
17 pages
NLP UNIT 5c
No ratings yet
NLP UNIT 5c
33 pages
Automatic Image Caption Generation System
No ratings yet
Automatic Image Caption Generation System
4 pages
Automated Neural Image Caption Generator For Visually Impaired People
No ratings yet
Automated Neural Image Caption Generator For Visually Impaired People
6 pages
15 Report PDF
No ratings yet
15 Report PDF
35 pages
Image Caption Generator Project Report
No ratings yet
Image Caption Generator Project Report
39 pages
Object Detection Aid for Visually Impaired
No ratings yet
Object Detection Aid for Visually Impaired
4 pages
Computer Science Internship Report
No ratings yet
Computer Science Internship Report
20 pages
1.thesis Book Omar
No ratings yet
1.thesis Book Omar
55 pages
Pehlivan 2019
No ratings yet
Pehlivan 2019
4 pages
Seed1.5-VL Technical Report
No ratings yet
Seed1.5-VL Technical Report
77 pages
A Report On Existing AI Work For Visually Impaired People: Ayesha Tariq
No ratings yet
A Report On Existing AI Work For Visually Impaired People: Ayesha Tariq
51 pages
Project Ideas
No ratings yet
Project Ideas
5 pages
Mobile App for Visually Impaired Users
No ratings yet
Mobile App for Visually Impaired Users
48 pages
Minor
No ratings yet
Minor
14 pages
MPP Report
No ratings yet
MPP Report
22 pages
ImageCaptioning (BLIP) Final
No ratings yet
ImageCaptioning (BLIP) Final
90 pages
Indian Sign Language Converter Using Convolutional Neural Networks
No ratings yet
Indian Sign Language Converter Using Convolutional Neural Networks
5 pages
Bengali Image Captioning with Deep Learning
No ratings yet
Bengali Image Captioning with Deep Learning
72 pages
Review 2
No ratings yet
Review 2
30 pages
Mini Project Report
No ratings yet
Mini Project Report
31 pages
Image Captioningforthe Visually Impaired 1
No ratings yet
Image Captioningforthe Visually Impaired 1
6 pages
Generating Caption From Images Using Flickr Image Dataset
No ratings yet
Generating Caption From Images Using Flickr Image Dataset
7 pages
RP Springer
No ratings yet
RP Springer
10 pages
Deep Learning Course Overview at aiQuest
No ratings yet
Deep Learning Course Overview at aiQuest
9 pages
Image Captioning
No ratings yet
Image Captioning
33 pages
DL Plagiarism Report
No ratings yet
DL Plagiarism Report
8 pages
Autonomous Car
100% (1)
Autonomous Car
12 pages
Final Report
No ratings yet
Final Report
74 pages
SYnopsis
No ratings yet
SYnopsis
5 pages
Batch 1 Project Book
No ratings yet
Batch 1 Project Book
73 pages
AI Digital Assistant for Image Captioning
50% (2)
AI Digital Assistant for Image Captioning
28 pages
Image Captioning Project Report
No ratings yet
Image Captioning Project Report
31 pages
SAAVIP: AI Assistant for the Visually Impaired
No ratings yet
SAAVIP: AI Assistant for the Visually Impaired
26 pages
Deep Recurrent Architecture Based Scene Description Generator For Visually Impaired
No ratings yet
Deep Recurrent Architecture Based Scene Description Generator For Visually Impaired
6 pages
Final - Done (1) 2.0
No ratings yet
Final - Done (1) 2.0
16 pages
Glacier Studies & Climate Change Training
No ratings yet
Glacier Studies & Climate Change Training
6 pages
Resource Gathering Guide
No ratings yet
Resource Gathering Guide
9 pages
741 Op-Amp Inverting & Non-Inverting Amplifier Design
No ratings yet
741 Op-Amp Inverting & Non-Inverting Amplifier Design
16 pages
Iec 60810 - Sau
No ratings yet
Iec 60810 - Sau
36 pages
Be Sharp With C# (Table of Contents)
100% (1)
Be Sharp With C# (Table of Contents)
12 pages
Plotting Data on a Cartesian Plane
No ratings yet
Plotting Data on a Cartesian Plane
2 pages
Mobile Computing
No ratings yet
Mobile Computing
126 pages
Active Downsampling For Binary Classification With An Imbalanced Dataset
No ratings yet
Active Downsampling For Binary Classification With An Imbalanced Dataset
7 pages
Grade 7 TVE Dressmaking Test 2
No ratings yet
Grade 7 TVE Dressmaking Test 2
2 pages
Obasa's Blog: Topic One: Output Devices
No ratings yet
Obasa's Blog: Topic One: Output Devices
1 page
Citoquininas Foloración Pitahaya
No ratings yet
Citoquininas Foloración Pitahaya
14 pages
Grade 8 - Phy - Circuits 1
No ratings yet
Grade 8 - Phy - Circuits 1
3 pages
Quantitative Aptitude Quantitative Aptitude Questions and Answers..
No ratings yet
Quantitative Aptitude Quantitative Aptitude Questions and Answers..
37 pages
320D Excavator BZP00001-UP (MACHINE) POWERED BY C6.4 Engine (SEBP5228 - 25) - Basic Search PDF
71% (7)
320D Excavator BZP00001-UP (MACHINE) POWERED BY C6.4 Engine (SEBP5228 - 25) - Basic Search PDF
11 pages
Done - Technical - Answers
No ratings yet
Done - Technical - Answers
2 pages
Feminin o Masculin o Singular Plural
No ratings yet
Feminin o Masculin o Singular Plural
13 pages
Indian Cricket Players Performance Chart
No ratings yet
Indian Cricket Players Performance Chart
6 pages
DEM/DTM/DSM Generation and Image Matching Techniques in Aerial Photogrammetry
No ratings yet
DEM/DTM/DSM Generation and Image Matching Techniques in Aerial Photogrammetry
30 pages
Sysmex White Paper Differential Diagnosis of Thrombocytopenia
No ratings yet
Sysmex White Paper Differential Diagnosis of Thrombocytopenia
5 pages
Intel's Innovation & Strategy Insights
No ratings yet
Intel's Innovation & Strategy Insights
14 pages
Ga 25 Motor
No ratings yet
Ga 25 Motor
1 page
Final Detailed Fermenter Design
100% (1)
Final Detailed Fermenter Design
24 pages
Patran 2019: Release Guide
100% (1)
Patran 2019: Release Guide
34 pages
QSHELL
100% (1)
QSHELL
154 pages
3000 Evolution User Manual Eng
No ratings yet
3000 Evolution User Manual Eng
51 pages
12-2324-SHW (Practiacal Work)
No ratings yet
12-2324-SHW (Practiacal Work)
4 pages
Grid Computing: Techniques and Applications Barry Wilkinson: 6.1 6.2 Job Submission
No ratings yet
Grid Computing: Techniques and Applications Barry Wilkinson: 6.1 6.2 Job Submission
30 pages
Functional Occlusion: Presented By-Dr. Ruchi Saxena Dept. of Orthodontics
50% (2)
Functional Occlusion: Presented By-Dr. Ruchi Saxena Dept. of Orthodontics
186 pages
Basic Maths Paper 0002 (Wavy Curve Modulus Test 2 - WA)
No ratings yet
Basic Maths Paper 0002 (Wavy Curve Modulus Test 2 - WA)
5 pages
IC Engine Lab Viva Questions
No ratings yet
IC Engine Lab Viva Questions
2 pages