0% found this document useful (0 votes)
93 views65 pages

Image Colorization Using CNNS: S VISHNUVARDHAN (Reg No: RA1511003010506) ANKIT PASAYAT (Reg No: RA1511003010693)

final

Uploaded by

Abhash Stha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views65 pages

Image Colorization Using CNNS: S VISHNUVARDHAN (Reg No: RA1511003010506) ANKIT PASAYAT (Reg No: RA1511003010693)

final

Uploaded by

Abhash Stha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

IMAGE COLORIZATION USING CNNS

A PROJECT REPORT

Submitted by

S VISHNUVARDHAN [Reg No: RA1511003010506]


ANKIT PASAYAT [Reg No: RA1511003010693]

Under the guidance of


Dr. R. I. Minu, Ph.D
(Asst Professor, Department of Computer Science & Engineering)

in partial fulfillment for the award of the degree


of

BACHELOR OF TECHNOLOGY
in

COMPUTER SCIENCE & ENGINEERING


of

FACULTY OF ENGINEERING AND TECHNOLOGY

S.R.M. Nagar, Kattankulathur, Kancheepuram District


MAY 2019
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
(Under Section 3 of UGC Act, 1956)

BONAFIDE CERTIFICATE

Certified that this project report titled “IMAGE COLORIZATION


USING CNNS” is the bonafide work of S VISHNUVARDHAN [Reg
No: RA1511003010506] , ANKIT PASAYAT [Reg No: RA1511003010693]
who carried out the project work under my supervision. Certified further,
that to the best of my knowledge the work reported herein does not form
any other project report or dissertation on the basis of which a degree or
award was conferred on an earlier occasion on this or any other candi-
date.

SIGNATURE SIGNATURE

Dr. R. I. Minu, Ph.D Dr. B. Amutha


GUIDE HEAD OF THE DEPARTMENT
Asst Professor Dept. of Computer Science & Engi-
Dept. of Computer Science & Engi- neering
neering

Signature of the Internal Examiner Signature of the External Examiner


Own Work Declaration
Department of Computer Science and Engineering

SRM Institute of Science & Technology

Own Work* Declaration Form

This sheet must be filled in (each box ticked to show that the condition has been met). It must be signed and dated
along with your student registration number and included with all assignments you submit – work will not be
marked unless this is done.

To be completed by the student for all assessments

Degree/ Course : B. Tech. / Computer Science and Engineering

Student Name : S Vishnuvardhan, Ankit Pasayat

Registration Number : RA1511003010506, RA1511003010693

Title of Work : Image Colorization Using CNNs

I / We hereby certify that this assessment compiles with the University’s Rules and Regulations relating to
Academic misconduct and plagiarism**, as listed in the University Website, Regulations, and the Education
Committee guidelines.

I / We confirm that all the work contained in this assessment is my / our own except where indicated, and that I / We
have met the following conditions:

• Clearly references / listed all sources as appropriate


• Referenced and put in inverted commas all quoted text (from books, web, etc)
• Given the sources of all pictures, data etc. that are not my own
• Not made any use of the report(s) or essay(s) of any other student(s) either past or present
• Acknowledged in appropriate places any help that I have received from others (e.g. fellow students,
technicians, statisticians, external sources)
• Compiled with any other plagiarism criteria specified in the Course handbook / University website

I understand that any false claim for this work will be penalized in accordance with the University policies and
regulations.

DECLARATION:
I am aware of and understand the University’s policy on Academic misconduct and plagiarism and I certify that
this assessment is my / our own work, except where indicated by referring, and that I have followed the good
academic practices noted above.

If you are working in a group, please write your registration numbers and sign with the date for every student in
your group.
ABSTRACT

This report presents a survey on the current technological processes that

are used to colorize images with little or no human intervention, us-

ing image processing and machine learning techniques. We also ana-

lyze the more recent deep learning based approaches, specifically those

using Convolutional Neural Networks (Convolutional Neural Network

(CNN)s), which are specially optimized mathematical models for image

processing and manipulation. By diving deep into the mathematical and

statistical theory behind the implementation of these models in our ref-

erence papers, we have managed to decipher the essence of the process

of colorization. We present a comparative study of our reference papers

and then propose changes that may theoretically increase the accuracy

and color-reproducibility of these models. Also, we present our findings

using our new system architecture and improved training methodology.


ACKNOWLEDGEMENT

We express our humble gratitude to Dr. Sandeep Sancheti, Vice Chancellor, SRM Insti-
tute of Science and Technology, for the facilities extended for the project work and his
continued support.
We extend our sincere thanks to Dr. C. Muthamizhchelvan, Director, Faculty of En-
gineering and Technology, SRM Institute of Science and Technology, for his invaluable
support.
We wish to thank Dr. B. Amutha, Professor & Head, Department of Computer
Science and Engineering, SRM Institute of Science and Technology, for her valuable
suggestions and encouragement throughout the period of the project work.
We are extremely grateful to our Academic Advisor Dr. K. Annapurani, Associate
Professor, Department of Computer Science and Engineering, SRM Institute of Science
and Technology, for her great support at all the stages of project work.
We would like to convey our thanks to our Panel Head, Dr. S.S. Sridhar, Profes-
sor, Department of Computer Science and Engineering, SRM Institute of Science and
Technology, for his inputs during the project reviews.
We register our immeasurable thanks to our Faculty Advisor, Mrs. R. Anita, Asst
Professor, Department of Computer Science and Engineering, SRM Institute of Science
and Technology, for leading and helping us to complete our course.
Our inexpressible respect and thanks to my guide, Dr. R.I. Minu, Asst Professor,
Department of Computer Science and Engineering, SRM Institute of Science and Tech-
nology, for providing me an opportunity to pursue my project under his mentorship.
She provided me the freedom and support to explore the research topics of my inter-
est. Her passion for solving the real problems and making a difference in the world has
always been inspiring.
We sincerely thank staff and students of the Computer Science and Engineering De-
partment, SRM Institute of Science and Technology, for their help during my research.
Finally, we would like to thank my parents, our family members and our friends for
their unconditional love, constant support and encouragement.

S Vishnuvardhan
Ankit Pasayat

v
TABLE OF CONTENTS

ABSTRACT iv

ACKNOWLEDGEMENT v

LIST OF TABLES ix

LIST OF FIGURES x

ABBREVIATIONS xi

LIST OF SYMBOLS xii

1 INTRODUCTION 1
1.1 Image Colorization . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Artificial or Automatic Colorization . . . . . . . . . . . . . . . . . 2
1.3 Formal Problem Definition . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.1 Scope of System . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 LITERATURE SURVEY 4
2.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.3 Literature Review On Image Colorization . . . . . . . . . . 4
2.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 SYSTEM ANALYSIS 15
3.1 Existing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.1 Use of Feature Extractors . . . . . . . . . . . . . . . . . . . 16

vi
3.2 Proposed System . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Advantages of Proposed System . . . . . . . . . . . . . . . . . . . 18
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 SYSTEM DESIGN 20
4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Cross-Channel Encoder . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 UML Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.1 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.2 State Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 SYSTEM REQUIREMENTS AND SPECIFICATIONS 23


5.1 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . 23
5.1.1 Client Requirements . . . . . . . . . . . . . . . . . . . . . 23
5.1.2 Server Requirements . . . . . . . . . . . . . . . . . . . . . 23
5.2 Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2.1 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2.2 Python 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2.3 Tensorflow . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2.4 Anaconda (Python Ditribution) . . . . . . . . . . . . . . . . 25
5.2.5 Apache Tomcat . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

6 SYSTEM IMPLEMENTATION 26
6.1 Algorithm Explanation . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2.1 Flora and Fauna, Figure 6.2 . . . . . . . . . . . . . . . . . 28
6.2.2 Skies and Monuments, Figure 6.3 . . . . . . . . . . . . . . 29
6.2.3 Humans and Humanoid subjects, Figure 6.4 . . . . . . . . . 29
6.2.4 Animals and Reptiles, Figure 6.5 . . . . . . . . . . . . . . . 30
6.3 Implementation Screenshots . . . . . . . . . . . . . . . . . . . . . 30
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

vii
7 CONCLUSION AND FUTURE ENHANCEMENTS 32
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.2 Future Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . 32

A CODING AND TESTING 35


A.1 Read_Model.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
A.2 Batch_Dataset_Reader.py . . . . . . . . . . . . . . . . . . . . . . . 35
A.3 Load_Bin_Centers.py . . . . . . . . . . . . . . . . . . . . . . . . . 39
A.4 Image_Colorization.py . . . . . . . . . . . . . . . . . . . . . . . . 39

B Contributions 53
B.1 S Vishnuvardhan . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
B.2 Ankit Pasayat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
LIST OF TABLES

2.1 Summary of Literature Survey . . . . . . . . . . . . . . . . . . . . 14

6.1 Turing Test Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 28

ix
LIST OF FIGURES

1.1 Example of manual image colorization (George Melies) . . . . . . . 1

3.1 Macro-architecture of the Visual Geometry Group (VGG)16 Network 15


3.2 A comparison between different colorization models . . . . . . . . 17
3.3 Old Training Architecture . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 New Training Architecture . . . . . . . . . . . . . . . . . . . . . . 19

4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 20


4.2 Cross-Channel Encoder . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 UML Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 UML State Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 22

6.1 313 Quantized colors in ab space . . . . . . . . . . . . . . . . . . . 27


6.2 Flora and Fauna . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3 Skies and Monuments . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.4 Humans and Humanoid subjects . . . . . . . . . . . . . . . . . . . 29
6.5 Animals and Reptiles . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.6 Web app before Colorization . . . . . . . . . . . . . . . . . . . . . 30
6.7 Web app after Colorization . . . . . . . . . . . . . . . . . . . . . . 31

x
ABBREVIATIONS

AC Artificial Colorization

CIE International Commission on Illumination

CNN Convolutional Neural Network

DLSS Deep Learning Super-Sampling

NN Neural Network

ReLU Rectified Linear Unit

RGB Red Green Blue

VGG Visual Geometry Group

xi
LIST OF SYMBOLS

α, β Color Channels
Q Quantized Color space
L Luminescence Channel
h, H Height of Convolutional Layer
w, W Width of Convolutional Layer
Ẑ Vector of 313 (Q) Values
X Vector of Convolutional Layers
F (X) Mathematical representation of the CNN
Ŷ Final output matrix

xii
CHAPTER 1

INTRODUCTION

1.1 Image Colorization

Before the advent of neural networks and deep learning, image colorization was a te-
dious, manual and expensive task which was done by professional artists and painters.
These painters used to add colors to images painstakingly using either primitive soft-
ware tools or physical mediums. (Swanepoel, 2012) and (Hautner, 2007). The number
of images that were manually colorized before 2015 are far more than those done by au-
tomatic methods. Figure 1.1 shows an example of an image that was colorized manually
by a digital artist.

Figure 1.1: Example of manual image colorization (George Melies)

While manual colorization can be very accurate and visually similar to the ground
truth image, the process is very slow. It takes 3-4 weeks for a professional and experi-
enced artist to colorize one full-sized photograph. Also, the presence of any artifacts or
blemishes on the images reduces the ability of the artist to replicate the original scene
in the photograph, which is quite common in old photographs and video frame captures
(the main use-case of image colorization).

The process of adding color to legacy film-tapes and old video footage is termed as
Film Colorization and is used in the movie industry to refurbish and republish classic
movies and photographs from old photoshoots.
1.2 Artificial or Automatic Colorization

Artificial Colorization (AC) is a process whereby a grayscale (black and white) image
is converted into an equivalent color image, without any color information or human
intervention.

Colorization using computers first appeared in the 1970s, using a technique per-
fected by Wilson Markle. These algorithms produced images that were not were far-off
from the black-and-white base images and had a signature sepia (brown) tint on them.
(Vanderbolt, 2010). This is observed quite frequently on old photographs. Some film
companies used a bluish aniline tint on the sky facing frames of their movies to create
illusions of color video footage.

Artificial colorization has evolved steadily through the years and now we can utilize
the raw learning ability of deep neural networks to study images in a dataset and use
that information to color any new image that is fed to the system.

1.3 Formal Problem Definition

In terms of problem formulation, image colorization is a problem of information recov-


ery, where the information in the color channels of the input image is ’list’. The system
needs to recreate this lost information from the data that it has encountered during the
training phase.

In mathematical terms, the system needs to map a grayscale pixel in an image


(A[R,G,B]) to a color tuple (R[0-256],G[0-256],B[0-256]), where each of the letters
RGB stand for the respective primary color spectrum channels in the image.

Theoratically, a simple probabilistic filter model will be able to compute the most
probable color tuple value (RGB) for a given pixel in a 2-dimension image space. But
in practice, such naive implementations of an image colorizer system produce results
that are not coherent and are unpleasing to the human eye, due to the predictability of
the system. However, if a statistical colorizer is trained using several different images
of an object in an angle, that is, if the training images and the input image are visually

2
similar to a large extent, then the colorization produced is satisfactory.

Generally, image processing problems can be classified into computational prob-


lems of 2-dimensional nature.(Hwang and Zhou, 2016). The objective of the problem
is to analyze a 2-dimensional matrix of pixel and color information from the input im-
age and make computational changes on this layer, so as to produce some significant
changes or achieve a required result. In the case of image colorization, this change
needs to be the inference of the color information from the grayscale pixel information.

1.4 Motivations

As we have described in Section 1, automatic image colorization has a number of use-


cases in the film and digital art industry. In this section we described other fields and
areas that can benefit from automating the process of color conversion. We also analyze
the more recent deep-learning based approaches, specifically those using Convolutional
Neural Networks (CNNs), (Walter, 2010) which are specially optimized mathematical
models for image processing and manipulation. By diving deep into the mathemati-
cal and statistical theory behind the implementation of these models in our reference
papers, we have managed to decipher the essence of the process of colorization.

1.4.1 Scope of System

Areas where image colorization can be applied include: Old Paintings, black and white
photographs, monochrome video feeds, medical scans, geographical and relief maps,
color correction

1.5 Summary

We have explored the old and new methods of introducing color to black and white
images in this section, while introducing the objective of our project. In the next section,
we summarize the formal literature survey conducted for our project.

3
CHAPTER 2

LITERATURE SURVEY

2.1 Literature Review

2.1.1 General

In this chapter, we present the results of the formal literature survey conducted for our
project. We identify the technological processes and methodologies used in the field of
image colorization and image processing. This survey helped us with key features of
our final model.

2.1.2 Literature Review

The term literature review is a study of a published paper, article or other academia.
Literature reviews are not original content, and are always considered as paraphras-
ing work. In the next subsection we describe the unique features, methodologies and
differences in each of our reference papers.

2.1.3 Literature Review On Image Colorization

[1] Daly, R., Zhao, T. (2016) CNN Assisted Colorization of Gray-scale Images.
Stanford University. This paper was published in the internal journal of Stanford Uni-
versity and talks in details about the process of Image Colorization with deep neural
networks. The main take-away from their work is that compression of the input image
matrix to a high level and sampling the output at the same scale will considerably re-
duce the amount of computation that needs to be done by the machine. This creates
an intelligent system that operates at a low level while maintaining sufficient quality of
the images produced. The color palette used by their system is a 224x224 matrix to
lower the need for complex computation. Their model can store inferences from a large
number of images in a small file with this method. The model has been configured to
operate on strictly low resolution grayscale images with 1:1 aspect ratio. The authors
have used an custom neural network with an underlying VGG16Net CNN. This is a
standard implementation that has seen much use in image manipulation applications.
The authors have managed to achieve acceptable accuracy for sparse contrast images.
The dataset used is subset of ImageNet ILSVRC2012 with the metrics as Training Loss,
Validation Loss and Percentage Accuracy of Hints.

The first thing that we noticed from this paper is that their algorithm is partial to the
color information that is interpreted by it. Some hues are expressed better than others in
the output images and this is not ideal for a colorization algorithm. However, the neural
network used by Zhang et al is of the highest quality and we feel that we can improve on
their work by use of our training optimization techniques. This is a good starting point
to build our model on, as it has acceptable base accuracy for most standard cases. We
have decided to change the hyper parameters of the CNN and use different performance
metrics than those used here.

[2] Hwang, J., Zhou, Y. (2016). Image colorization with deep convolutional
neural networks. Stanford University, Tech. Rep., 2016. [Online]. Available:
http://cs231n. stanford.edu/reports2016/219Report.pdf.Hwang and Zhou (2016).
Hwang et al. have managed to use statistical-learning for improving the understand-
ing of artificial image colorization. In their work, they have used a probabilistic filter to
extract color information from images that are sparsely distributed along the color spec-
trum. They are attempting to harness the innate ability of CNNs interwoven on a hidden
layer level with a higher order harmonic series manifesting an absolute correlation ma-
trix. (Krizhevsky et al., 2012) This system makes use of pixel level harmonics and they
have recently succeeded in using the reactive manifestation of quantum dot technology
dialectic projections. Focusing on a single pixel in 2d-space can lead to hyperbolic sub-
resonant variations combined with isotropic transfer functions encapsulating modular
partitions. The scope of the images is 224 x 224 x 3 channel images with clearly de-
fined edges. They have employed a learning Pipeline with underlying VGG16 CNN and
Rectified Linear Unit (ReLU) activation function. The regression network outputs are

5
reasonably accurate. Green tones are restricted to areas of the image with foliage, and
there seems to be a slight amount of colour tinting in the sky. We, however, note that the
images are severely de-saturated and generally unattractive. These results are expected
given their similarity to their base paper sample outputs and their hypothesis. We will
use this paper as reference for the regression network architecture, as even though their
model has high potential for serving as a base model, the base architecture is perfect
for image colorization and fine tuning of the hyper parameters will allow us to produce
better results.

[3] Zhang, R., Isola, P., Efros, A. A. (2016, October). Colorful image coloriza-
tion. UC Berkeley. Springer, Cham. Zhang et al. (2016) Zhang et al have used a
regression network paired on top of a CNN to create their colorization system. This is
a very good approach since the false positive channel activations that may occur due
to the convolutional network can be normalized by the regression network easily. By
tuning the parameters of the regression network, they have used it as a high-pass filter
for the false colorizations produced by the model. This causes their model to produce
highly life-like colorizations for most of their test cases. The images used are variably
sized grayscale images with at least 3 channels of information (Luminosity, U, V) and
are trained by CNN assisted Regression Network with a custom objective function. A
VGG16 classifier was not able to distinguish between stock images and coloured ones,
implying that channel information was coded with high accuracy. The colorization
quality is reasonably high even for runt images and sparse images with low detail or
contrast. Sky-tinting is not observed and skin-tones are reproduced fairly well.

This model is a very good example of a fully balanced and stable convolutional
network for image analysis. By tweaking hyper-parameters in this model, we can re-
purpose it for any task that is related to image computation. We can also see that this
combination of a neural network and a regression network can produce very bold and
saturated, and thus visually aesthetic results with input images of low quality. This
model is highly advanced and has a high level of accuracy. Therefore, we can take
hints about the internal architecture of our custom CNN from this model. Moreover, we
can also use the same image dataset to compare our modelâĂŹs performance with this
baseline model.

6
[4] Baldassarre, F., Mor’in, D. G., Rod’es-Guirao, L. (2017). Deep Koalar-
ization: Image Colorization using CNNs and Inception-ResNet-v2. KTH Royal
Institute of Technology. Baldassarre et al. (2017)

Baldasarre et al have created a combinatorial model much unlike our previous refer-
ence paper by Zhang et al, but instead of combining a neural network with a statistical
regression network, they have combined it with another neural network, essentially cre-
ating a serial chain of networks, where the output of the first network is the input of the
second network. This aides the colorization tasks in two ways. The first being, due to
one network being fed pre-convoluted data, the overhead associated with production of
index-matrices and model metadata is highly reduced. The second network can operate
directly on the convoluted data produced by the first network because both the networks
are symmetrical and have the same input and output dimensions.

The images used to train this model are of an arbitrary size but must contain the three
LAB channel information, that is, Luminance, alpha and beta channels. The images
were sourced from the ImageNet dataset, which is freely available for academic use
and were post processed to contain the required channel information

The layers in their model all have very specific functions so as to aid the high co-
hesiveness of the data classes in the images provided. For example, the layers in the
network that are closer to the input layer convolute features such as differences in con-
trasts and ridge-cuts. (Vedaldi and Lenc, 2015). The layers that follow these contrast
detection layers amplify and balance the data produced by the earlier layers. The last
layers that are adjacent to the output layers are used to rectify data into the output di-
mensions of the image matrix, without these layers, the produced output would not
be compliant with the output dimension specification of the network and the second
network would not be able to process them.

When this model was trained, the image dataset used was not standardized. Thus,
they fed the model unclassed data but still have managed to generate near ground truth
colorizations for most of their input images. The dataset used was very small in com-
parison to a standardized dataset such as ImageNet and thus was not able to widen the
scope of the type of images that can be colorized by it.

7
Their model is suitable for a specific type of image class, that is, images in which
all three richter channels of information are present. In images that do not have this,
the model tries to guess out the most probably channel information configuration. More
often than not, this reduces the accuracy of colorization considerably. Having said so, if
the field of image classes fed to this model was widened, the output images that could
potentially be produced could be endless.

[5] Welsh, T., Ashikhmin, M., Mueller, K. (2012, July). Transferring color to
grayscale images. In ACM Transactions on Graphics (TOG) (Vol. 21, No. 3).
ACM.Welsh and Ashikhmin (2002)

Welsh et al have followed a new approach for adding color to black and white im-
ages by using neural networks. Their model uses a guiding image, that is studied by
the model, to infer the color information and the objects in the scene. As described by
them, to colorize an image of a ball, their model requires an colored image of a ball,
from which it can learn to add color channel information to the black and white im-
age. The consequence of this method is the fact that if the model has not encountered
any image of a particular object during the training period, the colorization produced is
very absurd and at times disturbing. Holographic objects and color clouds are produced
and weird color patterns are randomly applied across as the model tries to estimate the
unknown objects color composition and fails to do so.

However, when the image is present in the training dataset, the colorization pro-
duced for the input image is considerably decent, even if it overfits the training image
color channel information. By overfit here, we mean to imply that if the model has seen
the image of a blue ball during its training phase, the probability that it will color all
subsequent balls blue is very high. This causes the model to be very deterministic and
predictable when it comes to colorizing certain objects.

The images used to train this model have originated from the ImageNet database and
specifically of 300x300 size with clearly visible contours. (Yao et al., 2010). The testing
methodology used by Welsh et al is also standard for models of image colorization - the
acceptance evaluation, which consists of calculating the accuracy of a model by means
of a human guessing out the correct artificially colorized image out of a set of images.

8
The model works reasonably well for the use-case presented by the authors such as
for empty landscapes and seascapes. The reason that the authors have chosen to use
these types of images for their model is that the information present in these images
is very low. For comparison, images containing humans, animals or other clearly dis-
cernable features encompass more than 100 times the inference information than empty
landscape and seascape images. This makes it infinitely easier for their image to discern
the appropriate color to place in a particular place because the model has already seen
the specific scene before and it only has to adapt the present information in such a way
that the integrity of the scene is not destroyed.

[6] Levin, A., Lischinski, D., Weiss, Y. (2014, August). Colorization using
optimization. In ACM transactions on graphics (tog) (Vol.23, No. 3, pp. 689-
694).ACM.

In this paper, Levin et al present a solution for colorization of black and white
images that is very different from the current and existing means of solving the problem
of image colorization. They employ a statistical model instead of a neural network
and regression layers to mimic the color information from the input image and the
training dataset into a new color space. How this works is based on the fact that color
information can retain certain aspects of object contouring and contrasting features.
When this is inferred by their model, the contour and contrast information is actually
trapped in the color channels of the training images and is transferred to the input images
from there without loss of information or dimensionality.

Now for retaining information in the color channels that cannot be usually kept
there, they use a human element to tell the model what parts of the training image is
useful for the model. Due to this fact, their model cannot fall into the category of fully
automated image colorizers. The artist or human still needs to point out to the model,
what parts of the image to store and what parts to discard. The indicated regions of
the training image are retained by the models and the processing happens only at that
specific points. This reduces the overhead of this model by a large margin and is thus
very efficient.

Images of size H x W with colour annotations from human interaction were used.
Quadratic weight matching with the correlation affinity is derived from the image by re-

9
lating the color channel information of the training images and the input image. (Zomet
and Peleg, 2002) and (Torralba et al., 2003).

The model exhibits low training time, less algorithm downtime and overall better
efficiency due to human interaction. By using manual pointers to track important re-
gions of the training images to process correctly, the model ignores a lot of features that
are not pointed to by the human entity involved. In theory, if the human entity succeeds
in correctly providing the areas of interest in the dataset, the model can achieve a stag-
gering over 90% accuracy in the testing phase. However, if the human detects only the
least important parts of the images, this gain in accuracy is deteriorated and low quality
colorizations are produced.

[7] Nie, D., Ma, Q., Ma, L., Xiao, S. (2007). Optimization based grayscale image
colorization. Pattern recognition letters, 28(12), ISI 1445-1451. Elsevier.Nie et al.
(2007)

Nie et al introduce a new technique for adding color to images by exploiting the
features of the advanced tensorflow CNN engine. The technique employed is mathe-
matically very similar to the one followed by (Koleini et al., 2009). Instead of using
quadratic weights for computing the layer activations, they have used a polynomial ac-
tivation function to reduce the error levels that are associated with linear and quadratic
functions. The polynomial function that they use is calculated in two parts.

The first phase involves deciding what powers of the indeterminate (x) variable
needs to appear in the final function. This is done by observing the error levels produced
by each degree of the variable in the final image and employing trial and error methods.
Once the terms of the function are decided, the next and final step is to calculate the
coefficients of the various terms in the final function.

This is done by calculation of solutions (or zeroes) of the equations and comparing
the like-degree terms on both the sides. Once the coefficients are calculated, the func-
tion is fed to the last activation layer in the model. (Chen et al., 2016). By analyzing the
output errors, it is determined if there is need for the whole function to be multiplied or
divided by a constant factor k, which will optimise and result in better error levels than
the ordinary polynomial function.

10
The validation curves in this model are not very satisfactory considering the efforts
they put in calculation of the perfect validation and activation polynomial function. This
may be due to the fact that even with a perfect activation layer and function, the dataset
used and the classes of images fed to the model are of below par quality. The model tops
out at a staggering 50% loss curve which is not very impressive for a neural network
based model.

[8] Chen, T., Wang, Y., Schillings, V., Meinel, C. (2014, January). Grayscale
image matting and colorization. In Proceedings of IEEE Asian Conference on
Computer Vision (Vol. 6).

This paper by Chen et al provides an useful insight into the process of colorization
by use of lesser known technique known as image matting. Image matting has been
studied in detail since the early 1990s, and is considered as a fully deterministic way of
introducing color into grayscale images without any prior color information. The main
features of image matting are the following, the framework for this process is based on
sheer probability and the Bayes theorem for calculation of conditional probability.

Mathematically, this algorithm tries to do the following computation. Given a pixel


with red, green and blue values pre-determinate from the input image, the model needs
to estimate the probability of a particular color X being present at the location of the
pixel, given the values of those particular pixel channels. This introduces several com-
plexities in implementing this method of image colorization.

The first being, there is a finite limit of colors to which the model can check for
the conditional probability at a particular location. The introduction of a new color
swatch or group of colors, will introduce a level of time and space complexity for the
entire model. For reasoning and understanding, if this model operates on a 16-bit color
swatch, which results in about 256 unique colors that can be present in the input image,
and a new color channel bit is added to this swatch, making it 17-bit, the complexity
increase will be of the order of 172̂ = 289 times the original computational complexity.
(Persch et al., 2017).

The bigger problem arises because the images that we access, view and download
at this point in time use huge swatches of color informations encoded into each image.

11
Some of the higher definition images that are retrieved from DSLR and professional
cameras can contain 16 million or even 32 million unique colors, whereas this model
still can support only upto 32bit color information.

If this crippling limitation of this model is ignored, the colorizations produced by


this model are aesthetically pleasing and semantically plausable. If in the future, some
computational development allows us to expand this process into near-infinite channel
color information, we can use this for efficiently and reliably colorizing any grayscale
image without prior information or human intervention.

[9] Kekre, H. B., Thepade, S. D. (2010, July). Color traits transfer to grayscale
images. In Emerging Trends in Engineering and Technology, 2010. ICETET 2010.
First International Conference on (pp. 82-85). IEEE. Kekre and Thepade (2008).

In this paper by Kekre et al, they have used the process of color traits transfer to
produce the model for image colorization. This process is highly complex and mainly
relies on the statistical aspect of neural networks. The model proposed by the authors is
most innovative, featuring two neural networks that run in parallel in the system, each
colorizing one channel of combinatorial color information. This allows the model to
produce fast colorization even with high scale or resolution of input images.

The first neural network computes the contrast information and alpha channel infor-
mation from the input image. The input layer of the first network is optimised to extract
contrast information more than color information, and thus functions as the primary
edge/object detector in the overall system. (Lavvafi et al., 2010). The contrast infor-
mation is combined with the alpha channel information by a proprietary combination
function and is fed to the last part of the system, a multi-channel encoder.

The second neural network layers are optimized to extract color information and
ignore all semblance of contrast and contours present in the input image. This part of
the system functions as the primary color detector in the system and provides the color
cues needed by the final multichannel encoder to produce the final colorized image.

The multichannel encoder is a simple logical combinator system that uses a x,y
based function to combine the information produced by the two parallel neural networks
into a streamlined channel of continous data that can be encoded into a output image

12
without loss of accuracy, color reproduction or scale. The main feature of this model is
that the input image is never compressed, so no loss of resolution is observed.

[10] Horiuchi, T., Hirano, S. (2013, September). Colorization algorithm for


grayscale image by propagating seed pixels. In Image Processing, 2013. ICIP
2013. Proceedings. 2013 International Conference on (Vol. 1, pp. I-457). IEEE.
Horiuchi and Hirano (2008).

According to the authors of our last reference paper, Horiuchi et al have described
the task of image colorization as a problem of data representation. The logic behind
their assumption is that each image is simply a matrix of two dimensional floating
point values and thus mathematical data. While this is inherently correct on their part,
many images contain more data than simply two dimensions of pixel information. Most
modern digital devices have the ability to store metadata in images, and this is a problem
for their model. The metadata contained in images produced by mobile phone, point and
click and professional cameras increase in this order respectively. Professional cameras
also store geolocation, and the details of the camera configuration such as aperture,
shutter speed and ISO values while a picture is taken.

If we assume naively, that the images that we encounter during the testing phase in
this model will contain only two dimensional data, simple statistical data retention is
not enough to produce life-like colorizations with these handicaps in place. However,
the authors clearly acknowledge this fact and provide solutions for this problem. They
recommend downsizing of the input image and truncating information that is not needed
by the model, this way the model can train itself only on the vital information and the
dimensionality of the model is preserved.

13
2.2 Summary

In the following Table 2.1, we group the main approaches for image colorization in the
reference papers and describe important features of each approach.

Paper Algorithm Dataset


Daly et al (2016) VGG16NET ImageNet
Hwang et al (2016) Learning Pipelines ImageNet
Zhang et al (2016) CNN + Regression Net ImageNet
Baldasarre et al (2017) Inception-ResNet v2 CIFAR-10
Welsh et al (2012) Global Image Matching CIFAR-10
Levin et al (2014) Quadratic matching CIFAR-10
Nie et al (2007) Polynomial matching ImageNet
Chen et al (2014) Digital Matting ImageGarden
Kekre et al (2010) Color Traits Transfer ImageNet
Houruichi et al (2013) Pseudo-coloring ImageNet

Table 2.1: Summary of Literature Survey

14
CHAPTER 3

SYSTEM ANALYSIS

3.1 Existing System

Convolutional neural networks are highly suitable for processing visual data due to
their implicit structure and characteristics. One of the most popular implementations of
CNNs for image processing is the VGG16 (Figure 1) by (Zhang, 2016a) . The VGG16
is used for object classification and its architecture and weights used are freely avail-
able for general use in their website. The model achieves a 92.7% top-5 test accuracy
in ImageNet, the standard dataset for image processing and training tasks. It is a col-
lection of 14 million images without copyrights for research and academic use. The
VGG16 network (Figure 3.1) contains 16 layers that are designed to manipulate and
retain information from the training images in order to detect macroscopic objects and
subjects.

Figure 3.1: Macro-architecture of the VGG16 Network

At this point, we start to analyze the work done by Zhang et al 3 in their ECCV
2016 submission (Zhang et al., 2016). They have used a custom architecture which is
known as AlexNet and achieved positive results in their so called âĂIJcolorization Tur-
ing TestâĂİ in which human participants are asked to differentiate between a real image
and an artificially colorized one. Zhang et al have managed to solve the object color re-
dundancy problem sufficiently, the fundamental fact that an object may have multiple
plausible colorizations. For achieving this, they have incorporated a unique loss met-
ric which is re-weighted during training time instead of being static. This allows their
model to use the entire information in the training dataset instead of converging on a
few highly contributing images. Finally, they have taken the annealed mean of the dis-
tribution to produce the final colorization. Here, the parameter T, closely relates to the
color temperature in the output image, that is, lowering T tends to produce images with
more reddish and yellowish hues and increasing T produces images with pronounced
blues and greens. By trial and error, they have chosen the T-value 0.38, which gives
satisfactory results even though it is on the warmer end of the color spectrum. This may
be due to the overall color temperature of the dataset being on the slightly warmer side.

3.1.1 Use of Feature Extractors

Use of feature extractors: In (Baldassarre et al., 2017), they have used a pre-trained
feature extractor to validate color information generated by their primary CNN. They
achieve this by using a system of dual-channel decoders and encoders. The source
grayscale image which initially only contains the Luminance (L) channel is down-
sampled to a computation friendly 224x224 dimension. One copy of this down-sampled
image is fed to the pre-trained feature extractor InceptionResNet, which produces a
vector containing the coherent information about the image. This information is then
’fused’ with the output of the encoder in the primary system, which convolutes a down-
sampled image through 8 layers. The fusion causes the primary system to recognize
features that are not immediately apparent from the L channel of the image alone. The
fused result is then fed to the decoder, which is essentially a reverse convolution net-
work, designed to amplify certain features in images. The results from the decoder are
up-sampled to produce the colored version of the grayscale image. Each of the mod-
ule in the system has a ReLU activation function, which is just a Linear Unit function,
except for the last layer, which is hyperbolically tangential. The work by Baldassare
et al generates images with high accuracy and photorealism. However, due to the rel-
atively low volume of the dataset used (about 60,000 out of the 14 million images in
ImageNet), it performs better when some features are present in the image. Figure 3.2
shows a side-by-side comparison of images that have been colored by 4 of our reference

16
papers, and has been taken from Baldassare et al.

Figure 3.2: A comparison between different colorization models

3.2 Proposed System

By reviewing and studying reference papers related to image processing and coloriza-
tion, we have gathered vital information regarding colorization processes and tech-
niques. We have learned about the strengths and weaknesses of each technique and
described 2 of the techniques in detail. Here are some of our observations from the
reference papers:

1. The L x α x β color space is objectively better for training colorization systems


due to the localized nature of information in Lαβ channel images.

2. Using a pre-trained feature extraction model or an object classification model as a


secondary source of information from the source image can cut down on training
time and resources required considerably.

3. Down-sampling of the source images to a standardized small size is essential to


reduce computational complexity, even if it reduces the amount of detail that can
be produced in the final images.

17
4. Up-sampling or decoding of the encoded channel images leads to unwanted noise
and artifacts in the output images that reduce visual appeal.
5. The computational complexity of the overall system will be directly proportional
to the number of neurons in each layer of the CNN.

Figure 3.3: Old Training Architecture

We propose several improvements that can be made to increase the accuracy or col-
orization plausibility of the models described above. In the first model by (Zhang et al.,
2016), the training set was vast, but the impact of each individual image was negligible,
thus the overall accuracy of the system could theoretically be increased by weighting
the input training images by their relevance or by using a separate feature-weight dis-
tributor as show in Figure 3.3. This can lead to more accurate colourization in abstract
objects with no context. Their choice of T-value and learning rate is optimal and the
only other improvements in their models can be made by replacing the architecture with
a more complex one, similar to (Segalos, 2014). However, the addition of dimensions
in the model leads to reduction in information distribution within the model and re-
quires careful optimization and tweaking to perfect. We realized that, for colorizing
each image, we had to contact the remote cloud server which contained the model. This
causes a huge overhead to the system. Therefore we have moved the model to our client
machine, reducing processing time by trading color reproduction accuracy as shown in
Figure 3.4.

3.3 Advantages of Proposed System

Our proposed system will reduce training time exponentially without a noticable accu-
racy loss. This will reduce the resources and expenses utilized by the system. At the

18
Figure 3.4: New Training Architecture

same time, it will also increase the number of colorizations done in a particular time
frame making it a much more viable colorization option when compared to its prede-
cessor models.

3.4 Summary

The proposed system is basically a model to solve the problem of color re balancing.
The division of αβ values in ground pictures is heavily influenced towards solutions
with smaller αβ values, because of the appearance of backgrounds like pavements and
clouds, walls and dirt. Leading to the non-anecdotal division of pixels in αβ color ma-
trix, collected from 1.3M training images from the ImageNet. The quantity of pixels
in ground pictures at de-saturated colors are significantly higher than for saturated col-
ors. We make up for the coloring mistake by calculating the loss for every pixel during
training based upon the pixel colour difference. This is similar and equal to the direct
method of re-designing the training space. Every pixel is compared to a constant fac-
tor based on its nearest αβ bin. For smoothed non-anecdotal division, we estimate the
non-anecdotal probability of separate colors in the αβ color space from the complete
ImageNet data set and sharpen the division using a Gaussian kernel. Then we combine
the division with a standard division using weight λ, divide by 1, and normalize so the
constant factor is 1. The values of lambda = [1, 2] and sigma = 5 worked well.

19
CHAPTER 4

SYSTEM DESIGN

4.1 System Architecture

Our system architecture is shown in Figure 4.1. Every convolutional layer is a set of
2 or 3 repeated convolutional and ReLU layers ending in a BatchNorm layer. The
neural net does not contain pool layers. Resolution changes happen due to spatial down
sampling or up sampling between convolutional blocks.

Figure 4.1: System Architecture

4.2 Cross-Channel Encoder

Apart from the visual part of colorization, we also look into how colorization can be
seen as a first step towards representation learning. Our model is similar to an automatic
encoder, with the input and output being separate image channels, or more concisely a
cross-channel encoder as shown in Figure 4.

For the evaluation of the feature representation obtained through this cross-channel
encoding, we require two separate tests on the model. First, we check the task gen-
eralization capability of the features by storing the learned representation and training
Figure 4.2: Cross-Channel Encoder

classifiers to linearly classify objects on already seen data. Second comes the finer ad-
justments to the network based on the PASCAL data set for the classification as well as
segmentation and detection. (Pollard, 2015). Here, we not only test held out tasks, we
also test the learned representation on the data generalization. For an accurate compari-
son with older feature learning models, we retrain an AlexNet net on the task, using our
complete process, for 450k loops.

4.3 UML Diagrams

4.3.1 Activity Diagram

Figure 4.3: UML Activity Diagram

UML Diagram 4.3 shows the multiple threads and activities which take place in the

21
client-server system from when the script starts executing till the time it ends.

4.3.2 State Diagram

Figure 4.4: UML State Diagram

UML Diagram 4.4 shows the different processes and states the image matrix under-
goes in the colorization process in two different sub-systems, the client and the server.

4.4 Summary

Image colorization is a complex problem because it deals with missing information in


the input images. By using a myriad of techniques, we can recover this lost informa-
tion to produce colorization that is plausible to the human eye. Convolutional Neural
Networks are a emerging tool for processing and colorizing images. Advances in the
architecture of CNNs used and optimization of the training methodologies will help
improve the accuracy of the colorization produced (Zhang, 2016b).

22
CHAPTER 5

SYSTEM REQUIREMENTS AND SPECIFICATIONS

5.1 Hardware Requirements

5.1.1 Client Requirements

Any normal home PC (laptop or desktop) will suffice as a client as long as it can compile
and run python files and has a relatively decent internet connection. i.e. The minimum
requirements are:

• 4GB RAM

• 20GB HDD space (mainly to store the data set)

• 8 Mbps internet connection

5.1.2 Server Requirements

The cloud service we have used is digital ocean. Digital Ocean is economically more
reasonable when compared to Amazon Web Services or Microsoft Azure. Our earliest
training architecture stored the data set on the server which increased the overhead
exponentially. Now, we store the data set on our client and send the Red Green Blue
(RGB) matrix to the server which reduces overhead and increases efficiency with a
negligible accuracy trade-off.

So now the server PC requires:

• 10GB of HDD to store the model, input and output matrix

• Preferably distributed multiprocessor system with high core and thread count

• GPU with high CUDA core count if using Graphical Processing Unit (GPU)
mode

• atleast 16GB of RAM


5.2 Software Requirements

5.2.1 Linux

Linux is an openly available operating system that is used by servers worldwide. It is


open-source and developed by people around the world. It was invented by Linus Tor-
valds for open source computing. It inherited heavily from the Unix family of operating
systems. It is suggested here to use Linux instead of Windows because of the depen-
dencies of our project. The package caffe, pypy do not compile on Windows without
disabling compiler signature verification, and due to the absence of C-level libraries
such as zlib and openssl.

5.2.2 Python 3.7

Python is a popular, high-level coding language. It was invented by Guido van Rossum
in 1991 for supporting C/C++ module integration, but has quickly become an all-
purpose language which has support for thousands of modules and plugins openly avail-
able. We have chosen Python for our project because of the simplicity of the syntax and
tight integration with many machine learning and image processing libraries (Python,
2016). We had also considered using R, but the advantages of using R over Python were
overcome by its module install difficulties. Here, we are using Python 3.7.3 which is
the latest stable version of Python 3 that is available at this point in time.

5.2.3 Tensorflow

Tensorflow is an open-source machine learning library that is used by developers and


enterprises for the ML and deep learning needs. Tensorflow is a big package to install
and easily exceeds 500MB with all the plugins and mappers. Tensorflow provides the
easy to use high level Caffe and Keras APIs for easing machine learning model creation
and training. Tensorflow can be run on the CPU as well as the GPU, which can really
speed up machine learning tasks (Ranier, 2016).

24
5.2.4 Anaconda (Python Ditribution)

Anaconda is a distribution of Python, which comes preloaded with all the tools that
are required to do complex scientific calculations and metric processing. Some of the
modules that come pre-installed with an up to date anaconda package are numpy, pypy,
scikit-image, scikit-learn and so on. We have used anaconda for building our tensor-
flow dependencies easily. Anaconda is used in the StackOS Linux distribution but is
supported on Windows, Linux and Mac OS.

5.2.5 Apache Tomcat

Apache Tomcat is a popular web server and a client facing reverse proxy which is
open source and developed by the Apache Software Foundation. Tomcat generally only
allows Java applications to be directly hosted on it. We have used an open-source fork of
the Tomcat server which comes installed with the mod_wsgi package, that allows WSGI
applications to be hosted directly. Then, we packaged our Python web application as a
standalone WSGI container and deployed locally for access on the network. This way,
clients requesting images to be colorized can do so over HTTP and HTTPS.

5.3 Summary

What makes are project unique is that is can be run from any home desktop or PC with
a steady and average internet connection. This is because (even though windows is a
viable option) Linux is an open-source and free Operating System whose distros are free
to download from the internet. Linux can also be run from USB in demo mode or on a
virtual machine over a Windows environment. The dependencies appear like multiple
separate units, but they can be satisfied by a single small python install script. This is
because most of the important machine learning and data science libraries and templates
are already bundled in the Anaconda Distribution which is a 560mb file. When it comes
to hardware dependencies, Digital Ocean is one of the more suited cloud service for our
task and has many reasonable and economic options when it comes to compute services.

25
CHAPTER 6

SYSTEM IMPLEMENTATION

6.1 Algorithm Explanation

What is the colorization problem statement in aspects of the International Commission


on Illumination (CIE) Lab color space. The CIE color space is a 3 channel color space
like RGB, but in a CIE color matrix, colour information is stored only in the α (red-
green component) and β (yellow-blue component) columns. The L (lightness) column
stores luminescence information only.

The black and white picture we want colorized can be imagined as the L-column
of the picture in the CIE color space and our problem is finding the a and b columns.
(Daly and Zhao, 2016) The Lab picture so obtained is converted to the RGB color
matrix by putting it through standard transforms. For example, in OpenCV, this is done
by cvtColor with COLOR_BGR2Lab option.

To simplify computations, the ab space of the Lab color space is divided into 313
bins as shown in Figure 6.1. Instead of calculation the a and b values for every pixel,
because of this division, we only require a bin value in the range of 0 and 312. Yet
another way of imagining the problem is using the L column which takes values from 0
to 255, to find the ab column that takes values between 0 to 312. (Koleini et al., 2010).
So the color prediction task is now converted into a multinomial classification problem
where for every gray pixel there are 313 classes to choose from. The model used by
(Zhang et al., 2016) is a VGG-style network with many convolutional layer blocks.
Each block has 23 convolutional layers ending in a ReLU function and terminating in
a Batch Normalization layer. Different to the VGG net, there is no pooling or fully
connected layers.

The input picture is down rescaled to 224 x 224. Let us call this rescaled grayscale
input image X. When it goes through the Neural Network (NN) shown in the diagram
Figure 6.1: 313 Quantized colors in ab space

above, it converts to Ẑ because of the neural network. Mathematically, this conversion


G by the NN can be represented as:

Ẑ = G(X)

The formula of Ẑ is H x W x Q, where W(=56) and H(=56) are the width and
height of the output of the last CNN block. For each of the HxW pixels, Ẑ contains a
vector of Q(=313) values where every value points to the probability of the particular
pixel being a color class. Our job is to calculate a distinct pair of ab channel values for
each probability distribution Ẑ_h,w.

Getting output image from Ẑ, the CNN gives us a vector of probabilities in Ẑ from
the downscaled input picture X. Let’s proceed to calculating a single ab Key-value pair
from each distribution in Ẑ.

You may be inclined to imagine that we can directly take an average of the matrix
and select the ab key-value by comparing it to the closest quantized bin center. However,
this distribution isn’t Gaussian, and the avergage of the distribution is almost always an
unnatural desaturated color. (Hussein and Yang, 2014) To comprehend this, imagine the
color of the sky – it is sometimes blue and sometimes orange-yellow. The division of
colours of the sky is bi-modal. While recoloring the sky, blue or yellow will acceptable
depending on the colouring. But the average of blue and yellow is an unacceptable gray

27
color.

We can also try to use the mode of the distribution to obtain a blue or yellow color?
We tried it and it gave vibrant colors, but sometimes it broke spatial consistency. Our
alternative was to interpolate between the mean and mode estimates to get a value called
the annealed-mean.(Chen et al., 2004) A parameter called temperature (T) is processed
to altere the degree of interpolation. A final value of T=0.38 was decided as a trade-off
between the two maximums.

6.2 Performance Evaluation


• We followed the human ’Turing Test’ methodology for testing our results.

• We gathered 15 participants (friends, relatives and colleagues) and had them


guess out 4 sets of two images each, which was the ground truth image and which
was colorized by our model.

The results are shown in Table 6.1:

Number of correct Number of humans ’Turing Test’ accuracy


guesses fooled
23 37 61.66%

Table 6.1: Turing Test Accuracy

6.2.1 Flora and Fauna, Figure 6.2

Figure 6.2: Flora and Fauna

28
Our model performed at par results to other discussed colorization models including
(Zhang et al., 2016).

6.2.2 Skies and Monuments, Figure 6.3

Figure 6.3: Skies and Monuments

Our model performed significantly better because of its feature extraction technique
and no down-scaling which generated a more vibrant image when compared to the dull
ground image.

6.2.3 Humans and Humanoid subjects, Figure 6.4

Figure 6.4: Humans and Humanoid subjects

Our model fared well in this category and was up to some extent able to separate
facial features and skin tones.

29
6.2.4 Animals and Reptiles, Figure 6.5

Figure 6.5: Animals and Reptiles

Sadly, our model contained bias when tested against animals and reptiles. The
model misplaced facial colors(especially mouths). In case of snakes and other rep-
tiles, the results were plausible because of the wide variety of colorful reptiles found in
nature.

6.3 Implementation Screenshots

Figure 6.6: Web app before Colorization

The web app is a php and tomcat based server running locally with a python script
which performs the colorization in the backend. The front end is basic html. When the
page is loaded and a grayscale image is uploaded, the webapp looks like Figure 6.6.
When the user presses enter, the client starts contacting the server in the backend of

30
Figure 6.7: Web app after Colorization

the web app as shown in the UML diagrams 4.3 and 4.4. The client sends the Lαβ
color matrix for both the training set and input image to the server. The sever starts
training the model based on the data received. The server then transforms the input
matrix into a new matrix based on the classifier generated. This output matrix is sent
to the client where it is upscaled and converted into the output image. After the script
has completed executing - which does not take long with our new architecture with the
reduced overhead - the colorized image is presented to the user as shown in Figure 6.7.

6.4 Summary

Our machine learning model is heavily based on Richard Zhang’s colorization model.
There are however many different approaches that we took which make our model much
more stabe and scalable. One of these, is the storage of the training set on the client.
We learned it the hard way that the overhead for a large and well featured image dataset
is huge for effective utilization of storage resources on a cloud computing device. The
whole issue can be avoided by the server generating a classifier based on a Lαβ color
matrix. A computing device can handle numbers better when compared to images in
any case. This also makes our device easier to retrain over and over again. Use of a
remote computing unit instead of a data cluster also reduces the time taken for the whole
script to run with negligible loss in image quality. That being said, there are multiple
other improvements which can be made on our current model or on the (Zhang et al.,
2016) model which we will discuss in the next and last chapter.

31
CHAPTER 7

CONCLUSION AND FUTURE ENHANCEMENTS

7.1 Conclusion

Image colorization is a complex problem because it deals with missing information in


the input images. By using a myriad of techniques, we can recover this lost informa-
tion to produce colorization that is plausible to the human eye. Convolutional Neural
Networks are a emerging tool for processing and colorizing images. Advances in the
architecture of CNNs used and optimization of the training methodologies will help
improve the accuracy of the colorization produced. Furthermore, promising techniques
such as Deep Learning Super-Sampling (DLSS), hardware based super sampling and
intelligent color correction will help us produce images that are visually similar to the
ground truth images.

7.2 Future Enhancement

An of improvement common to all models discussed in this paper is the up-sampling


of the encoded images. Traditional up-sampling of small images causes noise and arti-
facts in images. Instead of using the up-sampling methods of basic image processing,
we can use DLSS. DLSS technology was introduced by Nvidia at the end of 2018 for
up-sampling images with deep learning. Currently, DLSS is being used by Nvidia in
highend video games and visual effects for creating the illusion of detail when there is
none.DLSS technology can be applied to colorization to improve the quality of the out-
put image considerably.However, this requires one to train an entirely separate system
for super sampling images based on feature detection.
REFERENCES
1. Baldassarre, F., Morín, D. G., and Rodés-Guirao, L. (2017). “Deep koalarization: Im-
age colorization using cnns and inception-resnet-v2.” arXiv preprint arXiv:1712.03400.

2. Chen, B.-W., He, X., Ji, W., Rho, S., and Kung, S.-Y. (2016). “Large-scale image
colorization based on divide-and-conquer support vector machines.” The Journal of Su-
percomputing, 72(8), 2942–2961.

3. Chen, T., Wang, Y., Schillings, V., and Meinel, C. (2004). “Grayscale image matting and
colorization.” Proceedings of Asian Conference on Computer Vision, Vol. 6, Citeseer.

4. Daly, R. and Zhao, T. (2016). “Cnn assisted colorization of gray-scale images.” Stanford
University, Tech. Rep., 27.

5. Hautner, W. (2007). “Hand-colouring of photographs before the rise of automatic col-


orization - wikipedia. (Accessed on 04/27/2019).

6. Horiuchi, T. and Hirano, S. (2008). “Colorization algorithm for grayscale image by


propagating seed pixels.” 2003 Image Processing ICIP, IEEE, 450–457.

7. Hussein, A. A. and Yang, X. (2014). “Colorization using edge-preserving smoothing


filter.” Signal, Image and Video Processing, 8(8), 1681–1689.

8. Hwang, J. and Zhou, Y. (2016). “Image colorization with deep convolutional neural
networks.” Stanford University, Tech. Rep.

9. Kekre, H. B. and Thepade, S. D. (2008). “Color traits transfer to grayscale images.”


2008 First International Conference on Emerging Trends in Engineering and Technol-
ogy, IEEE, 82–85.

10. Koleini, M., Monadjemi, S. A., and Moallem, P. (2009). “Film colorization using tex-
ture feature coding and artificial neural networks..” Journal of Multimedia, 4(4).

11. Koleini, M., Monadjemi, S. A., and Moallem, P. (2010). “Automatic black and white
film colorization using texture features and artificial neural networks.” Journal of the
Chinese Institute of Engineers, 33(7), 1049–1057.

12. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). “Imagenet classification with
deep convolutional neural networks.” Advances in neural information processing sys-
tems, 1097–1105.

13. Lavvafi, M. R., Monadjemi, S., and Moallem, P. (2010). “Film colorization, using
artificial neural networks and laws filters.” J Comput, 5, 1094–1099.

14. Nie, D., Ma, Q., Ma, L., and Xiao, S. (2007). “Optimization based grayscale image
colorization.” Pattern recognition letters, 28(12), 1445–1451.

33
15. Persch, J., Pierre, F., and Steidl, G. (2017). “Exemplar-based face colorization using
image morphing.” Journal of Imaging, 3(4), 48.

16. Pollard, M. (2015). “Colorizing black & white photos with neural networks. (Accessed
on 04/27/2019).

17. Python, F. (2016). “Python 3.7.3 documentation, api specification and tutorials. (Ac-
cessed on 04/27/2019).

18. Ranier, R. (2016). “Tensorflow api documentation and tutorials. (Accessed on


04/27/2019).

19. Segalos, B. (2014). “Convolutional neural network based image colorization using
opencv. (Accessed on 04/27/2019).

20. Swanepoel, F. (2012). “Advent of film colorization and the rise of computerized col-
orization - wikipedia. (Accessed on 04/27/2019).

21. Torralba, A., Murphy, K. P., Freeman, W. T., and Rubin, M. A. (2003). “Context-based
vision system for place and object recognition.

22. Vanderbolt, C. (2010). “Digital image processing using opencv, scikit-learn and python
- wikipedia. (Accessed on 04/27/2019).

23. Vedaldi, A. and Lenc, K. (2015). “Matconvnet: Convolutional neural networks for
matlab.” Proceedings of the 23rd ACM international conference on Multimedia, ACM,
689–692.

24. Walter, P. (2010). “Artificial neural networks and their architecture - wikipedia. (Ac-
cessed on 04/27/2019).

25. Welsh, T. and Ashikhmin, M. (2002). “Transferring color to grayscale images.” ACM
Transactions On Graphics (TOG), ACM, 277–285.

26. Yao, C., Yang, X., Wang, J., Li, S., and Zhai, G. (2010). “Patch-driven colorization.”
Optical Engineering, 49(1), 017001.

27. Zhang, R. (2016a). “Colorful image colorization - an analysis of neural networks (03).
(Accessed on 04/27/2019).

28. Zhang, R. (2016b). “Colorizing black and white photos âĂŞ algorithmia. (Accessed on
04/27/2019).

29. Zhang, R., Isola, P., and Efros, A. A. (2016). “Colorful image colorization.” European
conference on computer vision, Springer, 649–666.

30. Zomet, A. and Peleg, S. (2002). “Multi-sensor super-resolution.” Improving image


sensor accuracy by supersampling, IEEE, 27.

34
APPENDIX A

CODING AND TESTING

A.1 Read_Model.py

# Specify the paths for the model files


protoFile = "./models/colorization_deploy_v2.prototxt"
weightsFile = "./models/colorization_release_v2.
caffemodel"
#weightsFile = "./models/colorization_release_v2_norebal.
caffemodel";

# Read the input image


frame = cv.imread("./dog-greyscale.png")

W_in = 224
H_in = 224

# Read the network into Memory


net = cv.dnn.readNetFromCaffe(protoFile, weightsFile)

A.2 Batch_Dataset_Reader.py

"""
Code ideas from https://github.com/Newmu/dcgan and
tensorflow mnist dataset reader
"""
import numpy as np
from scipy import misc
from skimage import color

class BatchDatset:
files = []
images = []
image_options = {}
batch_offset = 0
epochs_completed = 0

def __init__(self, records_list, image_options={}):


"""
Intialize a generic file reader with batching for
list of files
:param records_list: list of files to read -
:param image_options: A dictionary of options for
modifying the output image
Available options:
resize = True/ False
resize_size = #size of output image
color=LAB, RGB, HSV
"""
print("Initializing Batch Dataset Reader...")
print(image_options)
self.files = records_list
self.image_options = image_options
self._read_images()

def _read_images(self):
self.images = np.array([self._transform(filename)
for filename in self.files])

36
print (self.images.shape)

def _transform(self, filename):


try:
image = misc.imread(filename)
if len(image.shape) < 3: # make sure images are
of shape(h,w,3)
image = np.array([image for i in range(3)])

if self.image_options.get("resize", False) and


self.image_options["resize"]:
resize_size = int(self.image_options["
resize_size"])
resize_image = misc.imresize(image,
[resize_size,
resize_size])
else:
resize_image = image

if self.image_options.get("color", False):
option = self.image_options[’color’]
if option == "LAB":
resize_image = color.rgb2lab(resize_image)
elif option == "HSV":
resize_image = color.rgb2hsv(resize_image)
except:
print ("Error reading file: %s of shape %s" % (
filename, str(image.shape)))
raise

return np.array(resize_image)

37
def get_records(self):
return self.images

def reset_batch_offset(self, offset=0):


self.batch_offset = offset

def next_batch(self, batch_size):


start = self.batch_offset
self.batch_offset += batch_size
if self.batch_offset > self.images.shape[0]:
# Finished epoch
self.epochs_completed += 1
print("****************** Epochs completed: " +
str(self.epochs_completed) + "
******************")
# Shuffle the data
perm = np.arange(self.images.shape[0])
np.random.shuffle(perm)
# Start next epoch
start = 0
self.batch_offset = batch_size

end = self.batch_offset
images = self.images[start:end]
return np.expand_dims(images[:, :, :, 0], axis=3),
images

def get_random_batch(self, batch_size):


indexes = np.random.randint(0, self.images.shape
[0], size=[batch_size]).tolist()
images = self.images[indexes]
return np.expand_dims(images[:, :, :, 0], axis=3),

38
images

A.3 Load_Bin_Centers.py

# Load the bin centers


pts_in_hull = np.load(’./pts_in_hull.npy’)

# populate cluster centers as 1x1 convolution kernel


pts_in_hull = pts_in_hull.transpose().reshape(2, 313, 1,
1)
net.getLayer(net.getLayerId(’class8_ab’)).blobs = [
pts_in_hull.astype(np.float32)]
net.getLayer(net.getLayerId(’conv8_313_rh’)).blobs = [np.
full([1, 313], 2.606, np.float32)]

A.4 Image_Colorization.py

from __future__ import print_function

__author__ = "ankit"
’’’
Tensorflow implemenation of Image colorization using
Adversarial loss
’’’
import tensorflow as tf
import numpy as np

import TensorflowUtils as utils


import read_LaMemDataset as lamem
# import read_FlowersDataset as flowers
import datetime
import BatchDatsetReader as dataset

39
from six.moves import xrange
import os

FLAGS = tf.flags.FLAGS
tf.flags.DEFINE_integer("batch_size", "16", "batch size
for training")
tf.flags.DEFINE_string("logs_dir", "logs/", "path to logs
directory")
tf.flags.DEFINE_string("data_dir", "Data_zoo/LaMem/", "
path to dataset")
tf.flags.DEFINE_float("learning_rate", "1e-4", "Learning
rate for Adam Optimizer")
tf.flags.DEFINE_float("beta1", "0.9", "Beta 1 value to
use in Adam Optimizer")
tf.flags.DEFINE_string("model_dir", "Model_zoo/", "Path
to vgg model mat")
tf.flags.DEFINE_bool(’debug’, "False", "Debug mode: True/
False")
tf.flags.DEFINE_string(’mode’, "train", "Mode train/ test
")

MODEL_URL = ’http://www.vlfeat.org/matconvnet/models/
beta16/imagenet-vgg-verydeep-19.mat’

MAX_ITERATION = int(1e5 + 1)
IMAGE_SIZE = 128
ADVERSARIAL_LOSS_WEIGHT = 1e-3

def vgg_net(weights, image):


layers = (
# ’conv1_1’, ’relu1_1’,

40
’conv1_2’, ’relu1_2’, ’pool1’,

’conv2_1’, ’relu2_1’, ’conv2_2’, ’relu2_2’, ’pool2’


,

’conv3_1’, ’relu3_1’, ’conv3_2’, ’relu3_2’, ’


conv3_3’,
’relu3_3’, ’conv3_4’, ’relu3_4’, ’pool3’,

’conv4_1’, ’relu4_1’, ’conv4_2’, ’relu4_2’, ’


conv4_3’,
’relu4_3’, ’conv4_4’, ’relu4_4’, ’pool4’,

’conv5_1’, ’relu5_1’, ’conv5_2’, ’relu5_2’, ’


conv5_3’,
’relu5_3’, ’conv5_4’, ’relu5_4’
)

net = {}
current = image
for i, name in enumerate(layers):
kind = name[:4]
if kind == ’conv’:
kernels, bias = weights[i + 2][0][0][0][0]
# matconvnet: weights are [width, height,
in_channels, out_channels]
# tensorflow: weights are [height, width,
in_channels, out_channels]
kernels = utils.get_variable(np.transpose(
kernels, (1, 0, 2, 3)), name=name + "_w")
bias = utils.get_variable(bias.reshape(-1), name
=name + "_b")

41
current = utils.conv2d_basic(current, kernels,
bias)
elif kind == ’relu’:
current = tf.nn.relu(current, name=name)
if FLAGS.debug:
utils.add_activation_summary(current)
elif kind == ’pool’:
current = utils.avg_pool_2x2(current)
net[name] = current

return net

def generator(images, train_phase):


print("setting up vgg initialized conv layers ...")
model_data = utils.get_model_data(FLAGS.model_dir,
MODEL_URL)

weights = np.squeeze(model_data[’layers’])

with tf.variable_scope("generator") as scope:


W0 = utils.weight_variable([3, 3, 1, 64], name="W0"
)
b0 = utils.bias_variable([64], name="b0")
conv0 = utils.conv2d_basic(images, W0, b0)
hrelu0 = tf.nn.relu(conv0, name="relu")

image_net = vgg_net(weights, hrelu0)


vgg_final_layer = image_net["relu5_3"]

pool5 = utils.max_pool_2x2(vgg_final_layer)

42
# now to upscale to actual image size
deconv_shape1 = image_net["pool4"].get_shape()
W_t1 = utils.weight_variable([4, 4, deconv_shape1
[3].value, pool5.get_shape()[3].value], name="
W_t1")
b_t1 = utils.bias_variable([deconv_shape1[3].value
], name="b_t1")
conv_t1 = utils.conv2d_transpose_strided(pool5,
W_t1, b_t1, output_shape=tf.shape(image_net["
pool4"]))
fuse_1 = tf.add(conv_t1, image_net["pool4"], name="
fuse_1")

deconv_shape2 = image_net["pool3"].get_shape()
W_t2 = utils.weight_variable([4, 4, deconv_shape2
[3].value, deconv_shape1[3].value], name="W_t2")
b_t2 = utils.bias_variable([deconv_shape2[3].value
], name="b_t2")
conv_t2 = utils.conv2d_transpose_strided(fuse_1,
W_t2, b_t2, output_shape=tf.shape(image_net["
pool3"]))
fuse_2 = tf.add(conv_t2, image_net["pool3"], name="
fuse_2")

shape = tf.shape(images)
deconv_shape3 = tf.stack([shape[0], shape[1], shape
[2], 2])
W_t3 = utils.weight_variable([16, 16, 2,
deconv_shape2[3].value], name="W_t3")
b_t3 = utils.bias_variable([2], name="b_t3")
pred = utils.conv2d_transpose_strided(fuse_2, W_t3,
b_t3, output_shape=deconv_shape3, stride=8)

43
# return tf.concat(concat_dim=3, values=[images, pred],
name="pred_image")
return tf.concat([images, pred], 3, "pred_image")

def train(loss, var_list):


optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate
, beta1=FLAGS.beta1)
grads = optimizer.compute_gradients(loss, var_list=
var_list)
for grad, var in grads:
utils.add_gradient_summary(grad, var)
return optimizer.apply_gradients(grads)

def main(argv=None):
print("Setting up network...")
train_phase = tf.placeholder(tf.bool, name="
train_phase")
images = tf.placeholder(tf.float32, shape=[None, None,
None, 1], name=’L_image’)
lab_images = tf.placeholder(tf.float32, shape=[None,
None, None, 3], name="LAB_image")

pred_image = generator(images, train_phase)

gen_loss_mse = tf.reduce_mean(2 * tf.nn.l2_loss(


pred_image - lab_images)) / (IMAGE_SIZE * IMAGE_SIZE
* 100 * 100)
# tf.scalar_summary("Generator_loss_MSE", gen_loss_mse
)

44
tf.summary.scalar("Generator_loss_MSE", gen_loss_mse)

train_variables = tf.trainable_variables()
for v in train_variables:
utils.add_to_regularization_and_summary(var=v)

train_op = train(gen_loss_mse, train_variables)

print("Reading image dataset...")


# train_images, testing_images, validation_images =
flowers.read_dataset(FLAGS.data_dir)
train_images = lamem.read_dataset(FLAGS.data_dir)
image_options = {"resize": True, "resize_size":
IMAGE_SIZE, "color": "LAB"}
batch_reader = dataset.BatchDatset(train_images,
image_options)

print("Setting up session")
sess = tf.Session()
summary_op = tf.merge_all_summaries()
saver = tf.train.Saver()
summary_writer = tf.train.SummaryWriter(FLAGS.logs_dir
, sess.graph)
sess.run(tf.initialize_all_variables())

ckpt = tf.train.get_checkpoint_state(FLAGS.logs_dir)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
print("Model restored...")

if FLAGS.mode == ’train’:
for itr in xrange(MAX_ITERATION):

45
l_image, color_images = batch_reader.next_batch(
FLAGS.batch_size)
feed_dict = {images: l_image, lab_images:
color_images, train_phase: True}

if itr % 10 == 0:
mse, summary_str = sess.run([gen_loss_mse,
summary_op], feed_dict=feed_dict)
summary_writer.add_summary(summary_str, itr)
print("Step: %d, MSE: %g" % (itr, mse))

if itr % 100 == 0:
saver.save(sess, FLAGS.logs_dir + "model.ckpt
", itr)
pred = sess.run(pred_image, feed_dict=
feed_dict)
idx = np.random.randint(0, FLAGS.batch_size)
save_dir = os.path.join(FLAGS.logs_dir, "
image_checkpoints")
utils.save_image(color_images[idx], save_dir,
"gt" + str(itr // 100))
utils.save_image(pred[idx].astype(np.float64)
, save_dir, "pred" + str(itr // 100))
print("%s --> Model saved" % datetime.
datetime.now())

sess.run(train_op, feed_dict=feed_dict)

if itr % 10000 == 0:
FLAGS.learning_rate /= 2
elif FLAGS.mode == "test":
count = 10

46
l_image, color_images = batch_reader.
get_random_batch(count)
feed_dict = {images: l_image, lab_images:
color_images, train_phase: False}
save_dir = os.path.join(FLAGS.logs_dir, "image_pred
")
pred = sess.run(pred_image, feed_dict=feed_dict)
for itr in range(count):
utils.save_image(color_images[itr], save_dir, "
gt" + str(itr))
utils.save_image(pred[itr].astype(np.float64),
save_dir, "pred" + str(itr))
print("--- Images saved on test run ---")

if __name__ == "__main__":
tf.app.run()

47
4/12/2019 Gmail - Fwd: Submission received by Imaging Science Journal (Submission ID: 196044879)

Vishnuvardhan Kumar <[email protected]>

Fwd: Submission received by Imaging Science Journal (Submission ID: 196044879)


1 message

Minu R I <[email protected]> Fri, Apr 12, 2019 at 7:53 PM


To: [email protected]

From: [email protected]
To: "minu r" <[email protected]>
Sent: Friday, April 12, 2019 7:46:36 PM
Subject: Submission received by Imaging Science Journal (Submission ID: 196044879)

Dear Minu R I,

Thank you for your submission. Please see the details below.

Submission ID 196044879
Manuscript Title IMAGE COLORIZATION USING CNNs
Journal Imaging Science Journal

You can always check the progress of your submission here (we now offer multiple options to sign
in to your account. To log in with your ORCID please click on the 'with ORCID' box on the bottom
right of the log in area).

If you have any queries, please get in touch with [email protected].

Thank you for submitting your work to our journal.

Kind Regards,
Imaging Science Journal Editorial Office

Taylor & Francis is a trading name of Informa UK Limited, registered in England under no. 1072954.
Registered office: 5 Howick Place, London, SW1P 1W.

https://mail.google.com/mail/u/0/h/c6fw0folso2m/?&th=16a11ef88575bf49&ser=AIKcX55gL_tzwwPtItpqBMesRuZmUDZQmw&v=pt 1/2
 
 
25.,!-02!

 *
+,! -./01
*
.! 0.0! +2 "0+
*
 3,"! 2.+
*
+! /0.! 0+

-+2 "0+


+6 ( 7 89" 
+ ( 
*

+6 (   :
9 ;<3
9 
+ ( 
4*

= ;9 9(
99 "=>
+

. 
>
4*
6  

%
+6 (   :
9 !?  
+ ( 
4*

=" 
(@ (0A =>
+

. 
>
4*
6  

&
(B  >3(+ ?'=9 
39(" 
 =>% 7+ ++
4*
& A 
  "<
" 

( (+   >%


6  


 9 A
>C
@ 9>C: 
3 9 '
=" 
 B A.
 . B
?9=>
4*
D
?9A<" 
@ 9 D@"$>

6  



?
9'
 
 +
 4*

+6 ( 5 9B" (   :
9 ;
+ ( 
4*

BBB'9' 
 '(
 
 +
 4*


E 9'

 
 +
 4*

/(. >F ;>, A>
+AE 1 '=2   69(
;9 
4*
 
 => 
 
, 
9>
6  


@'' 9>D'' G A>''2 :
>
.'"', G A'=(9;9 <

4*
   9 
  9
:
 =>" 
9H5
A 9>&
6  

%
I6  BDE>@ 
G


>+
 
5(

>. A+ 6
>, "A A
4*
"A>
 G A >C9
 E 9'=!A/:
9  A/(
" 99 <  >
99 
(5.9=> 
  C
 <" 

APPENDIX B

CONTRIBUTIONS

B.1 S Vishnuvardhan
• Made key contributions to the neural network architecture and suggested the use
of low pass filters for improving the accuracy of the model.

• Configured and provisioned the cloud environment using Linux, Docker and Ku-
bernetes.

• Designed test scenarios to assess the quality of the model.

• Pivotal in generating the published paper and associated documentation.

• Conducted the human Turing Test with fellow students and colleagues.

B.2 Ankit Pasayat


• Suggested the use of bleeding-edge cloud technology for training the model in-
stead of using a local machine.

• His method improved the training time of our neural network by 4.8 times. (Chap-
ter 3)

• Contacted digitalocean.com and negotiated prices for our student-licensed cloud


architecture.

• Studied and analyzed reference papers for our literature survey.

• Pivotal in generating the report and documentation for the project.

You might also like