A Research Survey Report on Deep Learning Concepts
A Research Survey Report on Deep Learning Concepts
Abstract: The area of “machine learning” is into its golden era because of its features and aspects due to which “deep learning” has became the ruler in
the domain as it uses numerous layers where each layer represents data abstractions required to construct computational models by enabling “dee p
learning” algorithms that are “generative adversarial networks, conventional neural networks, and model transfers” that entir ely customizes our insight of
how the data is processed and information is attained as the opening of consideration behind it is extreme because the domain is never previously
symbolize multi scope perspective due to be deficient in core understanding that leads to controlling methods such as black box machines that restrain
development of basic levels as “deep learning” which is repeatedly perceived as tentative blocks in machine learning. In this paper we present a
thorough assessment of past and present state of art that does not retreat in visual or audio and text processing aspects.
Index Terms: Parallel algorithms, Distributed algorithms;, Deep Learning, machine learning theory, Neural networks, Deep Learning Network,
Conventional Neural Networks
—————————— ——————————
5883
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616
3.2 Online Learning fundamental algorithms are imparted over “deep learning” as
In the present day era stream of data with a huge time the process adjust various parameters in a iterative manner
5885
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616
over training sample as the computational complexity of SGD a high level interface that is used to develop architecture
is tiny than the original gradient as the whole dataset is without considering internal design as the framework is
considered when ever parameters are updated. In the learning implemented using parallel and distributed operations with
progression the updating velocity is prohibited by the hyper fatal tolerance due to which most of the developers adopted
parameter learning rate as the lower learning rates will TensorFlow popular deep learning” framework. Theano [29]
eventually lead to a state which is optimal though the data is and Neon [32] are the frameworks that are developed in
fluctuated due to decay or loss [27] due to which the idea of Python to perform code optimization in the developed system
momentum is introduced to determine the proper learning rate with detailed utilization of kernel level due to which the training
where the weight decay implements as penalty coefficient in speeds typically outperform when compared with existing
cost function to reduce over fitting and improvisation of frameworks as Python extensively support parallelism and
performance. The learning rate is further amplified whenever multi GPU environment but the major disadvantage in this
the parameters are updated by recording the generated framework is the multi node calculation is not designed in
gradient squares which are always positive. these framework. MXNet “supports several interfaces,
including C++, Python, R, Scala, Perl, MATLAB, Javascript,
E[ = E[ (2) Go, and Julia [30] as it supports both computation graph
declarations and imperative computation while performing
where E[g2]t is the accumulated squared gradient at stage t architecture design as MXNet extensively supports data and
and g2t is the squared gradient at stage t which is improved model parallelism with distinct parameter over various server
further by adding decay fraction β1 to record the accumulation schemes to support distributed calculations with most
using Adams[27] l-2 norm is reinstated to make the algorithm comprehensive functionality. But the major disadvantage in
stable. this frame work is performance is not optimized as that of
other existing frameworks”. Torch [31] has its “deep learning”
3.4 Distributed System based Deep learning features that are merged with “Facebook’s deep learning
Competence of training model is accelerated in distributed CUDA library (fbcunn)” [35] as Torch can operate over model
“deep learning” techniques over training process using data and data level computation over parallel systems due to which
parallelism and model parallelism replicated over it is built on a dynamic graph denote instead of a static graph
computational nodes where model is trained within assigned as to be a dynamic graph that allows us to update the
subset of data following certain period of time to synchronize computational graph at runtime by defining functions to
the nodesWhere as in model parallelism data is practiced with generate advanced graphs. Due to all of these advantages
a model in which each node is accountable for executing Torch is considered to be the most utilized framework. Caffe
inference of parameters in the model. “Let Wt,i represents a framework is implemented using the Berkeley Vision and
parameter in neural network node i at a specific time t with Learning Center due to which it is considered to be most
slave nodes N used for training with master node”: extensively used framework [33] as the most extensively used
layers for CNN and RNN and the disadvantage of this
(3) framework is it doesn’t utilize DBN framework. The main
advantage of Caffe is the structure of computation graphs that
Scalability of model parallelism is inferior as the framework are based on convolutional layers as pre-trained models in
takes the embedding represents each operations over different neural networks. Another limitation of Caffe framework is it is
devices when compared with human experts. single-machine framework as it cannot support execution in
multimode but the exception is while executing multi-GPU
4 FRAMEWORKS OF DEEP LEARNING calculations.
Table 2 represents “the list of popular deep learning
frameworks for implementing architecture designs where the 3 DEEP LEARNING APPLICATIONS
table represents CNN & RNN and DBN frameworks supported “Deep learning” applications are implemented using NLP
are listed”: By observing the Table 2 is “usually implemented Natural Language processing where data is processed using
using C++ for implementing deep learning frameworks that visual tools and speech is used for performing audio
accelerate the training speed as it uses GPU which is processing and many other application make use of social
significantly improvise speed up process of matrix evaluation network to analyze social impact and health analysis where
using the interface presented by CuDNN [34] as python has each application uses its own tools and methodology.
emerging to be a preferable language for implementing deep
learning architecture as python is more efficient programming
language and simple to implement process due to which the 5.1 Natural Language Processing (NLP)
distributed calculation become more easy in some of the latest NLP is a collection of techniques and algorithms that are used
frameworks like TensorFlow and MXNet tend to improvise the to train the computer machines for performing various tasks by
processing speed and efficiency while performing deep using human language as input where the process includes
learning”. TensorFlow contains the support to be provided to various phases such as “document classification, translation,
adapt “deep learning Application-Specific Integrated Circuit paraphrase identification, text similarity, summarization, and
(ASIC) called Tensor Processing Unit (TPU) to help increase question answering” as shown in Table 3.
the efficiency and decrease the power consumption”.
TensorFlow is instigated as customized “deep learning” NLP process is considerably complex and with ambiguous
process that provides sequence of internal functions to structure and highly context specific where a change in single
implement any deep neural network oriented static processing word will lead to change in the whole context. Where the NLP
graph attainment [28] “Keras started to support Tensorflow via follows steps involved 1) division of input text into words using
5886
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616
TABLE 3
“DEEP LEARNING APPLICATIONS REPRESENTATION” comparison of two sentences. By using syntactic trees that are
DEEP used to develop the feature space by measuring both phrase
NLP
LEARNING DESCRIPTION REFERENCE and word level matches though it is similar to RvNN as a
SUPPORT
APPLICATIONS whole and RAE plays a major key role in implementing
“CONVOLUTIONAL YOON KIM,
NEURAL NETWORKS FOR ET AL. [36]
unsupervised classification by computing and reconstruction
SENTENCE error instead of generating the supervised score while
CLASSIFICATION USED performing merge operation over two vectors yielding a
SENTIMENTAL
ANALYSIS
TO PERFORM YES compositional vector. This paper also introduces the dynamic
SENTIMENTAL ANALYSIS pooling layer which is used to balance and categorize two
AND GENERAL
sentences of distinct sizes as either a paragraph or any other
CLASSIFICATION ARE
IMPOSED”
such.
“NEURAL MACHINE
TRANSLATION BY DZMITRY 5.2 Visual Data Processing
TRANSLATION JOINTLY LEARNING TO YES BAHDANAU, CNN techniques comprises of issue handling techniques such
ALIGN AND ET AL. [37] as segmentation of images by performing classification leads
TRANSLATION.”
“DYNAMIC POOLING AND RICHARD
to attract most of the data mining and “machine learning”
UNFOLDING RECURSIVE SOCHER, ET researchers groups where the major research is performed on
PARA PHRASE computer vision communities AlexNet [20] comprises of image
AUTO ENCODERS FOR YES AL. [38]
IDENTIFICATION
PARAPHRASE classification results over a very large dataset with the GPU
DETECTION” implementation using augmentation and dropout techniques to
“EXTRACTIVE MIKAEL decrease over fitting problems. VGGNet [43] proposed a 19
SUMMARIZATION USING KÅGEBÄCK,
SUMMARIZATION
CONTINUOUS VECTOR
YES
ET AL. [39]
layer CNN methods with the spatial size as input to reduce the
SPACE MODELS” depth of network is achieved by increasing the achieves with
“QUESTION ANSWERING LI DONG, ET 7.4% top five error rate using simplicity and depth. Microsoft
QUESTION &
OVER FREEBASE WITH AL. [40] deep residual network (ResNet) [44] proposed the process by
MULTI-COLUMN YES including ILSVRC and COCO segmentation and detection
ANSWER
CONVOLUTIONAL
methods with residual connections attained 4% top five error
NEURAL NETWORKS”
rate by using vanishing gradients used to resolve deprivation
issue for generating saturated accuracy in deep networks as
tokenization process (2) reproduction of words into vectors or
ResNeXT [45] proposed the original version called (ResNet)
n-grams and the major issue in this process is to calculate
which significantly utilizes half of layers of ImageNet dataset
word length.
for performing image categorization over a definite period of
time by utilizing supervised image classification techniques
Sentiment Analysis is a branch of NLP to perform text
that exists. Object Detection and Semantic Segmentation in
classification based on the inputs given by writer as the
complex systems with many lower level features for
sentiment analysis are represented with the natural phrases
performing object detection over a Region-based CNN (R-
such as positive or negative by eliminating classifications
CNN) [46] that performs object detection using image
related to subjectivity methods as the “Recursive Neural
classification over a selected region by taking a large dataset
Tensor Network (RNTN)” [41] represents word vectors and
of small objects with labeled data to train a large data sets
parses by constructing a tree of phrases with captured
over CNN networks and ultra-deep networks YOLO (You Only
interactions between various elements in a recursive manner
Look Once) [47] is a online image detection technique that
to attain sentence level classifications called grammar.
implements bounding box detection using 45 frames per
Machine Translation is performed in “deep learning” by
second for comparing the existing real time systems as they
improvising conventional automatic conversion methods that
fully utilizes convolutional networks which shares techniques
are suggested by Cho et al. [13] used RNN based encoding
such as object detecton and achieves used to speed up the
and decoding architectures over a “Neural Machine
process. Single-Shot MultiBox Detector (SSD) [48] uses YOLO
Translation (NMT)” with “RNN Encoder Decoder frameworks”
as its performance is accurate over region-based techniques
used to map input sequences into fixed length vectors.
for generating set of fixed sized bounding boxes with
Bahdanau et al. [37] implemented dynamic-length vector that
corresponding object scores at pixel level. Video Processing is
translates text using translation procedures using binary
considered to be a challenging task because the process
search operation as a predictive translation process which is
includes spatial and temporal data over the CNN model [49]
computationally expensive and inefficient while handling rare
with multi-resolution architectures with local motion information
words. “Google’s Neural Machine Translation (GNMT)” [42]
along with context stream implemented over low-resolution
proposed character level models as it is a deep LSTM network
image modeling techniques. Recurrent Convolution Networks
with eight encoder decoder layers connected with attention
(RCNs) [50] proposed video processing techniques using
based mechanism.
CNNs over video frames for imparting visual feeds their
frames with transitional layers of CNNs with gated iterative unit
Paraphrase identification analyzes two sentences and
based datasets like “YouTube2Text datasets”. Visual Datasets
projecting based on the similarity in their fundamental hidden
fully depends over the improvisation of novel learning
semantics as one of the key feature that is advantageous over
algorithms that make use of powerful hardware systems for
numerous NLP jobs like “plagiarism detection, answers to
processing very large scale datasets to train “deep learning”
questions, context detection, summarization, and domain
algorithms by considering influential datasets.
identification”. Socher et al. [38] implemented the use of
unfolding “Recursive Auto encoders (RAEs)” for measuring the
5887
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616
4 CHALLENGES OF DEEP LEARNING [3] Yilin Yan, Min Chen, Mei-Ling Shyu, and Shu-Ching Chen.
Most of the domains are yet to be researched due to its 2015. Deep learning for imbalanced multimedia data
challenging nature that lacks data which is present in general classification. In The IEEE International Symposium on
public that creates significant opportunities for performing Multimedia. IEEE, 483–488.
future research like the lingering black box perception over [4] Maryam M. Najafabadi, Flavio Villanustre, Taghi M.
DNNs to perform decisions without analyzing the domain Khoshgoftaar, Naeem Seliya, Randall Wald, and
knowledge [51] specially when data is generated without EdinMuharemagic. 2015. Deep learning applications and
physical manifestation by mapping layers of a neural networks challenges in big data analytics.Journal of Big Data2, 1 (2015),
with yeast cell of DNA attained through microscopic 1–21.
nucleotides as the process takes instructions from the DNA to [5] Warren S. McCulloch and Walter Pitts. 1943. A logical calculus
generate proteins due to which DNA is updated. Google Brain of the ideas immanent in nervous activity.Bulletin of
[52] is a unique technique that implements the synthetic brain Mathematical Biophysics5, 4 (1943), 115–133.
of DNN called “inceptionism” where each neuron’s estimate [6] Jürgen Schmidhuber. 2015. Deep learning in neural networks:
values that are grouped with technique called the “deep An overview.Neural Networks61 (2015), 85–117.
dream” used to map network’s generated response. Manning [7] Yann LeCun and Yoshua Bengio. 1995. Convolutional networks
et al. [53] represents similar methods with semantic dataset by for images, speech, and time series. Handbook of Brain Theory
comprises of distinct network paths that are activated by and Neural Networks3361, 10 (1995), 255–257.
various data parts that are largely attributed by various [8] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012.
statisticians and “machine learning” professionals using “deep ImageNet classification with deep convolutional neural networks.
learning” to relate neural networks with physical or biological InAdvances in Neural Information Processing Systems 25,
phenomenon to develop metaphysical relationships with DNN F.Pereira, C.J.C.Burges, L.Bottou, and K. Q. Weinberger (Eds.).
brain for simplifying interfaces with low processing overheads. Curran Associates, 1097–1105.
The major issue in “machine learning” is that training samples [9] Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex
are not sufficiently available with labels [54] as the data in the Acero, and Larry P. Heck. 2013. Learning deep structured
present era is ranging from zetta bytes to peta bytes of data semantic models for web search using clickthrough data. InThe
being generated on hourly basis with a huge exponential 22nd ACM International Conference on Information and
growth due to which the aspect of labeled data is a issue need Knowledge Management. ACM, 2333–2338.
to be resolved by implementing supervised learning using [10] Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, and Paris
sentimental analysis by dividing huge data sets into smaller Smaragdis. 2014. Deep learning for monaural speech
ones. Due to huge increase in size and complexity of data separation. InIEEE International Conference on Acoustics,
unsupervised learning is a predominant solution with the Speech and Signal Processing. IEEE, 1562–1566.
issues such as data scarcity and cleaning of data is another [11] Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Deep
issue as we have to clean the data based on observations Boltzmann machines. InArtificial Intelligence and Statistics.
rather than any approximated values which leads to impart PMLR, 448–455.
“deep learning” methods. Maryam M [55] implemented their [12] Ruslan Salakhutdinov and Geoffrey Hinton. 2012. An efficient
methodology with 80 million low resolution images and learning procedure for deep Boltzmann machines. Neural
executed queries by reducing noisy labels and increasing total Computation24, 8 (2012), 1967–2006.
number of applications with streaming live formats such as [13] Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre,
time series with social networks. Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and
Yoshua Bengio. 2014. Learning phrase representations using
RNN encoder-decoder for statistical machine translation. InThe
5 CONCLUSION
Conference on Empirical Methods in Natural Language
“Deep learning” in the present era is the most renown topic in
Processing. 1724–1734.
“machine learning” defined as a various layers that implement
[14] Xiangang Li and Xihong Wu. 2015. Constructing long short-term
nonlinear processing with the existence of multiple levels of
memory based deep recurrent neural networks for large
data that is discovered with distinct patterns as the data is
vocabulary speech recognition. InIEEE International Conference
represented in the form of raw data. “Machine learning and
on Acoustics, Speech and Signal Processing. IEEE, 4520–4524.
data mining techniques” tend to generate knowledge at a
[15] Christoph Goller and Andreas Kuchler. 1996. Learning task-
higher level of data that represents in the form of streams of
dependent distributed representations by backpropagation
raw data over maximum real world applications. In this paper
through structure. InIEEE International Conference on Neural
we have reviewed and presented optimization techniques with
Networks, Vol. 1. IEEE, 347–352.
popular frameworks in this area which is a major challenge to
[16] Richard Socher, Cliff C. Lin, Chris Manning, and Andrew Y. Ng.
perform, we have take 55 papers or research articles to
2011. Parsing natural scenes and natural language with
illustrate the existing solutions and show insight on challenges
recursive neural networks. InInternational Conference on
by considering maximum issues in the present era.
Machine Learning. Omnipress, 129–136.
[17] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu,
REFERENCES David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua
[1] Li Deng. 2014. A tutorial survey of architectures, algorithms, and Bengio. 2014. Generative adversarial nets. InAdvances in
applications for deep learning.APSIPA Transactions on Signal Neural Information Processing Systems. Curran Associates,
and Information Processing3 (2014), 1–29. 2672–2680.
[2] Yilin Yan, Min Chen, Saad Sadiq, and Mei-Ling Shyu. 2017. [18] Alec Radford, Luke Metz, and Soumith Chintala. 2015.
Efficient imbalanced multimedia concept retrieval by deep Unsupervised representation learning with deep convolutional
learning on spark clusters. International Journal of Multimedia generative adversarial networks.CoRRabs/1511.06434 (2015).
Data Engineering and Management8, 1 (2017), 1–20.
5888
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616
Retrieved from http://arxiv.org/abs/1511.06434. [35] Nicolas Vasilache, Jeff Johnson, Michaël Mathieu, Soumith
[19] Diederik P. Kingma and Max Welling. 2013. Auto-encoding Chintala, Serkan Piantino, and Yann LeCun. 2014. Fast
variational bayes.CoRRabs/1312.6114 (2013). Retrieved from convolutional nets with fbfft: A GPU performance
http://arxiv.org/abs/1312.6114. evaluation.CoRRabs/1412.7580 (2014). Retrieved
[20] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. fromhttp://arxiv.org/abs/1412.7580.
ImageNet classification with deep convolutional neural networks. [36] Yoon Kim. 2014. Convolutional neural networks for sentence
InAdvances in Neural Information Processing Systems classification.CoRRabs/1408.5882 (2014). Retrieved from
25,F.Pereira,C.J.C.Burges,L.Bottou,and K. Q. Weinberger http://arxiv.org/abs/1408.5882.
(Eds.). Curran Associates, 1097–1105. [37] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014.
[21] Zhiwei Zhao and Youzheng Wu. 2016. Attention-based Neural machine translation by jointly learning to align and
convolutional neural networks for sentence classification. InThe translate.CoRRabs/1409.0473 (2014). Retrieved from
17th Annual Conference of the International Speech http://arxiv.org/abs/1409.0473.
Communication Association. ISCA, 705–709. [38] Richard Socher, Eric H. Huang, Jeffrey Pennington, Andrew Y.
[22] David H. Hubel and Torsten N. Wiesel. 1962. Receptive fields, Ng, and Christopher D. Manning. 2011. Dynamic pooling and
binocular interaction and functional architecture in the cat’s unfolding recursive auto encoders for paraphrase detection. In
visual cortex.Journal of Physiology160, 1 (1962), 106–154. Advances in Neural Information Processing Systems, Vol. 24.
[23] Ruslan Salakhutdinov and Geoffrey Hinton. 2012. An efficient Neural Information Processing Systems Foundation, 801–809.
learning procedure for deep Boltzmann machines. Neural [39] Mikael Kågebäck, Olof Mogren, Nina Tahmasebi, and Devdatt
Computation24, 8 (2012), 1967–2006. Dubhashi. 2014. Extractive summarization using continuous
[24] Dominik Scherer, Andreas Müller, and Sven Behnke. 2010. vector space models. In2nd Workshop on Continuous Vector
Evaluation of pooling operations in convolutional architectures Space Models and their Compositionality. Citeseer, Association
for object recognition.International Conference on Artificial for Computational Linguistics, 31–39.
Neural Networks6354 (2010), 92–101. [40] Li Dong, Furu Wei, Ming Zhou, and Ke Xu. 2015. Question
[25] Quoc V. Le. 2013. Building high-level features using large scale answering over freebase with multi-column convolutional neural
unsupervised learning. InIEEE International Conference on networks. In53rd Annual Meeting of the Association for
Acoustics, Speech and Signal Processing. IEEE, 8595–8598. Computational Linguistics, Vol. 1. Association for Computational
[26] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Linguistics, 260–269.
learning.Nature521, 7553 (2015), 436–444. [41] Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang,
[27] Samira Pouyanfar and Shu-Ching Chen. 2017. T-LRA: Trend- Christopher D. Manning, Andrew Y. Ng, and Christopher Potts.
based learning rate annealing for deep neural networks. In The 2013. Recursive deep models for semantic compositionality
3rd IEEE International Conference on Multimedia Big Data. over a sentiment treebank. InConference on Empirical Methods
IEEE, 50–57. in Natural Language Processing. Citeseer, Association for
[28] Martín Abadi, Ashish Agarwal, Paul Barham, Martin Wattenberg, Computational Linguistics, 1631–1642.
Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. [42] Yonghui Wu, Mike Schuster, Zhifeng Chen, Jason Riesa, Alex
Tensorflow: Large-scale machine learning on heterogeneous Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and
distributed systems.CoRRabs/1603.04467 (2016). Retrieved Jeffrey Dean. 2016. Google’s neural machine translation
fromhttp://arxiv.org/abs/1603.04467. system: Bridging the gap between human and machine
[29] Rami Al-Rfou, Guillaume Alain, Ying Zhang. 2016. Theano: A translation. CoRRabs/1609.08144 (2016). arxiv:1609.08144.
Python framework for fast computation of mathematical Retrieved from http://arxiv.org/abs/1609.08144.
expressions.CoRRabs/1605.02688(2016). Retrieved from [43] Karen Simonyan and Andrew Zisserman. 2014. Very deep
http://arxiv.org/abs/1605.02688. convolutional networks for large-scale image recognition.
[30] Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie CoRRabs/1409.1556 (2014). Retrieved
Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng fromhttp://arxiv.org/abs/1409.1556.
Zhang. 2015. MXNet: A flexible and efficient machine learning [44] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
library for heterogeneous distributed systems. 2016. Deep residual learning for image recognition. In IEEE
CoRRabs/1512.01274 (2015). Retrieved from Conference on Computer Vision and Pattern Recognition. IEEE
http://arxiv.org/abs/1512.01274. Computer Society, 770–778.
[31] Ronan Collobert, Samy Bengio, and Johnny Mariéthoz. [45] Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and
2002.Torch: A Modular Machine Learning Software Library. Kaiming He. 2016. Aggregated residual transformations for
Idiap-RR Idiap-RR-46-2002. Idiap. deep neural networks.CoRRabs/1611.05431 (2016). Retrieved
[32] Intel Nervana Systems. 2017. Neon deep learning framework. fromhttp://arxiv.org/abs/1611.05431.
Retrieved from https://www.nervanasys.com/ technology/neon. [46] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik.
Accessed April 4, 2017. 2014. Rich feature hierarchies for accurate object detection and
[33] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, semantic segmentation. InIEEE Conference on Computer
Jonathan Long, Ross B. Girshick, Sergio Guadarrama, and Vision and Pattern Recognition. IEEE, 580–587.
Trevor Darrell. 2014. Caffe: Convolutional architecture for fast [47] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali
feature embedding. InACM International Conference on Farhadi. 2016. You only look once: Unified, real-time object
Multimedia. ACM, 675–678. detection. In IEEE Conference on Computer Vision and Pattern
[34] Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Recognition. IEEE Computer Society, 779–788.
Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. [48] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy,
2014. cuDNN: Efficient primitives for deep Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016.
learning.CoRRabs/1410.0759 (2014). Retrieved fromhttp:// SSD: Single shot multibox detector. InEuropean Conference on
arxiv.org/abs/1410.0759. Computer Vision. Springer, 21–37.
5889
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616
5890
IJSTR©2020
www.ijstr.org