0% found this document useful (0 votes)
11 views8 pages

A Research Survey Report on Deep Learning Concepts

Uploaded by

VENKATESHWARLU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views8 pages

A Research Survey Report on Deep Learning Concepts

Uploaded by

VENKATESHWARLU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616

A Research Survey Report On Deep Learning


Concepts
P Venkateshwarlu, Dr. B.Manjula

Abstract: The area of “machine learning” is into its golden era because of its features and aspects due to which “deep learning” has became the ruler in
the domain as it uses numerous layers where each layer represents data abstractions required to construct computational models by enabling “dee p
learning” algorithms that are “generative adversarial networks, conventional neural networks, and model transfers” that entir ely customizes our insight of
how the data is processed and information is attained as the opening of consideration behind it is extreme because the domain is never previously
symbolize multi scope perspective due to be deficient in core understanding that leads to controlling methods such as black box machines that restrain
development of basic levels as “deep learning” which is repeatedly perceived as tentative blocks in machine learning. In this paper we present a
thorough assessment of past and present state of art that does not retreat in visual or audio and text processing aspects.

Index Terms: Parallel algorithms, Distributed algorithms;, Deep Learning, machine learning theory, Neural networks, Deep Learning Network,
Conventional Neural Networks
——————————  ——————————

1. INTRODUCTION extract data from distinct scenes in distinct ways by getting


In the present era “machine learning is becoming popular in them from end to end furthers that classify data objects which
the research that is incorporated in a maximum number of are main sources of information to ““deep learning”” which
applications that includes multimedia concepts that extensively further represents the IQ level of human brain. Due to this in
use machine-learning algorithms called as deep learning almost all areas “deep learning” is considered to be the
known as representation learning [1] based applications. preferred research area by most of the present day
There is a huge growth in availability of data that depicts researchers and in this paper we provide a survey report on
remarkable advancement in hardware technologies that leads the “machine learning” aspects over “deep learning”.
to newer studies imparted to distributed and deep learning
aspects. The inception of deep learning is through the “Developing a machine that can replicate human brains is a
conventional neural networks that tends to outperforms its mere dream since ages for many centuries as the deep
predecessors by utilizing graph technologies imparted on learning has been initiated in early 300 B.C. when Aristotle
neurons to create layered learning models that promises proposed associationism which started the history of humans’
results over applications related to Natural Language ambition in trying to understand the brain, since such an idea
Processing (NLP), visual data processing, speech and audio requires the scientists to understand the mechanism of human
processing, and many other well-known applications” [2, 3]. recognition systems. The modern history of deep learning
When we specify about the comettence of “machine learning started in 1943 when the McCulloch-Pitts (MCP) model was
algorithms” which depends on the depiction of the input data introduced and became known as the prototype of artificial
as the bad data represents lower performance when neural models [5]. They created a computer model based on
compared with good data representation. Hence this leads the the neural networks functionally mimicking neo cortex in
feature engineering or the research trends in “machine human brains [6]. The combination of the algorithms and
learning” for an comprehensive time by imparting raw data mathematics called threshold logic was used in their model to
over many research domains with significant human effort. mimic the human thought process but not to learn. Since then
“Deep learning” algorithms are used to implement feature deep learning has evolved steadily with a few significant
extraction in a automated way with a limited field knowledge milestones in its development”.
along with the human struggle [4] as these algorithms
comprises of layered architecture of representing data with 1 DEEP LEARNING NETWORKS
many features such as data extraction be capable of Many popular “deep learning” networks exists as
performing over preceding layers that exists in the networks represented in below table, Table 1,that comprises of key
even when the features are attained from the bottom layers points and networks information, due to immense research in
are inspired as “Artificial Intelligence (AI)” over a human brain this area many new networks and their architectures appear
impact.In a general scenario a human brains will be repeatedly on a weekly basis.

———————————————— 2.1 Recursive Neural Network (RvNN) [15]


 P. Venkateshwarlu, Research Scholar, Dept. of CS, Kakatiya
University, Warangal, Telangana, India.
Email: [email protected]
 Dr. B. Manjula, Assistant Professor, Dept. of CS, Kakatiya University,
Warangal, Telangana, India. Email: [email protected]

5883
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616

classification of natural images to generate natural language


TABLE 1
sentences by separating images into various segments with
“DEEP LEARNING NETWORK REPRESENTATION”
distinct interest where a sentence is partitioned into well
DEEP LEARNING
NETWORK
DESCRIPTION REFERENCE formed words by calculating score of a possible pairs to merge
CONVENTIONAL “CONVOLUTIONAL YANN LECUN, ET AL. [7], and generated a syntactic tree as the pair with the maximum
NEURAL NETWORKS FOR IMAGES, ALEX KRIZHEVSKY, ET score is merged into a compositional vectors that represents a
NETWORK (CNN) SPEECH, AND TIME AL.[8] region and provides a unique class label for every the new
SERIES” region till we represent for the entire region as shown in figure
DEEP BELIEF “DEEP NEURAL PO-SEN HUANG, ET AL 1
NETWORKS NETWORKS FOR [9, 10]
(DBN) ACOUSTIC MODELING IN
MONAURAL SPEECH
RECOGNITION WHERE
THE SHARED VIEWS OF
RESEARCH GROUPS”
DEEP BOLTZMANN “AN EFFICIENT LEARNING RUSLAN
MACHINE (DBM) PROCEDURE FOR DEEP SALAKHUTDINOV, ET AL.
BOLTZMANN MACHINES [11, 12]
AND STATISTICS”
RECURRENT “CONSTRUCTING LONG KYUNGHYUN CHO, ET
NEURAL SHORT-TERM MEMORY AL. [13],
NETWORKS BASED DEEP RECURRENT XIANGANG LI, ET AL.
(RNN) NEURAL NETWORKS FOR [14]
LARGE VOCABULARY
SPEECH RECOGNITION
USING RNN ENCODER
DECODER FOR
STATISTICAL MACHINE
TRANSLATION” 2.2 Recurrent Neural Network (RNN)
RECURSIVE “LEARNING TASK CHRISTOPH GOLLER, ET One of the most popular algorithms in “deep learning” is NLP
NEURAL DEPENDENT DISTRIBUTED AL [15], and speech processing called as RNN [13, 14] that utilizes the
NETWORKS REPRESENTATIONS BY RICHARD SOCHER, ET
ordered information in a network due to which many
(RVNN) BACK PROPAGATION AL [16]
THROUGH STRUCTURE
applications used embedded structure in the probable data
FOR PARSING NATURAL series to generate desired knowledge with the use of time
SCENES AND NATURAL bound memory units which tend to include x the input layer
LANGUAGE WITH with s the hidden layer and y the output layer for obtaining an
RECURSIVE NEURAL input sequence. In [14] three deep RNN approaches that
NETWORKS”
include deep “Input-to-Hidden,” “Hidden-to-Output,” and
GENERATIVE “GENERATIVE IAN GOODFELLOW, ET
ADVERSARIAL ADVERSARIAL NETS WITH AL [17],
“Hidden-to-Hidden” will lead to propose advantages of deeper
NETWORKS UNSUPERVISED ALEC RADFORD, ET AL RNN that reduces difficult learning process in deep networks
(GAN) REPRESENTATION [18] by exploding gradients [16] which may decay due to
LEARNING WITH DEEP multiplication operations are performed over tiny or maximum
CONVOLUTIONAL derivatives in the training process for reducing the sensitivity
GENERATIVE over time when compared with the preliminary inputs called as
ADVERSARIAL
NETWORKS”
“Long Short-Term Memory (LSTM)” [14] is used to resolve
VARIATIONAL “AUTO-ENCODING DIEDERIK P. KINGMA, ET issues by generating memory blocks using recurrent
AUTO ENCODER VARIATIONAL BAYES WITH AL [19] connections by using memory cells to store network temporal
(VAE) UNSUPERVISED states into very deep networks.
LEARNING PROBABILISTIC
GRAPHICAL MODEL” 2.3 Convolutional Neural Network (CNN)
CNN is one of the most widely used algorithm in “deep
The RvNN [15] will perform hierarchical predictions in a learning [7] that uses applications like NLP [21] for speech
constructive manner for performing classification to attain the processing [26] and computer vision [20]” as they are inspired
outputs using compositional vectors as it is mostly inspired by by neurons for simulating the visual cortex in a lion’s brain with
“Recursive Auto associative memory (RAAM)” [15] the complex sequences in cells [22] where the major advantage is
architecture that creates process objects in a structured “parameter sharing for sparse interactions and equivalent
subjective shape similar to trees or graphs where the process representations of cells” with multi dimensional data obtained
will utilize iterative nature of data structures with distinct through a fully connected network. Sampling of layers is
volume and generates constant width that is distributes the performed by pooling various layers that are connected where
process using “Back propagation Through Structure (BTS) the input x is denoted into three dimensions: “m × m × r”
learning scheme” used to train the network [15] using standard where m is denoted for height and width and r represents
back propagation algorithm by supporting tree-like structure by depth. Kernel is represented by K comprises of various filters:
imposing auto association training set to re generate the n × n × q, where q denotes size and n represents a
intended pattern over the input layer with the output layer compressed image with weight W and bias b for generating
desired. Socher et al [16] has developed a RvNN architecture feature map (hk) over a convolational layer is:
that is capable of handling inputs with distinct modalities by
implementing results over two examples using RvNN for
5884
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616

hk = f (Wk ∗ x + bk ). (1) complexity is a major concern over various network topologies


2.4 Deep Generative Networks in “deep learning” with time static or time invariant [26] which is
“Deep generative networks such as DBN, Deep Boltzmann the mainstream research area with the advancement of online
Machine (DBM), Generative Adversarial Network (GAN), and “deep learning” as the conventionally DNNs are constructed
Variational Auto encoder (VAE) are hybrid probabilistic over “Stochastic Gradient Descent (SGD)” that update
generative model using RBM over undirected connections are individually all the parameters with unique labels for
formed by the top two layers where the lower layers are the processing streams in a sequential manner or as a batch
directed connections that receives inputs from lowest layer processing over “Identically Distributed (IID)” with which
which is the visible layer that represents input states as a data computing resources and execution time are decreased
vector in an unsupervised approach and in others”. Greedy drastically with high velocity over varying distributions with a
algorithm is added to improve the generative model to form a certain degree of association with linear learning pace on
“DBN that allows a sub-network to consecutively obtain distinct every input sample.
depictions of data as the initial weights W0 are mapped with
the transposed weighted matrix WT0 to generate the 3.3 Optimized deep learning
maximum level “data” for the proceeding layer to log prospect Optimization process over DNN is performed by finding the
of every input data vector with less than approximate parameters in a network to reduce loss function as most of the
distribution while appending a new layer into the DBN is
improvised for every novel RBM block in the right process”. TABLE 2
DBM [24, 25] has the capability to study complex inner “DEEP LEARNING FRAMEWORK REPRESENTATION”
CNN &
depicted as a stronger “deep learning” model for performing DEEP
RNN
DBN
REFERENC
object and speech recognition tasks as the approximate LEARNING DESCRIPTION SUPP
SUPPOR E
reasoning procedure allows handling of duplicate DBM inputs, FRAMEWORK ORT
T
due to this property a DBM is different from DBN as the “LARGE-SCALE
complete process is based on directed belief networks rather MACHINE LEARNING
ON
than undirected called as RBMs. GAN [17, 18] comprises of
HETEROGENEOUS
productive model G and discriminative model D as G captures DISTRIBUTED MARTÍN
distribution pg over data at real time t by modeling data m TENSORFLO
SYSTEMS WHERE THE YES YES ABADI, ET
W
rather than pg over every iteration for performing back CORE LANGUAGE AL. [28]
propagation to generate maximum sensible data to deceive USED IS C++ AND
INTERFACES
and randomize the discriminator for identifying deceived data
SUPPORTED ARE
produced by G with a value function V. VAE [19] employees PYTHON & MATLAB”
the log likelihood of the data that influences the approach for “A PYTHON
obtaining minor bound estimator in a directed graphical model FRAMEWORK FOR
RAMI AL-
with implementation of continuous latent variables with THEANO
FAST COMPUTATION
YES YES RFOU, ET
generative parameters θ for implementing “Auto Encoding OF MATHEMATICAL
AL. [29]
EXPRESSIONS WITH
Variational Bayes (AEVB) algorithm” for optimizing parameters
BSD LICENSE”
ϕ and θ over the probabilities encoder qϕ (z|x) in a possible “A FLEXIBLE AND TIANQI
neural network with approximation generative model pθ (x,z), EFFICIENT MACHINE CHEN, ET
where z denotes latent variable with normalization N(0) LEARNING LIBRARY AL. [30]
FOR
HETEROGENEOUS
2 DEEP LEARNING TECHNIQUES AND MXNET DISTRIBUTED YES YES
FRAMEWORKS SYSTEMS WHERE THE
“Deep learning algorithms” allows to improve learning process CORE LANGUAGES
USED ARE C++,
to simplify calculation process with a longer training time due PYTHON, R, SCALA,
to which model remains as a key issue with researchers as the PERL”
classification accuracy of training data and parameters “A MODULAR RONAN
represent techniques and frameworks: MACHINE LEARNING COLLOBER
TORCH SOFTWARE LIBRARY YES YES T, ET AL.
BEING IMPLEMENTED [31]
3.1 Unsupervised Transfer Learning IN C++ AND PYTHON”
In recent era generative models such as GANs and VAEs are “NEON DEEP INTEL
the predominant techniques over unsupervised “deep LEARNING NERVANA
learning” as GANs are trained to reuse the fixed feature NEON FRAMEWORK BEING YES YES SYSTEMS
extractor using supervised tasks as the networks are based on IMPLEMENTED IN [32]
CNNs for representing their incomparability as unsupervised PYTHON”
“CONVOLUTIONAL YANG QING
learning while performing visual data analysis using sparse
ARCHITECTURE FOR JIA, ET AL.
auto encoder over very large scale image dataset [25]. The FAST FEATURE [33]
data that is generated as unlabeled over network to extract CAFFE EMBEDDING BEING YES NO
data that can be further used for face detection by which we IMPLEMENTED IN
can detect high level objects to generate stochastic network PYTHON AND

based on transition operators called as transfer learning. MATLAB”

3.2 Online Learning fundamental algorithms are imparted over “deep learning” as
In the present day era stream of data with a huge time the process adjust various parameters in a iterative manner
5885
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616

over training sample as the computational complexity of SGD a high level interface that is used to develop architecture
is tiny than the original gradient as the whole dataset is without considering internal design as the framework is
considered when ever parameters are updated. In the learning implemented using parallel and distributed operations with
progression the updating velocity is prohibited by the hyper fatal tolerance due to which most of the developers adopted
parameter learning rate as the lower learning rates will TensorFlow popular deep learning” framework. Theano [29]
eventually lead to a state which is optimal though the data is and Neon [32] are the frameworks that are developed in
fluctuated due to decay or loss [27] due to which the idea of Python to perform code optimization in the developed system
momentum is introduced to determine the proper learning rate with detailed utilization of kernel level due to which the training
where the weight decay implements as penalty coefficient in speeds typically outperform when compared with existing
cost function to reduce over fitting and improvisation of frameworks as Python extensively support parallelism and
performance. The learning rate is further amplified whenever multi GPU environment but the major disadvantage in this
the parameters are updated by recording the generated framework is the multi node calculation is not designed in
gradient squares which are always positive. these framework. MXNet “supports several interfaces,
including C++, Python, R, Scala, Perl, MATLAB, Javascript,
E[ = E[ (2) Go, and Julia [30] as it supports both computation graph
declarations and imperative computation while performing
where E[g2]t is the accumulated squared gradient at stage t architecture design as MXNet extensively supports data and
and g2t is the squared gradient at stage t which is improved model parallelism with distinct parameter over various server
further by adding decay fraction β1 to record the accumulation schemes to support distributed calculations with most
using Adams[27] l-2 norm is reinstated to make the algorithm comprehensive functionality. But the major disadvantage in
stable. this frame work is performance is not optimized as that of
other existing frameworks”. Torch [31] has its “deep learning”
3.4 Distributed System based Deep learning features that are merged with “Facebook’s deep learning
Competence of training model is accelerated in distributed CUDA library (fbcunn)” [35] as Torch can operate over model
“deep learning” techniques over training process using data and data level computation over parallel systems due to which
parallelism and model parallelism replicated over it is built on a dynamic graph denote instead of a static graph
computational nodes where model is trained within assigned as to be a dynamic graph that allows us to update the
subset of data following certain period of time to synchronize computational graph at runtime by defining functions to
the nodesWhere as in model parallelism data is practiced with generate advanced graphs. Due to all of these advantages
a model in which each node is accountable for executing Torch is considered to be the most utilized framework. Caffe
inference of parameters in the model. “Let Wt,i represents a framework is implemented using the Berkeley Vision and
parameter in neural network node i at a specific time t with Learning Center due to which it is considered to be most
slave nodes N used for training with master node”: extensively used framework [33] as the most extensively used
layers for CNN and RNN and the disadvantage of this
(3) framework is it doesn’t utilize DBN framework. The main
advantage of Caffe is the structure of computation graphs that
Scalability of model parallelism is inferior as the framework are based on convolutional layers as pre-trained models in
takes the embedding represents each operations over different neural networks. Another limitation of Caffe framework is it is
devices when compared with human experts. single-machine framework as it cannot support execution in
multimode but the exception is while executing multi-GPU
4 FRAMEWORKS OF DEEP LEARNING calculations.
Table 2 represents “the list of popular deep learning
frameworks for implementing architecture designs where the 3 DEEP LEARNING APPLICATIONS
table represents CNN & RNN and DBN frameworks supported “Deep learning” applications are implemented using NLP
are listed”: By observing the Table 2 is “usually implemented Natural Language processing where data is processed using
using C++ for implementing deep learning frameworks that visual tools and speech is used for performing audio
accelerate the training speed as it uses GPU which is processing and many other application make use of social
significantly improvise speed up process of matrix evaluation network to analyze social impact and health analysis where
using the interface presented by CuDNN [34] as python has each application uses its own tools and methodology.
emerging to be a preferable language for implementing deep
learning architecture as python is more efficient programming
language and simple to implement process due to which the 5.1 Natural Language Processing (NLP)
distributed calculation become more easy in some of the latest NLP is a collection of techniques and algorithms that are used
frameworks like TensorFlow and MXNet tend to improvise the to train the computer machines for performing various tasks by
processing speed and efficiency while performing deep using human language as input where the process includes
learning”. TensorFlow contains the support to be provided to various phases such as “document classification, translation,
adapt “deep learning Application-Specific Integrated Circuit paraphrase identification, text similarity, summarization, and
(ASIC) called Tensor Processing Unit (TPU) to help increase question answering” as shown in Table 3.
the efficiency and decrease the power consumption”.
TensorFlow is instigated as customized “deep learning” NLP process is considerably complex and with ambiguous
process that provides sequence of internal functions to structure and highly context specific where a change in single
implement any deep neural network oriented static processing word will lead to change in the whole context. Where the NLP
graph attainment [28] “Keras started to support Tensorflow via follows steps involved 1) division of input text into words using
5886
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616

TABLE 3
“DEEP LEARNING APPLICATIONS REPRESENTATION” comparison of two sentences. By using syntactic trees that are
DEEP used to develop the feature space by measuring both phrase
NLP
LEARNING DESCRIPTION REFERENCE and word level matches though it is similar to RvNN as a
SUPPORT
APPLICATIONS whole and RAE plays a major key role in implementing
“CONVOLUTIONAL YOON KIM,
NEURAL NETWORKS FOR ET AL. [36]
unsupervised classification by computing and reconstruction
SENTENCE error instead of generating the supervised score while
CLASSIFICATION USED performing merge operation over two vectors yielding a
SENTIMENTAL
ANALYSIS
TO PERFORM YES compositional vector. This paper also introduces the dynamic
SENTIMENTAL ANALYSIS pooling layer which is used to balance and categorize two
AND GENERAL
sentences of distinct sizes as either a paragraph or any other
CLASSIFICATION ARE
IMPOSED”
such.
“NEURAL MACHINE
TRANSLATION BY DZMITRY 5.2 Visual Data Processing
TRANSLATION JOINTLY LEARNING TO YES BAHDANAU, CNN techniques comprises of issue handling techniques such
ALIGN AND ET AL. [37] as segmentation of images by performing classification leads
TRANSLATION.”
“DYNAMIC POOLING AND RICHARD
to attract most of the data mining and “machine learning”
UNFOLDING RECURSIVE SOCHER, ET researchers groups where the major research is performed on
PARA PHRASE computer vision communities AlexNet [20] comprises of image
AUTO ENCODERS FOR YES AL. [38]
IDENTIFICATION
PARAPHRASE classification results over a very large dataset with the GPU
DETECTION” implementation using augmentation and dropout techniques to
“EXTRACTIVE MIKAEL decrease over fitting problems. VGGNet [43] proposed a 19
SUMMARIZATION USING KÅGEBÄCK,
SUMMARIZATION
CONTINUOUS VECTOR
YES
ET AL. [39]
layer CNN methods with the spatial size as input to reduce the
SPACE MODELS” depth of network is achieved by increasing the achieves with
“QUESTION ANSWERING LI DONG, ET 7.4% top five error rate using simplicity and depth. Microsoft
QUESTION &
OVER FREEBASE WITH AL. [40] deep residual network (ResNet) [44] proposed the process by
MULTI-COLUMN YES including ILSVRC and COCO segmentation and detection
ANSWER
CONVOLUTIONAL
methods with residual connections attained 4% top five error
NEURAL NETWORKS”
rate by using vanishing gradients used to resolve deprivation
issue for generating saturated accuracy in deep networks as
tokenization process (2) reproduction of words into vectors or
ResNeXT [45] proposed the original version called (ResNet)
n-grams and the major issue in this process is to calculate
which significantly utilizes half of layers of ImageNet dataset
word length.
for performing image categorization over a definite period of
time by utilizing supervised image classification techniques
Sentiment Analysis is a branch of NLP to perform text
that exists. Object Detection and Semantic Segmentation in
classification based on the inputs given by writer as the
complex systems with many lower level features for
sentiment analysis are represented with the natural phrases
performing object detection over a Region-based CNN (R-
such as positive or negative by eliminating classifications
CNN) [46] that performs object detection using image
related to subjectivity methods as the “Recursive Neural
classification over a selected region by taking a large dataset
Tensor Network (RNTN)” [41] represents word vectors and
of small objects with labeled data to train a large data sets
parses by constructing a tree of phrases with captured
over CNN networks and ultra-deep networks YOLO (You Only
interactions between various elements in a recursive manner
Look Once) [47] is a online image detection technique that
to attain sentence level classifications called grammar.
implements bounding box detection using 45 frames per
Machine Translation is performed in “deep learning” by
second for comparing the existing real time systems as they
improvising conventional automatic conversion methods that
fully utilizes convolutional networks which shares techniques
are suggested by Cho et al. [13] used RNN based encoding
such as object detecton and achieves used to speed up the
and decoding architectures over a “Neural Machine
process. Single-Shot MultiBox Detector (SSD) [48] uses YOLO
Translation (NMT)” with “RNN Encoder Decoder frameworks”
as its performance is accurate over region-based techniques
used to map input sequences into fixed length vectors.
for generating set of fixed sized bounding boxes with
Bahdanau et al. [37] implemented dynamic-length vector that
corresponding object scores at pixel level. Video Processing is
translates text using translation procedures using binary
considered to be a challenging task because the process
search operation as a predictive translation process which is
includes spatial and temporal data over the CNN model [49]
computationally expensive and inefficient while handling rare
with multi-resolution architectures with local motion information
words. “Google’s Neural Machine Translation (GNMT)” [42]
along with context stream implemented over low-resolution
proposed character level models as it is a deep LSTM network
image modeling techniques. Recurrent Convolution Networks
with eight encoder decoder layers connected with attention
(RCNs) [50] proposed video processing techniques using
based mechanism.
CNNs over video frames for imparting visual feeds their
frames with transitional layers of CNNs with gated iterative unit
Paraphrase identification analyzes two sentences and
based datasets like “YouTube2Text datasets”. Visual Datasets
projecting based on the similarity in their fundamental hidden
fully depends over the improvisation of novel learning
semantics as one of the key feature that is advantageous over
algorithms that make use of powerful hardware systems for
numerous NLP jobs like “plagiarism detection, answers to
processing very large scale datasets to train “deep learning”
questions, context detection, summarization, and domain
algorithms by considering influential datasets.
identification”. Socher et al. [38] implemented the use of
unfolding “Recursive Auto encoders (RAEs)” for measuring the
5887
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616

4 CHALLENGES OF DEEP LEARNING [3] Yilin Yan, Min Chen, Mei-Ling Shyu, and Shu-Ching Chen.
Most of the domains are yet to be researched due to its 2015. Deep learning for imbalanced multimedia data
challenging nature that lacks data which is present in general classification. In The IEEE International Symposium on
public that creates significant opportunities for performing Multimedia. IEEE, 483–488.
future research like the lingering black box perception over [4] Maryam M. Najafabadi, Flavio Villanustre, Taghi M.
DNNs to perform decisions without analyzing the domain Khoshgoftaar, Naeem Seliya, Randall Wald, and
knowledge [51] specially when data is generated without EdinMuharemagic. 2015. Deep learning applications and
physical manifestation by mapping layers of a neural networks challenges in big data analytics.Journal of Big Data2, 1 (2015),
with yeast cell of DNA attained through microscopic 1–21.
nucleotides as the process takes instructions from the DNA to [5] Warren S. McCulloch and Walter Pitts. 1943. A logical calculus
generate proteins due to which DNA is updated. Google Brain of the ideas immanent in nervous activity.Bulletin of
[52] is a unique technique that implements the synthetic brain Mathematical Biophysics5, 4 (1943), 115–133.
of DNN called “inceptionism” where each neuron’s estimate [6] Jürgen Schmidhuber. 2015. Deep learning in neural networks:
values that are grouped with technique called the “deep An overview.Neural Networks61 (2015), 85–117.
dream” used to map network’s generated response. Manning [7] Yann LeCun and Yoshua Bengio. 1995. Convolutional networks
et al. [53] represents similar methods with semantic dataset by for images, speech, and time series. Handbook of Brain Theory
comprises of distinct network paths that are activated by and Neural Networks3361, 10 (1995), 255–257.
various data parts that are largely attributed by various [8] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012.
statisticians and “machine learning” professionals using “deep ImageNet classification with deep convolutional neural networks.
learning” to relate neural networks with physical or biological InAdvances in Neural Information Processing Systems 25,
phenomenon to develop metaphysical relationships with DNN F.Pereira, C.J.C.Burges, L.Bottou, and K. Q. Weinberger (Eds.).
brain for simplifying interfaces with low processing overheads. Curran Associates, 1097–1105.
The major issue in “machine learning” is that training samples [9] Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex
are not sufficiently available with labels [54] as the data in the Acero, and Larry P. Heck. 2013. Learning deep structured
present era is ranging from zetta bytes to peta bytes of data semantic models for web search using clickthrough data. InThe
being generated on hourly basis with a huge exponential 22nd ACM International Conference on Information and
growth due to which the aspect of labeled data is a issue need Knowledge Management. ACM, 2333–2338.
to be resolved by implementing supervised learning using [10] Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, and Paris
sentimental analysis by dividing huge data sets into smaller Smaragdis. 2014. Deep learning for monaural speech
ones. Due to huge increase in size and complexity of data separation. InIEEE International Conference on Acoustics,
unsupervised learning is a predominant solution with the Speech and Signal Processing. IEEE, 1562–1566.
issues such as data scarcity and cleaning of data is another [11] Ruslan Salakhutdinov and Geoffrey Hinton. 2009. Deep
issue as we have to clean the data based on observations Boltzmann machines. InArtificial Intelligence and Statistics.
rather than any approximated values which leads to impart PMLR, 448–455.
“deep learning” methods. Maryam M [55] implemented their [12] Ruslan Salakhutdinov and Geoffrey Hinton. 2012. An efficient
methodology with 80 million low resolution images and learning procedure for deep Boltzmann machines. Neural
executed queries by reducing noisy labels and increasing total Computation24, 8 (2012), 1967–2006.
number of applications with streaming live formats such as [13] Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre,
time series with social networks. Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and
Yoshua Bengio. 2014. Learning phrase representations using
RNN encoder-decoder for statistical machine translation. InThe
5 CONCLUSION
Conference on Empirical Methods in Natural Language
“Deep learning” in the present era is the most renown topic in
Processing. 1724–1734.
“machine learning” defined as a various layers that implement
[14] Xiangang Li and Xihong Wu. 2015. Constructing long short-term
nonlinear processing with the existence of multiple levels of
memory based deep recurrent neural networks for large
data that is discovered with distinct patterns as the data is
vocabulary speech recognition. InIEEE International Conference
represented in the form of raw data. “Machine learning and
on Acoustics, Speech and Signal Processing. IEEE, 4520–4524.
data mining techniques” tend to generate knowledge at a
[15] Christoph Goller and Andreas Kuchler. 1996. Learning task-
higher level of data that represents in the form of streams of
dependent distributed representations by backpropagation
raw data over maximum real world applications. In this paper
through structure. InIEEE International Conference on Neural
we have reviewed and presented optimization techniques with
Networks, Vol. 1. IEEE, 347–352.
popular frameworks in this area which is a major challenge to
[16] Richard Socher, Cliff C. Lin, Chris Manning, and Andrew Y. Ng.
perform, we have take 55 papers or research articles to
2011. Parsing natural scenes and natural language with
illustrate the existing solutions and show insight on challenges
recursive neural networks. InInternational Conference on
by considering maximum issues in the present era.
Machine Learning. Omnipress, 129–136.
[17] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu,
REFERENCES David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua
[1] Li Deng. 2014. A tutorial survey of architectures, algorithms, and Bengio. 2014. Generative adversarial nets. InAdvances in
applications for deep learning.APSIPA Transactions on Signal Neural Information Processing Systems. Curran Associates,
and Information Processing3 (2014), 1–29. 2672–2680.
[2] Yilin Yan, Min Chen, Saad Sadiq, and Mei-Ling Shyu. 2017. [18] Alec Radford, Luke Metz, and Soumith Chintala. 2015.
Efficient imbalanced multimedia concept retrieval by deep Unsupervised representation learning with deep convolutional
learning on spark clusters. International Journal of Multimedia generative adversarial networks.CoRRabs/1511.06434 (2015).
Data Engineering and Management8, 1 (2017), 1–20.
5888
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616

Retrieved from http://arxiv.org/abs/1511.06434. [35] Nicolas Vasilache, Jeff Johnson, Michaël Mathieu, Soumith
[19] Diederik P. Kingma and Max Welling. 2013. Auto-encoding Chintala, Serkan Piantino, and Yann LeCun. 2014. Fast
variational bayes.CoRRabs/1312.6114 (2013). Retrieved from convolutional nets with fbfft: A GPU performance
http://arxiv.org/abs/1312.6114. evaluation.CoRRabs/1412.7580 (2014). Retrieved
[20] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. fromhttp://arxiv.org/abs/1412.7580.
ImageNet classification with deep convolutional neural networks. [36] Yoon Kim. 2014. Convolutional neural networks for sentence
InAdvances in Neural Information Processing Systems classification.CoRRabs/1408.5882 (2014). Retrieved from
25,F.Pereira,C.J.C.Burges,L.Bottou,and K. Q. Weinberger http://arxiv.org/abs/1408.5882.
(Eds.). Curran Associates, 1097–1105. [37] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014.
[21] Zhiwei Zhao and Youzheng Wu. 2016. Attention-based Neural machine translation by jointly learning to align and
convolutional neural networks for sentence classification. InThe translate.CoRRabs/1409.0473 (2014). Retrieved from
17th Annual Conference of the International Speech http://arxiv.org/abs/1409.0473.
Communication Association. ISCA, 705–709. [38] Richard Socher, Eric H. Huang, Jeffrey Pennington, Andrew Y.
[22] David H. Hubel and Torsten N. Wiesel. 1962. Receptive fields, Ng, and Christopher D. Manning. 2011. Dynamic pooling and
binocular interaction and functional architecture in the cat’s unfolding recursive auto encoders for paraphrase detection. In
visual cortex.Journal of Physiology160, 1 (1962), 106–154. Advances in Neural Information Processing Systems, Vol. 24.
[23] Ruslan Salakhutdinov and Geoffrey Hinton. 2012. An efficient Neural Information Processing Systems Foundation, 801–809.
learning procedure for deep Boltzmann machines. Neural [39] Mikael Kågebäck, Olof Mogren, Nina Tahmasebi, and Devdatt
Computation24, 8 (2012), 1967–2006. Dubhashi. 2014. Extractive summarization using continuous
[24] Dominik Scherer, Andreas Müller, and Sven Behnke. 2010. vector space models. In2nd Workshop on Continuous Vector
Evaluation of pooling operations in convolutional architectures Space Models and their Compositionality. Citeseer, Association
for object recognition.International Conference on Artificial for Computational Linguistics, 31–39.
Neural Networks6354 (2010), 92–101. [40] Li Dong, Furu Wei, Ming Zhou, and Ke Xu. 2015. Question
[25] Quoc V. Le. 2013. Building high-level features using large scale answering over freebase with multi-column convolutional neural
unsupervised learning. InIEEE International Conference on networks. In53rd Annual Meeting of the Association for
Acoustics, Speech and Signal Processing. IEEE, 8595–8598. Computational Linguistics, Vol. 1. Association for Computational
[26] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Linguistics, 260–269.
learning.Nature521, 7553 (2015), 436–444. [41] Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang,
[27] Samira Pouyanfar and Shu-Ching Chen. 2017. T-LRA: Trend- Christopher D. Manning, Andrew Y. Ng, and Christopher Potts.
based learning rate annealing for deep neural networks. In The 2013. Recursive deep models for semantic compositionality
3rd IEEE International Conference on Multimedia Big Data. over a sentiment treebank. InConference on Empirical Methods
IEEE, 50–57. in Natural Language Processing. Citeseer, Association for
[28] Martín Abadi, Ashish Agarwal, Paul Barham, Martin Wattenberg, Computational Linguistics, 1631–1642.
Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. [42] Yonghui Wu, Mike Schuster, Zhifeng Chen, Jason Riesa, Alex
Tensorflow: Large-scale machine learning on heterogeneous Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and
distributed systems.CoRRabs/1603.04467 (2016). Retrieved Jeffrey Dean. 2016. Google’s neural machine translation
fromhttp://arxiv.org/abs/1603.04467. system: Bridging the gap between human and machine
[29] Rami Al-Rfou, Guillaume Alain, Ying Zhang. 2016. Theano: A translation. CoRRabs/1609.08144 (2016). arxiv:1609.08144.
Python framework for fast computation of mathematical Retrieved from http://arxiv.org/abs/1609.08144.
expressions.CoRRabs/1605.02688(2016). Retrieved from [43] Karen Simonyan and Andrew Zisserman. 2014. Very deep
http://arxiv.org/abs/1605.02688. convolutional networks for large-scale image recognition.
[30] Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie CoRRabs/1409.1556 (2014). Retrieved
Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng fromhttp://arxiv.org/abs/1409.1556.
Zhang. 2015. MXNet: A flexible and efficient machine learning [44] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
library for heterogeneous distributed systems. 2016. Deep residual learning for image recognition. In IEEE
CoRRabs/1512.01274 (2015). Retrieved from Conference on Computer Vision and Pattern Recognition. IEEE
http://arxiv.org/abs/1512.01274. Computer Society, 770–778.
[31] Ronan Collobert, Samy Bengio, and Johnny Mariéthoz. [45] Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and
2002.Torch: A Modular Machine Learning Software Library. Kaiming He. 2016. Aggregated residual transformations for
Idiap-RR Idiap-RR-46-2002. Idiap. deep neural networks.CoRRabs/1611.05431 (2016). Retrieved
[32] Intel Nervana Systems. 2017. Neon deep learning framework. fromhttp://arxiv.org/abs/1611.05431.
Retrieved from https://www.nervanasys.com/ technology/neon. [46] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik.
Accessed April 4, 2017. 2014. Rich feature hierarchies for accurate object detection and
[33] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, semantic segmentation. InIEEE Conference on Computer
Jonathan Long, Ross B. Girshick, Sergio Guadarrama, and Vision and Pattern Recognition. IEEE, 580–587.
Trevor Darrell. 2014. Caffe: Convolutional architecture for fast [47] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali
feature embedding. InACM International Conference on Farhadi. 2016. You only look once: Unified, real-time object
Multimedia. ACM, 675–678. detection. In IEEE Conference on Computer Vision and Pattern
[34] Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Recognition. IEEE Computer Society, 779–788.
Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. [48] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy,
2014. cuDNN: Efficient primitives for deep Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016.
learning.CoRRabs/1410.0759 (2014). Retrieved fromhttp:// SSD: Single shot multibox detector. InEuropean Conference on
arxiv.org/abs/1410.0759. Computer Vision. Springer, 21–37.

5889
IJSTR©2020
www.ijstr.org
INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 02, FEBRUARY 2020 ISSN 2277-8616

[49] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas


Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Largescale
video classification with convolutional neural networks. InIEEE
Conference on Computer Vision and Pattern Recognition. IEEE
Computer Society, 1725–1732.
[50] Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama,
Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and
Trevor Darrell. 2015. Long-term recurrent convolutional
networks for visual recognition and description. InIEEE
Conference on Computer Vision and Pattern Recognition. IEEE
Computer Society, 2625–2634.
[51] Hayit Greenspan, Bram van Ginneken, and Ronald M.
Summers. 2016. Guest editorial deep learning in medical
imaging: Overview and future promise of an exciting new
technique. IEEE Transactions on Medical Imaging 35, 5 (2016),
1153–1159.
[52] Alexander Mordvintsev, Christopher Olah, and Mike Tyka. 2015.
Inceptionism: Going deeper into neural networks. Google
Research Blog. Retrieved
fromhttps://research.googleblog.com/2015/06/inceptionism-
going-deeperinto-neural.html. Accessed March 26, 2018
[53] Christopher Manning. 2016. Understanding human language:
Can NLP and deep learning help? InThe 39th International ACM
SIGIR Conference on Research and Development in
Information Retrieval.ACM,1–1.
[54] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep
learning.Nature521, 7553 (2015), 436–444.
[55] Antonio Torralba, Rob Fergus, and William T. Freeman. 2008.
80 million tiny images: A large data set for nonparametric object
and scene recognition.IEEE Transactions on Pattern Analysis
and Machine Intelligence 30, 11 (2008), 1958–1970.

5890
IJSTR©2020
www.ijstr.org

You might also like