The Quiet Revolution in Machine Vision
The Quiet Revolution in Machine Vision
Abstract
Over the past few years, what might not unreasonably be described as a true revolution has
taken place in the field of machine vision, radically altering the way many things had
previously been done and offering new and exciting opportunities for those able to quickly
embrace and master the new techniques. Rapid developments in machine learning, largely
enabled by faster GPU-equipped computing hardware, has facilitated an explosion of
machine vision applications into hitherto extremely challenging or, in many cases, previously
impossible to automate industrial tasks. Together with developments towards an internet of
things and the availability of big data, these form key components of what many consider to
be the fourth industrial revolution. This transformation has dramatically improved the
efficacy of some existing machine vision activities, such as in manufacturing (e.g. inspection
for quality control and quality assurance), security (e.g. facial biometrics) and in medicine
(e.g. detecting cancers), while in other cases has opened up completely new areas of use, such
as in agriculture and construction (as well as in the existing domains of manufacturing and
medicine). Here we will explore the history and nature of this change, what underlies it, what
enables it, and the impact it has had - the latter by reviewing several recent indicative
applications described in the research literature. We will also consider the continuing role that
traditional or classical machine vision might still play. Finally, the key future challenges and
developing opportunities in machine vision will also be discussed.
1. Introduction – the purpose of this paper and what it brings that is new to the reader
The appropriate utilisation of visual data is the enabling component in many automated tasks,
from industrial quality control to driverless cars. Here we consider the recent history of
machine vision research, the huge changes that have, and are still taking place, and the
dramatic impact this is having, both in terms of methodology and new applications. We
introduce, define, and explain key terminology for those not completely familiar with the
discipline, and comment on the most relevant and impactful developments by pinpointing
ideas in the literature which have changed radically the approaches and techniques now being
explored. Using insightful examples taken from representative applications, we will explain
how these ideas are being applied now and consider the future directions.
Deep
Learning
Artificial
Machine Intelligence
Learning
Figure 1 The relationship between the domains of AI, machine learning and deep
learning
In practice, deep learning is realised using artificial neural networks (ANNs) to simulate
human-like decision making. Inspired by, and loosely simulating the functioning of the
human cortex, ANNs comprise a matrix of connected layers of nodes (Figure 2). The input
layer is where data enter the network (for which the number of nodes is the same as the
dimension of input features). This is followed by one or more hidden layers that transform the
data flowing through the network towards the output layer (where the number of nodes is the
same as the number of classes to be classified), that produces the network predictions. A deep
network simply refers to an ANN with more than three hidden layers and in some cases can
have millions of nodes (loosely analogous to neurons in the human brain). We should note
here that in fact there are two different kinds of output prediction - either regression (a
continuous quantity), or, as mentioned above and which is more often the case in many
machine vision applications, a classification (a discrete class label or category).
Inputs Outputs
Connections between the layer nodes have weights and biases, known as the layer parameters
(Figure 3), that control how the layers together transform data between the input and output
of the network during a process known as feedforwarding. It is these parameters that
represent the network model, and we shall see later how a feedback learning process is used
to first establish these parameters during training. The development of ANNs has its origins
as far back as the 1950s and for a while fell out of favour. However, a series of breakthroughs
dating from the 1990s by Yann LeCun et. al. [1, 2], culminating in [3] and then later around
2010, most notably the work by Alex Krizhevsky [4] and others in image classification, led to
a particular form of ANN known as a deep convolutional neural network (CNN) becoming
established for many challenging computer vision tasks, particularly those involving
perception for object recognition and localisation in natural images.
Bias
Input 1 Weight 1
Output
Input 2
Weight 2
3.1 The convolutional neural network (CNN) – a special kind of ANN for images
Figure 4 shows the principal components of a convolutional neural network (CNN). The main
difference between CNNs and regular ANNs, and the reason why they lend themselves so
well to image analysis, is the addition of a convolutional layer - an initial filtering stage,
implemented using conventional filter kernels that traverse the 2D image to detect local
spatial features, such as edges, corners, and other patterns, at different positions – somewhat
like processes in human vision. Outputs from the kernel are summed using a bias function to
form the input to a non-linear activation function layer, often a rectified linear unit (ReLU) (a
non-linear transformation that reduces negative input values to zero while positive values are
passed as output). This non-linear characteristic of the ReLU is also similar to the firing
behaviour of neurons in the human brain and so is suited to visual tasks (or more formally it
offers a reduced sensitivity to the vanishing gradient problem – an issue that impedes CNN
training). A data reduction stage (necessary as images may contain millions of pixels and so
vast amounts of data – some potentially redundant to the task) is then performed by a pooling
layer. This usually employs either average- or a max-pooling function to a defined
neighbourhood of pixel values, reducing the data dimensionality to leave only the most
significant pixels. Note here that the combined convolution and pooling layers may be
repeated, to detect a hierarchy of image features, before input to a fully connected
classification stage, giving a feature vector, and finally an output layer, for which a Softmax
activation function allows multinomial labelling (where the probability of all outputs sums to
1.0). Hence, the network output gives a probability distribution over the output labels
indicating which output case the network has most closely matched to the input. So, CNNs
are very well adapted to machine vision tasks, as they are specifically designed to process
large quantities of pixel data. The convolution and pooling layers act as feature extractors
from the input image, while the fully connected layers act as a classifier. In this way, CNNs
can usefully be divided into two parts: feature extraction and classification. Feature extraction
is achieved by the convolution plus ReLU together with pooling layers, and classification is
achieved by the fully connected layers.
Flower
Weed
Stone
Figure 4 The architecture of a convolutional neural network (CNN). The CNN can be
divided into two parts: feature extraction and linear classification
However, before all this can happen, the network must first be trained. Training enables the
network to output predictions by identifying patterns in a set of labelled training data, fed
through the network while the outputs are compared with the actual labels by an objective
loss function. During training the network’s parameters (the weight and bias of each neuron)
are tuned over a series of iterations or epochs, until the patterns identified by the network
result in good predictions for the training data. Thus, the parameters of the network (weights
and biases) are found during a supervised training phase by minimizing a loss function,
(calculated during a forward propagation), between prediction and ground truth labels at the
output layer. To help this process, regularisation constraints are used to update the network
weights and biases at each iteration (e.g. employing stochastic gradient descent – SGD) using
backpropagation until convergence. The concept of backpropagation in training the network
is very important and the key part of how an ANN is made to work. It is therefore worth
spending a little time underlining what is going on. During backpropagation, within each
layer, the node weights are adjusted in proportion to the amount of error they contributed to
the loss function (the difference between predictions and labels) during forward propagation.
The more a given node contributes to the error, the more it is adjusted. When the loss
function reaches a convergence state, the current weights and biases are preserved as the
well-trained model. It is in this manner that the weights and biases of the network are
established via the learning process using real data to form the deep learnt model. This
trained model may later be applied (a process known as inferencing) to unknown data (known
as generalisation) in a practical application to identify various features of interest in images.
CNNs have become the default choice for simultaneous detection of multi-class
categorisation problems, typical of many industrial visual tasks.
Next consider an entirely different class of industrial machine vision problem, for example
that of identifying weeds within crops in a field as part of an agricultural farming application
[7]. This is typical of many machine vision tasks that involve natural objects, such as plants
(e.g. in farming, and in food and timber processing), minerals (e.g. polished marble and
granite), animals and animal by-products (e.g. meat and leather products) or humans (e.g. in
medical and security applications), as opposed to synthetic man-made objects – such as
manufactured metal and plastic parts and assemblies. In our example case of automated
weeding, the aim is to automatically provide a targeted precision dose of herbicide only to the
weeds, and not the crop or surrounding area [8]. Such a visually informed system might not
only reduce waste but could offer significant environmental benefits by minimising levels of
toxic chemical runoff and so ground and river pollution. The machine vision system
specification required here will need to be such that the device can both recognise the weeds,
or even perhaps a range of differing weed species (dock, ragwort, etc), and determine their
individual spatial locations. All will need to be completed in real-time in an outdoor
environment, as a tractor-mounted vision-equipped computer-controlled spray boom
traverses the crop. These uncontrollable variables, such as different invading species, the
amount of weed and crop growth, uneven ground, changing lighting and occlusion from the
crop itself, make the task inherently difficult as it includes a great deal of random variation.
Until very recently such tasks were considered entirely beyond the capability of a hand-
crafted rule-based model approach for any sort of practical application. Even so, such a
visually complex task is representative of many that are now being successfully addressed by
developments in machine learning techniques. More specifically, by deploying a form of
specialised artificial neural network, known as a convolutional neural network (CNN), it is
possible for the rules of such a complex machine vision model to be automatically learnt
from the data by the system itself, in a so-called data-driven approach.
6.2 The need to manually label training data: Most applications of CNNs involve
supervised training with large quantities of data – thousands of images. This can pose two
problems. Firstly, as mentioned above, is the limited availability of such large datasets for
training, and secondly there is the need to label (i.e. say what the image contains) the data
(usually a manual process). Both issues can to some extent be addressed using data
augmentation. This is where a limited labelled dataset is slightly altered using a range of
automated image transformations, such as rotations, reflections, height and width shifts,
zooming, horizontal-flipping, shear intensity changes, cropping, and adding noise, to
artificially create variation in the images and so a much larger training dataset. It is worth
noting that while such techniques can help improve training with a limited dataset, it is often
the case that use of transfer learning and data augmentation would not be expected to result in
as good a performance as training from scratch using a large dataset.
6.3 Overfitting the training data: Another well-known problem in machine learning is that of
overfitting. This is where the model fits the training data well but does not generalise to
unseen data (the useful part). Overfitting is one of the most frustrating problems in machine
learning projects and most practitioners experience it in their work. There are thankfully a
range of tools that can help, including just simplifying the network by reducing the number of
hidden layers, regularisation (adding a cost to the loss function for large weights) and the use
of dropout layers.
6.4 The mysterious black box: It is worth noting that it can be difficult to know what the
ANN is actually doing - for example which parts of the image are being used? Consider an
object recognition task. Could the network be using something unexpected, such as a
timestamp in the corner of the image and no part of the object itself? This can be of most
concern in medical applications, where even if performance is good, this lack of
accountability could have important legal consequences. Thankfully, a prediction process
debugging tool exists in the form of the Grad-CAM method. This gives a heatmap of class
activation that distinguishes the similarity of each image location with respect to a particular
class. More specifically, the heatmap of class activation is a feature map of the last
convolutional layer for a given input image, with the feature map channels weighted by a
class gradient calculated with regards to the feature map [15]. Figure 6 shows an example
heat map indicating the areas of a pig’s face used in a biometric recognition task [16]. This
shows in some detail which parts of the face the network was using for recognition and helps
us, to some extent, understand what the network was doing and the predictions it made.
Figure 6 Example heat map (right) indicating the areas of a pig’s face (left) used in a
biometric face recognition task where yellows and reds progressively show more
significant areas
7. Examples of machine vision tasks enabled by deep learning - a new class of machine
vision tasks
We next delve a little deeper into the details of how deep learning is currently being applied
to real machine vision industrial tasks via a range of seminal state-of-the-art real-world
prototypical example applications. We explore applications across differing sectors, consider
their context, including the role of big data [17] and the internet of things [18], the
motivations, and the potential advantages. We also review the common architectures in use
today; and look to the future by considering the most significant remaining problematic
issues still to be addressed.
Williams et al. [19] present the design and evaluation of a kiwifruit harvesting robot intended
to operate autonomously in pergola style orchards. The technology is typical of many agri-
tech solutions currently under development and the application highly topical as the work is
in part motivated within the context of a reduced availability of farm labour. A deep neural
network with stereo matching was used to detect and locate kiwifruit under real-world
conditions (with some added illumination). The machine supervised learning model utilised
here is typical of many similar applications and so worth exploring in a little detail. Semantic
segmentation (where each image pixel is given a class label) was performed using a fully-
convolutional network (FCN) adapted from the VGG-16 (available via Github as FCN-8S).
The network was trained using an Nvidia GTX-1070 8GB graphics card on 63 (48 training
and 15 validation) 200x200 pixel hand-labelled images, collected across multiple orchards
under a range of typical conditions. Commercial orchard field trials gave an average pick
cycle time of 5.5s/fruit, of which 3.0s was for the detection step. Some 89.6% of the pickable
kiwifruit could be detected. This performance was reported as significantly better than other
comparable automated systems. Wang et al. [20] provide a general review of ground-based
machine vision for weed detection that included CNN deep learning-based approaches for
detection and segmentation in the field. Many of the key challenges indicative of agricultural
applications are identified in this work, such as occlusion and overlap of leaves, varying
lighting conditions and the effects of different growth stages. Palacios et al. [21] describe a
system for quantification of grapevine flowers using a field based mobile platform for early
crop yield forecasting. Here artificial illumination and night-time operation offered some
limited structuring. A deep fully-convolutional neural network SegNet architecture (two deep
neural networks) with a VGG-19 network as the encoder were employed for semantic
segmentation. As with other examples, images were also pixel-wise manually labelled and
image augmentation (rotations and flipping) used to increase variability in the limited training
data. As discussed above, we see that in practice the use of such data augmentation
techniques are very commonly used in an effort to cope with, or avoid the need for, large
quantities of hand labelled training images. Impressive F1 (F scores are a measure of
accuracy calculated from the precision and recall) score values of 0.93 and 0.73 were
obtained for multiscale flower segmentation. A strong correlation was reported between the
estimated number of flowers and the final produce yield. Kakani [22] take a step back and
explain how agri-tech start-ups are choosing task-specific AI and vision solutions in the
context of improving yields and the goal of sustainable food supplies by 2050. Switching
from largely static crops to moving farm animals, Nasirahmadi et al. [23] demonstrate how
deep learning can be used to detect standing and lying behaviour of pigs to better than an
average precision of 0.93 and 0.95 respectively, for monitoring animal health and welfare
under varied real-world farm conditions. Use of Faster R-CNN (faster regions with
convolutional neural network features), R-FCN (region-based fully convolutional network)
and SSD (single shot multi-box detector) methods were all explored. In related work, Hansen
et al. [16] show how deep learning achieved better than 96% accuracy in biometrically
recognising pig faces on the farm and point to advantages of facial biometrics when
compared to the use of conventional RFID ear-tags. They use both a transfer learning
approach, centred on the pre-trained VGG-Face model (based on the VGG-Very-Deep-16
CNN and trained on the Labelled (human) Faces in the Wild dataset), and a CNN model,
trained from scratch with their own artificially augmented pig face data. Class Activated
Mapping using Grad-CAM was deployed to show the regions that the network used to
discriminate between pig faces (Figure 6). In an example of the benefits of transfer learning,
it is very interesting that the VGG-Face pre-trained model performed as well as it did (91%
accuracy), given that it was trained only on human and not pig faces. In addition to the
outdoor applications mentioned above, other rapidly developing areas include the use of
visual data from drones [24]. Here multispectral, thermal and visible cameras can be used for
land surveys (soil characteristics), inspection of crops for yield prediction, disease detection,
and when to irrigate (smart irrigation), or otherwise treat and harvest.
Rong et al. [25] present an approach for detecting foreign objects in walnuts, pointing to the
issue of irregular shapes and complex features making the manual design of suitable feature
detectors extremely problematic. Their aim was the detection of different sized natural
foreign objects (flesh leaf debris, dried leaf debris and gravel dust) and man-made foreign
objects (paper and plastic scraps, packing material and metal parts). Two different
convolutional neural network structures were used for segmenting the nuts and detecting
foreign bodies. The proposed method was able to correctly classify 95% of the foreign
objects in 277 validation images. Here, as with the production processing applications
described below, speed of operation is an important consideration. The combined
segmentation and detection processing time was less than 50ms. Similar issues were
described by Aslam et al. [26], who offer a survey of inspection methods for leather defect
detection with various deep learning architectures compared. Nasiri et al. [27] describe an
egg sorting system using a CNN to perform a classification task with an emphasis on highly
structured illumination. They classify unwashed egg images into three classes: intact, bloody,
or broken. A pre-trained CNN, VGG-16 (with weights pre-trained on the ImageNet dataset),
was modified by replacing the last fully connected layers with a classifier block. This block
included a global average pooling layer (to minimise over-fitting through decreasing the
number of parameters), two dense layers with 512 neurons and the ReLU function, batch
normalization (to maintain all inputs of the layer to the same range), dropout (as a
regularization technique to prevent overfitting), and a final dense layer in the form of the
Softmax classifier. This final layer computed the normalized probability value of the three
classes. Once again, common augmentation techniques were used to increase the size of the
manually labelled training dataset. Performance was shown to outperform traditional machine
vision-based models, achieving an average overall accuracy of 94.84% (by 5-fold cross-
validation).
A study by Xie et al. [40] had the aim of combining a convolutional neural network with
conventional computer vision techniques to automatically recognise and locate bones in
Atlantic salmon and to explore the impact of different image quality (obtained using differing
levels of image compression) on performance, in what is considered a safety critical
application. A Faster-RCNN object detection algorithm with three different convolutional
neural network models (Alexnet, VGG-16 and VGG-19) were studied. An image
compression ratio of 25% was identified as maximising costs and benefits with respect to
detection accuracy, equipment prices and detection speed. Jiang et al. [41] describe the use of
hyperspectral data for the detection of pesticide residue in post-harvest apples. They also fuse
deep learning in the form of an Alex-Net-CNN with conventional techniques by using an
Otsu segmentation and a Hough space transformation to first establish region of interest
masks, centred on the apples. Detection accuracy was better than 95% and compared
favourably with traditional k-nearest neighbour and support vector machine (SVM)
classification algorithms. Related work for online fruit sorting using deep learning to detect
internal mechanical damage of blueberries using hyperspectral data is presented by Wang et
al. [42]. Another example implementation that combines conventional machine vision
techniques with deep learning models for rapid defect identification in industrial process line
inspection was presented by Wang et al. [43]. Their factory setting allowed the use of a fixed
object pose and structured lighting to help reveal contamination defects in bottles. After
applying an image processing stage in the form of a Gaussian filter and (Canny) edge
detector, to generate an edge graph, a Hough transform was utilised to reduce the deep
learning defect detection task complexity by establishing regions of interest to which a
‘lightweight CNN’ was applied. This illustrates how the availability of big data in the form of
large numbers of example defects and defect free samples in many manufacturing
applications lends itself to a deep learning solution. Furthermore, as with many
manufacturing quality control applications of machine vision, the emphasis here was on
efficiency in terms of speed of operation in a trade off with acceptable detection accuracy.
Overall accuracy was reported to be as high as 99.6%, with a cycle time of just 47.6ms,
allowing inspection of 21 products per second. These results compared favourably both with
traditional feature based shallow machine learning, and other deep learning solutions. Li et al.
[44] present a hybrid of conventional techniques and deep learning for detecting a wide range
of defect types (scratches, floaters, light stains, and dark stains) on mobile phone screens
during production. Conventional techniques were deployed in a pre-examination stage and
regions of interest containing defect targets established using shape-based template matching.
A two-stage approach, in which several deep learning models (VGG, ResNet, GoogLeNet,
ResNeXt, SeNet, NasNet) were explored; all of which reached an impressive 99% detection
accuracy, with the simpler VGG model achieving a cycle time of 4.56s and 2.47s on a CPU
and GPU, respectively. Tang et al. [45] present an electrical component recognition method
based on deep learning and conventional machine vision techniques. Conventional pre-
processing operators, such as grayscale conversion, mean filtering, pose correction and other
techniques were used (from the OpenCV library). Component coding of different types and
materials were recognised by the CNN, with recognition results that compared favourably
with traditional techniques across a wide range of components. Jang et al. [46] also propose a
surface defect inspection solution but with an emphasis on a small training dataset in the
context of a lack of defect sample images - typical of some manufacturing applications. Their
proposed method for wafer/PCB defect inspection, a major application area from machine
vision, exploited conventional defect inspection techniques to estimate a defect probability
image. That, together with the original grey level image, formed the input to a CNN. Their
integrating of conventional inspection techniques with a CNN model was shown to be highly
effective for conventional defect inspection problems.
9. Future key challenges - promises and opportunities
The literature demonstrates that machine vision systems are being used across a plethora of
industries for a wide range of tasks. Of these, quality assurance and inspection form the
largest segment and this alone is expected to grow at a rate of 9.5% over the next four years
[47]. New markets are also developing, and given the new capabilities offer by deep learning,
there is a particular growing demand in outdoor applications. These evolving applications
mean that the global machine vision system market is expected to rapidly grow from $8.6
Billion in 2020 to $17.7 Billion by 2027 [47], accelerating the demand for improved
performance and new capabilities.
We have seen how the application of deep ANNs in industrial machine vision tasks can use
biologically informed modelling of the vision system to achieve substantial leaps in
performance. Deep learning can also help automate much of the model creation steps during
development, and in application can be more robust and flexible, more adaptable to change
and offer greater generality in application. However, it is also apparent that classical machine
vision still has an important role to play. While some tasks are not suited to deep learning
[48], for those that are, we have seen how classical techniques can often be combined with
deep learning to substantially improve performance.
However, it is also clear from the literature that there are challenges remaining, and it seems
appropriate to attempt to prioritise these as opportunities for future research. Firstly, deep
learning necessitates a large amount of, usually labelled, training data. This may not always
be accessible and there can be a scarcity of publicly available data for training CNNs. This
means, as we have seen, that in practice few researchers train deep CNNs from scratch.
Instead, many use networks that have been pre-trained on large-scale image data, where the
trained weights are applied as a feature extractor to a smaller dataset in the solution domain.
We have seen how this can be achieved by removing the last few layers of the pre-trained
CNN. This allows a system developed for one application domain to be relatively easily
transferred to another (e.g. human face to pig face biometric recognition [16]). However, this
prompts the question as to whether future research should focus on devising new data
augmentation methods for expanding often limited available data, or on how better to acquire
real big data for each task, or will transfer learning be sufficient? Secondly, datasets have to
be labelled, and in some applications (e.g. medical) by domain experts. We have also seen
how this problem can be partly solved by leveraging effective data augmentation (and to a
lesser extent automated annotation) techniques. Perhaps a more promising area of research,
aimed at addressing both limited training data and the need for manual labelling, could be to
explore methods for unsupervised learning, such as that already offered by generative
adversarial networks (GANs). As we have seen, GANs offer an unsupervised form of
learning, in which a generator network works in partnership with a discriminator network to,
for example, mimic a human expert. Unsupervised learning avoids the need for manual data
labelling by automatically discovering patterns in the data such that the network model can
generate new outputs that plausibly could have been drawn from the original real dataset. In
this regard, it is rather like generating augmented data. Impressive examples of the latter
exist, where GANs have been used to create very realistic, but completely artificial, fake
human faces [49]. Unsupervised learning and the ability to create realistic training data with
wide variation could have a huge impact across many industrial applications where training
data are limited.
It is widely recognised that deep learning is now the state-of-the-art in machine learning for
machine vision and is being increasingly widely deployed across industrial applications. We
have seen how conventional machine vision techniques can support deep learning
applications and have highlighted some of the key challenges that will need to be addressed
in the design and development of future CNN based solutions.
References
1. Y. LeCun, O. Matan, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L.
D. Jackel and H. S. Baird: Handwritten Zip Code Recognition with Multilayer Networks, in
IAPR (Eds), Proc. of the International Conference on Pattern Recognition, II:35-40, IEEE,
Atlantic City, invited paper, 1990
2. Y. LeCun, Y. Bengio, Convolutional networks for images, speech, and time series. The
handbook of brain theory and neural networks, 3361(10), 1995
3. Y. LeCun, L. Bottou, Y. Bengio and P. Haffner: Gradient-Based Learning Applied to
Document Recognition, Proceedings of the IEEE, 86(11), 1998
4. A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet Classification with Deep
Convolutional Neural Networks, Advances in Neural Information Processing Systems 25(2),
DOI: 10.1145/3065386, 2012
5. A. Sioma, Automated Control of Surface Defects on Ceramic Tiles Using 3D Image
Analysis, Materials, 13(5), 2020
6. M. L. Smith, L. N. Smith, Dynamic Photometric Stereo - A New Technique for Moving
Surface Analysis, Image and Vision Computing, 23(9), 2005
7. B. L. Steward, J. Gai, L. Tang, The use of agricultural robots in weed management and
control, Agriculture and Engineering Biosystems Publications, Iowa State University,
DOI:10.19103/AS.2019.0056.13, 2019
8. L. N. Smith, A. Byrne, M. F. Hansen, W. Zhang, M. L. Smith, Weed classification in
grasslands using convolutional neural networks, Applications of Machine Learning, SPIE
Optics Photonics, DOI.org/10.1117/12.2530092, 2019
9. N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, J.
Liang, Convolutional neural networks for medical image analysis: Full training or fine
tuning?, IEEE Transactions on Medical Imaging, 35(5), 2016.
10. D. Erhan, Y. Bengio, A. Courville, P. A. Manzagol, P. Vincent, S. Bengio, Why does
unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11,
Online: http://dl.acm.org/citation.cfm?id=1756006.1756025, 2010
11. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image
recognition, arXiv 1409.1556, 09, 2014
12. K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
13. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto,
H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications,
CoRR, abs/1704.04861, Online: http://arxiv.org/abs/1704.04861, 2017
14. F. Chollet, Xception: Deep learning with depthwise separable convolutions, IEEE
Conference on Computer Vision and Pattern Recognition (CVPR),
DOI.org/10.1109/CVPR.2017.195, 2017
15. A. Nasiria, A.Taheri-Garavand, Y. Zhang, Image-based deep learning automated sorting
of date fruit, Postharvest Biology and Technology, 153, 2019
16. M. F. Hansen, M. L. Smith, L. N. Smith, M. G. Salter, E. M. Baxter, M. Farish, B.
Grieve, Towards on-farm pig face recognition using convolutional neural networks,
Computers in Industry, 98, 2018
17. L. Li, K. Ota, M. Dong, Deep Learning for Smart Industry: Efficient Manufacture
Inspection System with Fog Computing, IEEE Transactions on Industrial Informatics, 14(10),
2018
18. H. Yang, S. Kumara, S. T.S. Bukkapatnam, F. Tsung, The internet of things for smart
manufacturing: A review, IISE Transactions, Vol. 51, Iss. 11, 2019
19. H. A. M. Williams, M. H. Jones, M. Nejati, M. J. Seabright, J. Bell, N. D. Penhall, J. J.
Barnett, M. D. Duke, A. J. Scarfe, H. S. Ahn, J. Y. Lim, B. Macdonald, Robotic kiwifruit
harvesting using machine vision, convolutional neural networks, and robotic arms,
Biosystems Engineering, 181, 2019
20. A. Wang, W. Zhang, X. Wei, A review on weed detection using ground-based machine
vision and image, Computers and Electronics in Agriculture, 158 2019
21. F. Palaciosa, G. Buenoc, J. Salidoc, M. P. Diago, I. Hernández, J. Tardaguilaa,
Automated grapevine flower detection and quantification method based on computer vision
and deep learning from on-the-go imaging using a mobile sensing platform under field
conditions, Computers and Electronics in Agriculture, 178, 2020
22. V. Kakani, V. H. Nguyen, B. P. Kumar H. Kim, V. R. Pasupuleti, A critical review on
computer vision and artificial intelligence in food industry, Journal of Agriculture and Food
Research, 2, 2020
23. A. Nasirahmadi, B. Sturm, S. Edwards, K-H. Jeppsson, A-C. Olsson, S. Müller, O.
Hensel, Deep Learning and Machine Vision Approaches for Posture Detection of Individual
Pigs, Sensors, 19(17), 2019.
24. D. Liuzz, G. Silano, F. Picariello, L. Iannelli, L. Glielmo, L. De Vito, P. Daponte, A
review on the use of drones for precision agriculture, IOP Conference Series Earth and
Environmental Science, 275, DOI: 10.1088/1755-1315/275/1/012022, October 2018
25. D. Rong, L. Xie, Y. Ying, Computer vision detection of foreign objects in walnuts using
deep learning, Computers and Electronics in Agriculture, 162, 2019
26. M. Aslam, T. M. Khan, S. S. Naqvi, G. Holmes, R. Naffa, On the Application of
Automated Machine Vision for Leather Defect Inspection and Grading: A Survey, IEEE
Access, DIO:10.1109/ACCESS.2019.2957427, 7, 2019
27. A. Nasiri, M. Omid, A. Taheri-Garavand, An automatic sorting system for unwashed eggs
using deep learning, Journal of Food Engineering 283 (2020)
28. H. Würschinger, M. Mühlbauer, M. Winter, M. Engelbrecht, Project Administration, N.
Hanenkamp, Implementation and potentials of a machine vision system in a series production
using deep learning and low-cost hardware, Procedia CIRP, 90, 2020
29. R. Ren, T. Hung, and K. C. Tan, A Generic Deep-Learning-Based Approach for
Automated Surface Inspection, IEEE Transactions on Cybernetics, 48(3), 2018
30. C. Zhihong, Z. Hebin, W. Yanbo, L. Binyan, L. Yu, A Vision-based Robotic Grasping
System Using Deep Learning for Garbage Sorting, Proceedings of the 36th Chinese Control
Conference, China, July 2017
31. M. Bahaghighat, F. Abedini, M. S’hoyan, A. Molnar, Vision Inspection of Bottle Caps in
Drink Factories Using Convolutional Neural Networks, IEEE 15th International Conference
on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania,
DOI:10.1109/ICCP48234.2019.8959737, 2019
32. J. Villalba-Diez, D. Schmidt, R. Gevers, J. Ordieres-Meré, M. Buchwitz, W. Wellbrock,
Deep Learning for Industrial Computer Vision Quality Control in the Printing Industry 4.0,
Sensors, 19, 3987, DOI:10.3390/s19183987, 2019
33. R. Moreno, E. Gorostegui-Colinas, P. López de Uralde, A. Muniategui, Towards
Automatic Crack Detection by Deep Learning and Active Thermography, International
Work-Conference on Artificial Neural Networks, IWANN, Advances in Computational
Intelligence, DOI: 10.1007/978-3-030-20518-8_13, 2019
34. E. Ayan and H. M. Ünver, Diagnosis of Pneumonia from Chest X-Ray Images Using
Deep Learning, 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering
and Computer Science (EBBT), Turkey, DOI: 10.1109/EBBT.2019.8741582,2019
35. G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A.W.M.
van der Laak, B. van Ginneken, C. I. Sánchez, A survey on deep learning in medical image
analysis, Medical Image Analysis, 42, 2017
36. M. Ruppel, R. Persad, A. Bahl, S. Dogramadzi, C. Melhuish, and L. N. Smith, NANCY:
Combining Adversarial Networks with Cycle-Consistency for Robust Multi-Modal Image
Registration, International Journal of Computer and Information Engineering, 14.8, 2020
End of Paper
888888888888888888888888888888888888888888888888888888888888888888888888888