0% found this document useful (0 votes)
59 views5 pages

Transfer Learning For Image Classification

Uploaded by

Bushra Rafia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views5 pages

Transfer Learning For Image Classification

Uploaded by

Bushra Rafia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018)

IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1

Transfer learning for image classification


Manali Shaha1 Meenakshi Pawar2
Dept. of Electronics and Telecommunication
SVERI, Pandharpur
Maharashtra, India
1
[email protected]
2
[email protected]

Abstract—Convolutional neural network (CNN) gained great the local neighbourhood information. LBP has proved its
attention for robust feature extraction and information mining. effectiveness in almost all computer vision algorithm because
CNN had been used for variety of applications such as object of its simple and computationally efficient implementation.
recognition, image super-resolution, semantic segmentation etc.
due to its robust feature extraction and learning mechanism. By Though, it is unable to incorporate the directional information,
keeping constant the baseline learning topology, various CNN variants of LBP were proposed according to the need of
architectures were proposed to improve the respective system application. To extract illumination invariant local features,
performance. Among these, AlexNet, VGG16 and VGG19 are Tan et al. [7] proposed local ternary patterns (LTP). Further,
the famous CNN architecture introduced for object recognition Murala et al. proposed bank of local operators [8]–[16] with
task. In this paper, we make use of transfer learning to fine-tune
the pre-trained network (VGG19) parameters for image classi- an application to CBIR. Among which, local tetra patterns
fication task. Further, performance of the VGG19 architecture [14] extract twelve directional information and obtain more
is compared with AlexNet and VGG16. Along with the CNN robust information. Spherical symmetric 3D LTP proposed by
architectures, we have compared the hybrid learning approach Murala et al. [11] extracts the spatio-temporal information. Not
which is comprised of robust feature extraction from CNN only CBIR but also variety of applications [17], [18] make
architecture followed by support vector machine (SVM) classifier.
We have used two state-of-the-art databases namely: GHIM10K use of local feature extraction, because of its ease of use.
and CalTech256 to study the effect of CNN architecture for robust Further, some modified approach proposed by Mayuri et al.
feature extraction. Performance evaluation has been carried out [18] for CBIR using combination of local feature descriptor
using average recall, precision and F-score. Performance analysis and artificial neural networks.
shows that fine-tuned VGG19 architecture outperforms the other Even though these operators extract the local information
CNN and hybrid learning approach for image classification task.
they fail in complex scene or clutter background. Evolutionary,
Index Terms—Image classification, AlexNet, VGG16, VGG19, interest point detection methods overcome the drawbacks of
CalTech256, GHIM10K local operator and other hand-crafted features. Initially, Harris
et al. [19] proposed corner detection algorithm based on pure
I. I NTRODUCTION mathematical theory. However, corner detection method fails
Research in image classification witnessed the evolution in with variation of scale. Further, Lowe et al. [20] proposed scale
computer vision algorithm from first order moments to hand invariant feature transform (SIFT) to detect the scale invariant
crafted features to end-to-end machine learning approaches interest points (SIIP). SIIP followed by histogram of oriented
to improve the classification accuracy. This evolution was gradients extracts the robust SIFT features. Speeded up robust
initialized by extracting textural information using first order features (SURF) [21] introduced by Bay et al. to reduce
moments and grey level dependency features. Haralick et al. the computational complexity of SIFT. Further, various re-
[1] proposed set of simple texture features. They proposed grey searchers integrate the interest points detected by SIFT/SURF
level dependency statistics for texture classification. Further, with another feature descriptor and introduced different robust
structural approach for texture information has been proposed feature descriptor [22]–[24] for image classification and other
by Haralick et al. [2] to incorporate the structural information computer vision tasks.
in texture classification. Manjunath et al. [3] proposed first Feature descriptor discussed above are witnessed to the
order moments to extract the texture information with an growth of the digital image libraries and the evolution of
application to content based image retrieval. However, first large scale databases. Hand crafted features fail in the large-
order moments are not invariant to scale as well as rotation. scale database because of the high intra as well as inter
To overcome the scale and rotation invariance, Han et al. [4] class variation in image categories. However, effective/robust
proposed rotation and scale invariant Gabor filters with ap- feature descriptor could improve the performance of image
plication to texture classification. Further, supervised learning classification. In recent years, convolutional neural network
proposed by Talbar et al. [5] for texture classification. had great success in almost all areas of machine learning
However, first order moments are not robust to classify the and computer vision filed. Due to the robustness of CNN
similar contrast textures. Also, it fails in complex textures. feature extraction, researchers make use of it in variety of
Ojhala et al. [6] proposed local binary patterns (LBP) to extract applications. Initially, Alex et al. [25] proposed an evolutionary

978-1-5386-0965-1/18/$31.00 ©2018 IEEE 656


Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018)
IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1

Fig. 1. Existing CNN architectures. (a) AlexNet [25] (b) VGG16 [27] (c) VGG19 [27]

CNN architecture named as AlexNet for object recognition parameters for recognition task over very large scale database.
task. The major hurdle in training of CNN is availability AlexNet consist twenty six layers, out of which last two
of large scale database. However, they used ILSVRC [26] layers are softmax and output layers. Network architecture
database along with augmentation to train their network. To is divided into three parts. Fig. 1 (a) shows the network
improve the recognition accuracy further towards the human architecture. First part of the network consists of two units,
vision system researchers proposed deeper CNN architectures. each unit comprises { convolution, relu, normalization and
Simonyan el al. [27] proposed VGG16 architecture for object pooling layer}. Second part of the network consist of four
recognition task. Improved VGG16 architecture known as units, each of them comprises { convolution and pooling
VGG19 overcomes the drawbacks of AlexNet and increases layer}. Last part of the network corresponds to the non-linear
the system accuracy. activation unit which corresponds to the fully connected (FC),
In this paper, we have fine-tuned the VGG19 architec- relu and drop-out layer. Drop-out layer avoids the over-fitting
ture over two state-of-the-art databases CalTech256 [28], of the data during training. This repetitive structure of the
GHIM10K [29] and analysed the effect of deeper network by network adapts the data characteristics and extracts the robust
comparing the result with AlexNet and VGG16 architectures features. Initial filters learnt the low level features.
for image classification task. Also, we have analysed the effect Accuracy of the CNN architecture highly depends upon the
of SVM classifier in conjunction with the extracted features three factors namely: Large scale database, high end compu-
from CNN architectures. Performance evaluation carried out tational unit and the network depth. Out of these, requirement
using recall, precision and F-score. of the training database is solved due to availability of the
ILSVRC [26] publicly available database. GPU unit can solve
II. E XISTING CNN ARCHITECTURE
the second difficulty. However, last parameter has an uncer-
In this section, we have discussed three existing CNN tainty, because there is no such measure which could set a limit
architectures AlexNet [25], VGG16 [27] and VGG19. Initially, for network depth. Going deeper in the network extracts more
AlexNet proposed by Alex et al. [25] to solve the object complex and robust features. Simonyan el al. [27] proposed
recognition problem. It was the first try to learn the network VGG16 architecture for object detection task. Fig. 1 (b) shows

978-1-5386-0965-1/18/$31.00 ©2018 IEEE 657


Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018)
IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1

Fig. 2. Proposed system block diagram. (a) CNN transfer learning (fine tuning) (b) CNN testing phase

the network architecture. Unlike AlexNet, VGG16 consists In this work, we have fine-tuned the network parameters
of replicative structure of { convolution, relu and pooling of the VGG19 over two databases namely: CatlTech256 and
layer}. They increased the number of such network unit to GHIM10K. Proposed network is divided into two parts, (1)
design deeper network. However, [27] considered smaller size CNN training phase and (2) CNN testing phase. Fig. 2
receptive window for each convolutional filter as compared to shows the proposed system flow. Fig. 2(a) illustrates the CNN
AlexNet. Non-linear activation unit is same as that of AlexNet. training phase in which network parameters of the VGG19
Further, more deeper network VGG19 is proposed for the are fine-tuned and trained VGG19 is obtained. However, Fig.
same task (Object detection). VGG19 comprises of some 2(b) shows the CNN testing phase which comprises the test
extra convolutional relu units in the middle of the network image followed by trained VGG19 to estimate the image class
as compared to VGG16. However, this minute change in the probability.
architecture turns into the accuracy enhancement for object
recognition task. IV. E XPERIMENTAL RESULTS
We have divided our analysis into three experiments. Out
III. P ROPOSED APPROACH of which first two were performed on two state of the art
databases namely: GHIM10K and CalTech256.
ILSVRC database consist of 22000 categories of objects.
It covers almost all the existing object whichever known A. Experiment #1
to a common human being. However, existing networks are We have carried this experiment on publicly available
trained over 1000 different categories out of 22000. It is GHIM10K database. It consists of 20 classes, each class is
quite impossible to learn the network parameters of such huge having 500 images. Fig. 4 shows the sample images from
networks over small scale datasets. However, these parameters GHIM10K dataset. In this experiment, we have analysed the
can be fine tuned over the small datasets as per the application performance of VGG19 architecture for image classification
demands. task. Fig. 3 illustrates the class wise average { recall, precision

978-1-5386-0965-1/18/$31.00 ©2018 IEEE 658


Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018)
IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1

Fig. 3. Result of Class wise recall, precision and F-score on GHIM10K database using VGG19 network architecture.

TABLE I
C OMPARISON BETWEEN CNN ARCHITECTURES USING AVERAGE RECALL ,
PRECISION AND F- SCORE ON GHIM10K DATABASE

Method Recall Precision F-score


AlexNet 96.88 96.56 96.72
VGG16 98.57 98.23 98.40
VGG19 99.38 99.23 99.30

TABLE II
C OMPARISON BETWEEN CNN ARCHITECTURES USING AVERAGE RECALL ,
PRECISION AND F- SCORE ON C ALT ECH 256 DATABASE

Method Recall Precision F-score


AlexNet 87.08 87.31 87.09
VGG16 88.04 88.24 88.03
VGG19 88.63 88.88 88.65

Fig. 4. Sample images from GHIM10K database. One image from each class.
class is having minimum 80 images. Due to the space limit,
it is not possible to show the class wise accuracy, as there are
and F-score} over GHIM10K database using VGG19 archi- 256 different categories. Instead we discussed average { recall,
tecture. Further, to analyse the robustness of CNN features, precision and F-score} over CalTech256 dataset. Further, to
we employed support vector machine for image classifica- analyse the robustness of CNN features, we employed support
tion. Also, we have compared performance of SVM with vector machine for image classification. Also, we have com-
VGG19. Along with VGG19, we have analysed performance pared performance of SVM with VGG19. Along with VGG19,
of AlexNet and VGG16 on GBHIM10K database. Fig. 5 we have analysed performance of AlexNet and VGG16 on
shows the comparison between the three CNN architectures CalTech256 database. Table II shows the comparison between
and hybrid approach (SVM) over GHIM10K and CalTech256 CNN architectures using average recall, precision and F-score
database. Table I shows the comparison between CNN ar- on CalTech256 database. From Table II, it can be observed
chitectures using average recall, precision and F-score on that VGG19 improves the system accuracy. Fig. 5 shows the
GHIM10K database. Table I witnessed to the improvement overall comparison between the three CNN architectures over
in the accuracy due to the VGG19 architecture. GHIM10K and CalTech256 datasets.

B. Experiment #2 V. C ONCLUSION
This experiment comprises use of CalTech256 database In this paper, we fine-tuned the network (AlexNet, VGG16
for performance evaluation of VGG19 architecture for image and VGG19) weight parameters over two state-of-the-art
classification task. CalTech256 consists of 256 categories, each databases namely: GHIM10K and CalTech256 for image clas-

978-1-5386-0965-1/18/$31.00 ©2018 IEEE 659


Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018)
IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1

[12] S. Murala and Q. J. Wu, “Mri and ct image indexing and retrieval
using local mesh peak valley edge patterns,” Signal Processing: Image
Communication, vol. 29, no. 3, pp. 400–409, 2014.
[13] S. Murala and Q. Wu, “Peak valley edge patterns: a new descriptor for
biomedical image indexing and retrieval,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition Workshops,
2013, pp. 444–449.
[14] S. Murala, R. Maheshwari, and R. Balasubramanian, “Local tetra
patterns: a new feature descriptor for content-based image retrieval,”
IEEE Transactions on Image Processing, vol. 21, no. 5, pp. 2874–2886,
2012.
[15] M. Subrahmanyam, R. Maheshwari, and R. Balasubramanian, “Local
maximum edge binary patterns: a new descriptor for image retrieval
and object tracking,” Signal Processing, vol. 92, no. 6, pp. 1467–1479,
2012.
[16] S. Murala and Q. J. Wu, “Expert content-based image retrieval system
using robust local patterns,” Journal of Visual Communication and Image
Representation, vol. 25, no. 6, pp. 1324–1334, 2014.
Fig. 5. Overall comparison between the three CNN architectures and hybrid [17] A. Dudhane, G. Shingadkar, P. Sanghavi, B. Jankharia, and S. Talbar,
approach (SVM) over GHIM10K and CalTech256 database. “Interstitial lung disease classification using feed forward neural net-
works,” in Advances in Intelligent Systems Research, ICCASP, vol. 137,
2017, pp. 515–521.
[18] M. Sadafale and S. V. Bonde, “Spatio-frequency local descriptor for
sification task. We compare the performance of these network content based image retrieval,” in 2017 IEEE International Conference
architectures using three parameters recall, precision and F- on Signal Processing, Informatics, Communication and Energy Systems
score. Further, to analyse the robustness of CNN features, (SPICES), Aug 2017, pp. 1–5.
[19] C. Harris and M. Stephens, “A combined corner and edge detector.”
we employed support vector machine for image classification. 1988.
We have compared performance of SVM with discussed CNN [20] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”
networks. Performance analysis witnessed to the % improve- International journal of computer vision, vol. 60, no. 2, pp. 91–110,
2004.
ment in the average recall, precision and F-score on both the [21] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust
databases using VGG19 CNN architecture. In future, these features (surf),” Computer vision and image understanding, vol. 110,
fine-tuned network architectures can be used into high level no. 3, pp. 346–359, 2008.
[22] E. Nowak, F. Jurie, and B. Triggs, “Sampling strategies for bag-
tasks such as object detection, human action recognition etc. of-features image classification,” in Computer Vision – ECCV 2006.
Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 490–503.
[23] A. Verma, S. Banerji, and C. Liu, “A new color sift descriptor and
R EFERENCES methods for image category classification,” in International Congress
on Computer Applications and Computational Science, 2010, pp. 4–6.
[1] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural features [24] M. Brown and S. Ssstrunk, “Multi-spectral sift for scene category
for image classification,” IEEE Transactions on Systems, Man, and recognition,” in CVPR 2011, June 2011, pp. 177–184.
Cybernetics, vol. SMC-3, no. 6, pp. 610–621, Nov 1973. [25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
[2] R. M. Haralick, “Statistical and structural approaches to texture,” Pro- with deep convolutional neural networks,” in Advances in neural infor-
ceedings of the IEEE, vol. 67, no. 5, pp. 786–804, May 1979. mation processing systems, 2012, pp. 1097–1105.
[3] B. S. Manjunath and W.-Y. Ma, “Texture features for browsing and [26] J. D. A. Berg and L. Fei-Fei, “Large scale visual recognition challenge
retrieval of image data,” IEEE Transactions on pattern analysis and 2010,” http://image-net.org/download, 2010, [Online; accessed 29-Jan-
machine intelligence, vol. 18, no. 8, pp. 837–842, 1996. 2018].
[4] J. Han and K.-K. Ma, “Rotation-invariant and scale-invariant gabor [27] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
features for texture image retrieval,” Image and vision computing, large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
vol. 25, no. 9, pp. 1474–1481, 2007. [28] G. Griffin, A. Holub, and P. Perona, “Caltech-256 object category
[5] S. N. Talbar, R. S. Holambe, and T. R. Sontakke, “Supervised texture dataset,” 2007.
classification using wavelet transform,” in Signal Processing Proceed- [29] J. Li and J. Z. Wang, “Automatic linguistic indexing of pictures by a
ings, 1998. ICSP ’98. 1998 Fourth International Conference on, vol. 2, statistical modeling approach,” IEEE Transactions on pattern analysis
1998, pp. 1177–1180 vol.2. and machine intelligence, vol. 25, no. 9, pp. 1075–1088, 2003.
[6] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale
and rotation invariant texture classification with local binary patterns,”
IEEE Transactions on pattern analysis and machine intelligence, vol. 24,
no. 7, pp. 971–987, 2002.
[7] X. Tan and B. Triggs, “Enhanced local texture feature sets for face
recognition under difficult lighting conditions,” IEEE transactions on
image processing, vol. 19, no. 6, pp. 1635–1650, 2010.
[8] S. Murala and Q. J. Wu, “Local mesh patterns versus local binary
patterns: biomedical image indexing and retrieval,” IEEE Journal of
Biomedical and Health Informatics, vol. 18, no. 3, pp. 929–938, 2014.
[9] S. Murala, R. Maheshwari, and R. Balasubramanian, “Directional binary
wavelet patterns for biomedical image indexing and retrieval,” Journal
of Medical Systems, vol. 36, no. 5, pp. 2865–2879, 2012.
[10] S. Murala and Q. J. Wu, “Local ternary co-occurrence patterns: a new
feature descriptor for mri and ct image retrieval,” Neurocomputing, vol.
119, pp. 399–412, 2013.
[11] M. Subrahmanyam and Q. J. Wu, “Spherical symmetric 3d local
ternary patterns for natural, texture and biomedical image indexing and
retrieval,” Neurocomputing, vol. 149, pp. 1502–1514, 2015.

978-1-5386-0965-1/18/$31.00 ©2018 IEEE 660

You might also like