
Article
Computer Aided Breast Cancer Detection Using Ensembling of
Texture and Statistical Image Features
Soumya Deep Roy
1
, Soham Das
1
, Devroop Kar
2
, Friedhelm Schwenker
3,
* and Ram Sarkar
2
Citation: Roy, S.D.; Das, S.; Kar, D.;
Schwenker, F.; Sarkar, R. Computer
Aided Breast Cancer Detection Using
Ensembling of Texture and Statistical
Image Features. Sensors 2021, 21, 3628.
https://doi.org/10.3390/s21113628
Academic Editor: Sheryl Berlin
Brahnam
Received: 12 April 2021
Accepted: 14 May 2021
Published: 23 May 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright:
c
2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
Department of Metallurgical and Material Engineering, Jadavpur University, Kolkata 700032, India;
2
Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India;
3
Institute of Neural Information Processing, Ulm University, 89081 Ulm, Germany
Abstract:
Breast cancer, like most forms of cancer, is a fatal disease that claims more than half a
million lives every year. In 2020, breast cancer overtook lung cancer as the most commonly diagnosed
form of cancer. Though extremely deadly, the survival rate and longevity increase substantially with
early detection and diagnosis. The treatment protocol also varies with the stage of breast cancer.
Diagnosis is typically done using histopathological slides from which it is possible to determine
whether the tissue is in the Ductal Carcinoma In Situ (DCIS) stage, in which the cancerous cells have
not spread into the encompassing breast tissue, or in the Invasive Ductal Carcinoma (IDC) stage,
wherein the cells have penetrated into the neighboring tissues. IDC detection is extremely time-
consuming and challenging for physicians. Hence, this can be modeled as an image classification task
where pattern recognition and machine learning can be used to aid doctors and medical practitioners
in making such crucial decisions. In the present paper, we use an IDC Breast Cancer dataset
that contains 277,524 images (with 78,786 IDC positive images and 198,738 IDC negative images)
to classify the images into IDC(+) and IDC(-). To that end, we use feature extractors, including
textural features, such as SIFT, SURF and ORB, and statistical features, such as Haralick texture
features. These features are then combined to yield a dataset of 782 features. These features are
ensembled by stacking using various Machine Learning classifiers, such as Random Forest, Extra
Trees, XGBoost, AdaBoost, CatBoost and Multi Layer Perceptron followed by feature selection
using Pearson Correlation Coefficient to yield a dataset with four features that are then used for
classification. From our experimental results, we found that CatBoost yielded the highest accuracy
(92.55%), which is at par with other state-of-the-art results—most of which employ Deep Learning
architectures. The source code is available in the GitHub repository.
Keywords: breast cancer; IDC; machine learning; ensemble learning; feature selection
1. Introduction
With the widespread digitization of health records, computer aided disease detection
(CADD) systems that employ data mining and Machine Learning (ML) techniques have
become increasingly commonplace. Considering the monstrosity of the disease, it comes
as little surprise that the earliest efforts in CADD [
1
] started with mammography for the
detection of breast cancer and were later extended to other types of cancer as well. Breast
cancer is a condition in which the cells in the breast proliferate uncontrollably.
According to global cancer statistics [
2
], breast cancer in women has overtaken lung
cancer as the most diagnosed cancer in the world, comprising almost 12% of the total
instances of cancer worldwide. As is the case in most types of cancer, the treatment of
breast cancer is greatly aided by early detection but involves surgical or intensive medical
procedures if not diagnosed early.
Sensors 2021, 21, 3628. https://doi.org/10.3390/s21113628 https://www.mdpi.com/journal/sensors