Thyroid Cancer Computer-Aided Diagnosis System Using MRI-Based Multi-Input CNN Model
Thyroid Cancer Computer-Aided Diagnosis System Using MRI-Based Multi-Input CNN Model
Ahmed Naglaha , Fahmi Khalifaa , Reem Khaledb , Ahmed Abdel khalek Abdel Razekb , Ayman El-Baza∗
2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI) | 978-1-6654-1246-9/20/$31.00 ©2021 IEEE | DOI: 10.1109/ISBI48211.2021.9433841
a
Department of Bioengineering at University of Louisville, Louisville, United States
b
Radiology Department, Faculty of Medicine, Mansoura University, Mansoura, Egypt
ABSTRACT and mainly performed using biopsy. Although biopsy, either fine-
needle aspiration or surgical excision of the nodule, is still being the
Achieving early detection and classification of thyroid nodules con- definitive way of clinical evaluation, this invasive procedure is costly
tributes to the prediction of cancer burdening and also steers appro- and is not always accurate, with a false negative rate depending on
priate clinical pathways of that medical condition. We propose a the biopsy technique and size of the nodule being aspirated [3–6].
novel multimodal MRI-based computer-aided diagnosis (CAD) sys- The type of imaging technology used as input to AI algorithms
tem that detects cancerous thyroid nodules using a deep-learning ar- can affects the accuracy of the desired computer-aided diagnosis
chitecture. Particularly, our system is built with a multi-input convo- (CAD) system. US imaging is currently used as a first-line evalu-
lutional neural network (CNN) to perform fusion of two MRI modal- ation of suspected thyroid nodules [2], and specific features of thy-
ities: the diffusion weighted image (DWI) and apparent diffusion co- roid nodules in US imaging can be associated with higher risk of
efficient (ADC) map. The main contribution of our system is three- malignancy. However, the appearance of those features in US im-
folded. Namely, (1) it is the first system to fuse thyroid DWI and ages is operator-dependent, and also multiple features need to be
ADC using CNN for classification purpose; (2) it enables indepen- considered simultaneously during the evaluation in order to provide
dent convolutions process for each of DWI and ADC images, which sufficient malignancy diagnostic power [2]. These factors cause var-
can increase the likelihood of detecting deep texture patterns in thy- ious limitations in AI-based systems that use US images for thyroid
roid nodules; and (3) it enables adding extra channels in each input nodule classification [7–9]. Instead of US, our proposed system uses
with the possibility to integrate with additional MRI modalities and multimodal MRI with capability to measure the apparent diffusion
other imaging technologies. We compared our system to other fu- coefficient (ADC) in thyroid nodules. Studies suggest that statis-
sion methods and also to other machine learning (ML) frameworks tical analysis between ADC value and the value from other corre-
that use hand-crafted features. Our system achieved the highest per- sponding MRI modality can model the texture of thyroid nodules,
formance among them with diagnostic accuracy of 0.88, precision of and therefore can differentiate between malignant and benign nod-
0.82, and recall of 0.82. ules [10–12].
Index Terms— Pattern recognition and classification, Thyroid, In this paper, we propose a novel CNN-based CAD that fuses
Diffusion weighted imaging the apparent ADC and the diffusion-weighted image (DWI) using
a multi-input CNN network, in contrast to a recent study that uses
CNN-based system without using ADC [13]. ADC images can be
1. INTRODUCTION considered an indication of cell density in tissues [14], and therefore
In the United States, approximately 52,890 new cases of thyroid can- can be used to search for cancer biomarkers in cancerous nodules
cer and about 2,180 deaths were estimated in 2020 according to the which usually involve high rate of cell proliferation. In distinction
American Cancer Society’s most recent statistics [1]. The preva- with a recent study that uses multiparametric MRI radiomics for pre-
lence of thyroid nodules is approximately 5% in women and 1% in diction [15], we uses a CNN-based structure instead of hand-crafted
men [2]. Among the cases of thyroid nodules, 7% –15% evolve features to utilizes a process of independent convolutions for ADC
into malignant tumors (cancerous tissue), and this rate depends on and DWI before fusing them using the dense fully-connected layer.
age, sex, radiation exposure history, family history, and other fac- This process increases the possibility to detect deep texture patterns
tors [2]. Malignant tumors can be classified into three major cate- from each modality without loosing the fusion capability. Also our
gories: Differentiated thyroid cancer (DTC), medullary thyroid can- system enables adding other channels in each input that can integrate
cer, and anaplastic thyroid cancer. DTC has the biggest share of additional MRI modalities and other imaging technologies.
thyroid cancer, with a share of more than 90%. DTC includes two
main subcategories: papillary thyroid carcinoma (PTC) and follicu-
2. MATERIAL AND METHODS
lar thyroid carcinoma (FTC). PTC accounts for more than 80% of all
thyroid cancer [2]. Data were collected in this study from 49 patients with patholog-
The diagnostic criteria of thyroid nodules involve different pro- ically proven thyroid nodules. The age range is 25 to 70 years.
cedures that include physical examination, blood test, ultrasound Imaging of the thyroid gland was performed at Mansoura Univer-
imaging (US), magnetic resonance imaging (MRI) imaging, and sity, Egypt with a 1.5 T Ingenia MR scanner (Philips Medical Sys-
biopsy procedure. The detection of smaller nodules becomes easier tems, Best, Netherlands) using a head/neck circular polarization sur-
over time due to the current advances in US and MRI. However, can- face coil. All participants were fully informed about the aims of
cer diagnosis and early stratification of nodules is still challenging the study and provided their informed consent. The inclusion crite-
ria for the study were untreated patients with thyroid nodules whose
∗ Corresponding Author [email protected] malignancy status was unclear from ultrasound examination. Pa-
tients underwent thyroid core biopsy or surgery after MR imaging. moid activation function [18]. The total number of parameters in
Histopathologic diagnoses were provided by an experienced cytol- our proposed network is 45,589 parameters.
ogist or pathologist. In total, there are 17 malignant nodules in 17 The condition of unbalanced classes during training phase was
patients and 40 benign nodules in 32 patients included in our study. handled by configuring the weights in the mean-square error (MSE)
Two types of MRI images were used in our study; DWI image loss function we use in the back-propagation of the network. The
and ADC image. For the DWI image, a diffusion gradient was ap- ratio of the weight of malignant class to the weight of benign class
plied with b-value of b = 1500s/mm2 , and we are referring to this excludes the sample left out duringPleave-one-out validation. The
N 2
in our paper as the base image. For the ADC image, we calculate loss function used is: Loss = N1 i=0 wi (y − yi ) ; where N is
the apparent diffusion coefficient by combining the diffusion images the number of training samples, y is the output of the neural network
of b-values b = 1500s/mm2 and b = 0s/mm2 and substitute on observed during forward propagation, yi is the label of the sample,
the voxel-level in the Stejskal–Tanner equation [16]. In each slice, and wi is the weight of each training sample.
we calculate the size of the nodules’ cross-section that appear in it. We use Adam’s stochastic method to update the parameters of
Then we feed the corresponding base image and ADC image of the the network during learning [19]. The learning rate and other pa-
slice that has higher nodules’ footprint. After that, we extract the rameters of the optimizer were tuned and kept constant during our
nodule in both images using a bounding rectangle that encloses the evaluation. Additionally, we use the ratio of 1 to 3 of the samples as
appearing nodules of each slice. We resize the generated image into validation data during the learning phase.
unified 48x48 images, and we normalize the voxel-intensity to be in
0-1 range. 3. RESULTS AND DISCUSSIONS
To build our diagnostic system, we propose a novel multi-input
deep-learning network, which follows the feed-forward convolu- The evaluation of our system was performed using leave-one-out
tional neural network (CNN) structure. The proposed architecture, cross validation. During the learning phase, validation accuracy was
shown in Fig. 1 consists of two identical branches in structure. The found to saturate before 100 epochs. We kept the network configu-
advantages of our network compared with other is that the gener- ration fixed for our reported results including the ablation study, as
ated kernels are governed by the fusion of base images and ADC well as when compared with other techniques. The proposed sys-
images of the training samples during the forward-propagation and tem evaluation is based on three metrics: accuracy, precision and
backward-propagation of the neural network. Also, a 1 × 1 conv recall. Accuracy is the ratio between the number of correctly classi-
layer is added to the proposed design in order to perform compres- fied cases and the total number of cases and is defined as Accuracy =
tp+tn tp
sion for the features maps. The advantage of this addition is that tp+tn+f p+f n
. Precision is the ratio defined as: Precision = tp+f p
,
the number of weights need to be learned during the training phase tp
and Recall is the ratio defined as: Recall = tp+f n . Here; tp (tn)
is extremely minimized, thus providing fast learning and diagnosis. is the number of correctly classified benign (malignant) cases, and
For the analysis, each of the base images and the ADC images is fed f n (f p) is the number of incorrectly classified benign (malignant)
to the respective branch. The convolution layers is constructed from cases.
4 × 4 conv (with 32 filters and 4 × 4 kernel size), 1x1 conv (with
16 filters and 1 × 1 kernel size), pooling block (2 × 2 pool size, 3.1. Ablation Study
maximum value pooling). Each branch has two convolution blocks
before being concatenated into the dense fully-connected layers (2 An ablation study has been conducted to assess the accuracy of the
layers). Those layers are one hidden layer of 10 neuron with ReLU proposed method. The study shows that the proposed fusion using
activation function [17], and one output layer of 1 neuron with sig- multi-input CNN outperformed other compared frameworks. In this
1692
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on February 06,2024 at 07:09:05 UTC from IEEE Xplore. Restrictions apply.
Table 1. Ablation study results for the proposed system.
Evaluation Metrics
Method Accuracy Precision Recall
Single-Input CNN (base-images + ADC) 0.82 0.76 0.72
Single-Input CNN (base-images only) 0.84 0.82 0.74
Single-Input CNN (ADC only) 0.82 0.82 0.70
Two-CNN voting (base-images + ADC) 0.86 0.82 0.78
Multi-Input CNN (Proposed Method) 0.88 0.82 0.82
study, a single input CNN, with the same structure, is build and is
used for the diagnosis using the base image one time. Similarly,
another experiment is conducted using ADC image.
In addition to that, we studied and compared our proposed sys-
tem to other fusion methods. The first fusion method includes a
probability voting between the base image prediction and the ADC
image prediction that are the outputs from the single input networks.
We used the following equation to get the resultant probability after
voting: Pv = 12 (PBI + PADC ). The second fusion method uses
single input CNN while having the base image and the ADC image
as channels to the input. This structure will limit the ability to use
1x1 conv blocks to avoid distortions of the images.
The comparison results between our proposed method to the
designs with reduced (i.e. single input CNN) structures are sum-
marized in Table 1. It shows that the proposed method of fusion
achieves the highest performance compared to other CNN designs.
1693
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on February 06,2024 at 07:09:05 UTC from IEEE Xplore. Restrictions apply.
Table 2. Comparative performance for the proposed multi-input CNN system and hand-crafted-based machine learning techniques.
Evaluation Metrics
Method Accuracy Precision Recall
Hand-crafted features with DT classifier 0.70 0.60 0.70
Hand-crafted features with NB classifier 0.77 0.73 0.70
Hand-crafted features with RF classifier 0.77 0.73 0.77
Hand-crafted features with SVM classifier 0.57 0.40 0.60
Multi-Input CNN (Proposed Method) 0.88 0.82 0.82
5. COMPLIANCE WITH ETHICAL STANDARDS [11] A. M. Brown, S. Nagala, et al., “Multi-institutional validation
This research study was conducted using human subject data who of a novel textural analysis tool for preoperative stratification
provided their consent to participate in this study. of suspected thyroid tumors on diffusion-weighted mri,” Mag-
netic resonance in medicine, vol. 75, no. 4, pp. 1708–1716,
6. ACKNOWLEDGEMENTS 2016.
No funding was received for conducting this study. The authors de- [12] S. Schob, H. J. Meyer, et al., “Histogram analysis of diffusion
clare no competing interests. weighted imaging at 3t is useful for prediction of lymphatic
metastatic spread, proliferative activity, and cellularity in thy-
7. REFERENCES roid cancer,” International Journal of Molecular Sciences, vol.
18, no. 4, pp. 821, 2017.
[1] A. C. Society, “Cancer facts and figures 2020,” Atlanta, Ga: [13] R. Zhang, Q. Liu, et al., “Thyroid classification via new multi-
American Cancer Society; 2020. channel feature association and learning from multi-modality
[2] B. R. Haugen, E. K. Alexander, et al., “2015 american thy- mri images,” in 2018 IEEE 15th International Symposium on
roid association management guidelines for adult patients with Biomedical Imaging (ISBI 2018). IEEE, 2018, pp. 277–280.
thyroid nodules and differentiated thyroid cancer: the ameri- [14] A. Surov and N. Garnov, “Proving of a mathematical model of
can thyroid association guidelines task force on thyroid nod- cell calculation based on apparent diffusion coefficient,” Trans-
ules and differentiated thyroid cancer,” Thyroid, vol. 26, no. 1, lational oncology, vol. 10, no. 5, pp. 828–830, 2017.
pp. 1–133, 2016.
[15] H. Wang, B. Song, et al., “Machine learning-based multipara-
[3] I. L. Rojo, A. G. Valdazo, and J. G. Ramirez, “Current metric mri radiomics for predicting the aggressiveness of pap-
use of molecular profiling for indeterminate thyroid nodules,” illary thyroid carcinoma,” European Journal of Radiology, vol.
Cirugı́a Española (English Edition), vol. 96, no. 7, pp. 395– 122, pp. 108755, 2020.
400, 2018. [16] E. O. Stejskal and J. E. Tanner, “Spin diffusion measurements:
[4] L. C. Pescatori, P. Torcia, et al., “Which needle in the treatment spin echoes in the presence of a time-dependent field gradient,”
of thyroid nodules?,” Gland surgery, vol. 7, no. 2, pp. 111, The journal of chemical physics, vol. 42, no. 1, pp. 288–292,
2018. 1965.
[5] L. F. Alexander, N. J. Patel, et al., “Thyroid ultrasound: Diffuse [17] A. F. Agarap, “Deep learning using rectified linear units
and nodular disease,” Radiologic Clinics, 2020. (relu),” arXiv preprint arXiv:1803.08375, 2018.
[6] R. Mistry, C. Hillyar, et al., “Ultrasound classification of thy- [18] D. J. Finney, Probit analysis: a statistical treatment of the sig-
roid nodules: A systematic review,” Cureus, vol. 12, no. 3, moid response curve, Cambridge university press, Cambridge,
2020. 1952.
[7] A. A. Ardakani, A. Gharbali, and A. Mohammadi, “Classifi- [19] D. P. Kingma and J. Ba, “Adam: A method for stochastic opti-
cation of benign and malignant thyroid nodules using wavelet mization,” arXiv preprint arXiv:1412.6980, 2014.
texture analysis of sonograms,” Journal of Ultrasound in [20] C. Müller, Spherical harmonics, vol. 17, Springer, 2006.
Medicine, vol. 34, no. 11, pp. 1983–1989, 2015.
[21] M. A. Friedl and C. E. Brodley, “Decision tree classification
[8] F. Verburg and C. Reiners, “Sonographic diagnosis of thyroid of land cover from remotely sensed data,” Remote sensing of
cancer with support of ai,” Nature Reviews Endocrinology, vol. environment, vol. 61, no. 3, pp. 399–409, 1997.
15, no. 6, pp. 319–321, 2019.
[22] A. Liaw, M. Wiener, et al., “Classification and regression by
[9] F.-s. Ouyang, B.-l. Guo, et al., “Comparison between linear randomforest,” R news, vol. 2, no. 3, pp. 18–22, 2002.
and nonlinear machine-learning algorithms for the classifica-
[23] A. McCallum, K. Nigam, et al., “A comparison of event models
tion of thyroid nodules,” European journal of radiology, vol.
for naive bayes text classification,” in AAAI-98 workshop on
113, pp. 251–257, 2019.
learning for text categorization. Citeseer, 1998, vol. 752, pp.
[10] Y. Hao, C. Pan, et al., “Differentiation between malignant 41–48.
and benign thyroid nodules and stratification of papillary thy-
[24] J. A. Suykens and J. Vandewalle, “Least squares support vector
roid cancer with aggressive histological features: Whole-lesion
machine classifiers,” Neural processing letters, vol. 9, no. 3,
diffusion-weighted imaging histogram analysis,” Journal of
pp. 293–300, 1999.
Magnetic Resonance Imaging, vol. 44, no. 6, pp. 1546–1555,
2016.
1694
Authorized licensed use limited to: MAULANA AZAD NATIONAL INSTITUTE OF TECHNOLOGY. Downloaded on February 06,2024 at 07:09:05 UTC from IEEE Xplore. Restrictions apply.