0% found this document useful (0 votes)

10 views29 pages

Boosted Ensemble

The manuscript presents a novel approach for lung cancer diagnosis using a Boosted Neural Network Ensemble Classification method, specifically the Weight Optimized Neural Network with Maximum Likelihood Boosting (WONN-MLB). It aims to enhance diagnostic accuracy while minimizing classification time and false positive rates through a two-stage process involving feature selection and ensemble classification. Experimental results indicate that the proposed method outperforms conventional techniques in terms of accuracy and efficiency.

Uploaded by

AnupMallick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views29 pages

Boosted Ensemble

Uploaded by

AnupMallick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Accepted Manuscript

Boosted neural network ensemble classification for lung cancer disease

diagnosis

Jafar A. ALzubi, Balasubramaniyan Bharathikannan, Sudeep Tanwar,

Ramachandran Manikandan, Ashish Khanna,
Chandrasekar Thaventhiran

PII: S1568-4946(19)30223-6
DOI: https://doi.org/10.1016/j.asoc.2019.04.031
Reference: ASOC 5461

To appear in: Applied Soft Computing Journal

Received date : 3 December 2018

Revised date : 19 April 2019
Accepted date : 27 April 2019

Please cite this article as: J.A. ALzubi, B. Bharathikannan, S. Tanwar et al., Boosted neural
network ensemble classification for lung cancer disease diagnosis, Applied Soft Computing Journal
(2019), https://doi.org/10.1016/j.asoc.2019.04.031

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form.
Please note that during the production process errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
*Manuscript
Click here to view linked References

Boosted Neural Network Ensemble Classification for Lung Cancer

Disease Diagnosis

Jafar A. ALzubi1, Balasubramaniyan Bharathikannan2, Sudeep Tanwar3, Ramachandran.Manikandan4,

Ashish Khanna5, Chandrasekar Thaventhiran6
1
School of Engineering, Al-Balqa Applied University, Jordan,
2
School of Computer Science and Engineering, Golgotha’s University, Greater Noida, India,
3
Department of Computer Engineering, Institute of Technology, Nirma University, Ahmedabad, India
4, 6
School of Computing, SASTRA Deemed University, India
5
Department of Computer Engineering, Maharaja Agrasen Institute Technology, GGSIP University, India

E-mails: [email protected], [email protected], [email protected],

[email protected], [email protected], [email protected]

Abstract
Accurate diagnosis of Lung Cancer Disease (LCD) is an essential process to provide
timely treatment to the lung cancer patients. Artificial Neural Networks (ANN) is a recently
proposed Machine Learning (ML) algorithm which is used on both large-scale and small-size
datasets. In this paper, an ensemble of Weight Optimized Neural Network with Maximum
Likelihood Boosting (WONN-MLB) for LCD in big data is analyzed. The proposed method is
split into two stages, feature selection and ensemble classification. In the first stage, the essential
attributes are selected with an integrated Newton Raphsons Maximum Likelihood and Minimum
Redundancy (MLMR) preprocessing model for minimizing the classification time. In the second
stage, Boosted Weighted Optimized Neural Network Ensemble Classification algorithm is
applied to classify the patient with selected attributes which results to improve the cancer disease
diagnosis accuracy and to minimize the false positive rate. Experimental results demonstrate that
the proposed approach achieves better false positive rate, accuracy of prediction, and reduced
delay in comparison to the conventional techniques.

Keywords: Machine Learning, Lung Cancer Disease, Weighted Optimized, Neural Network,
Maximum Likelihood Boosting

1. Introduction
In the present era, one of the foremost causes of death in developing countries is lung
cancer which is increasing rapidly with the dramatic upsurge in cigarette smoking. According to
the survey conducted by big data research in Non-Small Cell Lung Cancer (NSCLC) [1], the
deep learning can be used to improve the rate of diagnostic accuracy by means of prediction and
decision in the medical system. Moreover, Artificial Intelligence techniques were used to solve
prediction and decision for big data in NSCLC. However, the image and diagnostic parameters
were integrated via machine algorithm. Therefore, combining image and diagnostic parameters
was an efficient method for doctors to solve patient’s diagnosis in large data (i.e., Healthcare 4.0)
environment. However, the time consumed to diagnosis for big data was not concentrated.
Boosted Support Vector Machine (SVM) method for imbalanced data (BSI) proposed by
Zięba et al. [2] to solve the issues related to the imbalanced data. They have combined the
advantages of applying ensemble classifiers for uneven data with the cost-sensitive support
vectors machines. Three steps were carried out with the input dataset. In the first step, the
information gain criterion was used to select the effective and required features. Followed by a
feature selection step, the problem to predictpostoperative life expectancy was analyzed
according to Gmean criterion, where the rules were said to be extracted. In the third step, the
accuracy measure and coverage measure were evaluated for the extracted rules which result in
the improved predictive accuracy. However, the prediction accuracy was achieved, but less focus
was made on the error aspect.
Considering the aforementioned issues, in this paper, a new combination approach for
classifier ensembles using the Newton Raphson’s MLMR preprocessing model is proposed,
where the essential features are extracted to reduce the time for lung cancer disease diagnosis.
Newton Raphson’s Maximum Likelihood model is applied to the MRMR attributes is proposed.
Moreover, the first and the second derivative results of maximum relevance minimum redundant
attributes are used to select the most relevant attributes. To achieve it, we explore the features of
the MRMR model, Newton Raphson’s Maximum Likelihood in the combination process of an
ensemble. Then, Boosted Weighted Optimized Neural Network Ensemble Classification
algorithm is proposed to minimize the error (i.e., false positive error rate) and improve diagnosis
accuracy. Optimizedweights related to the decision of each ensemble classifier are defined
dynamically, according to the ensemble classifier outputs and the relation among the outputs of
all ensemble classifiers. In order to evaluate the feasibility of the proposed approach, an
empirical analysis of ensemble performance using Thoracic Surgery Dataset, comparing its
performance with ensemble classifier using traditional methods.
1.1 Motivation

Healthcare is one of the essential sources in big data. Accurate analysis of healthcare data
is highly in demand for diagnosing the disease at early stage. Recently, many research works
have been designed for identifying disease in the big data with higher quality. But, there is a
requirement for novel classification technique to increase the diagnosis accuracy with time.
Moreover, ML algorithms are designed to increase the prediction accuracy in big data. However,
error rate still not exploited to its full potential. Therefore, this research work motivates
optimized machine learning algorithms to improve the diagnosis accuracy with lower time and
error.

1.2Research Contributions
Contributions of this paper are as follows.
 To increase the performance of lung cancer diagnosis accuracy for big data as compared
to state-of-the-art works, WONN-MLB method is usedwithWeight Optimized Neural
Network to have Maximum Likelihood Boostinsg for Lung Cancer Disease.
 To minimize the classification time for early lung cancer disease diagnosis, integrated
Newton Raphsons MLMR preprocessing model is used toselect the relevant attributes, to
obtain higher diagnosis accuracy.
 To reduce the error (i.e. false positive rate) and improves the disease diagnosis accuracy
with higher classification efficiency and lower classification time, Boosted Weighted
Optimized Neural Network Ensemble Classification algorithm is designed in WONN-
MLB method.

1.3 Organization

Rest of the paper is organized as follows. Section 2 describes the related works on various
lung cancer disease diagnosis are reviewed. In Section 3, the ensemble classification method
along with a preprocessing model for lung cancer disease diagnosis is investigated, the maximum
relevance method along with the maximum likelihood function is explored in detail, and the
effects of the extracted relevant attributes on the ensemble classification performance of the
WONN-MLB method are also studied. In Section 4, the performance of the proposed approach is
compared with the state-of-the-art approaches to demonstrate its effectiveness for lung cancer
disease diagnosis and Section 5 concludes the paper.

2. Related works

With the invention of the microarray technique, scientists and researchers have immense
opportunity to evaluate the expression levels of thousands of genes concurrently in a single
experiment. In Ghorai et al.[4], the Nonparallel Plane Proximal Classifier (NPPC) was proposed
for cancer classification in a Computer Aided Diagnosis (CAD) framework to ensure high
classification accuracy and to minimize the computation time. But, Valvular heart disorders were
considered to be one of the most difficult classification problems. Sengur et al [5] used three
powerful and popular ensemble learning representative called, bagging, boosting, and random
subspacesto early detect Valvular heart disorders. However, the classification time was
minimized using methods, but the rate at which the accuracy was said to be attained remained
unaddressed. In Costaaet al. [6], three Generalized Mixture (GM) functions were applied via
dynamic weights to improve the classification accuracy of the classification system. Though the
function handles single-label classification, multi-label classification problem was not addressed.
A case study for brain tumor diagnosis using global optimization based hybrid wrapper-
filter feature selection with ensemble classification methods wasproposed by Huda et al.[7]. It
increases the classification accuracy, but the classification time was not minimized.
Approximately 40% of the world’s population is affected by cancer. A Proportion SVM was
used by Huseeinet al. [8] for efficient categorization of Lung Nodules, which results in the
improved diagnosing accuracy. The proportion of SVM failed to minimize the error rate in
disease categorization. Another method to early detect lung cancer was proposed by Abetiba et
al.[9] using Radial Basis Function Neural Network with Affine Transforms which in turn
achieved high classification accuracy and low mean square error. But, the performance of feature
extraction was not improved. A review of feature selection and parallel classification systems
was carried out by Jain et al. [10] to enhance the classification accuracy for disease perdition, but
classification time was not minimized.
A Critical assessment of ANN was carried out in Dande et al.[11] which results in an
increase in the efficacy and specificity of the diagnostic techniques, but it fails to minimize the
computational complexity. Tumor tissue based on pathological evaluation is considered to be
one of the most pivotal for early diagnosis in cancer patients. However, the automated image
analysis methods have the potential to improve the accuracy of disease diagnosis and to
minimize human errors. Khosravia et al. [12] proposed different computational methods using
convolutional neural networks (CNN), where a stand-alone pipeline was constructed in an
effective manner to classify several histopathology images across different types of cancer. But,
it fails to minimize the computation cost while classifying the various types of cancer.
Sharma et al. [13] proposed a two-stage hybrid ensemble classification technique to
increase the prediction accuracy of chronic kidney disease with ML technique. It improves the
disease diagnosis, but the multistage classification was not performed with minimum time. Early
diagnoses of lung cancers and differentiation between the tumor types and non-tumor types have
been required to improve the patient survival rate. In Hosseinzadeh et al. [14], a diagnostic
system with structural and physicochemical attributes of proteins via feature extraction, feature
selection, and prediction models was designed. Then, the ML models were applied to both
original and newly created database to predict the lung cancer type of tumors which results in
improved accuracy. However, the model reduces the processing time, but the false positive rate
was not minimized. Evaluation of ML algorithm for lung cancer diagnosis was carried out by
Podolsky et al. [15]. It accurately predicts cancer vulnerability as well as minimizes the false
positive rate. But, the classification time was not exploited which can be helpful for early lung
cancer detection.
A narrative review based on radiomic features to help diagnose lung cancer in an early
stage was proposed by Rabbani et al.[16], where the ML algorithms were combined with
artificial intelligence approaches. The objective of radiomics remains in extracting and analyzing
several quantitative features from medical images. Moreover, they focused on highly promising
in staging, diagnosing, and predicting outcomes of cancer treatments. However, the machine
learning algorithms used, but the feature extraction does not provideaccurate results. Zhou et al.
[17] proposed a multi-modality and multi-classifier radiomics predictive models to address the
aforementioned issues using a new reliable classifier fusion strategy. Here, the training of
modality-specific classifiers was first made, followed by an analytic evidential reasoning (ER)
rule, which was used to combine the output score from each modality to build an optimal
predictive model towards disease diagnosis. This model failed to minimize the disease prediction
time.
A systematic review of mortalities and survival rate of lung cancer with evolutionary
algorithms was conducted by Dubey et al. [18] to identify a better method for early lung cancer
diagnosis and to achieve higher accuracy rate with deep learningtechniques. It does not minimize
the error rate. Liu et al. [19] proposed a MultiView Convolutional Neural Networks (MV-CNN)
for efficient lung nodule classification, to improve the accuracy, and the classification time.
Here, accurate detection was not performed with the features. Baz et al. [20] explored some
crucial challenges and methodologies with CAD system for lung cancer. It increases the
detection and diagnosis of lung nodules, but the accurate feature selection was not performed to
minimize the detection time.
Deep feature fusion and hand-crafted features for lung nodule classification was
developed by Wang et al. [21]. But, classification performance was not accurate. CAD was
introduced for enhancing the performance of nodule candidate classification by Chen et al. [22].
However, classification time was not minimized. In order to effectively classify the lung nodules,
deep features were extracted in CT images with higher accuracy by Kumar et al. [23]. But, the
error rate was remained unaddressed. Image-based features selection method was developed for
classifying the lung cancer images with higher accuracy Baranidharan et al. [24]. In this method,
novel fusion-based selection was used to select the features for classification. During the feature
selection, the redundant features were unable to be removed thus introduced an error in
classification process. To overcome this problem, the proposed WONN-MLB method used
Newton Raphson’s Maximum Likelihood mode, where MLMR are used to choose the most
relevant attributes. Then, the boosting classifier is applied to classify the attributes for LCD
diagnosis, which reduces the error rate in the classification process.
Data analysis of population statistics and data mining techniques were used in [25] to
determining the cancer morbidity and mortality data in a regional cancer registry. However, false
positive rate was not minimized. Multiple aspects of large scale knowledge mining was covered
in [26] for medical and diseases examination. A new image-based features selection method was
planned in [27] to categorize the lung computed tomography images with a higher accuracy. But,
the feature selection rate was not improved.
Table 1 presents a comparison of the proposed approach with state-of-the-art approaches.
The main aim of this paper is to design diagnosis for LCD using ensemble classification
algorithm with an objective to reduce the classification time and false positive rate as compared
to the state-of-the-art approaches.

Table 1 Comparison of the proposed approach with the state-of-the-art approaches

Author Year Approach Objective Pros Cons

Ensemble learning To classify the Valvular Minimize the Classification accuracy rate
Daset al. [5] 2010
methods heart disease classification time remained unsolved
Perform cancer
provides better
Nonparallel Plane classification with
classification accuracy Valvular heart disorders
Ghorai et al. [4] 2011 Proximal Classifier higher accuracy in a
with lesser computation classification was difficult
(NPPC) Computer Aided
time
Diagnosis
Computer-aided Achieve better detection
Accurate feature selection
Baz et al. [20] 2012 diagnosis (CAD) Lung cancer diagnosis and diagnosis of lung
was not performed
system nodules
Provide more accurate
Hosseinzadeh Machine learning Predict and detect the The false positive rate was
2013 results in lung tumor
et al. [14] models type of lung tumors not minimized
detection
Improve classification,
Radial Basis Function
Adetiba et al. Classifies the Lung accuracy and achieves, Performance of feature
2015 Neural Network with
[9] Cancer and low mean square extraction was not improved
Affine Transforms
error
Detect the lung cancer
Kumar et al. Evolutionary The error rate was not
2016 lung cancer detection accurately with minimum
[18] algorithms minimized
time
Global optimization
based hybrid wrapper- Tumor classification Increases the imbalanced
Classification time was not
Huda et al. [7] 2016 filter feature selection with the imbalanced healthcare data
minimized
with ensemble healthcare data classification
classification
Increase the accuracy of
Podolsky et al. Machine learning predicting cancer Classification time remained
2016 lung cancer diagnosis
[15] algorithm susceptibility and unsolved
Minimize false positive
Multi-modality and
Extract numbers of Increase the disease
multi-classifier Failed to minimize the
Zhou et al. [17] 2017 quantitative features prediction accuracy with
radiomics predictive disease prediction time.
and disease prediction the features
models
Multi‐view Increases the Failed to attain accurate
lung nodule
Kang et al.[19] 2017 convolutional neural classification accuracy disease prediction with
classification
networks (MV‐CNN) and minimizes the time features
Diagnosis and Increase the efficacy and
Dande et al. Artificial Neural Failed to minimize the
2017 evaluation of medical specificity of disease
[11] Network computational complexity
conditions diagnosis
Increase the
Multi-label classification
Generalized mixture classification accuracy Handles single-label
Costaa et al. [6] 2018 problem remained
(GM) functions of a classification classification problems
unaddressed
system
proportion-Support
Hussein et al. Categorizes the Lung Improve the diagnosing Failed to minimize the error
2018 Vector Machine
[8] Nodules accuracy rate
(SVM)
feature selection and classification systems for
Enhancing the accuracy The classification time was
Jain et al. [10] 2018 parallel classification effective disease
of classification systems not minimized
systems prediction
Deep convolutional Increase the precision of
Khosravia et al. Classifying the various Computation cost was not
2018 neural networks diagnosis and minimizes
[12] cancer tissues minimized
(CNN) the error
The multi-stage diagnosis was
Sharma et Two-stage hybrid Classifying the chronic Accurate diagnosis of the
2018 not performed with minimum
al.[13] ensemble technique kidney disease disease with a feature set
time
Machine learning
extracting and analyzing
(ML) method The ML algorithms used for
Rabbani et al. several quantitative Improves diagnosis,
2018 Combining artificial feature extraction was not
[16] features from medical treatment and outcomes
intelligence attained the accurate results
images
approaches

Baranidharan et Image-based features Classify the lung cancer Increase the true positive Error rate was not effectively
2016
al. [24]. selection method images rate minimized

Weight Optimized
Increase diagnosing
Neural Network with
Lung Cancer Disease accuracy and minimizes
Proposed - Maximum Likelihood -
diagnosis with big data the false positive rate,
Boosting (WONN-
classification time
MLB) technique

3. Materials and Methods

In this paper, we proposed a WONN-MLB method to increase the performance of LCD

diagnosis. The WONN-MLB is designed with an implementation of Newton Raphsons MLMR
preprocessing model and Boosted Weighted Optimized Neural Network Ensemble Classification
algorithm. To validate the proposed WONN-MLB method, the Thoracic Surgery Data Dataset
Wroclaw Thoracic Surgery Centre is used [27]. The patient data contains underwent major lung
resections for primary lung cancer in the years 2007–2011. The center is linked through the
Thoracic Surgery of the medical university of Wroclaw and Lower-Silesian Centre for
Pulmonary Diseases, Poland. In order to conduct the experiments, a different number of patient
data is taken, i.e., 10,000 patient data from Thoracic Surgery data dataset. In this data set, the
information related to forced vital capacity, pain before surgery, Haemoptysis before surgery,
Dyspnoea before surgery, cough before surgery, weakness before surgery, peripheral arterial
diseases , smoking , asthma, age at surgery, and year survival period were collected. Based on
this information, the LCD classification was made in the proposed approach.

3.1 Proposed Approach

This section describes the proposed approach and the proposed architecture with WONN-
MLB method for LCD, as shown in Fig. 1. The different phases to implement and utilize the
proposed approach are shown in Fig. 1. These include the data acquisition (Thoracic Surgery
Data Dataset) Zięba et al. [2], feature selection or preprocessing (reducing big data feature
dimensionality), and ensemble classification (using WONN-MLB) and are comprehensively
discussed in the next subsections.

Preprocessing
Data
acquisition
Maximum
Likelihood
Minimum
Thoracic Surgery Redundancy
Data Dataset

Optimal attributes

Ensemble
Classification

Lung cancer disease diagnosis

Fig. 1: Architecture of proposed approach for Lung Cancer Disease Diagnosis

3.1.1 Data acquisition

The data is obtained for classification problem related to lung cancer patients from the
Thoracic Surgery Domain (TSD) archive in the Department of Thoracic Surgery of the Medical
University of Wroclaw and Lower-Silesian Centre for Pulmonary Diseases, Poland, from UCI
Machine Repository. The data was collected retrospectively at Wroclaw Thoracic Surgery Centre
for 1200 patients who underwent major lung resections for primary lung cancer in the years 2007
– 2011. We have used these predictors for lung cancer prediction from the online UCI repository
acquired from Zięba et al. [2].

3.1.2 Newton Raphsons Maximum Likelihood and Minimum Redundancy

preprocessing
To overcome the time complexity, accuracy problems in big data classification, initially
preprocessing step is needed to extract the relevant attributes. While extracting the relevant
attributes the redundant attribute removal is unable to be performed in conventional techniques.
This produces the misclassification results in lung cancer disease diagnosis. Therefore, Newton
Raphsons Maximum Likelihood and Minimum Redundancy pre-processing techniques is
developed to perform relevant attribute extraction through removing redundancy.
A large-scale ML classifier based on boosted classifiers [2] was used for the
classification of biomedical lung cancer data. Moreover, an iterative process was carried out by
updating the boosting coefficient value to minimize the weighted error function. Despite to
minimize the weighted error, less focus was made on the time consumed for lung cancer
diagnosis. In this work, an integrated Newton Raphsons MLMR preprocessing model is applied
to the data acquired from the Thoracic Surgery Data Dataset Zięba et al. [2] with an objective not
only to reduce the weighted error, but also to minimize the time consumed. The preprocessing
proposed model is based on the Newton–Raphson’s method with the maximum likelihood, to
obtain more robust results than other well-known algorithms such as SVMs Zięba et al. [2].
MLMR preprocessing model is employed to find the most relevant and least redundant
attributes in the set of class. At first, the maximum relevancy is identified between set of
attributes and class based on the mutual information. The results often contained most relevance
but redundant. In order to solve this issue, minimum redundancy between attributes is measured
in MLMR preprocessing model. These two conditions are equally important and these are
combined into a single criterion function in MLMR. In WONN-MLB method, additive
combination is used to integrate the maximum relevancy and minimum redundancy. Lastly,
maximization is performed on resultant attributes using Newton Raphsons’s Maximum
Likelihood function thus minimizes the time required to diagnosis the lung cancer. Fig. 2 shows
the flow diagram of proposed MLMR preprocessing model.

Thoracic
Surgery Data Mutual Information
Dataset ‘ ’
Marginal
Probability
Instance Joint Probability

Maximum Relevance
‘ ’
Minimum Redundant
‘ ’
Joint function ‘ ’

Newton Raphsons’s
Maximum Likelihood
function

Maximum Likelihood Minimum Redundant

attributes
Fig. 2: Flow of MLMR preprocessing model

As shown in Fig. 2, let us assume a standard feature selection problem by means of

instance ‘ ’, where ‘ ’ represents the ‘ ’ attribute value of the
‘ ’ sample and ‘ ’ represents the value of the output class ‘ ’. Moreover, let us assume a
training dataset ‘ ’ with ‘ ’ examples consists of a set ‘ ’ with ‘ ’ attributes. The main
objective of MLMR preprocessing model is to identify the maximum dependency between a set
of attributes ‘ ’ and the class ‘ ’, using mutual information, denoted by ‘ ’. The value of
‘ ’ is obtained using the marginal probabilities (i.e., with a pair of attributes) ‘ ’ and
‘ ’,(where ‘ ’) and the joint probability ‘ ’ as given in
Eq. 1.

(1)

However, with big data in consideration for lung cancer analysis, maximum relevance, and
minimum redundancy is measured. Maximum relevance ‘ ’ consists to search attributes with
higher relevancy factor and is formulated as follows:

(2)

With reference to Eq. 2, the maximum relevance between attributes ‘ ’ in class ‘ ’ is

obtained according to the mutual information factor ‘ ’, while to select the attributes based on
the maximum relevance criterion results in larger amount of redundancy. To minimize it, the
minimum redundancy ‘ ’ criterion is used and is formulated as follows:

(3)

From Eq. 3, the minimum redundant attributes ‘ ’ is obtained between the set of
attributes ‘ ’ and ‘ ’, respectively. From Eqs. 2 and 3, the integration and optimization of
both maximum relevancy ‘ ’ and the minimum redundancy ‘ ’ results in maximum
relevance minimum redundancy called as ‘ ’. The maximum relevance minimum
redundancy is calculated as follows:

(4)

Followed by maximum relevance minimum redundancy attributes obtained for lung cancer
disease diagnosis with an objective to minimize the time consumed, in this work, a Newton
Raphsons’s Maximum Likelihood function is used to the resultant attributes. The log-likelihood
function for Eq. 4is formulated as follows:

(5)
In the log-likelihood function the first derivative and second derivative are formulated as
follows:

(6)
(7)

The log-likelihood function is used to maximize the maximum relevance and minimum
redundant attributes. From that, the most relevant attributes are taken for classification process
which effectively reduces the time required to lung cancer disease diagnosis. The pseudo-code of
the proposed Maximum Likelihood Minimum Redundant preprocessing is given in algorithm 1.

Input: ,
Output: Maximum Likelihood Minimum Redundant attributes selected ‘ ’
1: Begin
2: For with
3. Find
4: Determine attribute
5: Minimize attribute
6: Combine
7: Formulate
8: Obtain and
9: End for
10: End

Algorithm 1: Maximum Likelihood Minimum Redundant preprocessing

The Maximum Likelihood Minimum Redundant Preprocessing is described in algorithm
1, where for each training dataset (i.e., big data), all the attributes are not essential. In this work,
for lung cancer diagnosis with big data as the input dataset, maximum relevance, and the
minimum redundant attributes are selected. Then, Newton Raphson’s Likelihood Estimation is
evaluated with respect to the first and the second derivative with an objective tominimize the
time consumed for lung cancer diagnosis.
3.1.3 Weighted Optimized Neural Network with Maximum Likelihood Boosting

Once the Maximum Likelihood Minimum Redundant attributes are obtained, then an
ensemble classification model is used to improve the lung cancer diagnosis accuracy for big data.
In this work, an ensemble of WONN-MLB attributes is applied to achieve the objective of lung
cancer diagnosis accuracy with minimum time and error.

The given ‘ ’ training data (i.e., attributes) ‘ ’,

‘ ’ consists of a vector corresponding to an input sample data, associated with ‘ ’ input
attributes, and ‘ ’ represents the target variable with a class label of either ‘ ’. To start
with, in the proposed model, a weak classifier is trained using distribution ‘ ’, where ‘
’.
An artificial neuron consists of ‘ ’ synapses related to the input attributes
( ) and each input attribute has the corresponding weight ‘ ’. Here, the signal
at input is multiplied by the weight , then the summation of weighted inputs and a linear
combination of the weighted inputs are obtained. Moreover, a bias ‘ ’ is summed to the linear
combination and a weighted sum ‘ ’ is obtained as follows:

(8)

Then, a nonlinear activation function ‘ ’ is applied to the weighted sum ‘ ’ as given in Eq.
9which results in an output ‘ ’:

(9)

Then, a weak classifier with low weighted error is selected and is formulated as follows:

(10)
(11)

From Eqs. 10 and 11, the low weighted error ‘ ’ is obtained based on the probability of
distribution function ‘ ’ for a linear combination of weighted inputs (i.e., attributes)
‘ ’. Finally, a new component ‘ ’ based on error function is calculated as follows:

(12)

Upon successful completion of all of the boosting iterations, final ensemble learning classifier
which possesses weighted error that is better than chance, is evaluated by combining all weak
classifiers with an optimal weight Mana et al. [3]. This is formulated as follows:

(13)
From Eq. 13, the final ensemble learning classifier is measured as a weighted majority vote of
the weak classifiers ‘ ’, where each classifier is assigned by weighting ‘ ’. The pseudo code
of ensemble classification is given in algorithm 2.
Input: Maximum Likelihood Minimum Redundant attributes ‘ ’, , , iteration ,
Optimal weight ‘ ’
Output: Improved lung cancer diagnosis accuracy
1: Procedure
2: Initialize
3: For each and iteration
4: Measure
5: If ‘ ’then
6: Compute
7: Obtain
8: Obtain
9: End if
10: Else
11: Go to step 4
13: End for
14: End

Algorithm 2: Boosted Weighted Optimized Neural Network Ensemble Classification algorithm

The Boosted Weighted Optimized Neural Network Ensemble Classification Algorithm is

introduced to classify the LCD with minimum error, which is given in algorithm 2. In first step
for each maximum likelihood minimum redundant attributes weights are initialized. Then, a
weight initialization and the weighted sum value is obtained. Moreover, a conditional checking is
performed to see whether the weighted sum is less than or equal to the optimal weight Mana et
al. [3]. Upon unsuccessful checking, the weighted sum value with different weights being
initialized is obtained. Then, the process is continued by applying a boosting technique. Here,
three steps are carried out. In first step, a weak classifier with low weighted error is measured.
Then, in second step, a new component based on error function is obtained. Finally, in third step,
final ensemble learning classifier is applied to the new component. Hence, the lung cancer
disease diagnosis accuracy is said to be improved with minimum error rate.

4. Experimental Settings and Results Discussion

To evaluate the performance of proposed WONN-MLB approach the Thoracic Surgery
Data Set[2]is used. The proposed WONN-MLB approach is implemented in JAVA platform
using Weka tool. The Thoracic Surgery Data Dataset is dedicated to classification problem
related to the post-operative life expectancy in the lung cancer patients. The data was collected
retrospectively at Wroclaw Thoracic Surgery Centre. The patient data includes those who
underwent major lung resections for primary lung cancer in the years 2007 – 2011. The Centre
mainly concentrates on the Pulmonary Diseases which is associated with the Department of
Thoracic Surgery of the Medical University of Wroclaw and Lower-Silesian Centre, Poland.
However, the research database constitutes a part of the National Lung Cancer Registry. The
Lung Cancer Registry is administered by the Institute of Tuberculosis and Pulmonary Diseases
in Warsaw, Poland. Specifically, the preprocessing is first performed on the attributes in
Thoracic Surgery Data dataset including, maximum relevancy, minimum redundancy and
maximum likelihood to obtain the relevant features. With the Maximum Likelihood Minimum
Redundant attributes, the next process of ensemble classification is performed for improving
diagnosing accuracy with minimum error and time.
The experimental work of proposed approach is performed for many instances with
respect to various numbers of patient data with an objective to analyze its performance. The
effectiveness of proposed approach is compared with Non-Small Cell Lung Cancer (Big data
research in NSCLC) by Wu et al. [1], Boosted Support vector machine (BSVM) method by
Zięba et al. [2],Nonparallel Plane Proximal Classifier (NPPC) by Ghorai et al. [4], and Multi-
View Convolutional Neural Networks (MV-CNN) by Liu et al. [19] For the better understanding
among the readers, the discussion on obtained results of the proposed approachis explained with
different parameters such as-diagnosing accuracy, false positive rate or error rate, and
classification time, F1-score.

4.1 Scenario 1: Impact of diagnosing accuracy:

It is considered as one of the important parameters for early disease diagnosis. Higher the
diagnosing accuracy, early disease diagnosis is said to be achieved and therefore the method is
also said to be efficient. It provides evidence on how well a method precisely recognizes the
disease and informs upcoming decisions about treatment for physicians or patients. It is given as
follows:

(14)

From Eq.14, the diagnosing accuracy ‘ ’ is arrived at based on the number of data
correctly diagnosed as disease ‘ ’ to the total samples ‘ ’ considered for
experimentation. It is measured in percentage. The values obtained through Eq. 14 are
represented as shown in Fig.3 for different patient data using the proposed WONN-MLB
approach and compared it with the NSCLC and BSVM approaches. The sample calculation to
measure the diagnosing accuracy using the aforementioned three methods is given as follows:

Sample calculation:
 Proposed WONN-MLB: With ‘ ’ patient data considered for experimentation and
number of data correctly diagnosed as disease being ‘ ’, the diagnosing accuracy is
calculated as follows:

 NSCLC: With ‘ ’ patient data considered for experimentation and number of data
correctly diagnosed as disease being ‘ ’, the diagnosing accuracy is calculated as
follows:

 BSVM: With ‘ ’ patient data considered for experimentation and number of data
correctly diagnosed as disease being ‘ ’, the diagnosing accuracy is calculated as
follows:
 NPPC: With ‘ ’ patient data considered for experimentation and number of data
correctly diagnosed as disease being ‘ ’, the diagnosing accuracy is calculated as
follows:

 MV-CNN: With ‘ ’ patient data considered for experimentation and number of data
correctly diagnosed as disease being ‘ ’, the diagnosing accuracy is calculated as
follows:

Fig. 3: Diagnosing accuracy with 10000 patient data

Fig.3 shows the diagnosing accuracy comparison between proposed approach and
existing NSCLC and BSVM, respectively. It is found that the diagnosing accuracy of lung cancer
is improved using WONN-MLB because of measurement of the weak classifier with low
weighted error and new component based on error function through ensemble classification. The
results confirm that with an increase in the number of patient data, the diagnosing accuracy
increases for minimum patient data, then reduces with an increase in the number of patient data.
This happens because with an increase in the number of patient data, many irrelevant attributes
are also present. Moreover, preprocessing performed in the WONN-MLB method, the certain
error is occurred, which results in certain amount of irrelevant attributes even after
preprocessing. However, the comparison made with the existing methods NSCLC, BSVM,
NPPC and MV-CNN shows an improvement is observed by using the WONN-MLB method.
This happens because of the application of ensemble classification that not only minimizes the
error by updating the weak classifier, but also minimizes the time by boosting the updated
results. This in turn improves the diagnosing accuracy using WONN-MLB method by 7%, 11%,
19% and 28%as compared to NSCLC, BSVM, NPPC and MV-CNN, respectively.
4.2 Scenario 2: Impact of false positive rate
The second important parameter used to measure the early diagnosing of lung cancer is
the rate of false positive or error, while to conduct multiple comparisons in a statistical
framework, the false positive rate refers to the probability of falsely rejecting the null hypothesis
for a specific test. In other words, the false positive rate is measured as the ratio between the
number of negative events (i.e., not diagnosed with lung cancer) wrongly categorized as positive
(i.e., diagnosed with lung cancer) and the total number of actual negative events (i.e., not
diagnosed with lung cancer). It is formulated as follows:

(15)

From Eq.15, the false positive rate ‘ ’ refers to the ratio of number of patient data
incorrectly diagnosed as disease ‘ ’ to the total samples ‘ ’ considered for
experimentation. It is measured in terms of percentage (%).The values obtained through Eq. 15
are represented as shown in Fig. 5 for different patient data using the proposed WONN-MLB
approach and compared it with the NSCLC and BSVM. The sample calculation for measuring
false positive rate using the three methods is given as follows:

Sample calculation:
 Proposed WONN-MLB: With ‘ ’ number of patient data considered as samples and
‘ ’ number of patient data incorrectly diagnosed with lung cancer disease, the false
positive rate is as given as follows:

 NSCLC: With ‘ ’ number of patient data considered as samples and ‘ ’ number

of patient data incorrectly diagnosed with lung cancer disease, the false positive rate is
given as follows:

 BSVM: With ‘ ’ number of patient data considered as samples and ‘ ’ number of

patient data incorrectly diagnosed with lung cancer disease, the false positive rate is as
given as follows:
 NPPC: With ‘ ’ number of patient data considered as samples and ‘ ’ number of
patient data incorrectly diagnosed with lung cancer disease, the false positive rate is as
given as follows:

 MV-CNN: With ‘ ’ number of patient data considered as samples and ‘ ’ number

of patient data incorrectly diagnosed with lung cancer disease, the false positive rate is as
given as follows:

From Eq. 15, the false positive rate for different number of patient data in the range of
1000 to 10000 is measured. The results of experimental evaluations conducted to measure the
false positive rate as shown in table 1. The false positive rate obtained using the proposed
WONN-NLB approach offers comparable values than the state-of-the-art methods.

Fig. 4: Performance measure of false positive rate

Fig 4 shows the performance analysis of false positive rate for disease diagnosis for big
data. As illustrated in Fig 4, when 1000 number of patient data is considered as samples, 90
patient data were incorrectly diagnosed with lung cancer using WONN-MLB, 120 patient data
were incorrectly diagnosed with lung cancer using NSCLC, 140 patient data are incorrectly
diagnosed with lung cancer using BSVM, 160patient data were incorrectly diagnosed using
NPPC, 170 patient data were incorrectly diagnosed using MV-CNN. The false positive rate
using WONN-MLN is minimized by 25%, 36%, 44% and 47%as compared to NSCLS and
BSVM, NPPC and MV-CNN respectively. This result is achieved with Newton Raphsons
MLMR preprocessing model. The advantage of applying MLMR preprocessing model is that
instead of using all the attributes in the dataset, only the maximum likelihood and relevancy
attributes are considered for disease diagnosis. With the application of log-likelihood function,
the attribute availability also gets changed and reflected in the maximum relevance minimum
redundancy coefficient. This adaptive change made through maximum relevance minimum
redundancy coefficient in terms minimizes the incorrect lung cancer diagnosis using the WONN-
MLN method. The resultant attributes are then used to classify the patients as lung cancer and
normal patient which in turn minimizes the false positive rate by 39%, 53%, 58% and 61%as
compared as compared to NSCLS, BSVM, NPPC and MV-CNN respectively.

4.3 Scenario 3: Classification time

The third parameter considered for the early diagnosis of lung cancer is the classification
time. The classification time refers to the time taken to classify the patient data as diagnosed with
lung cancer or not diagnosed with lung cancer. The classification time is calculated as follows:

(16)

From Eq. 16, the classification time ‘ ’ is calculated according to the samples ‘ ’ and
the time consumed to perform ensemble classification ‘ ’. Lower the classification
time, early the lung cancer diagnosis is said to be. It is measured in terms of milliseconds (ms).
The values obtained through Eq. 16 are represented in Fig.4 with the proposed WONN-MLB
approach, existing NSCLC and BSVM. The sample calculation for classification time using the
three methods is given as follows:

Sample calculations:

 Proposed WONN-MLB: With the time taken for classification of single patient data
being ‘ ’, with ‘ ’ number of patient data considered as samples, the
classification time is calculated as follows:

 NSCLC: With the time taken for classification of single patient data being ‘ ’,
with ‘ ’ number of patient data considered as samples, the classification time is given
as follows:

 BSVM: With the time taken for classification of single patient data being ‘ ’,
with ‘ ’ number of patient data considered as samples, the classification time is given
as follows:
 NPPC: With the time taken for classification of single patient data being ‘ ’,
with ‘ ’ number of patient data considered as samples, the classification time is given
as follows:

 MV-CNN: With the time taken for classification of single patient data being
‘ ’, with ‘ ’ number of patient data considered as samples, the
classification time is given as follows:

Fig.5: Performance measure of classification time

Fig.5 shows the measure of classification time to classify the patient data with diagnosed
as disease or not, the proposed approach is implemented in Java Language using various
numbers of patient data in the range of 1000 to 10000. The experimental result of classification
time using proposed method is compared with existing NSCLC and BSVM. When considering
1000 number of patient data for the experimental work, the proposed method consumed 8.5ms to
classify, whereas the existing NSCLC, BSVM, NPPC and MV-CNN consumed 8.9ms,
9.3ms,11.5ms and 13.2ms respectively. Thus, it is clear that the classification time using
proposed approach is less as compared to other existing methods [1], [2]. However, with an
increase in the number of patient data and increase in the number and size of the attributes, the
classification time is also increases using all the three methods. Comparative analysis shows that
the classification time using proposed approach is less than the [1], [2], [4] and [19] methods.
This is because of the application of the Newton Raphson’s Maximum Likelihood model in
addition to the maximum relevance minimum redundancy factor, which applies the first derivate
and the second derivate to extract the most relevant attributes. With this most relevant attributes
extracted, the classification time is reduced using proposed approach by 34%, 51%, 56%, and
59%as compared to NSCLC by Wu et al. [1], BSVM by Zięba et al. [2], NPPC by Ghorai et al.
[4], and MV-CNN by Liu et al. [19],respectively.

4.4 Scenario 4: F1-score

The fourth parameter taken for classifying lung cancer diagnosis is F1-score. F1-score is
a single measure of performance test for the positive class. It is defined both precision and recall
of the test. Precision is the number of correct positive results divided by the number of all
positive results returned by the classifier and recall is the number of correct positive results
divided by the number of all relevant samples. The F1-score is calculated as follows:

(17)

From Eq. 17, the classification time ‘ ’ is measured using average mean of
precision and recall value. Higher the F1-score, early the lung cancer diagnosis is said to be. The
sample calculation for F1-scoreusing the five methods is given as follows:

Sample calculations:

 Proposed WONN-MLB: With ‘ ’ patient data considered for experimentation and

precision is value is identified as 93 and recall value is 91, the F1-scoreis calculated as
follows:

 NSCLC: With ‘ ’ patient data considered for experimentation and precision is value
is identified as 89 and recall value is 87, the F1-score is calculated as follows:

 BSVM: With ‘ ’ patient data considered for experimentation and precision is value
is identified as 86 and recall value is 85, the F1-score is calculated as follows:
 NPPC: With ‘ ’ patient data considered for experimentation and precision is value is
identified as 80 and recall value is 82, the F1-score is calculated as follows:

 MV-CNN: With ‘ ’ patient data considered for experimentation and precision is

value is identified as 74 and recall value is 78, the F1-score is calculated as follows:

Fig.6: Performance measure of F1-score

Fig.6 illustrates the measure of F1-scoreto classify the patient data with higher accuracy.
In order to conduct the experiments, 1000 to 10000 patient data is considered. The performance
analysis ofF1-score using proposed method is compared with existing NSCLC, BSVM, NPPC,
and MV-CNN . When considering 1000 number of patient data for the performance analysis, the
proposed method provides the F1-score of 92% whereas the existing NSCLC, BSVM, NPPC and
MV-CNN produced 88%, 85%, 81% and 76%respectively. From the discussion, it is clear that
the F1-score using proposed method is higher as compared to other existing methods. While
increasing the number of patient data, the value of F1-score is increased in all methods.
Comparatively, F1-score using proposed method is high than the [1], [2], [4] and [19] methods.
This is due the application of weighted optimized neural network with maximum likelihood
boosting which classifies the patient data with higher accuracy. Therefore, F1-score is improved
using proposed WONN-MLB approach by 7%, 11%, 19%, and 26% as compared to NSCLC by
Wu et al. [1], BSVM by Zięba et al. [2], NPPC by Ghorai et al. [4], and MV-CNN by Liu et al.
[19], respectively.

4.5 Scenario 5: Space complexity

Space complexity is defined as an amount of storage space required to store the patient
data in big healthcare data analytics. It is measured in terms of megabyte (MB). The
mathematical formula for space complexity is measured as follows,
(18)

In (18), ‘ ’ denotes a space complexity and ‘n’ denotes the number of the patient data.
The sample calculation for space complexity using the five methods is given as follows:

Sample calculations:

 Proposed WONN-MLB: With ‘ ’ patient data considered for experimentation and

space for storing one patient data is 0.01MB, the space complexity is calculated as
follows:

 NSCLC: With ‘ ’ patient data considered for experimentation and space for storing
one patient data is 0.012MB, the space complexity is calculated as follows:

 BSVM: With ‘ ’ patient data considered for experimentation and space for storing
one patient data is 0.015MB, the space complexity is calculated as follows:

 NPPC: With ‘ ’ patient data considered for experimentation and space for storing
one patient data is 0.018MB, the space complexity is calculated as follows:

 MV-CNN: With ‘ ’ patient data considered for experimentation and space for
storing one patient data is 0.021MB, the space complexity is calculated as follows:
Fig.7: Performance measure of space complexity

Fig.7 shows the measure of space complexity to store the patient data with minimum
space. To conduct the experiments, 1000 to 10000 patient data is considered. From Fig 7, the
performance analysis of space complexity using WONN-MLB approach is compared with
existing NSCLC, BSVM, NPPC, and MV-CNN. While considering 1000 number of patient data
for analyzing the performance, the proposed WONN-MLB approach provides the 10MB of
space complexity whereas the existing NSCLC, BSVM, NPPC and MV-CNN offers 12MB,
15MB, 18MB and 21MB respectively. From the above discussion, space complexity using
proposed WONN-MLB approach is lower as compared to other existing [1], [2], [4] and [19]
methods. This is because of the application of boosted weighted optimized neural network
ensemble classification algorithm in proposed WONN-MLB approach. This algorithm classifies
the patient data with higher accuracy and it is further stored for diagnosing the cancer diseases.
Therefore, space complexity is reduced using proposed WONN-MLB approach by 13%, 24%,
31%, and 36% as compared to NSCLC by Wu et al. [1], BSVM by Zięba et al. [2], NPPC by
Ghorai et al. [4], and MV-CNN by Liu et al. [19] respectively.
4.6 Scenario 6: Feature selection rate
Feature selection rate is defined as the ratio of number of relevant features that are
correctly selected to the total number of features. It is measured in terms of percentage
(%).The mathematical formula for feature selection rate is measured as follows,
(19)

In (19), ‘ ’ denotes a Feature Section Rate. The sample calculation for feature
selection rate using the five methods is given as follows:

Sample calculations:

 Proposed WONN-MLB: With ‘ ’ features considered for experimentation and the

number of features correctly selected is 18, then the feature selection rate is calculated as
follows:

 NSCLC: With ‘ ’ features considered for experimentation and the number of features
correctly selected is 17, then the feature selection rate is calculated as follows:

 BSVM: With ‘ ’ features considered for experimentation and the number of features
correctly selected is 16, then the feature selection rate is calculated as follows:


 NPPC: With ‘ ’ features considered for experimentation and the number of features
correctly selected is 14, then the feature selection rate is calculated as follows:

 MV-CNN: With ‘ ’ features considered for experimentation and the number of features
correctly selected is 13, then the feature selection rate is calculated as follows:
Fig.8: Performance measure of feature selection rate

Fig.8 depicts the feature selection rate comparison between proposed approach and
existing NSCLC, BSVM, NPPC, and MV-CNN respectively. In order to conduct the
experiments, 20 to 200featuresare considered. The performance analysis of feature selection rate
using proposed WONN-MLB approach is compared with existing NSCLC, BSVM, NPPC, and
MV-CNN. When considering 20 number of features for the performance analysis, the proposed
WONN-MLB approach provides the feature selection rate of 90%, whereas the existing NSCLC,
BSVM, NPPC and MV-CNN obtains 85%, 80%, 72% and 65% respectively. From the
discussion, it is clear that the feature selection rate using proposed WONN-MLB approach is
higher as compared to other existing [1], [2], [4] and [19] methods. This is due the application of
identifying maximum relevancy between set of attributes and reducing minimum redundancy
attributes in preprocessing. This helps to selects the accurate features for cancer disease
diagnosis. Therefore, feature selection rate is improved using proposed WONN-MLB approach
by 10%, 18%, 28%, and 41% as compared to NSCLC by Wu et al. [1], BSVM by Zięba et al.
[2], NPPC by Ghorai et al. [4], and MV-CNN by Liu et al. [19] respectively.
5. Conclusion
An effective Weight Optimized Neural Network with Maximum Likelihood Boosting for
LCD in big data is investigated to improve the LCD diagnosis accuracy and to minimize the false
positive rate as well as classification time. To achieve these, the preprocessing the model using
Newton Raphson’s MLMR attributes retrieved and remove the irrelevant features is used.
Therefore, the classification time gets minimized. With the most relevant attributes, an ensemble
classification model called Weighted Optimized Neural Network and Boosting is applied for
early lung cancer diagnosis with a higher accuracy rate. Here, not only the weighted sum
function is considered, but also the most optimal values are obtained. The final ensemble
technique finds the weak classifier with less error value and new component update based on the
error function. This process attains higher disease diagnosing accuracy with the minimum false
positive rate. Experimental evaluation is conducted with different parameters such as-disease
diagnosing accuracy, false positive rate, and classification. The experimental results show that
the proposed approach achieved accurate results for big data processing as compared to existing
methods. Proposed WONN-MLB approach is tested with different dataset, but still there is huge
amount of data points are presented which need to be tested with the proposed approach in
future.

References
[1] Jia Wu, Yanlin Tan, Zhigang Chen, Ming Zhao, “Decision based on big data research for non-small cell lung
cancer in medical artificial system in developing country”, Computer Methods and Programs in Biomedicine,
Elsevier, Volume 159, Mar 2018, Pages 87-101 [Big data research in Non-Small Cell Lung Cancer – Big data
research in NSCLC]
[2] Maciej Zięba, Jakub M. Tomczak, Marek Lubicz, Jerzy Świątek, “Boosted SVM for extracting rules from
imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients”,
Applied Soft Computing, Elsevier, Volume 14, Part A, January 2014, Pages 99-108 [Boosted Support vector
machine (SVM) method for Imbalanced data (BSI)]
[3] Zhihong Mana, Kevin Lee, Dianhui Wang, Zhenwei Cao, Suiyang Khoo, “An optimal weight learning machine
for handwritten digit image recognition”, Signal Processing, Elsevier, Volume 93, Issue 6, June 2013, Pages 1624-
1638
[4] Santanu Ghorai, Anirban Mukherjee, Sanghamitra Sengupta, Pranab K. Dutta, “Cancer Classification from Gene
Expression Data by NPPC Ensemble”, IEEE/ACM Transactions on Computational Biology and Bioinformatics,
Volume. 8, Issue 3, MAY/JUNE 2011, Pages 659 - 671
[5] Resul Das, Abdulkadir Sengur, “Evaluation of ensemble methods for diagnosing of valvular heart disease”,
Expert Systems with Applications, Elsevier, Volume 37, Issue 7, July 2010, Pages 5110-5115
[6] Valdigleis S. Costaa, Antonio Diego S. Fariasa, Benjam´ın Bedregala, Regivan H. N. Santiagoa, AnneMagaly de
P. Canutoa, “Combining Multiple Algorithms in Classifier Ensembles usingGeneralized Mixture Functions”, Neuro
Computing, Elsevier, Volume 313, November 2018, Pages 402-414
[7] Shamsul Huda, John Yearwood, Herbert F. Jelinek, Mohammad Mehedi Hassan, Giancarlo Fortino, Michael
Buckland, “A hybrid feature selection with ensemble classification for imbalanced healthcare data: A case study for
brain tumor diagnosis”, IEEE Access, Volume 4, May 2016, Pages 9145 - 9154
[8] Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, and Ulas Bagci, “Supervised and
Unsupervised Tumor Characterization in the Deep Learning Era”, IEEE Transactions on Medical Imaging, Volume
2, July 2018, Pages 1-11
[9] Emmanuel Adetiba, Oludayo O. Olugbara, “Improved Classification of Lung Cancer Using Radial Basis
Function Neural Network with Affine Transforms of Voss Representation”, PLOS ONE journal, Volume10, Issue
12, 2015, pages 1-25
[10] Divya Jain, Vijendra Singh, “Feature selection and classification systems for chronic disease prediction: A
review”, Egyptian Informatics Journal, Elsevier, Volume 19, Issue 3, November 2018, Pages 179-189
[11] Payal Dande, Purva Samant, “Acquaintance to Artificial Neural Networks and use of artificial intelligence as a
diagnostic tool for tuberculosis: A review”, Tuberculosis, Elsevier, Volume 108, January 2018, Pages 1-9
[12] Pegah Khosravia, Ehsan Kazemic, Marcin Imielinskid, Olivier Elemento, Iman Hajirasouliha, “Deep
Convolutional Neural Networks Enable Discrimination of Heterogeneous Digital Pathology Images”,
EBioMedicine, Elsevier, Volume 27, Jan 2018, Pages 317-328
[13] Sahil Sharma, Vinod Sharma, Atul Sharma, “A Two Stage Hybrid Ensemble Classifier Based Diagnostic Tool
for Chronic Kidney Disease Diagnosis Using Optimally Selected Reduced Feature Set”, International Journal of
Intelligent Systems and Applications in Engineering, Volume 6, Issue 2, Apr 2018, Pages 113-122
[14] Faezeh Hosseinzadeh, Amir Hossein KayvanJoo, Mansuor Ebrahimi, Bahram Goliaei, “Prediction of lung
tumor types based on protein attributes by machine learning algorithms”, Springer Plus, Volume 2, Issue 238, Sep
2013, Pages 1-14
[15] Maxim D Podolsky, Anton A Barchuk, Vladimir I Kuznetcov, Natalia F Gusarova, Vadim S Gaidukov, Segrey
A Tarakanov, “Evaluation of Machine Learning Algorithm Utilization for Lung Cancer Classification Based on
Gene Expression Levels”, Asian Pacific Journal of Cancer Prevention, Volume 17, Issue 2, 2016, Pages 835-838
[16] Mohamad Rabbani, Jonathan Kanevsky, Kamran Kafi, Florent Chandelier, Francis J. Giles, “Role of artificial
intelligence in the care of patients with nonsmall cell lung cancer”, European Journal of Clinical Investigation,
Wiley Online Library, Volume 48, Issue 4, January 2018, Pages 1-7
[17] Zhiguo Zhou, Zhi-Jie Zhou, Hongxia Hao, Shulong Li, Xi Chen, You Zhang, Michael Folkert, and Jing Wang,
“Constructing multi-modality and multiclassifier radiomics predictive models through reliable classifier fusion”,
IEEE Computer Society, Jun 2017, Pages 1-13
[18] Ashutosh Kumar Dubey, Umesh Gupta, Sonal Jain, “Epidemiology of lung cancer and approaches for its
prediction: a systematic review and analysis”, Chinese Journal of Cancer, Volume 35, Issue 71, July 2016, Pages 1-
13
[19] Kui Liu, Guixia Kang, “Multiview Convolutional Neural Networks for Lung Nodule Classification”,
International journal of imaging systems and technology, Wiley online library,
Volume 27, Issue 1, March 2017, Pages 12-22
[20] Ayman El-Baz, Garth M. Beache, Georgy Gimel’farb, Kenji Suzuki, Kazunori Okada, Ahmed Elnakib, Ahmed
Soliman, and Behnoush Abdollahi, “Computer-Aided Diagnosis Systems for Lung Cancer: Challenges and
Methodologies”, International Journal of Biomedical Imaging, Hindawi Publishing Corporation, Volume 2013,
November 2012, Pages 1-46.
[21]Changmiao Wang Ahmed Elazab Jianhuang Wu Qingmao Hu, “Lung nodule classification using deep feature
fusion in chest radiography”, Computerized Medical Imaging and Graphics, Elsevier, Volume 57, April 2017, Pages
10-18

[22]Sheng Chen, Kenji Suzuki, and Heber MacMahon, “Development and evaluation of a computer-aided
diagnostic scheme for lung nodule detection in chest radiographs by means of two-stage nodule enhancement with
support vector classification”, The international journal of medical physics research and practice, Vol. 38, No. 4,
2011, Pages 1844-1858

[23] Devinder Kumar, Alexander Wong, David A. Clausi, “Lung Nodule Classification Using Deep Features in CT
Images”, 2015 12th Conference on Computer and Robot Vision, 3-5 June 2015, Pages 133-138

[24] Thangavel Baranidharan, Thangavel Sumathi, Vadivelraj Chandra Shekar., “Weight Optimized Neural Network
Using Metaheuristics for the Classification of Large Cell Carcinoma and Adenocarcinoma from Lung Imaging”,
Current Signal Transduction Therapy, Volume 11 , Issue 2 , 2016, Pages 91-97.
[25] Varlamis, Apostolakis, Sifaki-Pistolla, Dey, Georgoulias, Lionis, “Application of data mining techniques and
data analysis methods to measure cancer morbidity and mortality data in a regional cancer registry: The case of the
island of Crete, Greece.”, Computer methods and programs in biomedicine, Volume 145, 2017, Pages 73-83.
[26] Md. Sarwar Kamal, Nilanjan Dey and Amira S. Ashour, “Large Scale Medical Data Mining for Accurate
Diagnosis: A Blueprint” In Handbook of Large-Scale Distributed Computing in Smart Healthcare, Springer, 2017,
Pages 157-176
[27] Thoracic Surgery Data Data Set: https://archive.ics.uci.edu/ml/datasets/Thoracic+Surgery+Data
*Highlights (for review)

Boosted Neural Network Ensemble Classification for Lung

Cancer Disease Diagnosis with Big Data
1. To increase the performance of lung cancer diagnosis accuracy for big data
as compared to state-of-the-art works
2. To minimize the classification time for early lung cancer disease diagnosis,
integrated Newton Raphsons Maximum Likelihood Minimum Redundancy
(MLMR) preprocessing model is used
3. To reduce the error (i.e. false positive rate) and also improve the disease
diagnosis accuracy of big data with higher classification efficiency and lower
classification time as compared to conventional methods
*Declaration of Interest Statement

There are no conflict of interest of authors

1 s2.0 S2210650224003055 Main
No ratings yet
1 s2.0 S2210650224003055 Main
15 pages
Prediction of Lung Cancer Using Machine Learning Classifier
No ratings yet
Prediction of Lung Cancer Using Machine Learning Classifier
11 pages
PA Research Papers
No ratings yet
PA Research Papers
5 pages
Ensemble Deep Learning Models For Lung Cancer Diagnosis in Histopathological Images
No ratings yet
Ensemble Deep Learning Models For Lung Cancer Diagnosis in Histopathological Images
12 pages
Lung Cancer Detection - Research Paper-2
100% (1)
Lung Cancer Detection - Research Paper-2
9 pages
5 41-55 IJMSPHR Performance of Machine Learning Algorithm
No ratings yet
5 41-55 IJMSPHR Performance of Machine Learning Algorithm
15 pages
4lung Cancer Classification
0% (1)
4lung Cancer Classification
20 pages
Nishajenipher 2020
No ratings yet
Nishajenipher 2020
6 pages
Reddy 2019
No ratings yet
Reddy 2019
5 pages
Lung Cancer Prediction Using Machine Learning Techniques: Saif Al Rumhi1, Raza Hasan, Saqib Hussain and Jitendra Pandey
No ratings yet
Lung Cancer Prediction Using Machine Learning Techniques: Saif Al Rumhi1, Raza Hasan, Saqib Hussain and Jitendra Pandey
13 pages
Lung Cancer Detection Using Machine Learning Algorithms and Neural Network On A Conducted Survey Dataset Lung Cancer Detection
No ratings yet
Lung Cancer Detection Using Machine Learning Algorithms and Neural Network On A Conducted Survey Dataset Lung Cancer Detection
4 pages
Lungcancer
No ratings yet
Lungcancer
5 pages
FFFFFFFFFFFFFFFFFFFFFF
No ratings yet
FFFFFFFFFFFFFFFFFFFFFF
25 pages
Doi Final
No ratings yet
Doi Final
10 pages
Lung Cancer Detection Using Multiple Machine Learning Algorithms
No ratings yet
Lung Cancer Detection Using Multiple Machine Learning Algorithms
6 pages
Lung Cancer Detection and Classification Using Machine Learning Algorithm
No ratings yet
Lung Cancer Detection and Classification Using Machine Learning Algorithm
7 pages
Lung Cancer Detection via Machine Learning
No ratings yet
Lung Cancer Detection via Machine Learning
5 pages
Lung Cancer Prediction Using ML 5 Pages
No ratings yet
Lung Cancer Prediction Using ML 5 Pages
3 pages
Enhanced Lung Cancer Prediction Using Ensemble Machine Learning Algorithms
No ratings yet
Enhanced Lung Cancer Prediction Using Ensemble Machine Learning Algorithms
5 pages
Detection of Lung Cancer Using Supervised Machine Learning Algorithms
No ratings yet
Detection of Lung Cancer Using Supervised Machine Learning Algorithms
5 pages
Minor Project (IEEE)
No ratings yet
Minor Project (IEEE)
2 pages
Latex First Project
No ratings yet
Latex First Project
7 pages
Minor
No ratings yet
Minor
21 pages
Prediction of Lung Cancer Using Machine Learning Techniques and Their Comparative Analysis
No ratings yet
Prediction of Lung Cancer Using Machine Learning Techniques and Their Comparative Analysis
4 pages
Scientific Journal Article
No ratings yet
Scientific Journal Article
9 pages
Deep Learning Techniques For Lung Cancer Recogniti
No ratings yet
Deep Learning Techniques For Lung Cancer Recogniti
7 pages
Predicting Early Stage Lung Cancer Using Advanced Machine Learning Methods
No ratings yet
Predicting Early Stage Lung Cancer Using Advanced Machine Learning Methods
7 pages
ML for Early Lung Cancer Detection
No ratings yet
ML for Early Lung Cancer Detection
11 pages
Explainable Lung Cancer Classification With Ensemble Transfer Learning of VGG16 Resnet50 and InceptionV3 Using GradcamBMC Medical Imaging
No ratings yet
Explainable Lung Cancer Classification With Ensemble Transfer Learning of VGG16 Resnet50 and InceptionV3 Using GradcamBMC Medical Imaging
19 pages
Final PPT Lung
100% (4)
Final PPT Lung
21 pages
Prediction of Lung Cancer Using Ensemble Classifiers: Journal of Physics: Conference Series
No ratings yet
Prediction of Lung Cancer Using Ensemble Classifiers: Journal of Physics: Conference Series
12 pages
Lung Cancer Prediction with CNN
No ratings yet
Lung Cancer Prediction with CNN
28 pages
Aihc Report
No ratings yet
Aihc Report
13 pages
Lung Cancer Report
No ratings yet
Lung Cancer Report
55 pages
An Integrated Deep Learning Based Enhanced Grey Wolf Optimization For Lung Cancer Prediction
No ratings yet
An Integrated Deep Learning Based Enhanced Grey Wolf Optimization For Lung Cancer Prediction
14 pages
Deep Learning and Machine Learning Algorithms To Predict Lung Cancer
No ratings yet
Deep Learning and Machine Learning Algorithms To Predict Lung Cancer
5 pages
Lung Cancer Detection by Using Image Processing Approach: IOP Conference Series: Materials Science and Engineering
No ratings yet
Lung Cancer Detection by Using Image Processing Approach: IOP Conference Series: Materials Science and Engineering
4 pages
Lung Cancer Prediction Using Machine Learning
No ratings yet
Lung Cancer Prediction Using Machine Learning
6 pages
Deep Learning-Based Lung Cancer Classification of
No ratings yet
Deep Learning-Based Lung Cancer Classification of
12 pages
Detectron2 vs EfficientNet in Cancer Detection
No ratings yet
Detectron2 vs EfficientNet in Cancer Detection
26 pages
Cancer Prediction in Early Stages
No ratings yet
Cancer Prediction in Early Stages
3 pages
A Critical Study of Classification Algorithms For Lungcancer Disease Detection and Diagnosis
No ratings yet
A Critical Study of Classification Algorithms For Lungcancer Disease Detection and Diagnosis
8 pages
A Comparative Study of Lung Cancer Detection Using Machine Learning Algorithms
No ratings yet
A Comparative Study of Lung Cancer Detection Using Machine Learning Algorithms
4 pages
3rd REVEIW 8th Sem
No ratings yet
3rd REVEIW 8th Sem
10 pages
Deep Learning Method For Lung Cancer Identification and Classification
No ratings yet
Deep Learning Method For Lung Cancer Identification and Classification
10 pages
8-Prediction of Cancer Disease Using Machine Learning Approach
No ratings yet
8-Prediction of Cancer Disease Using Machine Learning Approach
8 pages
TSP CMC 54460
No ratings yet
TSP CMC 54460
26 pages
4.enhancing Lung Cancer Classification and Prediction With Deep Learning and Multi-Omics Data
No ratings yet
4.enhancing Lung Cancer Classification and Prediction With Deep Learning and Multi-Omics Data
13 pages
Lung Cancer Prediction and Classification Using Machine Learning Algorithms
No ratings yet
Lung Cancer Prediction and Classification Using Machine Learning Algorithms
4 pages
Article 1
No ratings yet
Article 1
4 pages
AI-Driven Lung Cancer Detection
No ratings yet
AI-Driven Lung Cancer Detection
11 pages
Improving Lung and Colon Cancer Detection Using Ensemble Method Approach
No ratings yet
Improving Lung and Colon Cancer Detection Using Ensemble Method Approach
7 pages
A Hybrid Model For Lung Cancer Prediction Using Patch Processing and Deeplearning On CT Images
No ratings yet
A Hybrid Model For Lung Cancer Prediction Using Patch Processing and Deeplearning On CT Images
22 pages
Ieee
No ratings yet
Ieee
13 pages
Lung Cancer Detection and Classification Using Machine Learning Algorithms
No ratings yet
Lung Cancer Detection and Classification Using Machine Learning Algorithms
6 pages
Deep Learning Ensemble for Cancer Prediction
No ratings yet
Deep Learning Ensemble for Cancer Prediction
17 pages
Neet Physics Math Formulas
No ratings yet
Neet Physics Math Formulas
2 pages
Brain Segmentation Using MATLAB
No ratings yet
Brain Segmentation Using MATLAB
7 pages
?A S L Have Been To Tte Obe Forier Sents Eristohe.: Kalyani Government Engineering College
No ratings yet
?A S L Have Been To Tte Obe Forier Sents Eristohe.: Kalyani Government Engineering College
1 page
Cancer
No ratings yet
Cancer
9 pages
Kalyani Government Engineering College: Signet Have To Be
No ratings yet
Kalyani Government Engineering College: Signet Have To Be
1 page
Determind: Fousier Signat
No ratings yet
Determind: Fousier Signat
1 page
Conditen Ofurier Sules Fatemce - : Eraty
No ratings yet
Conditen Ofurier Sules Fatemce - : Eraty
1 page
Kalyani Government Engineering College: 5) Ea Ue)
No ratings yet
Kalyani Government Engineering College: 5) Ea Ue)
1 page
JavaScript Image Preloading Script
No ratings yet
JavaScript Image Preloading Script
1 page
Plant PPT
No ratings yet
Plant PPT
24 pages
Design of An Improved Interval Type-2 Controller Using FCM and Supervised Clustering Algorithms
No ratings yet
Design of An Improved Interval Type-2 Controller Using FCM and Supervised Clustering Algorithms
10 pages
Introduction and Course Rationale Required Readings
No ratings yet
Introduction and Course Rationale Required Readings
6 pages
Professional Examination Schedule 2019
No ratings yet
Professional Examination Schedule 2019
8 pages
Gerunds & Infinitives Guide
No ratings yet
Gerunds & Infinitives Guide
4 pages
Example of English Education Thesis Proposal
100% (3)
Example of English Education Thesis Proposal
7 pages
Revised Supply Chain.1
No ratings yet
Revised Supply Chain.1
43 pages
COMMUNICATION SKILLS BHECS 113 ASSIGNMENT 1h
No ratings yet
COMMUNICATION SKILLS BHECS 113 ASSIGNMENT 1h
4 pages
Data Analytics PPT 1
100% (1)
Data Analytics PPT 1
7 pages
Digital Marketing at MAAS EdTech JSC
No ratings yet
Digital Marketing at MAAS EdTech JSC
122 pages
Keneth Lang - Social Indicators
No ratings yet
Keneth Lang - Social Indicators
27 pages
Boosting Critical Thinking in Business
No ratings yet
Boosting Critical Thinking in Business
6 pages
The Impact of Green Banking Practices On Bank's Environmental Performance - Evidence From Sri Lanka
No ratings yet
The Impact of Green Banking Practices On Bank's Environmental Performance - Evidence From Sri Lanka
14 pages
Guduchi
No ratings yet
Guduchi
4 pages
Ethnography Diversity and Urban Space
No ratings yet
Ethnography Diversity and Urban Space
15 pages
Cosmetic Product Serious Undesirable Effects Report
No ratings yet
Cosmetic Product Serious Undesirable Effects Report
3 pages
Audit Procedures Detailed Explanation
No ratings yet
Audit Procedures Detailed Explanation
3 pages
The Making of Teachers in
No ratings yet
The Making of Teachers in
273 pages
ISyE 6644 Homework Solutions
No ratings yet
ISyE 6644 Homework Solutions
8 pages
Math - Diagnostic Test - 4th Quarter
100% (1)
Math - Diagnostic Test - 4th Quarter
4 pages
IBT TOEFL Test Overview and Preparation
No ratings yet
IBT TOEFL Test Overview and Preparation
3 pages
Project 1 - Case Study
No ratings yet
Project 1 - Case Study
26 pages
Legal Issues Thesis PDF
No ratings yet
Legal Issues Thesis PDF
456 pages
Indices and Reports
No ratings yet
Indices and Reports
18 pages
Microbiologist CV: Vijesh Kumar Rathore
No ratings yet
Microbiologist CV: Vijesh Kumar Rathore
4 pages
Consumer Behavior-Block-1
100% (1)
Consumer Behavior-Block-1
55 pages
Vivier Sanchez Betancourt 2020 Community Leaders As Intermediaries How Everyday Practices Create and Sustain Leadership
No ratings yet
Vivier Sanchez Betancourt 2020 Community Leaders As Intermediaries How Everyday Practices Create and Sustain Leadership
19 pages
ChemEng Careers Handbook
No ratings yet
ChemEng Careers Handbook
70 pages
Sociology Structure and Change 1st Edition Jodie M. Lawston 2025 Instant Download
No ratings yet
Sociology Structure and Change 1st Edition Jodie M. Lawston 2025 Instant Download
118 pages
Guidelines for Hospital Empanelment
No ratings yet
Guidelines for Hospital Empanelment
32 pages
Literature Review Site HTTP Owl - English.purdue - Edu
No ratings yet
Literature Review Site HTTP Owl - English.purdue - Edu
6 pages

Boosted Ensemble

Uploaded by

Boosted Ensemble

Uploaded by

Accepted Manuscript

Boosted neural network ensemble classification for lung cancer disease

Jafar A. ALzubi, Balasubramaniyan Bharathikannan, Sudeep Tanwar,

To appear in: Applied Soft Computing Journal

Received date : 3 December 2018

Boosted Neural Network Ensemble Classification for Lung Cancer

Jafar A. ALzubi1, Balasubramaniyan Bharathikannan2, Sudeep Tanwar3, Ramachandran.Manikandan4,

E-mails: [email protected], [email protected], [email protected],

Table 1 Comparison of the proposed approach with the state-of-the-art approaches

3. Materials and Methods

In this paper, we proposed a WONN-MLB method to increase the performance of LCD

3.1 Proposed Approach

Lung cancer disease diagnosis

Fig. 1: Architecture of proposed approach for Lung Cancer Disease Diagnosis

3.1.1 Data acquisition

3.1.2 Newton Raphsons Maximum Likelihood and Minimum Redundancy

Maximum Likelihood Minimum Redundant

As shown in Fig. 2, let us assume a standard feature selection problem by means of

With reference to Eq. 2, the maximum relevance between attributes ‘ ’ in class ‘ ’ is

Algorithm 1: Maximum Likelihood Minimum Redundant preprocessing

The given ‘ ’ training data (i.e., attributes) ‘ ’,

Algorithm 2: Boosted Weighted Optimized Neural Network Ensemble Classification algorithm

The Boosted Weighted Optimized Neural Network Ensemble Classification Algorithm is

4. Experimental Settings and Results Discussion

4.1 Scenario 1: Impact of diagnosing accuracy:

Fig. 3: Diagnosing accuracy with 10000 patient data

 NSCLC: With ‘ ’ number of patient data considered as samples and ‘ ’ number

 BSVM: With ‘ ’ number of patient data considered as samples and ‘ ’ number of

 MV-CNN: With ‘ ’ number of patient data considered as samples and ‘ ’ number

Fig. 4: Performance measure of false positive rate

4.3 Scenario 3: Classification time

Fig.5: Performance measure of classification time

4.4 Scenario 4: F1-score

 Proposed WONN-MLB: With ‘ ’ patient data considered for experimentation and

 MV-CNN: With ‘ ’ patient data considered for experimentation and precision is

Fig.6: Performance measure of F1-score

4.5 Scenario 5: Space complexity

 Proposed WONN-MLB: With ‘ ’ patient data considered for experimentation and

 Proposed WONN-MLB: With ‘ ’ features considered for experimentation and the

Boosted Neural Network Ensemble Classification for Lung

There are no conflict of interest of authors

You might also like