0% found this document useful (0 votes)

4 views11 pages

Analytics of Machine Learning-based Algorithms for Text Classification

This paper presents a comparative analysis of various machine learning algorithms for text classification, focusing on their efficiency across different datasets. The study finds that Logistic Regression and Support Vector Machine perform best on the IMDB dataset, while k-Nearest Neighbor excels on the SPAM dataset. The research highlights the significance of automating text classification to enhance decision-making processes in various industries.

Uploaded by

anh.dmh7210

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views11 pages

Analytics of Machine Learning-based Algorithms for Text Classification

Uploaded by

anh.dmh7210

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Sustainable Operations and Computers 3 (2022) 238–248

Contents lists available at ScienceDirect

Sustainable Operations and Computers

journal homepage:
http://www.keaipublishing.com/en/journals/sustainable-operations-and-computers/

Analytics of machine learning-based algorithms for text classiﬁcation

Sayar Ul Hassan a, Jameel Ahamed a,∗, Khaleel Ahmad a
a
Department of Computer Science & Information Technology, Maulana Azad National Urdu University, Hyderabad, Telangana, India

a b s t r a c t

Text classification is the most vital area in natural language processing in which text data is automatically sorted into a predefined set of classes. The application
of text classification is wide in commercial works like spam filtering, decision making, extracting information from raw data, and many other applications. Text
classification is more significant for many enterprises since it eliminates the need for manual data classification, a more expensive and time-consuming mechanism.
In this paper, a comparative analysis of text classification is done in which the efficiency of different machine learning algorithms on different datasets is analyzed
and compared. Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression (LR), Multinomial Naïve Bayes (MNB), and Random Forest (RF) are
Machine Learning based algorithms used in this work. Two different datasets are used to make a comparative analysis of these algorithms. This paper further analyzes
the machine learning techniques employed for text classification on the basis of performance metrics viz accuracy, precision, recall and f1- score. The resullltsss
reveals that Logistic Regression and Support Vector Machine outperforms the other models in the IMDB dataset, and kNN outperforms the other models for the SPAM
dataset as per the results obtained from the proposed system.

1. Introduction [4]. In this study, different selected Machine Learning Techniques are
used for text classification. Besides these techniques, there are various
Nowadays, industries benefit greatly from developing automatic approaches for text classification, but most of them cannot classify text
systems for extracting useable structured data from unstructured text data accurately compared to Machine Learning techniques, which give
sources. Researchers and industry professionals would perform reason- more effective results [3]. Even though several efficient text classifi-
ably easy queries to retrieve all information related to industrial work cation approaches have been developed, text classification remains a
using a structured resource [1]. We can use these machine learning clas- difficult subject with a lot of room for improvement in terms of effi-
sifiers in the field of environment.Suppose the data related to sustain- ciency [5]. However, organizations and enterprises use text documents
able development and climate change will be collected from different to keep track of their industrial and government services [4,6]. In the
sources. In that case, different machine learning techniques can be ap- text classification system, the classifier is the main part. The classifier
plied to that data so that the domain knowledge can be extracted from performance quality is directly related to the efficiency and effect of text
it. It will help us in different fields such as making decisions about the classification. Most of theclassifiers are based on the methods from in-
future and also will get an idea about how we should sustainably use formation retrieval and the machine learning algorithms that are intro-
available resources. We can also be aware of people of climate change duced for text classification purpose [7]. A good text classifier though,
problems. We can also publish the resulting data on different platforms would work efficiently for large training data sets with several features
to become aware of climate change and sustainable development. Text [8]. Because of the high dimensionality and existence of noise in fea-
analysis is one of the important aspects of extracting the desired in- tures, it is crucial to choose only the most critical features in the case
formation. Text classification is classifying text into different classes of text categorization. [9]. A comparative analysis is done in this paper,
based on the textdomain. It is a fundamental process in natural lan- based on text classification employing Machine Learning techniques on
guage processing in which the tools are available for classifyingtextual different datasets. The problem is that the manual process of classify-
data. Automatic text classification has always been a critical application ing text data is tedious and very time-consuming [6]. Therefore, it is
and research topic since the inception of digital documents. Today, text very important to automate the process and enhance data-driven de-
classification is much required due to the massive amount of text doc- cisions [10,11]. In this research, the machine learning algorithms are
uments generated daily worldwide [2]. Textual analytics translates text applied and compared for best performanceon different datasets [12].
into numbers, giving structured data and making it easier to spot trends. The documents in the text classification model arepassed through dif-
The more structured data, the better will be the analysis, and eventu- ferent steps viz (i) convert the main document into plain text, lowercase
ally, the better the decisions [3]. Machine learning (ML) is employed for the full document, removal of stop words, remove the words which are
this purpose which is a branch of artificial intelligence (AI) that allows not useful, to reduce different words in a single word called root word
computers to operate and learn even they are not explicitly programmed using stemming and lemmatization and(ii) select the data for training
and testing, build the classifier and then deploy the classifier on dif-
∗
ferent datasets [2,13,27]. Further, the Machine learning techniques can
Corresponding author.
also be applied for classification problems to measure the perceptions
E-mail address: [email protected] (J. Ahamed).

https://doi.org/10.1016/j.susoc.2022.03.001
Received 24 July 2021; Received in revised form 20 February 2022; Accepted 25 March 2022
Available online 1 April 2022
2666-4127/© 2022 The Author(s). Published by Elsevier B.V. on behalf of KeAi Communications Co., Ltd. This is an open access article under the CC BY-NC-ND
license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
S.U. Hassan, J. Ahamed and K. Ahmad Sustainable Operations and Computers 3 (2022) 238–248

about the COVID-19 pandemic and identify the misconception among classifiers that can automatically organize and classify documents. Many
the population, which will help us to inform the public health organiza- machine learning algorithms have been applied to make an automatic
tions and then better methods can be created to educate the public and text classifier by getting trained from a set of classified training doc-
make them understand not to fall for these misconceptions [14]. Ma- uments [21,22,23]. There are several text classification models which
chine learning techniques can also be used to tackle the SARAS-COV-2 have been built for Urdu, English, French, Chinese, and many other
crisis in the field of diagnosis, disease progression, and epidemiology languages [24,25,26]. Support vector machine (SVM) is one of the su-
[15]. pervised machine learning model that uses classification algorithms for
Machine learning is one of the most important technical develop- two-group classification problems [28]. A number of text classifiers are
ments allowing Industry 4.0 to take hold in businesses and industries. used in text mining are used and compared in this work [8]. Usually,
The introduction of AI and Machine Learning to Industry 4.0 marks a supervised and unsupervised are the two categories of classifiers used
significant shift for manufacturing organizations, potentially resulting for text classification. In text classification, the process to train an “un-
in new business prospects and benefits such as increased productivity. known” NLP text [6], the most significant element of collecting informa-
Artificial Intelligence, Machine Learning, Deep Learning have also been tion is automatically categorizing a batch of documents into categories
widely used in areas like healthcare, finance, smart factories as a part (or classes, or subjects) from a specified set [29]. Different tools and
of Industry 4.0 [16–19] methods are derived from the domain, which has several applications
in text classification [30]. SVM is also compared to other algorithms, but
SVM outperforms the others in various studies [12,29]. Different classi-
2. Related work fiers are viewed and analyzed to determine which category a document
belongs to. We can classify non-linear data using kernel functions in or-
Text classification is one of the main tasks in Natural Language Pro- der to classify data with greater dimensions [7,31]. The Support Vector
cessing (NLP) [6,20]. Due to the fast growth of Internet applications, a Machine gives high performance but less recall which is one of the lim-
huge increment in online texts leads to improved automated text mining itations of using Support Vector Machine [31]. In k-Nearest Neighbor
Table 1
Systematic literature review.

Title Author(year) Methodologies Findings

Eﬃcient English Xiaoyu Luo(2021) SVM, Precision, Recall, and

Text Classification Naïve Bayes, F1-value is calculated
Using Selected Logistic Regression For the evaluation of
Machine Learning the classifier in which
Techniques [6]. SVM outperforms the
Other in two datasets
and Logistic
Regression outperform
in one dataset
Restaurant Review Classification and Dhirajj Kumar, Gopesh, Naïve Bayes, Multinomial Naïve Bayes technique performs better than
Analysis [52] AvinashChoubey, Ms. Pratibha Multinomial Naïve Bayes, other algorithms in Precision, Recall, and F1 Score
Sing(2020) Logistic Regression evaluation matric.

Benchmark Performance Of Muhammad NabeelAsim, Naïve Bayes, SVM Outperformed

Machine Learning Muhammad Usman Ghani, SVM, The Naïve Bayes
and Deep learning Muhammad Ali TF-IDF using TF-IDF
Based Methodologies Ibrahim, Vector representation
for Urdu Text Waqar Mahmood,
Document Sdheraz Ahmad,
Classification [53]. Andreas Dengel(2020)
Text classification using machine learning EmmanouilK.Ikonomskis, Sotiris Naïve Bayes, Classification performance depends on the training text
techniques [21]. Kotsiantis, V.Tampakas(2019) K-Nearest corpuses. With high-quality training, corpus performance
Neighbour, will be better.
Support Vector
Machine
Comparative Kapil Sethi, Neural Network, SVM outperforms the
Analysis of Machine Ankit Gupta, K-Nearest Neighbor, other Algorithms and
Learning Algorithms Gaurav Gupta, Support Vector this model is useful
on DifferentDatasets [4] Varun Jaiswal(2017) Machine. in medication,
governmental issues
and Other different
fields
A Study Of Text Jasleen Kaur, Naïve Bayes, Supervised Machine
Classification Natural Dr. Jatinder Kumar, SVM, Learning Algorithms
Language Processing R Saini Artificial Neural Outperformed
Algorithms for Indian (2015) Network, than Unsupervised ML
Language [54] N-gram Algorithms For Indian
Languages.
Urdu Word Sense “Muhammad Abid, Bayes Net Classifier, Bayes Net Outperform
Disambiguation Using Asad Habib, SVM, The other Algorithms.
Machine Learning JawabShahid, Decision Tree
Approach [55]. Jawad Ashraf(2017)”
Text classification Using Basant Agarwal, Naïve Bayes, SVM provides good
Machine Learning NamitaMithal(2016) SVM, performance for textual
Methods. A Survey [56]. KNN, documents which belong
Decision Tree, To a Particular category
but not for Multiclass
Classification.

239
S.U. Hassan, J. Ahamed and K. Ahmad Sustainable Operations and Computers 3 (2022) 238–248

Fig. 1. Flow diagram of proposed model.

(k-NN) majority voting method is used to classify the instance correctly ing whether it is authentic or true, even if we trust it. Most individuals
[32,33,34]. Hence, this is a newly introduced method using fewer text are using this platform as a weapon to manipulate public opinion for
data for testing and large text data for training, giving the best perfor- political, religious, or other causes, but we can use machine learning
mance and effective results [33]. It is also evident that the main focus of algorithms to determine whether the news or information is authentic
the machine learning model is to learn automatically and improve the or deceptive propaganda [44,45]. A conventional way of extracting the
model’s efficiency based on experience [35]. The proposed classification emotions, opinions, and attitudes from the text data available on social
model comprises three major modules viz preprocessing raw data, em- networking sites can alsobe done using machine learning algorithms
ploying machine learning, and final model for classification [36]. This [46,47]. Different data preprocessing techniques are also essential for
model learns from past data or experience to improve the model’s per- modelsto to give better results with good performance [48]. TF-IDF is a
formance [37]. With the introduction of digital documents, automatic statistical measure for extracting meaningful information from text in-
text classification became an important area of research [38]. Among put, but it is ineffective for unbalanced distributions. However, we may
the machine learning techniques, the SVM classifier is getting better apply an upgraded version of TF-IDF to improve the model’s power and
results in most applications of the classification problems specifically, robustness [49]. There are a variety of additional ways for classifying
for diseases identification and face recognition application [39]. Fur- text data, such as a caps-net based multitask learning architecture for
ther, three machine learning algorithms like Random forest, kNN, and text classification, but when compared to machine learning approaches
Naïve Bayes were applied to the chronic kidney disease prediction. The for the same problem, the results are substantially better utilising ma-
random forest was proven to have the best results [40,41]. Accurate chine learning techniques [50]. Following the release of new COVID-19
predictions and better generalizations can be achieved using random variations, the entire world is experiencing healthcare issues, making
sampling and ensemble strategies [11]. Machine learning techniques it harder to collect health care records to analyse current and future
can also play a vital role in analyzing diseases from medical records needs for the general public using various machine learning approaches
in the medical sector. It also helps us in the COVID-19 pandemic to [51]. Furthermore, the systematic literature review with regard to the
detect differentaspects such as perceptions and misconceptions among effectiveness of machine learning for text classification is depicted in
the general public [14]. Machine learning techniques are also used for Table 1.
drug discovery, increasing exponentially day by day. It is easy to ana-
lyze the previous data based on drugs that have been utilized to predict 3. Proposed system and methodology
the new requirements [42]. It is important for healthcare organizations
and health officers to understand the view of the general public about The methodologies used in this research work will be based on
what causes them anxiety, stress, and trauma and then it makes better the Machine Learning Techniques viz Support Vector Machine (SVM),
policies and better treatments based on the data available using machine k-Nearest Neighbor, Gaussian Naïve Bayes (GNB), Multinomial Naïve
learning techniques [43]. Nowadays, any news or information spreads Bayes (MNB), and Logistic Regression (LR). The ML-based classification
swiftly on many social networking sites, and there is no way of know- models are compared on different datasets in terms of the accuracy of

240
S.U. Hassan, J. Ahamed and K. Ahmad Sustainable Operations and Computers 3 (2022) 238–248

Table 2
Summary of machine learning algorithms.

Methods Strengths Weaknesses Applications

Support Higher-order data can be classified by Not suitable for large Handwriting recognition,
Vector using kernel datasets Difficult, Text and hypertext categorization,
Machine Functions. To choose a kernel Classification of images,
(SVM) Function.
K-Nearest Neighbor(KNN) Robust for noisy data. Computation cost is high Health care,
Easy to implement, Finding the value of K is difficult. Segmentation,
Effective results for Customer service,
a large amount of training Fraud Detection.
data,
Take no time to learn.
Multinomial Easy to implement. Probabilities are not Real-time prediction,
Naïve Bayes Better results were accurate, Spam and Ham filtering,
(MNB) obtained in most classes, Interaction between Sentiment analysis.
Small amount of training features can’t be achieved
data required.
Logistic Quick to train data, Not better for non-linear Medicine,
Regression Work well for data, Text editing,
(LR) categorical data, Required large sample size Hotel booking,
Simple parameter Financial forecasting.
Estimation,
Better for linear data.
Random Forest Better for large datasets, Complex for multiple Banking sector,
(RF) Experimental method valued and uncertain Healthcare sector,
for detecting variable attributes, Customer intelligence,
interaction Require more Marketing data.
Computational power.

each model. Before developing the classification model, different tech- a classifier that can classify the text data with high accuracy [61]. Sup-
niques are used for preprocessing the input data and preprocessed data, port vector machine is computationally effective with some limitations,
then used for training and testing purposes [57,58]. A portion of data which reduce its performance for small datasets [31]. There are two
is taken for training, and the remaining is used for testing, but the data types of data classifications using SVM (i) linear data classification and
handled for training and testing is divided based on the technique used (ii)non-linear data classification.
in training [36,59]. The flow of this Machine learning-based text classi-
fication is shown in Fig. 1. (i). Linear data classification. To classify linear data using SVM, the
Maximum Margin Hyperplane (MMH) is used to separate the two data
points in order to draw many hyperplanes. It is desired to find out one
3.1. Machine learning techniques having the largest distance between vector points that will accurately
classify the data points, as shown in Fig. 2. As depicted in Fig. 2, there
As we all know, text data is increasing exponentially. Hence, it is not are positive and negative hyperplanes whereas the positive hyperplane
easy to classify it manually, so it is desired to find out the different fea- is drawn on the positive data point side. In contrast, the negative hyper-
sible ways to classify a large amount of data in a short period. The data plane is drawn on the negative data point side [62]. It is better to draw
generated after classification is called information, and this information the hyperplanes in such a way to get the maximum margin between
is then used to make future planning of the business and industrial ap- positive and negative hyperplanes.
plications. In this work, different machine learning algorithms are pro-
posed to be used for text classification, as mentioned in Table 2. It is (ii). Non-linear data classification. Support Vector Machine (SVM) can
necessary to determine which machine learning algorithm will provide also classify non-linear data with the help of kernel function. It trans-
high accuracy on which dataset type. This comparative analysis will ex- forms the data to higher dimensions to make the classification, as shown
amine the efficiency of various machine learning algorithms and then in Fig. 3. There are different types of kernel functions available that we
determine which algorithm is better for which type of data, as we know can use for classification purposes [29]. This method has to find out the
that different machine learning algorithms classify text data differently. proper kernel function to classify data points appropriately. When the
As a result, determining which approach is suitable for a specific dataset kernel function is used to classify the data points, it will transform one
type is critical. The detailed definitions of all the applied machine learn- class of data to a higher dimension, and the decision surface is obtained
ing techniques are given in the next section. to classify the data points.

3.1.2. K-nearest neighbors classiﬁer(KNN)

3.1.1. Support vector machine K-nearest neighbor algorithm is a simple, easy-to-implement super-
Support Vector Machine is a machine learning technique that can vised machine learning algorithm that can be used to solve both classi-
be used for both regression and classification, but it is best for classification and regression problems [57,63]. This algorithm finds the simi-
fication problems [20,22]. It can classify linear data with the help of larity between the available and new data, and the new data is classified
Maximum Margin Hyperplane (MMH), in which the distance is maxi- to that category with more similarity [64]. The value of K is difficult to
mum between data points called support vectors. The two parallel lines analyze, so the classification time by k-NN is more [33]. It is also called
separating the data are called positive and negative hyperplanes, as we a lazy learner algorithm because it is not learning from training data
can draw several [60]. For non-linear data, the kernel function can be abruptly, but it acts at the time of classification, as shown in Fig. 4 [27].
used to form the multi-dimensional hyperplane for classification. There
are multiple numbers of kernel functions available for classification pur- 3.1.3. Multinomial naïve bayes(MNB)
poses. Researchers have used kernel functions like String Subsequence The MNB classification algorithm is used to classify discrete features
Kernel (SSK) and Approximating Kernels (AK). These two kernels make (e.g., word frequency for text classification) [65]. In multinomial dis-

241
S.U. Hassan, J. Ahamed and K. Ahmad Sustainable Operations and Computers 3 (2022) 238–248

Fig. 2. Linear data classiﬁcation using SVM.

Fig. 3. Non-linear data classiﬁcation using SVM.

242
S.U. Hassan, J. Ahamed and K. Ahmad Sustainable Operations and Computers 3 (2022) 238–248

Fig. 4. Working of k-NN.

tribution, integer feature counts are required, but in reality, fractional also be used as a regression technique but is mainly used for classifi-
values in such cases as TF-IDF may also work [66]. cation because of its diversity and simplicity. It is the combination of
The bag of words approach is used, and each word constitutes a fea- learning models which increases the final result [60,63,68].Many trees
ture in which the word order is not important. Naïve Bayes is based on are combined to make a random forest in this machine learning tech-
Baye’s rule of conditional probability [67], and MNB is mathematically nique. We will get high accuracy if we have more uncorrelated trees
defined in the equation below. [10]. Missing values can be filled using random forest [11]. FurtherDe-
𝑃 (𝑥|□)𝑃 (□) cision, tree classifiers are popular for their outstanding performance as
P(h|x) = it is known that random forest is the collection of decision trees so it
𝑃 (𝑥)
Where h is the hypothesis and x is the attribute. becomes more robust and more powerful. A simple decision tree used
for classification problems gives better results with high accuracy [69].
3.1.4. Logistic regression (LR)
4. Results
Logistic regression is a supervised machine learning algorithm used
for classification purposes. It is used when the data is in the form of
This section will look at the results of distinct machine learning algo-
binary, i.e., 0 and 1that means whether the class is from one category
rithms that were applied to two separate datasets. Each algorithm was
or another. We can use two functions for binary values,viz logistic func-
applied separately for determining the efficiency of the machine learn-
tion and sigmoid function [10]. The Logistic Regression, also termed a
ing algorithm using various performance indicators such as Accuracy,
classification algorithm, is shown in Fig. 5 [64]. Logistic regression can
Precision, Recall, and F1-Score. We need to understand the basic blocks
be classified based on the number of categories as given below.
of evaluation measures before learning about these truly positive eval-
(i) Binomial: Only two types of values are possible in the target vari- uation measures true negative, false positive, and false negative.
able: “0” or “1” can represent “loss” versus “win,” “fail” versus True positive: A positive class is correctly predicted by the model
“pass,” “alive” versus “dead,” etc. or classifier. It can be represented by TP.
(ii) Multinomial: Three or more types are possible in target variables True negative: The model or classifier correctly predicts a negative
that are not ordered (i.e., types have no quantitative significance) class. It can be represented by TN.
like “virus A” versus “virus B” versus “virus C.” False positive: The model or classifier incorrectly predicts a positive
(iii) Ordinal: Ordered categories in a target variable; for example, call. It can be represented by FP.
an assessment score can be categorized as: “very good,” “good,” False negative: The model or classifier incorrectly predicts a nega-
“poor,” and “very poor.” Here, each category can be given a score tive class. It can be represented by FN.
like 0, 1, 2, 3, or vice versa. Accuracy: It is one of the evaluation measures of the machine learn-
ing model in which we can say how much accurate the classifier classi-
3.1.5. Random forest fies the data. We calculate accuracy using this formula:
Several classification algorithms exist, but the random forest (Fig. 6)
TP + TN
is one of the best classification algorithms in machine learning. It can Accuracy =
P+N

243
S.U. Hassan, J. Ahamed and K. Ahmad Sustainable Operations and Computers 3 (2022) 238–248

Fig. 5. Logistic regression.

Fig. 6. Classiﬁcation using random forest.

244
S.U. Hassan, J. Ahamed and K. Ahmad Sustainable Operations and Computers 3 (2022) 238–248

Fig. 7. Graphical representation of sentiment

attributes.

Precision will tell us how exact our model is (how many positive Table 3
identified classes were correct). We calculate the precision using this Comparative analysis of ML algorithms.
formula: Algorithm(Dataset) Accuracy Precision Recall F1Score
TP
Precision = SVM(IMDB) 85.5 85 87 86
TP + FP SVM(SPAM) 95.5 96 96 95
Recall: It will tell us how complete our model is (how many actual kNN(IMDB) 50.8 50 72 59
positives were identified correctly). We calculate the recall using this kNN(SPAM) 98.5 99 99 98
MNB(IMDB) 84.4 85 87 86
formula: MNB(SPAM) 97.4 98 97 97
TP RF(IMDB) 74.9 72 81 77
Recall =
TP + FN RF(SPAM) 96.5 97 97 96
LR(IMDB) 85.8 85 87 86
F1-Score:The harmonic mean of precision and recall gives the bal- LR(SPAM) 91.9 93 92 90
ance result of precision and recall. We calculate the f1-score using this
formula:
2 ∗ ( precision ∗ recall )
F1 − Score = don’t have an equal number of spam and ham labels, so this dataset is,
Precision + recall
also called the unbalanced dataset,which means the data is skewed.
We have an equal number of positive and negative sentiments in
4.1. Datasets used the IMDB dataset. We first cleaned the data using different preprocess-
ing steps like Removal of Punctuations, Stop-words, Frequent words,
We have used two data sets in this work, and these datasets are col- Stemming, and Lemmatization [70]. After the preprocessing, the text
lected from online repositories. The datasets areanalyzed on different is then converted into vectors using the bag of words model, term-
machine learning algorithms to analyze the efficiency of each algorithm. frequency, inverse document term frequency model, and finally, the al-
The description of the datasets is given in the subsection below. gorithm’s efficiency using different matrix evaluation methods. On the
other hand,In the Spam dataset, there are different labels like the num-
4.1.1. IMDB dataset ber of ham records which are more than the number of spam records. For
This dataset reviews the movies available on the Internet in which this dataset, it is compulsory to evaluate the classifier using precision,
we have 50000 records with two attributes. One is a review. Another recall, and f1-score. Table 3 represents the accuracy, precision, recall,
is a sentiment, as shown in Fig. 7. This dataset has an equal number and f1-score of the algorithm implementation on the IMDB and Spam
of positive and negative sentiments, so this dataset is also called the dataset.The performance of these algorithms using a graphical repre-
balanced dataset, which means the data is not skewed. sentation will give a clear representation to find out which machine
learning algorithm outperforms the other. SVM and Logistic Regression
4.1.2. SPAM dataset have 85.5 and 85.9% accuracies, respectively, as shown in Fig. 9.
This dataset is all about the normal SMS messages in which we have On the spam dataset, the support vector machine outperforms the
two labels, ham and spam. This dataset has 50572 records with two other classifier. The remaining algorithms have almost the same ac-
attributes, label and message, as shown in Fig. 8. In this dataset, we curacy as the support vector machine,i.e., 95.5%, the k-nearest neigh-

245
S.U. Hassan, J. Ahamed and K. Ahmad Sustainable Operations and Computers 3 (2022) 238–248

Fig. 8. Graphical representations of label attributes.

Fig. 9. Accuracy of selected ML algorithms on

IDBM dataset.

246
S.U. Hassan, J. Ahamed and K. Ahmad Sustainable Operations and Computers 3 (2022) 238–248

Fig. 10. Accuracy of selected ML algorithms

on Spam dataset.

bor has 98.5, multinomial naïve Bayes has 97.4,and Random forest has CRediT authorship contribution statement
96.5%.
Further, Logistic Regression has 91.9% accuracy, which is the least Sayar Ul Hassan: Conceptualization, Methodology, Software, Visu-
among all classiﬁers, as shown in Fig. 10. The results on various machine alization. Jameel Ahamed: Data curation, Writing – original draft, Su-
learning performance metrics are given in Table 3. pervision. Khaleel Ahmad: Writing – review & editing.

5. Limitations and future work References

[1] S.P. Nayat, L. Marti, C.B. Garcia, Text classification techniques in oil in-
In the future, this research can be expanded to include more algo- dustry applications, Adv. Intell. Syst. Comput. 239 (January) (2014) v–vi,
rithms with hyperparameter tuning and ensemble approaches. To rep- doi:10.1007/978-3-319-01854-6.
resent the effective information discovery, the models can also be im- [2] M. Ikonomakis, S. Kotsiantis, V. Tampakas, Text classification using ma-
chine learning techniques, WSEAS Trans. Comput. 4 (8) (2005) 966–974,
plemented with novel strategies for parameter optimization. In the area doi:10.11499/sicejl1962.38.456.
of text classification, streaming data processing has been rather under- [3] A. Wilkinson, N. Wenger, L.R. Shugarman, Literature review on advance directives,
explored and needs to be minutely viewed. As a result, if used correctly, US Department of Health and Human Services, Washington, DC, 2007.
[4] E. Uysal, A. Ozturk, Comparison of machine learning algorithms on different
ensemble and calibrated approaches will benefit text classification. datasets, in: 26th IEEE Signal Processing and Communications Applications Con-
ference, SIU 2018, No. ICIC 2017, 2018, pp. 1–4, doi:10.1109/SIU.2018.8404193.
6. Conclusion [5] J. Wang, Y. Li, J. Shan, J. Bao, C. Zong, L. Zhao, Large-scale text classification using
scope-based convolutional neural network–A deep learning approach, IEEE Access
7 (2019) 171548–171558, doi:10.1109/ACCESS.2019.2955924.
The most significant part of natural language processing is text clas- [6] X. Luo, Efficient English text classification using selected Machine Learning Tech-
sification, which automatically categorizes text data into a set of de- niques, Alex. Eng. J. 60 (3) (2021) 3401–3409, doi:10.1016/j.aej.2021.02.009.
[7] L. Wei, B. Wei, B. Wang, Text classification using support vector machine with mix-
sirable categories. Machine Learning-based techniques are essential for ture of kernel, Journal of Software Engineering and Applications 5 (2012) 55.
text classification. Therefore, this study uses five algorithms: Support [8] C.N. Kamath, S.S. Bukhari, A. Dengel, Comparative study between traditional ma-
Vector Machine, k-Nearest Neighbor Logistic Regression, Multinomial chine learning and deep learning approaches for text classification, in: Proceedings
of the ACM Symposium on Document Engineering 2018, 2018, August, pp. 1–11.
Nave Bayes, and Random Forest, as well as two datasets: IMDB and [9] M. Trivedi, S. Sharma, N. Soni, S. Nair, Comparison of text classification algorithms,
Spam. It is revealed from the results that out of the developed mod- International Journal of Engineering Research & Technology (IJERT) 4 (02) (2015).
els,the k-NN model outperforms the other models in the Spam dataset [10] A. Mohi, U. Din, K. Syed, T. Rabani, Q. Rayees, Machine learning based approaches
for detecting COVID-19 using clinical text data, Int. J. Inf. Technol. 12 (3) (2020)
with an accuracy of 98.5%. In contrast, the LR model surpasses the other
731–739, doi:10.1007/s41870-020-00495-9.
models in the IMDB dataset with an accuracy of 85.8% using the pro- [11] D. Mahesh Matta Meet Kumar Saraf, D. Mahesh Matta, M. Kumar Saraf, S. Memeti,
posed system. Prediction of COVID-19 using Machine Learning Techniques, 2020.
[12] C. C. Aggarwal and C. X. Zhai, Mining text data, vol. 9781461432. 2013.
[13] A. Sarkar, S. Chatterjee, W. Das, D. Datta, Text classification using support vec-
Declaration of Competing Interests tor machine, International Journal of Engineering Science Invention 4 (11) (2015)
33–37.
The authors declare that they have no known competing financial [14] M. Gupta, A. Bansal, B. Jain, J. Rochelle, A. Oak, M.S. Jalali, Whether the weather
will help us weather the COVID-19 pandemic–Using machine learning to mea-
interests or personal relationships that could have appeared to influence sure Twitter users’ perceptions, Int. J. Med. Inform. 145 (November 2020) (2021)
the work reported in this paper. 104340, doi:10.1016/j.ijmedinf.2020.104340.

247
S.U. Hassan, J. Ahamed and K. Ahmad Sustainable Operations and Computers 3 (2022) 238–248

[15] H.B. Syeda, M. Syed, K.W. Sexton, S. Syed, S. Begum, F. Syed, . . . F. Yu Jr,, Role of [43] S.V. Praveen, R. Ittamalla, G. Deepak, Analyzing Indian general public’s perspective
machine learning techniques to tackle the COVID-19 crisis: systematic review, JMIR on anxiety, stress and trauma during Covid-19-A machine learning study of 840,000
medical informatics 9 (1) (2021) e23811. tweets, Diabetes & Metabolic Syndrome: Clinical Research & Reviews 15 (3) (2021)
[16] D. Nagar, S. Raghav, A. Bhardwaj, R. Kumar, P.Lata Singh, R. Sindhwani, Machine 667–671.
learning–Best way to sustain the supply chain in the era of industry 4.0, Mater. Today [44] A.M.U.D. Khanday, Q.R. Khan, S.T. Rabani, Detecting textual propaganda using ma-
Proc. 47 (2021) 3676–3682, doi:10.1016/j.matpr.2021.01.267. chine learning techniques, Baghdad Sci J 18 (1) (2021) 199–209.
[17] M.A. Kadampur, S. Al Riyaee, Skin cancer detection–Applying a deep learn- [45] A.M.U.D. Khanday, Q.R. Khan, S.T. Rabani, Identifying propaganda from online so-
ing based model driven architecture in the cloud for classifying dermal cial networks during COVID-19 using machine learning techniques, International
cell images, Inform. Med. Unlocked 18 (November 2019) (2020) 100282, Journal of Information Technology 13 (1) (2021) 115–122.
doi:10.1016/j.imu.2019.100282. [46] N. Yadav, O. Kudale, A. Rao, S. Gupta, A. Shitole, Twitter sentiment analysis using
[18] N.F. Hordri, S.S. Yuhaniz, N.F.M. Azmi, S.M. Shamsuddin, Handling class imbalance supervised machine learning, Lect. Notes Data Eng. Commun. Technol. 57 (2021)
in credit card fraud using resampling methods, Int. J. Adv. Comput. Sci. Appl. 9 (11) 631–642, doi:10.1007/978-981-15-9509-7_51.
(2018) 390–396, doi:10.14569/ijacsa.2018.091155. [47] Ahmed, A. A. A., Aljabouh, A., Donepudi, P. K., & Choi, M. S. (2021). Detecting
[19] K. Crowston, F. Bolici, Impacts of machine learning on work, Proc. Annu. Hawaii Int. Fake News using Machine Learning: A Systematic Literature Review. arXiv preprint
Conf. Syst. Sci. 2019-January (2019) 5961–5970, doi:10.24251/hicss.2019.719. arXiv:2102.04458.
[20] B.S. Singh, S.A. Nayyar, A review paper on algorithms used for text classification, [48] Y. HaCohen-Kerner, D. Miller, Y. Yigal, The influence of preprocessing on text
International Journal of Application or Innovation in Engineering & Management classification using a bag-of-words representation, PLoS One 15 (5) (2020),
(IJAIEM) 2 (3) (2013). doi:10.1371/journal.pone.0232525.
[21] M. Ikonomakis, S. Kotsiantis, V. Tampakas, Text classification using machine learn- [49] Z. Jiang, B. Gao, Y. He, Y. Han, P. Doyle, Q. Zhu, Text classification using novel term
ing techniques, WSEAS transactions on computers 4 (8) (2005) 966–974. weighting scheme-based improved TF-IDF for Internet media reports, Mathematical
[22] A.I. Anik, S. Yeaser, A.I. Hossain, A. Chakrabarty, Player’s performance prediction Problems in Engineering 2021 (2021).
in ODI cricket using machine learning algorithms, in: 2018 4th International Con- [50] I.J. Jacob, Performance evaluation of caps-net based multitask learning architecture
ference on Electrical Engineering and Information & Communication Technology for text classification, Journal of Artificial Intelligence 2 (01) (2020) 1–10.
(iCEEiCT), IEEE, 2018, September, pp. 500–505. [51] F. Rustam, M. Khalid, W. Aslam, V. Rupapara, A. Mehmood, G.S. Choi, A perfor-
[23] Nigam, K., McCallum, A., & Mitchell, T. M. (2006). Semi-Supervised Text Classifica- mance comparison of supervised machine learning models for Covid-19 tweets sen-
tion Using EM. timent analysis, Plos one 16 (2) (2021) e0245909.
[24] I. Rasheed, V. Gupta, H. Banka, C. Kumar, Urdu text classification: A compara- [52] Kumar, D., Gopesh, A. C., & Singh, M. P. Restaurant Review Classification and Anal-
tive study using machine learning techniques, in: 2018 Thirteenth International ysis.
Conference on Digital Information Management (ICDIM), IEEE, 2018, September, [53] Nabeel Asim, M., Usman Ghani, M., Ibrahim, M. A., Ahmad, S., Mahmood, W., &
pp. 274–278. Dengel, A. (2020). Benchmark Performance of Machine And Deep Learning Based
[25] N. Aljedani, R. Alotaibi, M. Taileb, Hmatc: Hierarchical multi-label arabic text clas- Methodologies for Urdu Text Document Classification. arXiv e-prints, arXiv-2003.
sification model using machine learning, Egyptian Informatics Journal 22 (3) (2021) [54] J. Kaur, J.R. Saini, A study of text classification natural language processing algo-
225–237. rithms for Indian languages, VNSGU J Sci Technol 4 (1) (2015) 162–167.
[26] Y. Zhan, H. Chen, S.F. Zhang, M. Zheng, Chinese text categorization study based on [55] M. Abid, A. Habib, J. Ashraf, A. Shahid, Urdu word sense disambiguation using
feature weight learning, in: 2009 international conference on machine learning and machine learning approach, Cluster Computing 21 (1) (2018) 515–522.
cybernetics, 3, IEEE, 2009, July, pp. 1723–1726. [56] S.K. Singh, N. Katal, S.G. Modani, Multi-objective optimization of PID controller
[27] J. Sreemathy, P.S. Balamurugan, An efficient text classification using knn and naive for coupled-tank liquid-level control system using genetic algorithm, in: Proceed-
bayesian, International Journal on Computer Science and Engineering 4 (3) (2012) ings of the Second International Conference on Soft Computing for Problem Solving
392. (SocProS 2012), December 28-30, 2012, Springer, New Delhi, 2014, pp. 59–66.
[28] S. Mayor, B. Pant, Document classification using support vector machine, Interna- [57] B. Maram, G. Padmapriya, A.R. Satish, A framework for performance analysis on
tional Journal of Engineering Science and Technology 4 (4) (2012). machine learning algorithms using covid-19 dataset, Adv Math: Sci J 9 (10) (2020)
[29] F. Colas, P. Brazdil, Comparison of SVM and some older classification algo- 8207–8215.
rithms in text classification tasks, IFIP Int. Fed. Inf. Process. 217 (2006) 169–178, [58] A.I. Kadhim, An evaluation of preprocessing techniques for text classification, Int.
doi:10.1007/978-0-387-34747-9_18. J. Comput. Sci. Inf. Secur. 16 (6) (2018) 22–32.
[30] S. Tong and D. Koller, “with Applications to Text Classification,” pp. 45–66, 2001. [59] M.A. Rosid, A.S. Fitrani, I.R.I. Astutik, N.I. Mulloh, H.A. Gozali, Improving text pre-
[31] J. Shawe-Taylor and C. Watkins, “Text Classification using String Kernels.” processing for student complaint document classification using sastrawi, in: IOP Con-
[32] B. Trstenjak, S. Mikac, D. Donko, KNN with TF-IDF based framework for text catego- ference Series: Materials Science and Engineering, 874, IOP Publishing, 2020.
rization, Procedia Eng. 69 (2014) 1356–1364, doi:10.1016/j.proeng.2014.03.129. [60] R.I. Kurnia, Y.D. Tangkuman, A.S. Girsang, Classification of user comment using
[33] L. Baoli, Y. Shiwen, and L. Qin, “An improved k -nearest neighbor algorithm,” Proc. word2vec and SVM classifier, Int. J. Adv. Trends Comput. Sci. Eng. 9 (1) (2020)
20th Int. Conf. Comput. Process. Orient. Lang., 2003. 643–648, doi:10.30534/ijatcse/2020/90912020.
[34] E. M. Elnahrawy, “Log-based chat room monitoring using text categorization –A [61] A. Balinsky, H. Balinsky, S. Simske, Rapid change detection and text mining, in: Pro-
comparative study,” IASTED Int. Conf. Inf. Knowl. Shar. (IKS 2002), 2002. ceedings of the 2nd Conference on Mathematics in Defence (IMA), Defence Academy,
[35] G. Khazal, A. Zamyatin, Feature engineering for Arabic text classification, J. Eng. UK, 2011, October.
Appl. Sci. 14 (7) (2019) 2292–2301, doi:10.36478/jeasci.2019.2292.2301. [62] A.I. Kadhim, Survey on supervised machine learning techniques for automatic text
[36] S. Vijayarani, M.N. Nithya, Efficient machine learning classifiers for automatic in- classification, Artificial Intelligence Review 52 (1) (2019) 273–292.
formation classification, Int. J. Mod. Trends Eng. Res. (Ii) (2015) 685–694. [63] Y. Zheng, “An exploration on text classification with classical machine learning
[37] B. Agarwal, N. Mittal, Text classification using machine learning methods-a survey, algorithm,” 2019 Int. Conf. Mach. Learn. Big Data Bus. Intell., pp. 81–85, 2019,
in: Proceedings of the Second International Conference on Soft Computing for Prob- doi:10.1109/MLBDBI48998.2019.00023.
lem Solving (SocProS 2012), December 28-30, 2012, Springer, New Delhi, 2014, [64] R. Jindal, Techniques for text classification –Literature review and current trends,
pp. 701–709. Webology 12 (2) (2015) 1–28.
[38] S.H. Jambukia, V.K. Dabhi, H.B. Prajapati, ECG beat classification using ma- [65] B. Lopez, X. Sumba, IMDb Sentiment Analysis, 2019, pp. 2–6.
chine learning techniques, Int. J. Biomed. Eng. Technol. 26 (1) (2018) 32–53, [66] M. Usman and S. Ayub, “Urdu Text Classification using Majority Voting,” vol. 7, no.
doi:10.1504/IJBET.2018.089255. 8, pp. 265–273, 2016.
[39] “Machine learning applications based on SVM classification.pdf.”. [67] M. Bilal, H. Israr, M. Shahid, A. Khan, Sentiment classification of Roman-Urdu opin-
[40] Parul Sinha, Poonam Sinha, Comparative study of chronic kidney disease pre- ions using Navie Baysian, Decision Tree and KNN classification techniques, J. King
diction using KNN and SVM, Int. J. Eng. Res. V4 (12) (2015) 608–612, Saud Univ. Inf. Sci. 28 (3) (2015) 330344, doi:10.1016/j.jksuci.2015.11.003.
doi:10.17577/ijertv4is120622. [68] A. Aggarwal, J. Singh, K. Gupta, in: A Review of Different Text Categorization Tech-
[41] I. Ibrahim, A. Abdulazeez, The role of machine learning algorithms for niques, 7, 2018, pp. 11–15.
diagnosing diseases, J. Appl. Sci. Technol. Trends 2 (01) (2021) 10–19, [69] B. Charbuty, A. Abdulazeez, Classification based on decision tree algorithm
doi:10.38094/jastt20179. for machine learning, J. Appl. Sci. Technol. Trends 2 (01) (2021) 20–28,
[42] M. Elbadawi, S. Gaisford, A.W. Basit, Advanced machine-learning techniques in drug doi:10.38094/jastt20165.
discovery, Drug Discovery Today 26 (3) (2021) 769–777. [70] B. Mathiak, S. Eckstein, Five steps to text mining in biomedical literature, in: Pro-
ceedings of the European Workshop on Data Mining and Text Mining for Bioinfor-
matics, 2004, pp. 47–50.

248

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification
No ratings yet
A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification
16 pages
The Microsoft Fabric Handbook: Simplifying Data Engineering and Analytics
From Everand
The Microsoft Fabric Handbook: Simplifying Data Engineering and Analytics
Robert Johnson
No ratings yet
ps2 Sol
No ratings yet
ps2 Sol
4 pages
Best Text To Speech Ai - Aitech - Studio
No ratings yet
Best Text To Speech Ai - Aitech - Studio
8 pages
17 - Project Report - NLP-2-27
No ratings yet
17 - Project Report - NLP-2-27
26 pages
127 1498038923 - 21-06-2017 PDF
No ratings yet
127 1498038923 - 21-06-2017 PDF
9 pages
A Complete Process of Text Classification System Using State‐of‐the‐Art NLP Models
No ratings yet
A Complete Process of Text Classification System Using State‐of‐the‐Art NLP Models
26 pages
IEEE-paper[1] Original
No ratings yet
IEEE-paper[1] Original
3 pages
Comparison of Text Classifiers On News Articles
No ratings yet
Comparison of Text Classifiers On News Articles
5 pages
IEEE-paper on NLP
No ratings yet
IEEE-paper on NLP
3 pages
text classification research paper 2
No ratings yet
text classification research paper 2
7 pages
Text Classification Based on Machine Learning and
No ratings yet
Text Classification Based on Machine Learning and
12 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
27 pages
Deep Learning
No ratings yet
Deep Learning
42 pages
Jurnal
No ratings yet
Jurnal
19 pages
A Survey On Machine Learning Techniques
No ratings yet
A Survey On Machine Learning Techniques
8 pages
Machine Learning in Automated Text Categorization
No ratings yet
Machine Learning in Automated Text Categorization
55 pages
research paper 3
No ratings yet
research paper 3
7 pages
Book Genre Categorization Using Machine Learning Algorithms (K-Nearest Neighbor, Support Vector Machine and Logistic Regression) Using Customized Dataset
No ratings yet
Book Genre Categorization Using Machine Learning Algorithms (K-Nearest Neighbor, Support Vector Machine and Logistic Regression) Using Customized Dataset
12 pages
Unit 2
No ratings yet
Unit 2
26 pages
Survey On Text Classification
No ratings yet
Survey On Text Classification
7 pages
Effective Classification of Text
No ratings yet
Effective Classification of Text
6 pages
A Survey On Different Types of Approaches To Text Categorization
No ratings yet
A Survey On Different Types of Approaches To Text Categorization
3 pages
Deng Et Al. - 2019 - Feature Selection For Text Classification A Review
No ratings yet
Deng Et Al. - 2019 - Feature Selection For Text Classification A Review
20 pages
19_ArticleClassificationusingNaturalLanguageProcessingandMachineLearning
No ratings yet
19_ArticleClassificationusingNaturalLanguageProcessingandMachineLearning
8 pages
Kshitij Text Classification
No ratings yet
Kshitij Text Classification
20 pages
Text Classification PDF
No ratings yet
Text Classification PDF
7 pages
spam detection
No ratings yet
spam detection
39 pages
Machine Learning For Text Document Classification-Efficient Classification Approach
No ratings yet
Machine Learning For Text Document Classification-Efficient Classification Approach
8 pages
A Comprehensive Survey of Text Classification Techniques and Their
No ratings yet
A Comprehensive Survey of Text Classification Techniques and Their
23 pages
News Classsification
No ratings yet
News Classsification
11 pages
Unit-3
No ratings yet
Unit-3
27 pages
Comparative Study Between Traditional Machine Learning and Deep Learning Approaches For Text Classification
No ratings yet
Comparative Study Between Traditional Machine Learning and Deep Learning Approaches For Text Classification
11 pages
Theis finaldoc
No ratings yet
Theis finaldoc
86 pages
111 1460444112 - 12-04-2016 PDF
No ratings yet
111 1460444112 - 12-04-2016 PDF
7 pages
Ijcst V3i2p17
No ratings yet
Ijcst V3i2p17
5 pages
Science Research Journal
No ratings yet
Science Research Journal
7 pages
Review of Text Classification Methods On Deep Learning
No ratings yet
Review of Text Classification Methods On Deep Learning
13 pages
IR - Group1
No ratings yet
IR - Group1
27 pages
artigo
No ratings yet
artigo
10 pages
NLP m4
No ratings yet
NLP m4
97 pages
A Survey On Text Classification From Shallow To Deep Learning
No ratings yet
A Survey On Text Classification From Shallow To Deep Learning
21 pages
mining text data and classificatin
No ratings yet
mining text data and classificatin
4 pages
Project Proposal - Group 17-2-5
No ratings yet
Project Proposal - Group 17-2-5
4 pages
text classification reseach paper
No ratings yet
text classification reseach paper
4 pages
Technovate Poster - Template (AutoRecovered)
No ratings yet
Technovate Poster - Template (AutoRecovered)
1 page
MEE 437 Operations Research Project Document Text Mining For Supplier Manufacturing Industries
No ratings yet
MEE 437 Operations Research Project Document Text Mining For Supplier Manufacturing Industries
25 pages
A Comparative Study On Different Types of Approaches To The Arabic Text Classification
No ratings yet
A Comparative Study On Different Types of Approaches To The Arabic Text Classification
12 pages
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
text classification research paper 3
No ratings yet
text classification research paper 3
13 pages
Essential Federated Learning: AI at the Edge
From Everand
Essential Federated Learning: AI at the Edge
Robert Johnson
No ratings yet
Woleola
No ratings yet
Woleola
1 page
Applied Text Mining
No ratings yet
Applied Text Mining
505 pages
UNIT-III Text Classification
No ratings yet
UNIT-III Text Classification
4 pages
Selected Text Analysis 2
No ratings yet
Selected Text Analysis 2
20 pages
Enhancing Text Classification Through Novel Deep Learning Sequential Attention Fusion Architecture
No ratings yet
Enhancing Text Classification Through Novel Deep Learning Sequential Attention Fusion Architecture
12 pages
CAT King study material 4
No ratings yet
CAT King study material 4
32 pages
Practical MXNet Applications: Definitive Reference for Developers and Engineers
From Everand
Practical MXNet Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Lect05
No ratings yet
Lect05
17 pages
Researchpaperclassification IEEEprocedding 1
No ratings yet
Researchpaperclassification IEEEprocedding 1
7 pages
Data Mining Report
No ratings yet
Data Mining Report
28 pages
Mixed Mode Crack Initiation in Piezoelectric Ceramic Strip: B.L. Wang, N. Noda
No ratings yet
Mixed Mode Crack Initiation in Piezoelectric Ceramic Strip: B.L. Wang, N. Noda
13 pages
hostel_informatiosheet
No ratings yet
hostel_informatiosheet
2 pages
I/G/O I/G/O A/M/I: Instrument Approach Chart - Icao
100% (1)
I/G/O I/G/O A/M/I: Instrument Approach Chart - Icao
1 page
GR 17 004 en
No ratings yet
GR 17 004 en
62 pages
ĐỀ 3
No ratings yet
ĐỀ 3
6 pages
Chemical Registry
No ratings yet
Chemical Registry
4 pages
2016 Catalogue English Complete
No ratings yet
2016 Catalogue English Complete
288 pages
1266 41
No ratings yet
1266 41
171 pages
Python For Data Science Syllabus
No ratings yet
Python For Data Science Syllabus
6 pages
Ascii Code
No ratings yet
Ascii Code
4 pages
JDL Volume 7 Issue 1 Pages 1-143
No ratings yet
JDL Volume 7 Issue 1 Pages 1-143
145 pages
Introduction To Rheology
No ratings yet
Introduction To Rheology
13 pages
Review of Related Studies Sample Thesis
100% (2)
Review of Related Studies Sample Thesis
7 pages
Investment stock sheet 2024
No ratings yet
Investment stock sheet 2024
7 pages
Notice: Antidumping: Circular Welded Carbon Steel Pipes and Tubes From&#8212 Various Countries
No ratings yet
Notice: Antidumping: Circular Welded Carbon Steel Pipes and Tubes From&#8212 Various Countries
11 pages
Agarwal Sons: House of Fasteners, Hand Tools and General Hardware
No ratings yet
Agarwal Sons: House of Fasteners, Hand Tools and General Hardware
3 pages
Formylation of Alcohol With Formic Acid Under Solvent-Free and Neutral Conditions Catalyzed by Free I Ori Generated in Situ From Fe (No) 9H O/Nai
No ratings yet
Formylation of Alcohol With Formic Acid Under Solvent-Free and Neutral Conditions Catalyzed by Free I Ori Generated in Situ From Fe (No) 9H O/Nai
5 pages
Kautilya's Arthashastra
100% (2)
Kautilya's Arthashastra
10 pages
Hitachi zx120 Brochure
No ratings yet
Hitachi zx120 Brochure
16 pages
University of Khartoum: CFD Simulation of Axial Flow Turbine
No ratings yet
University of Khartoum: CFD Simulation of Axial Flow Turbine
80 pages
1732703249159
No ratings yet
1732703249159
2 pages
Lirik Naruto
No ratings yet
Lirik Naruto
23 pages
Operating Manual - CryoCube Freezer F570-FC660
No ratings yet
Operating Manual - CryoCube Freezer F570-FC660
60 pages
Polya Urn Model
No ratings yet
Polya Urn Model
2 pages
Breaking The Payments Dam
No ratings yet
Breaking The Payments Dam
24 pages
H.S. Model Question Geography
No ratings yet
H.S. Model Question Geography
89 pages
SS7 DPC Outage Handling
No ratings yet
SS7 DPC Outage Handling
19 pages
Needs Analysis: Improve This Article Adding Citations To Reliable Sources
No ratings yet
Needs Analysis: Improve This Article Adding Citations To Reliable Sources
2 pages
MANUAL BALLUFF INDUCTIVE SENSORS
No ratings yet
MANUAL BALLUFF INDUCTIVE SENSORS
178 pages

Analytics of Machine Learning-based Algorithms for Text Classification

Uploaded by

Analytics of Machine Learning-based Algorithms for Text Classification

Uploaded by

Sustainable Operations and Computers 3 (2022) 238–248

Contents lists available at ScienceDirect

Sustainable Operations and Computers

Analytics of machine learning-based algorithms for text classiﬁcation

Title Author(year) Methodologies Findings

Eﬃcient English Xiaoyu Luo(2021) SVM, Precision, Recall, and

Benchmark Performance Of Muhammad NabeelAsim, Naïve Bayes, SVM Outperformed

Fig. 1. Flow diagram of proposed model.

Methods Strengths Weaknesses Applications

3.1.2. K-nearest neighbors classiﬁer(KNN)

Fig. 2. Linear data classiﬁcation using SVM.

Fig. 3. Non-linear data classiﬁcation using SVM.

Fig. 4. Working of k-NN.

Fig. 5. Logistic regression.

Fig. 6. Classiﬁcation using random forest.

Fig. 7. Graphical representation of sentiment

Fig. 8. Graphical representations of label attributes.

Fig. 9. Accuracy of selected ML algorithms on

Fig. 10. Accuracy of selected ML algorithms

5. Limitations and future work References

You might also like