0% found this document useful (0 votes)

22 views27 pages

Email

Uploaded by

Ramanand kumar Gupt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views27 pages

Email

Uploaded by

Ramanand kumar Gupt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Mahatma Jyotiba Phule Rohilkhand

University, Bareilly
Institute of Engineering and Technology

Department of Computer Science and Information Technology

(2023-24)

Project Report On
EMAIL SPAM DETECTION.

UNDER THE GUIDANCE OF –

Dr. BRAJESH KUMAR

Submitted By –

Aditya Gupta(220089020026)
Ramanand Kumar Gupt(220089020062)
Akshay Pratap Singh(220089020029)
Surendra(220089020068)
ACKNOWLEDGEMENT

We extend our sincere and heartfelt thanks to our esteemed guide, Dr. Brajesh Kumar and for
his exemplary guidance, monitoring and constant encouragement throughout the course at
crucial junctures and for showing us the right way.

We would like to extend thanks to our respected Head of the division, [Link] Rishiwal for
allowing us to use the facilities available. We would like to thank other faculty members also.
Last but not least, we would like to thank our friends and family for the support and
encouragement they have given us during the course of our work.

we wish to express our thanks to all Teaching and Non-teaching staff members of the
Department of Computer Science and Information Technology who were helpful in many ways
for the completion of the project.

Aditya Gupta (21CS23)

Ramanand Kumar Gupt(21CS28)

Akshay Pratap Singh(21CS30)

Surendra(21CS59)
INDEX

1. INTRODUCTION
1.1 CONTEXT
1.2 MOTIVATION
1.3 OBJECTIVE

2. LITERATURE REVIEW
2.1 INTRODUCTION
2.2 RELATED WORK

3. METHODOLOGY
3.1 METHODOLOGY

4. AlGORITHM
4.1 ALGORITHM

5. FLOW CHART
5.1 FLOWCHART
6. DESCRIBING OF DATASET
6.1 spam detection using machine learning
7. RESULT

8. Conclusion

9. Reference
ABSTRACT

Nowadays communication plays a major role in everything be it professional

or personal. Email communication service is being used extensively because of
its free use services, low-cost operations, accessibility, and popularity. Emails
have one major security flaw that is anyone can send an email to anyone just by
getting their unique user id. This security flaw is being exploited by some
businesses and ill-motivated persons for advertising, phishing, malicious
purposes, and finally fraud. This produces a kind of email category called
SPAM.

Spam refers to any email that contains an advertisement, unrelated and frequent
emails. These emails are increasing day by day in numbers. Studies show that
around 55 percent of all emails are some kind of spam. A lot of effort is being
put into this by service providers. Spam is evolving by changing the obvious
markers of detection. Moreover, the spam detection of service providers can
never be aggressive with classification because it may cause potential
information loss in case of a misclassification.

To tackle this problem, we present a new and efficient method to detect spam
using machine learning and natural language processing. A tool that can detect
and classify spam. In addition to that, it also provides information regarding the
text provided in a quick view format for user convenience.
1. INTRODUCTION

1.1 CONTEXT
Email spam detection is a crucial aspect of ensuring the security and efficiency of
email communication. Spam refers to unsolicited and often irrelevant or inappropriate
messages sent over the internet, typically to a large number of users, for advertising,
phishing, spreading malware, or other malicious purposes. Detecting and filtering out
spam emails is essential to protect users from potential security threats and to maintain
the integrity of email communication. Here's the context of email spam detection:

1.2 MOTIVATION
The motivation behind email spam detection projects lies in addressing
several important concerns and challenges associated with the
increasing volume of spam emails. Here are some key motivations for
implementing and continuously improving email spam detection
systems the motivation for email spam detection projects stems from a
combination of user-centric, security, resource optimization,
compliance, and business continuity considerations. The ongoing
evolution of spamming techniques reinforces the need for continuous
improvement and innovation in spam detection technologies.

1.3 OBJECTIVE

The primary objectives of email spam detection projects revolve around

enhancing the security, efficiency, and user experience associated with
email communication. Here are the key objectives of such projects.

The objectives of email spam detection projects are multifaceted,

encompassing security, user experience, resource optimization,
compliance, and adaptability to emerging threats. These objectives aim
to create a secure and efficient email communication environment for
both individuals and organizations.
2 LITERATURE REVIEWS

2.1 Introduction

This chapter discusses the literature review for machine learning classifier that being
used in previous researches and projects. It is not about information gathering but it
summarizes the prior research that related to this project. It involves the process of
searching, reading, analyzing, summarizing and evaluating the reading materials based
on the project.

A lot of research has been done on spam detection using machine learning. But
due to the evolvement of spam and development of various technologies the
proposed methods are not dependable. Natural language processing is one of
the lesser known fields in machine learning and it is reflected here with
comparatively less work present.

2.2 Related work

Spam classification is a problem that is neither new nor simple. A lot of research has
been done and several effective methods have been proposed.

M. RAZA, N. D. Jayasinghe, and M. M. A. Muslam have analyzed various techniques

for spam classification and concluded that naïve Bayes and support vector machines
have higher accuracy than the rest, around 91% consistently [1].

S. Gadde, A. Lakshmanarao, and S. Satyanarayana in their paper on spam detection

concluded that the LSTM system resulted in higher accuracy of 98%[2].

P. Sethi, V. Bhandari, and B. Kohli concluded that machine learning algorithms perform
differently depending on the presence of different attributes [3].

H. Karamollaoglu, İ. A. Dogru, and M. Dorterler performed spam classification on

Turkish messages and emails using both naïve Bayes classification algorithms and
support vector machines and concluded that the accuracies of both models measured
around 90% [4].

P. Navaney, G. Dubey, and A. Rana compared the efficiency of the SVM, 12 naïve
Bayes, and entropy method and the SVM had the highest accuracy (97.5%) compared to
the other two models [5].
S. Nandhini and J. Marseline K.S in their paper on the best model for spam detection it
is concluded that random forest algorithm beats others in accuracy and KNN in building
time [6].

S. O. Olatunji concluded in her paper that while SVM outperforms ELM in terms of
accuracy, the ELM beats the SVM in terms of speed [7].

M. Gupta, A. Bakliwal, S. Agarwal, and P. Mehndiratta studied classical machine learning

classifiers and concluded that convolutional neural network outperforms the classical machine
learning methods by a small margin but take more time for classification [8].

N. Kumar, S. Sonowal, and Nishant, in their paper, published that naïve Bayes
algorithm is best but has class conditional limitations [9].

T. Toma, S. Hassan, and M. Arifuzzaman studied various types of naïve Bayes

algorithms and proved that the multinomial naïve Bayes classification
algorithm has better accuracy than the rest with an accuracy of 98% [10].

F. Hossain, M. N. Uddin, and R. K. Halder in their study concluded that

machine learning models outperform deep learning models when it comes to
spam classification and ensemble models outperform individual models in
terms of accuracy and precision [11]
3. METHODOLOGY

Data cleaning
Data cleaning, also known as data preprocessing or data wrangling, is a crucial
step in the machine learning pipeline. It involves the identification and correction
of errors, inconsistencies, and missing values in the dataset to ensure that the data
is suitable for training machine learning models. Here are some common data
cleaning tasks.

Exploratory Data Analysis

EDA stands for Exploratory Data Analysis, and it is a critical step in the process
of understanding and preparing data for machine learning. EDA involves
examining and visualizing the dataset to gather insights, discover patterns, and
identify relationships among variables. This process helps in making informed
decisions about data preprocessing, feature engineering, and model selection. Here
are some key steps and techniques involved in EDA for machine learning.

Text preprocessing
Text preprocessing is a crucial step in preparing text data for machine learning
applications. It involves cleaning and transforming raw text into a format that is
suitable for analysis and model training. Here are common text preprocessing steps
used in machine learning.

Model building
Building a machine learning model involves selecting an appropriate algorithm,
preparing the data, training the model, and evaluating its performance.

Evaluation
Evaluation in machine learning involves assessing the performance of a model on
a dataset. The goal is to understand how well the model generalizes to new, unseen
data.

Improvement
Improving machine learning models involves optimizing various aspects of the
model, the data, and the training process to enhance performance and
generalization.
Deploy
Deploying a machine learning model involves making your trained model
available for use in a real-world environment, where it can make predictions on
new, unseen data. The deployment process can vary depending on the type of
application (web service, mobile app, edge device) and the specific requirements
of the project.
4. AlGORITHM

A combination of algorithms are used for the classifications are as follows..

K-Nearest Neighbors
KNN is a classification algorithm. It comes under supervised algorithms. All the
data points are assumed to be in an n-dimensional space. And then based on
neighbors the category of current data is determined based on the majority.
Euclidian distance is used to determine the distance between points.
The distance between 2 points is calculated as

d=√(〖(x2-x1)〗^2+〖(y2-y1)〗^2 )

The distances between the unknown point and all the others are calculated.
Depending on the K provided k closest neighbors are determined. The category to
which the majority of the neighbors belong is selected as the unknown data
category.
If the data contains up to 3 features then the plot can be visualized. It is fairly slow
compared to other distance-based algorithms such as SVM as it needs to determine
the distance to all points to get the closest neighbors to the given point.

Naïve Bayes Classifier

A naïve Bayes classifier is a supervised probabilistic machine learning model that is used
for classification tasks. The main principle behind this model is the Bayes theorem.

Bayes Theorem:
Naive Bayes is a classification technique that is based on Bayes’ Theorem with an
assumption that all the features that predict the target value are independent of each
other. It calculates the probability of each class and then picks the one with the highest
probability.

Naive Bayes classifier assumes that the features we use to predict the target are
independent and do not affect each other. Though the independence assumption is never
correct in real-world data, but often works well in practice. so that it is called “Naive”
[14].

P(A│B)=(P(B│A)P(A))/P(B)
Extra Trees Classifier(ETC)

Extra Trees is an ensemble learning method that is similar to Random Forests. It builds
multiple decision trees and merges their predictions. The main difference lies in the way
the trees are constructed.
In Extra Trees, each decision tree is built from the entire dataset using random thresholds
for feature splits. This randomness helps to make the algorithm more robust and less
prone to overfitting.

Random Forest Classifier

Random Forest classifier is a supervised ensemble algorithm. A random forest consists

of multiple random decision trees. Two types of random nesses are built into the trees.
First, each tree is built on a random sample from the original data. Second, at each tree
node, a subset of features is randomly selected to generate the best split [16].
Decision Tree:
The decision tree is a classification algorithm based completely on features. The
tree repeatedly splits the data on a feature with the best information gain. This
process continues until the information gained remains constant. Then the
unknown data is evaluated feature by feature until categorized. Tree pruning
techniques are used for improving accuracy and reducing the overfitting of data.
Several decision trees are created on subsets of data the result that was given by
the majority of trees is considered as the final result. The number of trees to be
created is determined based on accuracy and other metrics through iterative
methods. Random forest classifiers are mainly used on condition-based data but it
works for text if the text is converted into numerical form.

Support Vector Machines (SVM)

It is a machine learning algorithm for classification. Decision boundaries are
drawn between various categories and based on which side the point falls to the
boundary the category is determined.

AdaBoost
AdaBoost, short for Adaptive Boosting, is an ensemble learning algorithm used in
machine learning for classification and regression tasks. It was introduced by Yoav
Freund and Robert Shapira in 1996. The primary idea behind AdaBoost is to
combine the predictions of weak learners (usually simple models) to create a
strong learner that performs well on the overall task.
Logistic Regression
Logistic Regression is a “Supervised machine learning” algorithm that can be used
to model the probability of a certain class or event. It is used when the data is
linearly separable and the outcome is binary or dichotomous [17]. The
probabilities are calculated using a sigmoid function.
For example, let us take a problem where data has n features. We need to fit a line
for the given data and this line can be represented by the equation
z=b_0+b_1 x_1+b_2 x_2+b_3 x_3….+b_n x_n
here z = odds generally,
odds are calculated as
odds=p(event occurring)/p(event not occurring)

Gradient Boosting Decision Trees (GBDT)

Gradient Boosting Decision Trees (GBDT) is an ensemble learning algorithm used
for both classification and regression tasks. It is a popular and powerful machine
learning technique that builds a strong predictive model by combining the
predictions of multiple weak learners, typically decision trees.
[Link]

[Link]
[Link] OF DATASET

Module:-

Numpy
NumPy is a powerful, open-source library for the Python programming language that
provides support for large, multi-dimensional arrays and matrices of numerical data, as
well as a large collection of mathematical functions to operate on these arrays. It is
widely used in scientific computing, data analysis, machine learning, and other related
fields. One of the main features of NumPy is its n-dimensional array object, which is
used to store and manipulate large arrays of numerical data.

Pandas
Pandas allow us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets and make them readable and relevant.
Relevant data is very important in data science.

Scikit-Learn
Scikit-Learn, also known as sk-learn is a python library to implement machine learning
models and statistical modelling. Through scikit-learn, we can implement various
machine learning models for regression, classification, clustering, and statistical tools
for analyzing these models.

Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and interactive
visualizations in Python. Matplotlib makes easy things easy and hard things possible.
Create publication quality plots. Make interactive figures that can zoom, pan, update.

Pilot
pilot is a collection of command style functions that make matplotlib work like
MATLAB. Each pilot function makes some change to a figure: e.g., creates a figure,
creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot
with labels, etc.

NLTK (Natural Language Toolkit)

NLTK (Natural Language Toolkit) is the go-to API for NLP (Natural Language
Processing) with Python. It is a really powerful tool to preprocess text data for further
analysis like with ML models for instance. It helps convert text into numbers, which
the model can then easily work with.
6.1 SPAM DETECTION USING MACHINE LEARNING

• For training the algorithm dataset from Kaggle is used which is shown below

[Link]
• It has many fields, some of these columns of the dataset are not required. So
remove some columns which are notrequired. We need to change the names of
the columns.

[Link] of dataset

• With the help of NLTK (Natural Language Tool Kit) for the text processing, Using
Matplotlib you can plot graphs, histogram and bar plot and all those things ,Word
Cloud is used to present text data and pandas for data manipulation and
analysis, NumPy is to do the mathematical and scientific operation. The
packages used in the proposed model are shown below.

[Link]
• Split the data into training and testing sets as shown below. Some
percentage f the data set is used as train dataset and the rest as a test dataset.

[Link] dataset
• Reset train and test index as shown in the next column:

Fig.6. Reset train and test index

• We need to find out the most repeated words in the spam and ham
[Link] Word Cloud library is us.
• Whenever there is any message, we must first preprocess the input messages.
We need to convert all the input characters to lowercase.

• Then split up the text into small pieces and also removing the punctuations. So
the Tokenization process is used to remove punctuations and splitting
messages.

• We need to find the probability of the word in spam and ham messages.

Fig.10. Ham and spam probability

• plot the histogram graph

[Link] graph
[Link] graph

• Exploratory data analysis (EDA)

[Link]
[Link] and Visualization
When we receive message in the inbox ,that message will be exported to dataset
shown This message will be detected as spam or not.

Accuracy:

Accuracy is a metric that measures how often a Machine learning model

correctly predicts the outcomes.
Precision:

Precision is one indicator of a machine learning models

performance-the Quality of a positive prediction made by the
model.
[Link]
6.1 Conclusion
From the results obtained we can conclude that an ensemble machine learning model is
more effective in detection and classification of spam than any individual algorithms.
We can also conclude that TF-IDF (term frequency inverse document frequency)
language model is more effective than Bag of words model in classification of spam
when combined with several algorithms. And finally, we can say that spam detection
can get better if machine learning algorithms are combined and tuned to needs.

6.2 Future work

There are numerous applications to machine learning and natural language processing
and when combined they can solve some of the most troubling problems concerned with
texts. This application can be scaled to intake text in bulk so that classification can be
done more effectively in some public sites. Other contexts such as negative, phishing,
malicious, etc.. can be used to train the model to filter things such as public comments
in various social sites. This application can be converted to online type of machine
learning system and can be easily updated with latest trends of spam and other mails so
that the system can adapt to new types of spam emails and texts.
[Link]

[1] S. H. a. M. A. T. Toma, "An Analysis of Supervised Machine Learning Algorithms

for Spam Email Detection," in International Conference on Automation, Control and
Mechatronics for Industry 4.0 (ACMI), 2021.

[2]S. Nandhini and J. Marseline K.S., "Performance Evaluation of Machine Learning

Algorithms for Email Spam Detection," in International Conference on Emerging Trends
in Information Technology and Engineering (ic-ETITE), 2020.

[3] A. L. a. S. S. S. Gadde, "SMS Spam Detection using Machine Learning and Deep
Learning Techniques," in 7th International Conference on Advanced Computing and
Communication Systems (ICACCS), 2021, 2021.

[4] V. B. a. B. K. P. Sethi, "SMS spam detection and comparison of various machine

learning algorithms," in International Conference on Computing and Communication
Technologies for Smart Nation (IC3TSN), 2017.

[5] G. D. a. A. R. P. Navaney, "SMS Spam Filtering Using Supervised Machine Learning

Algorithms," in 8th International Conference on Cloud Computing, Data Science &
Engineering (Confluence), 2018.

[6]S. O. Olatunji, "Extreme Learning Machines and Support Vector Machines models
for email spam detection," in IEEE 30th Canadian Conference on Electrical and
Computer Engineering (CCECE), 2017.

[7] S. S. a. N. N. Kumar, "Email Spam Detection Using Machine Learning Algorithms,"

in Second International Conference on Inventive Research in Computing Applications
(CIRCA), 2020.

[8] R. Madan, "[Link]," [Online]. Available: [Link]

vidhya/tf-idf-term-frequency-technique-easiest-explanatio n-for-text-classification-in-
nlp-with-code-8ca3912e58c3.

[9] N. D. J. a. M. M. A. M. M. RAZA, "A Comprehensive Review on Email Spam

Classification using Machine Learning Algorithms," in International Conference on
Information Networking (ICOIN), 2021, 2021.

[10] A. B. S. A. a. P. M. M. Gupta, "A Comparative Study of Spam SMS Detection

Using Machine Learning Classifiers," in Eleventh International Conference on
Contemporary Computing (IC3), 2018.

Project Report Emaildetection 4 44
No ratings yet
Project Report Emaildetection 4 44
41 pages
1822 B Deleted Merged Cropped
No ratings yet
1822 B Deleted Merged Cropped
40 pages
1822 B Deleted
No ratings yet
1822 B Deleted
38 pages
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
No ratings yet
EMAIL+SPAM+DETECTION Final Fishries++ (2658+to+2664) - 1
7 pages
E-Mail Spam Detection
No ratings yet
E-Mail Spam Detection
8 pages
Pending Proj
No ratings yet
Pending Proj
37 pages
Spam Detection for CS Students
No ratings yet
Spam Detection for CS Students
29 pages
Mini Project Final 10,42,52
No ratings yet
Mini Project Final 10,42,52
39 pages
IEEE Conference Template 148
No ratings yet
IEEE Conference Template 148
6 pages
AI-Enabled Email Classiciation Spam Detection (RP)
No ratings yet
AI-Enabled Email Classiciation Spam Detection (RP)
6 pages
Email Spam Detection Project Report
No ratings yet
Email Spam Detection Project Report
19 pages
Presentation 3
No ratings yet
Presentation 3
13 pages
Spam Detection via ML & NLP
No ratings yet
Spam Detection via ML & NLP
44 pages
Machine Learning for Email Spam Detection
No ratings yet
Machine Learning for Email Spam Detection
9 pages
IJRPR8167
No ratings yet
IJRPR8167
7 pages
Aryan Blackbook 1
No ratings yet
Aryan Blackbook 1
29 pages
Research Article On The Forensic
No ratings yet
Research Article On The Forensic
14 pages
Ijirt156181 Paper
No ratings yet
Ijirt156181 Paper
5 pages
Published Paper
No ratings yet
Published Paper
9 pages
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
No ratings yet
Enhancing Email Security With Naïve Bayes Spam Detection - Docx Fully Edited
64 pages
Automated Spam Detection Using ML
No ratings yet
Automated Spam Detection Using ML
4 pages
VBK23 Cse 041
No ratings yet
VBK23 Cse 041
6 pages
Final PPT
No ratings yet
Final PPT
18 pages
Pruthviraj Micor Foml
No ratings yet
Pruthviraj Micor Foml
26 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
17 pages
Email Spam Detection Seminar
No ratings yet
Email Spam Detection Seminar
18 pages
Spam Mail Detection Using Machine Learning
No ratings yet
Spam Mail Detection Using Machine Learning
5 pages
Spam Detection & Classification Final
No ratings yet
Spam Detection & Classification Final
38 pages
Evaluation and Comparison of Machine Learning Models For Ham and Spam Email Classification
No ratings yet
Evaluation and Comparison of Machine Learning Models For Ham and Spam Email Classification
13 pages
46 - Ijme... Mech Engg..Research Paper-1
No ratings yet
46 - Ijme... Mech Engg..Research Paper-1
10 pages
Email Spam Detection for Engineers
No ratings yet
Email Spam Detection for Engineers
4 pages
Fin Irjmets1697888326
No ratings yet
Fin Irjmets1697888326
4 pages
Vishal FOML Micro Project Vishal & Milan
No ratings yet
Vishal FOML Micro Project Vishal & Milan
26 pages
Report (1) 1
No ratings yet
Report (1) 1
35 pages
B. Flowchart of The Model: Esult
No ratings yet
B. Flowchart of The Model: Esult
3 pages
Email Classification with Machine Learning
No ratings yet
Email Classification with Machine Learning
22 pages
B.Sc. Project: Email Spam Filter
No ratings yet
B.Sc. Project: Email Spam Filter
35 pages
NLP-RF Spam Detection Methodology
No ratings yet
NLP-RF Spam Detection Methodology
22 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Ieee Conference Template
No ratings yet
Ieee Conference Template
7 pages
Email Spam Detection
No ratings yet
Email Spam Detection
13 pages
Spam Detection Synopsis
No ratings yet
Spam Detection Synopsis
8 pages
Spam Detection
No ratings yet
Spam Detection
39 pages
Final Report Spam Classifier
100% (1)
Final Report Spam Classifier
24 pages
Madhavan 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012113
No ratings yet
Madhavan 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012113
12 pages
Thameena Report
No ratings yet
Thameena Report
30 pages
Machine Learning Based Classification For Spam Detection
No ratings yet
Machine Learning Based Classification For Spam Detection
14 pages
ML Techniques for Spam Detection
No ratings yet
ML Techniques for Spam Detection
7 pages
Spam Detection in Email Using Machine Le
No ratings yet
Spam Detection in Email Using Machine Le
8 pages
Report
No ratings yet
Report
11 pages
44 Decision Tree Model For Email Classification
No ratings yet
44 Decision Tree Model For Email Classification
4 pages
Spam Email Detection Using Python and Machine Learning
No ratings yet
Spam Email Detection Using Python and Machine Learning
14 pages
Final Report (Saie)
No ratings yet
Final Report (Saie)
38 pages
$RB0DCAN
No ratings yet
$RB0DCAN
10 pages
Spam Detection
No ratings yet
Spam Detection
4 pages
Spam Detection via Machine Learning
No ratings yet
Spam Detection via Machine Learning
11 pages
Spam Classification Based On Supervised Learning U
No ratings yet
Spam Classification Based On Supervised Learning U
6 pages
Emai Spam Detection Using Machine Learning and Python - IJRPR3714
No ratings yet
Emai Spam Detection Using Machine Learning and Python - IJRPR3714
6 pages
Spam Email Using Machine Learning
No ratings yet
Spam Email Using Machine Learning
13 pages
Email
No ratings yet
Email
27 pages
itemInvoiceDownload 2
No ratings yet
itemInvoiceDownload 2
1 page
ML Assignment
No ratings yet
ML Assignment
13 pages
Synopsis
No ratings yet
Synopsis
7 pages
Adi Report
No ratings yet
Adi Report
24 pages
Qspider Python Note
100% (1)
Qspider Python Note
90 pages
Mahatma Jyotiba Phule Rohilkhand University, Bareilly: Dr. Iram Naim
No ratings yet
Mahatma Jyotiba Phule Rohilkhand University, Bareilly: Dr. Iram Naim
18 pages
OceanofPDF - Com Financial Times USA - 13 November 2023 - Financial Times USA
No ratings yet
OceanofPDF - Com Financial Times USA - 13 November 2023 - Financial Times USA
18 pages
Mayank Seminar
No ratings yet
Mayank Seminar
16 pages
Industrial Training PPT (Abhay)
No ratings yet
Industrial Training PPT (Abhay)
16 pages
Adobe Scan Dec 01, 2023
No ratings yet
Adobe Scan Dec 01, 2023
12 pages
College Notes
No ratings yet
College Notes
9 pages
Adaptive Machine Learning-Driven Intrusion Detection System
No ratings yet
Adaptive Machine Learning-Driven Intrusion Detection System
57 pages
Indian Airline Ticket Price Analysis
No ratings yet
Indian Airline Ticket Price Analysis
60 pages
A Survey of Ensemble Learning Concepts Algorithms Applications and Prospects
No ratings yet
A Survey of Ensemble Learning Concepts Algorithms Applications and Prospects
22 pages
Bike Sharing Demand Prediction
No ratings yet
Bike Sharing Demand Prediction
12 pages
AI Tools
No ratings yet
AI Tools
16 pages
Indo-Aryan Dialect Identification Using Deep Learning Ensemble Model
No ratings yet
Indo-Aryan Dialect Identification Using Deep Learning Ensemble Model
11 pages
Ensemble Learning: Martin Sewell
No ratings yet
Ensemble Learning: Martin Sewell
16 pages
Internship Final Report NITT
No ratings yet
Internship Final Report NITT
32 pages
51 Machine Learning Interview Questions With Answers - Springboard
100% (1)
51 Machine Learning Interview Questions With Answers - Springboard
20 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
8 pages
ML Ch-3 Decision Trees and Ensemble Methods
No ratings yet
ML Ch-3 Decision Trees and Ensemble Methods
14 pages
Optimizing Fraud Detection in Financial Transactions With
No ratings yet
Optimizing Fraud Detection in Financial Transactions With
18 pages
AdaBoost Interview Prep Guide
No ratings yet
AdaBoost Interview Prep Guide
6 pages
B.Tech Heart Disease Prediction Project
No ratings yet
B.Tech Heart Disease Prediction Project
33 pages
Chapter 3 - Boosting Theory
No ratings yet
Chapter 3 - Boosting Theory
7 pages
Deep Learning UNIT-1
No ratings yet
Deep Learning UNIT-1
47 pages
Multimodal ML Approach
No ratings yet
Multimodal ML Approach
16 pages
16th ICCCNT 2025 Paper 3002
No ratings yet
16th ICCCNT 2025 Paper 3002
6 pages
ML in Network Intrusion Detection
No ratings yet
ML in Network Intrusion Detection
17 pages
CH 7 - Ensemble Learning and Random Forests
No ratings yet
CH 7 - Ensemble Learning and Random Forests
78 pages
Module 3 - 3
No ratings yet
Module 3 - 3
93 pages
BML Answer Key
No ratings yet
BML Answer Key
21 pages
Concept Drift: Definitions and Solutions
No ratings yet
Concept Drift: Definitions and Solutions
7 pages
Minor Project Synopsis - Dog Breed Identification
No ratings yet
Minor Project Synopsis - Dog Breed Identification
43 pages
Credit Card Fraud Detection ML
No ratings yet
Credit Card Fraud Detection ML
100 pages
A Human Activity Recognition Method Based On Lightweight Feature Extraction Combined With Pruned and Quantized CNN For Wearable Device
No ratings yet
A Human Activity Recognition Method Based On Lightweight Feature Extraction Combined With Pruned and Quantized CNN For Wearable Device
14 pages
Fraud Detection in Fintech Leveraging Machine Lear
No ratings yet
Fraud Detection in Fintech Leveraging Machine Lear
23 pages
ML Parallelization1
No ratings yet
ML Parallelization1
14 pages
Evaluating Machine Learning Models
100% (2)
Evaluating Machine Learning Models
10 pages

Email

Uploaded by

Email

Uploaded by

Mahatma Jyotiba Phule Rohilkhand

Department of Computer Science and Information Technology

UNDER THE GUIDANCE OF –

Aditya Gupta (21CS23)

Ramanand Kumar Gupt(21CS28)

Akshay Pratap Singh(21CS30)

Nowadays communication plays a major role in everything be it professional

The primary objectives of email spam detection projects revolve around

The objectives of email spam detection projects are multifaceted,

2.2 Related work

M. RAZA, N. D. Jayasinghe, and M. M. A. Muslam have analyzed various techniques

S. Gadde, A. Lakshmanarao, and S. Satyanarayana in their paper on spam detection

H. Karamollaoglu, İ. A. Dogru, and M. Dorterler performed spam classification on

M. Gupta, A. Bakliwal, S. Agarwal, and P. Mehndiratta studied classical machine learning

T. Toma, S. Hassan, and M. Arifuzzaman studied various types of naïve Bayes

F. Hossain, M. N. Uddin, and R. K. Halder in their study concluded that

Exploratory Data Analysis

A combination of algorithms are used for the classifications are as follows..

Naïve Bayes Classifier

Random Forest Classifier

Random Forest classifier is a supervised ensemble algorithm. A random forest consists

Support Vector Machines (SVM)

Gradient Boosting Decision Trees (GBDT)

NLTK (Natural Language Toolkit)

Fig.6. Reset train and test index

Fig.10. Ham and spam probability

• plot the histogram graph

• Exploratory data analysis (EDA)

Accuracy is a metric that measures how often a Machine learning model

Precision is one indicator of a machine learning models

6.2 Future work

[1] S. H. a. M. A. T. Toma, "An Analysis of Supervised Machine Learning Algorithms

[2]S. Nandhini and J. Marseline K.S., "Performance Evaluation of Machine Learning

[4] V. B. a. B. K. P. Sethi, "SMS spam detection and comparison of various machine

[5] G. D. a. A. R. P. Navaney, "SMS Spam Filtering Using Supervised Machine Learning

[7] S. S. a. N. N. Kumar, "Email Spam Detection Using Machine Learning Algorithms,"

[8] R. Madan, "[Link]," [Online]. Available: [Link]

[9] N. D. J. a. M. M. A. M. M. RAZA, "A Comprehensive Review on Email Spam

[10] A. B. S. A. a. P. M. M. Gupta, "A Comparative Study of Spam SMS Detection

You might also like