0% found this document useful (0 votes)
100 views109 pages

Fake Job Prediction with ML Algorithms

Project
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views109 pages

Fake Job Prediction with ML Algorithms

Project
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

A

Major Project Report


On
FAKE JOB PREDICTION USING MACHINE LEARNING ALGORITHMS

Submitted in partial fulfilment of the


Requirement for the award of the degree of

BACHELOR OF TECHNOLOGY

IN

COMPUTER SCIENCE AND MACHINE LEARNING (AI & ML)

BY
G. NANDINI 21PT5A6602

Under the guidance of


Dr. N.V. RAMANA REDDY
M. Tech, Ph. D
Associate Professor

Department of Computer Science and Machine Learning (AI & ML)


AVANTHI’S SCIENTIFIC TECHNOLOGICAL & RESEARCH
ACADEMY
(Affiliated to JNTUH Approved by AICTE, Recognized by Govt of T.S)
Gunthapally(V), Abdullapurmet(M), R.R District-501512
2023-2024
AVANTHI’S SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY
(Affiliated to JNTUH Approved by AICTE, Recognized by Govt of T.S)
Gunthapally(V), Abdullapurmet(M), R.R District-501512

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING (AI & ML)

CERTIFICATE

This is to certify that the major project entitled “FAKE JOB PREDICTION USING MACHINE
LEARNING ALGORITHMS” is being submitted by G. NANDINI (21PT5A6602), in partial
fulfilment of the requirement for the award of the degree of B. Tech in Computer Science and
Engineering (AI &ML), Avanthi’s Scientific Technological and Research Academy, Hyderabad,
is a record of bonafide work carried out by them under my guidance. The result presented in this major
project work have been verified and are found to be satisfactory. The result embodied in this project
work have not been submitted to any other University for the award of any other degree.

Dr. N.V RAMANA REDDY Dr. N.V. RAMANA REDDY


MCA, M.Tech, MBA, Ph.D MCA, M.Tech, MBA, Ph.D
Internal Guide Associate Professor
Associate Professor HEAD OF THE DEPARTMENT
Department of CSE Department of CSE

External Examiner Dr. G. RAMACHANDRA REDDY


B.Tech, M.Tech,Ph.D. MISTE, FIE
PRINCIPAL

COMMITTED TO EXCELLENCE IN TECHNICAL EDUCATION


DECLARATION

I here by declare that the results embodied in this dissertation entitled “FAKE JOB
PREDICTION USING MACHINE LEARNING ALGORITHMS” is carried out
by me during the year 2023-2024 in partial fulfilment of the award of B. Tech,
Computer Science and Engineering (AI & ML) from Avanthi’s Scientific
Technological and Research Academy. I have not submitted the same to any other
university or organization for the award of other degree.

Signature of the Student

G. NANDINI 21PT5A6602

COMMITTED TO EXCELLENCE IN TECHNICAL EDUCATION


ACKNOWLEDGEMENT

This is an acknowledge of the intensive drive and technical competence of many individuals who
have contributed to the success of our major project work.

We are grateful to chairman, Avanthi Group of Institutions, Sri. M. SRINIVASA RAO for
granting us the permission for undergoing the practical training through development of this project
in college.

Our sincere thanks to the Principal, Dr. G. RAMACHANDRA REDDY, Avanthi’s Scientific
Technological and Research Academy and to all the faculty members.

We would like to express our gratitude to head of the department Dr. N. V. RAMANA REDDY,
C.S.E-HOD, Associate Professor, for his valuable suggestions during the course of our major project
work.

We are immensely thankful to our B.Tech Project Co-ordinator S. RAJENDER, Assistant Professor
for Department of Computer Science and Engineering for this work, which helped us in
completing this major project successfully.

We are immensely thankful to our internal guide Dr. N. V. RAMANA REDDY, Associate
Professor, Department of CSE, for his valuable guidance and suggestion in each and every stage
of this work, which helped us in completing this major project successfully.

We are thankful to one and all, who are co-operated with us to complete our major project work
successfully.

G. NANDINI 21PT5A6602
ABSTRACT

FAKE JOB PREDICTION USING MACHINE LEARNING

ALGORITHMS

To avoid fraudulent post for job in the internet, an automated tool using machine learning based
classification techniques is proposed in the paper. Different classifiers are used for checking fraudulent
post in the web and the results of those classifiers are compared for identifying the best employment
scam detection model. It helps in detecting fake job posts from an enormous number of posts. Two
major types of classifiers, such as single classifier and ensemble classifiers are considered for
fraudulent job posts detection. However, experimental results indicate that ensemble classifiers are the
best classification to detect scams over the single classifiers. Naive Bayes is a statistical classification
method based on Bayes Theorem that assumes the impact of a specific feature on a class is unrelated
to the impact of other features. It is a quick, accurate, and dependable approach that performs well on
large datasets. On the other hand, SGD Classifier is an effective method for fitting linear classifiers and
regressors under convex loss functions like Support Vector Machines and Logistic Regression. It has
gained significant attention in large-scale learning due to its ability to handle text categorization and
natural language processing issues with ease.
INDEX
CHAPTER CONTENTS Page No
o CERTIFICATES
o ACKNOWLEDGEMENT
o DECLARATION
o ABSTRACT
1. INTRODUCTION
1.1 Problem Statement 3
1.2 Purpose 4
1.3 Scope 5
6
1.4 Objectives
2. LITERATURE SURVEY 8

3. SYSTEM ANALYSIS
3.1 Existing System 18
3.2 Disadvantages of Existing System 19
3.3 Proposed System 15
3.4 Advantages of Proposed System 16

4. SYSTEM REQUIREMENTS
4.1 Functional Requirements 18
4.2 Non-Functional Requirements 19
4.2.1 Hardware Requirements 22
4.2.2 Software Requirements 22

5. SOFTWARE ENVIRONMENT
5.1 Python 23

6. SYSTEM ARCHITECTURE
6.1 System Architecture 36
36
6.2 UML Diagrams
37
6.2.1 Use Case Diagram 39
6.2.2 Class Diagram 40
6.2.3 Sequence Diagram 42
6.2.4 Activity Diagram 44
6.2.5 Deployment Diagram 46
6.2.6 Data Flow Diagram 46
7. MODULES
7.1Modules 47
7.2Description of modules 48

8. IMPLEMENTATION
8.1Source Code 50

9. SCREEN SHOTS 62

10. SYSTEM TESTING


66
10.1 Results/Discussion
10.1.1 Functional Testing 67
10.1.1.1 Unit Testing 67
10.1.1.2 Integration Testing 67
10.1.1.3 System Testing 68
10.1.1.4 Acceptance Testing 70
10.1.2 Non-Functional Testing 71
10.1.2.1 Security Testing 71
10.1.2.2 Performance Testing 72
10.1.2.3 Usability Testing 73

11. CONCLUSION
11.1 Conclusion 75
76
11.2 Future Scope

12. REFERENCES/BIBILOGRAPHY 78-80

13. APPENDICES
Appendix-A
Appendix-B

14. PROGRAM OUTCOMES

15. PO ATTAINMENT
CHAPTER 1
INTRODUCTION
FAKE JOB PREDICTION USING MACHINE 1. INTRODUCTION
LEARNING ALGORITHMS

1.INTRODUCTION

Employment scam is one of the serious issues in recent times addressed in the domain
of Online Recruitment Frauds (ORF). In recent days, many companies prefer to post
their vacancies online so that these can be accessed easily and timely by the job-seekers.
However, this intention may be one type of scam by the fraud people because they offer
employment to job-seekers in terms of taking money from them.

Fraudulent job advertisements can be posted against a reputed company for violating
their credibility. These fraudulent job post detection draws a good attention for
obtaining an automated tool for identifying fake jobs and reporting them to people for
avoiding application for such jobs.

For this purpose, machine learning approach is applied which employs several
classification algorithms for recognizing fake posts. In this case, a classification tool
isolates fake job posts from a larger set of job advertisements and alerts the user. To
address the problem of identifying scams on job posting, supervised learning algorithm
as classification techniques are considered initially.

A classifier maps input variable to target classes by considering training data.


Classifiers addressed in the paper for identifying fake job posts from the others are
described briefly. These classifiers based prediction may be broadly categorized into -
Single Classifier based Prediction and Ensemble Classifiers based Prediction.

A. Single Classifier based Prediction Classifiers are trained for predicting the
unknown test cases. The following classifiers are used while detecting fake job posts

a) Naive Bayes Classifier-

The Naive Bayes classifier is a supervised classification tool that exploits the concept
of Bayes Theorem of Conditional Probability. The decision made by this classifier is
quite effective in practice even if its probability estimates are inaccurate. This classifier
obtains a very promising result in the following scenario- when the features are

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 1


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 1. INTRODUCTION
LEARNING ALGORITHMS

independent or features are completely functionally dependent. The accuracy of this


classifier is not related to feature dependencies rather than it is the amount of
information loss of the class due to the independence assumption is needed to predict
the accuracy.

b) Multi-Layer Perceptron Classifier -

Multi-layer perceptron can be used as supervised classification tool by incorporating


optimized training parameters. For a given problem, the number of hidden layers in a
multilayer perceptron and the number of nodes in each layer can differ. The decision of
choosing the parameters depends on the training data and the network architecture.

c) K-nearest Neighbor Classifier-

K-Nearest Neighbour Classifiers, often known as lazy learners, identifies objects based
on closest proximity of training examples in the feature space. The classifier considers
k number of objects as the nearest object

while determining the class. The main challenge of this classification technique relies
on choosing the appropriate value of k.

d) Decision Tree Classifier-

A Decision Tree (DT) is a classifier that exemplifies the use of tree-like structure. It
gains knowledge on classification. Each target class is denoted as a leaf node of DT and
non-leaf nodes of DT are used as a decision node that indicates certain test. The
outcomes of those tests are identified by either of the branches of that decision node.
Starting from the beginning at the root this tree are going through it until a leaf node is
reached. It is the way of obtaining classification result from a decision tree. Decision
tree learning is an approach that has been applied to spam filtering. This can be useful
for forecasting the goal based on some criterion by implementing and training this
model.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 2


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 1. INTRODUCTION
LEARNING ALGORITHMS

1.1 Problem Statement

1. The problem statement involves predicting fake job listings using machine learning
algorithms. This is an essential task in the era of increasing online job scams and
fraudulent activities. Machine learning algorithms can be employed to analyze
patterns and identify anomalies in large datasets, making them suitable for this
application.

2. First, let’s discuss the data collection process. Data for this problem can be obtained
from various sources such as job listing websites, social media platforms, and
government databases. The data should include features like job title, company
name, location, salary range, description, and other relevant information. A
significant portion of the dataset should consist of genuine job listings to serve as a
baseline for comparison.

3. Next, we need to preprocess the data by cleaning it and transforming it into a format
suitable for machine learning models. This may involve removing irrelevant
features, handling missing values, and encoding categorical variables.

4. Once the data is preprocessed, we can apply various machine learning algorithms
to build our predictive model. Some popular choices include Naive Bayes
Classifier, Support Vector Machines (SVM), Decision Trees, Random Forests, and
Neural Networks. These algorithms can be trained on the labeled dataset to learn
patterns that distinguish fake job listings from genuine ones based on their features.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 3


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 1. INTRODUCTION
LEARNING ALGORITHMS

1.2 Purpose

1. Fake job prediction using machine learning algorithms serves the critical purpose
of detecting and preventing fraudulent activities in the recruitment process. By
leveraging advanced technologies like natural language processing
(NLP) and supervised learning algorithms, organizations can identify false job
postings and alert job seekers to avoid falling victim to scams.

2. Detection of Fraudulent Job Postings One of the primary objectives of utilizing


machine learning algorithms for fake job prediction is to distinguish between
legitimate job openings and fraudulent listings. Scammers often exploit online
platforms to post fake job advertisements with the intention of deceiving
unsuspecting applicants. By developing sophisticated classifiers that can analyze
various features of job postings, such as text content, location, profile details, and
benefits, these algorithms can effectively flag suspicious listings.

3. Enhancing Information Security Another key purpose of fake job prediction


through machine learning is to safeguard the integrity of information shared by job
seekers during the application process. Fraudulent job postings may request
sensitive personal data or financial information from applicants, putting them at risk
of identity theft or financial fraud. By accurately identifying and filtering out fake
job ads, these algorithms contribute to maintaining information security in the
recruitment ecosystem.

4. Mitigating Economic Impact The prevalence of fake job postings not only
harms individual job seekers but also has broader economic implications. By
leveraging machine learning algorithms to predict and prevent employment fraud,
organizations can help reduce instances of financial loss, unemployment due to
deceptive practices, and economic stress caused by falling victim to scams. This

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 4


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 1. INTRODUCTION
LEARNING ALGORITHMS

proactive approach can contribute to a more secure and trustworthy job market
environment.

5. Empowering Job Seekers Ultimately, the purpose of fake job prediction using
machine learning algorithms is to empower job seekers with reliable tools and
insights that enable them to make informed decisions when applying for positions
online. By leveraging technology to identify false advertising and untrustworthy
employers, these algorithms play a crucial role in protecting individuals from
falling prey to fraudulent schemes while seeking employment opportunities.

1.3 Scope

1. Machine learning algorithms have shown significant promise in predicting job


outcomes and career paths for individuals based on various data points such as
academic performance, extracurricular activities, internships, and other relevant
factors. The scope for job prediction using machine learning algorithms is vast and
holds immense potential in revolutionizing the way career counseling and guidance
are provided to students and professionals alike.

2. 1. Data Collection and Analysis: Machine learning algorithms can be utilized to


collect, analyze, and interpret large volumes of data related to individuals’ academic
achievements, work experiences, skills, interests, and preferences. By leveraging
this data, predictive models can be developed to forecast potential job opportunities
and career trajectories for individuals.

3. 2. Predictive Modeling: Through the application of machine learning techniques


such as regression analysis, decision trees, neural networks, and natural language
processing, predictive models can be built to identify patterns and trends in the data
that correlate with successful employment outcomes. These models can then be

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 5


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 1. INTRODUCTION
LEARNING ALGORITHMS

used to predict the likelihood of securing a job offer or advancing in a particular


career path.

4. 3. Personalized Career Guidance: By incorporating individual-specific data into


the machine learning models, personalized career guidance can be offered to
students and job seekers. These models can provide tailored recommendations on
academic programs, skill development initiatives, networking opportunities, and
job search strategies based on an individual’s unique profile.

5. 4. Continuous Learning and Improvement: One of the key advantages of using


machine learning algorithms for job prediction is their ability to continuously learn
from new data inputs and improve their predictive accuracy over time. By updating
the models with real-time information on market trends, industry demands, and
emerging job roles, they can adapt to changing dynamics in the job market.

6. 5. Ethical Considerations: While the scope for job prediction using machine
learning algorithms is extensive, it is essential to address ethical considerations such
as data privacy, bias mitigation, transparency in decision-making processes, and
ensuring fair treatment of individuals from diverse backgrounds. Implementing
ethical guidelines in the development and deployment of these predictive models is
crucial to maintaining trust and credibility.

1.4 Objectves

1. Machine learning plays a crucial role in predicting employment outcomes by


utilizing advanced algorithms to analyze large datasets and extract meaningful
patterns that can forecast future job opportunities for individuals. In the study
conducted at Ohio University mentioned in the context, machine learning models
were able to predict job offers with an accuracy rate of 87% based on factors such
as GPA, co-curricular activities, and internships. By leveraging techniques like
linear regression and random forest decision trees, these models can identify

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 6


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 1. INTRODUCTION
LEARNING ALGORITHMS

relationships between different variables and make predictions about students’


career paths.

2. Approaches in Machine Learning for Job Prediction

3. Linear Regression: This algorithm is used to establish a relationship between


independent variables (such as GPA, activities, internships) and a dependent
variable (job offer prediction). It assumes a linear relationship between the input
features and the output prediction.

4. Random Forest Regression: Unlike linear regression, random forest regression


utilizes decision trees to identify complex patterns within the data. By running
multiple decision trees simultaneously, this approach can capture nonlinear
relationships and provide more accurate predictions.

5. Data Preparation: Before applying machine learning algorithms for job


prediction, it is essential to clean and preprocess the data effectively. This involves
compiling relevant information such as GPA, activities, internships, and other
factors into a structured dataset that can be fed into the models for analysis.

6. Model Training: Once the data is prepared, machine learning models need to be
trained using historical datasets that contain information about past employment
outcomes. By running algorithms like random forest regression on this training
data, the models learn to detect patterns and make predictions based on new input.

7. Outcome Prediction: After training the models, they can be used to predict future
job outcomes for students based on their academic performance and extracurricular
involvement. These predictions can help career centers provide targeted support and
guidance to individuals seeking employment opportunities.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 7


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
CHAPTER 2
LITERATURE SURVEY
FAKE JOB PREDICTION USING MACHINE 2. LITERATURE SURVEY
LEARNING ALGORITHMS

2. LITERATURE SURVEY

TITLE: “An Intelligent Model for Online Recruitment Fraud Detection,”

ABSTRACT: This study research attempts to prohibit privacy and loss of money for
individuals and organization by creating a reliable model which can detect the fraud
exposure in the online recruitment environments. This research presents a major
contribution represented in a reliable detection model using ensemble approach based
on Random forest classifier to detect Online Recruitment Fraud (ORF). The detection
of Online Recruitment Fraud is characterized by other types of electronic fraud
detection by its modern and the scarcity of studies on this concept. The researcher
proposed the detection model to achieve the objectives of this study. For feature
selection, support vector machine method is used and for classification and detection,
ensemble classifier using Random Forest is employed. A freely available dataset called
Employment Scam Aegean Dataset (EMSCAD) is used to apply the model. Pre-
processing step had been applied before the selection and classification adoptions. The
results showed an obtained accuracy of 97.41%. Further, the findings presented the
main features and important factors in detection purpose include having a company
profile feature, having a company logo feature and an industry feature.

TITLE: An Empirical Study of the Naïve Bayes Classifier An empirical study of the
naive Bayes classifier,

ABSTRACT: The naive Bayes classifier greatly simplify learn-ing by assuming that
features are independent given class. Although independence is generally a poor
assumption, in practice naive Bayes often competes well with more sophisticated
classifiers. Our broad goal is to understand the data character-istics which affect the
performance of naive Bayes. Our approach uses Monte Carlo simulations that al-low a
systematic study of classification accuracy for several classes of randomly generated
prob-lems. We analyze the impact of the distribution entropy on the classification error,
showing that low-entropy feature distributions yield good per-formance of naive Bayes.
We also demonstrate that naive Bayes works well for certain nearly-functional feature

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 8


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 2. LITERATURE SURVEY
LEARNING ALGORITHMS

dependencies, thus reaching its best performance in two opposite cases: completely
independent features (as expected) and function-ally dependent features (which is
surprising). An-other surprising result is that the accuracy of naive Bayes is not directly
correlated with the degree of feature dependencies measured as the class- conditional
mutual information between the fea-tures. Instead, a better predictor of naive Bayes ac-
curacy is the amount of information about the class that is lost because of the
independence assump-tion.

TITLE: Bayes’s Theorem and the Analysis of Binomial Random Variables,

ABSTRACT: A very practical application of Bayes's theorem, for the analysis of


binomial random variables, is presented. Previous papers (Walters, 1985; Walters,
1986a) have already demonstrated the reliability of the technique for one, or two
random variables, and the extension of the approach to several random variables is
described. Two biometrical examples are used to illustrate the method.

TITLE: Multilayer perceptrons for classification and regression,


ABSTRACT: We review the theory and practice of the multilayer perceptron. We aim
at addressing a range of issues which are important from the point of view of applying
this approach to practical problems. A number of examples are given, illustrating how
the multilayer perceptron compares to alternative, conventional approaches. The
application fields of classification and regression are especially considered. Questions
of implementation, i.e. of multilayer perceptron architecture, dynamics, and related
aspects, are discussed. Recent studies, which are particularly relevant to the areas of
discriminant analysis, and function mapping, are cited.

TITLE: K -Nearest Neighbour Classifiers,

ABSTRACT: We analyze a Relational Neighbor (RN) classifier, a simple relational


predictive model that predicts only based on class labels of related neighbors, using no
learning and no inherent attributes. We show that it performs surprisingly well by

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 9


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 2. LITERATURE SURVEY
LEARNING ALGORITHMS

comparing it to more complex models such as Probabilistic Relational Models and


Relational Probability Trees on three data sets from published work.

TITLE: A Survey on Decision Tree Algorithms of Classification in Data Mining,

ABSTRACT: As the computer technology and computer network technology are


developing, the amount of data in information industry is getting higher and higher.

It is necessary to analyze this large amount of data and extract useful knowledge from
it. Process of extracting the useful knowledge from huge set of incomplete, noisy, fuzzy
and random data is called data mining. Decision tree classification technique is one of
the most popular data mining techniques. In decision tree divide and conquer technique
is used as basic learning strategy. A decision tree is a structure that includes a root node,
branches, and leaf nodes. Each internal node denotes a test on an attribute, each branch
denotes the outcome of a test, and each leaf node holds a class label. The topmost node
in the tree is the root node. This paper focus on the various algorithms of Decision tree
(ID3, C4.5, CART), their characteristic, challenges, advantage and disadvantage.

TITLE: “Machine learning for email spam filtering: review, approaches and open
research problems,

ABSTRACT: The upsurge in the volume of unwanted emails called spam has created
an intense need for the development of more dependable and robust antispam filters.
Machine learning methods of recent are being used to successfully detect and filter
spam emails. We present a systematic review of some of the popular machine learning
based email spam filtering approaches. Our review covers survey of the important
concepts, attempts, efficiency, and the research trend in spam filtering. The preliminary
discussion in the study background examines the applications of machine learning
techniques to the email spam filtering process of the leading internet service providers
(ISPs) like Gmail, Yahoo and Outlook emails spam filters. Discussion on general email
spam filtering process, and the various efforts by different researchers in combating
spam through the use machine learning techniques was done. Our review compares the
strengths and drawbacks of existing machine learning approaches and the open research

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 10


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 2. LITERATURE SURVEY
LEARNING ALGORITHMS

problems in spam filtering. We recommended deep leaning and deep adversarial


learning as the future techniques that can effectively handle the menace of spam emails.

TITLE: ST4_Method_Random_Forest,

ABSTRACT: Several machine-learning algorithms have been proposed for remote


sensing image classification during the past two decades. Among these machine
learning algorithms, Random Forest (RF) and Support Vector Machines (SVM) have
drawn attention to image classification in several remote sensing applications. This
paper reviews RF and SVM concepts relevant to remote sensing image classification
and applies a meta-analysis of 251 peer-reviewed journal papers. A database with more
than 40 quantitative and qualitative fields was constructed from these reviewed papers.
The meta-analysis mainly focuses on:

(1) the analysis regarding the general characteristics of the studies, such as geographical
distribution, frequency of the papers considering time, journals, application domains,
and remote sensing software packages used in the case studies,

(2) a comparative analysis regarding the performances of RF and SVM classification


against various parameters, such as data type, RS applications, spatial resolution, and
the number of extracted features in the feature engineering step.

The challenges, recommendations, and potential directions for future research are also
discussed in detail. Moreover, a summary of the results is provided to aid researchers
to customize their efforts in order to achieve the most accurate results based on their
thematic applications.

TITLE: Bagging classifiers for fighting poisoning attacks in adversarial classification


tasks,

ABSTRACT: Pattern recognition systems have been widely used in adversarial


classification tasks like spam filtering and intrusion detection in computer networks. In
these applications a malicious adversary may successfully mislead a classifier by
“poisoning” its training data with carefully designed attacks. Bagging is a well-known

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 11


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 2. LITERATURE SURVEY
LEARNING ALGORITHMS

ensemble construction method, where each classifier in the ensemble is trained on a


different bootstrap replicate of the training set. Recent work has shown that bagging
can reduce the influence of outliers in training data, especially if the most outlying
observations are resampled with a lower probability. In this work we argue that
poisoning attacks can be viewed as a particular category of outliers, and, thus, bagging
ensembles may be effectively exploited against them. We experimentally assess the
effectiveness of bagging on a real, widely used spam filter, and on a web-based
intrusion detection system. Our preliminary results suggest that bagging ensembles can
be a very promising defence strategy against poisoning attacks, and give us valuable
insights for future research work.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 12


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
CHAPTER 3
SYSTEM ANALYSIS
FAKE JOB PREDICTION USING MACHINE 3. SYSTEM ANALYSIS
LEARNING ALGORITHMS

3. SYSTEM ANALYSIS

In the realm of cybersecurity and fraud detection, the use of machine learning
algorithms has gained significant traction in predicting and preventing fake job
postings. By analyzing various features and patterns within job listings, machine
learning models can be trained to distinguish between legitimate and fraudulent job
advertisements.

3.1 Existing system

According to several studies, Review spam detection, Email Spam detection,


Fake news detection have drawn special attention in the domain of Online Fraud
Detection.

A. Review Spam Detection

People often post their reviews online forum regarding the products they purchase. It
may guide other purchaser while choosing their products. In this context, spammers can
manipulate reviews for gaining profit and hence it is required to develop techniques
that detects these spam reviews. This can be implemented by extracting features from
the reviews by extracting features using Natural Language Processing (NLP). Next,
machine learning techniques are applied on these features. Lexicon based approaches
may be one alternative to machine learning techniques that uses dictionary or corpus to
eliminate spam reviews.

B. Email Spam Detection

Unwanted bulk mails, belong to the category of spam emails, often arrive to user
mailbox. This may lead to unavoidable storage crisis as well as bandwidth
consumption. To eradicate this problem, Gmail, Yahoo mail and Outlook service
providers incorporate spam filters using Neural Networks. While addressing the
problem of email spam detection, content based filtering, case based filtering, heuristic

AVANTHI SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 13


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 3. SYSTEM ANALYSIS
LEARNING ALGORITHMS

based filtering, memory or instance based filtering, adaptive spam filtering approaches
are taken into consideration.

C. Fake News Detection

Fake news in social media characterizes malicious user accounts, echo chamber effects.
The fundamental study of fake news detection relies on three perspectives- how fake
news is written, how fake news spreads, how a user is related to fake news. Features
related to news content and social context are extracted and a machine learning models
are imposed to recognize fake news.

3.2 Disadvantages of Existing System

1. Dominant classes: Machine learning algorithms, especially Naive Bayes and


SGD Classifier, may perform well on the majority class (real jobs) due to their
high prior probabilities. This can lead to poor performance in identifying
fraudulent jobs, which form a smaller proportion of the dataset.

2. Unbalanced dataset: The job posting dataset is highly unbalanced, with 9868
real jobs and only 725 fraudulent jobs. This imbalance can negatively impact
the performance of machine learning models, as they may not be able to
effectively learn patterns from the minority class (fraudulent jobs).

3. Extra monitoring and entry-level jobs: Fake job postings often target entry-
level positions or require extra monitoring activities, making it challenging for
machine learning models to accurately identify these types of fraudulent
postings. Younger individuals are also more susceptible to falling victim to
these scams.

4. Text data pre-processing: Pre-processing textual data for machine learning


models can be time-consuming and resource-intensive. Techniques such as
stopword removal, lemmatization, and character count normalization are
necessary but add complexity to the model development process.

AVANTHI SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 14


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 3. SYSTEM ANALYSIS
LEARNING ALGORITHMS

5. SMOTE synthetic minority class samples: To address the imbalance issue


in the dataset, techniques like SMOTE (Synthetic Minority Over-sampling
Technique) can be used to generate synthetic minority class samples. However,
this approach may introduce noise into the dataset and could potentially worsen
model performance if not implemented carefully.

6. Disadvantages of IBM LinkedIn or Data client leader: No specific


disadvantages related to IBM LinkedIn or Data client leader were mentioned in
the context provided for using machine learning algorithms for fake job
prediction. However, it’s important to note that relying solely on a single
platform or data source for job posting data could limit the scope and accuracy
of predictions made by machine learning models.

3.3 Proposed System

The target of this study is to detect whether a job post is fraudulent or not. Identifying
and eliminating these fake job advertisements will help the job seekers to concentrate
on legitimate job posts only. In this context, a dataset from Kaggle is employed that
provides information regarding a job that may or may not be suspicious.

A. Implementation of Classifiers

In this framework classifiers are trained using appropriate parameters. For maximizing
the performance of these models, default parameters may not be sufficient enough.
Adjustment of these parameters enhances the reliability of this model which may be
regarded as the optimised one for identifying as well as isolating the fake job posts from
the job seekers.

B. Performance Evaluation Metrics

While evaluating performance skill of a model, it is necessary to employ some metrics


to justify the evaluation. For this purpose, following metrics are taken into
consideration in order to identify the best relevant problem- solving approach. Accuracy
is a metric that identifies the ratio of true predictions over the total number of instances

AVANTHI SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 15


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 3. SYSTEM ANALYSIS
LEARNING ALGORITHMS

considered. However, the accuracy may not be enough metric for evaluating model‘s
performance since it does not consider wrong predicted cases. If a fake post is treated
as a true one, it creates a significant problem. Hence, it is necessary to consider false
positive and false negative cases that compensate to misclassification. For measuring
this compensation, precision and recall is quite necessary to be considered.

3.4 Advantages of Proposed System

Machine learning algorithms offer several advantages when it comes to


predicting fake job postings. Here are some key benefits:

1. Improved Accuracy: Machine learning algorithms can analyze large amounts


of data to identify patterns and anomalies that may indicate a fake job posting.
By leveraging historical data and training models on labeled datasets, these
algorithms can achieve high levels of accuracy in detecting fraudulent job
listings.

2. Efficiency: Machine learning algorithms can automate the process of


screening job postings for authenticity, saving time and resources for
organizations. This efficiency allows for real-time monitoring of job listings
and quick identification of suspicious activities.

3. Scalability: Machine learning models can scale to handle large volumes of job
postings, making them suitable for platforms with a high frequency of new
listings. This scalability ensures that all incoming job postings are screened
effectively without overwhelming human resources.

4. Adaptability: Machine learning algorithms can adapt to new trends and


techniques used by scammers to create fake job postings. By continuously
learning from new data and adjusting their detection methods, these
algorithms can stay ahead of evolving fraudulent practices.

AVANTHI SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 16


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 3. SYSTEM ANALYSIS
LEARNING ALGORITHMS

5. Reduced False Positives: Through continuous learning and optimization,


machine learning algorithms can minimize false positives, ensuring that
legitimate job postings are not mistakenly flagged as fraudulent. This helps
maintain a positive user experience for both job seekers and employers.

6. Customization: Organizations can customize machine learning models to suit


their specific needs and preferences when it comes to detecting fake job
postings. This flexibility allows for the fine-tuning of algorithms based on the
organization’s unique requirements and risk tolerance levels.

7. Cost-Effectiveness: While initial setup and training may require investment,


in the long run, using machine learning algorithms for fake job prediction can
be cost-effective due to the reduction in manual effort, increased accuracy, and
prevention of potential financial losses associated with fraudulent activities.

AVANTHI SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 17


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
CHAPTER 4
SYSTEM REQUIREMENTS
FAKE JOB PREDICTION USING MACHINE 4. SYSTEM REQUIREMENTS
LEARNING ALGORITHMS

4. SYSTEM REQUIREMENTS

4.1 Functional Requirements

Fake job listings have become a significant issue in the job market, leading to increased interest
in using machine learning algorithms to predict and prevent such fraudulent activities.
Functional requirements for building a system that can accurately predict fake job listings using
machine learning algorithms can be outlined as follows:

1. Data Collection: The first requirement is to gather a large and diverse


dataset of job listings, both real and fake. This data can be collected from
various sources such as popular job boards, company websites, social media
platforms, and even dark web forums. It is essential to ensure the data’s
authenticity and accuracy by cross-referencing it with reliable sources.

2. Data Preprocessing: Once the data has been collected, it needs to be


preprocessed to remove irrelevant information, correct errors, and format it
in a way that can be used by machine learning algorithms. This may include
tasks such as text cleaning, tokenization, stemming, and stopword removal.

3. Feature Extraction: The next step is to extract relevant features from the
preprocessed data that can be used as inputs for machine learning
algorithms. These features may include things like the presence of certain
keywords or phrases that are commonly associated with fake job listings
(e.g., “work from home,” “no experience required,” “instant hire”), the use
of specific email domains or phone numbers, or inconsistencies in the
listing’s formatting or content.

4. Model Selection: Several machine learning algorithms can be used for


predicting fake job listings based on the extracted features. Some popular
choices include Naive Bayes classifiers, Support Vector Machines (SVM),

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 18


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 4. SYSTEM REQUIREMENTS
LEARNING ALGORITHMS

Random Forests, and Neural Networks. The choice of algorithm will depend
on factors such as the size and complexity of the dataset, the desired level
of accuracy, and computational resources available.

5. Model Training: Once a suitable machine learning algorithm has been


selected, it needs to be trained on the preprocessed dataset using labeled
examples (i.e., real vs. fake job listings). The model will learn to identify
patterns and relationships between the input features and output labels (fake
or real) through this training process.

6. Model Evaluation: After training the model, it needs to be evaluated to


determine its accuracy and effectiveness in predicting fake job listings. This
may involve testing it on a separate dataset of unseen examples or using
metrics such as precision, recall, F1 score, or area under the ROC curve
(AUC-ROC). If necessary, adjustments may need to be made to improve the
model’s performance based on these evaluation results.

7. Real-time Processing: To effectively detect fake job listings in real-time


as they are posted online, the system must be able to process new data
quickly and efficiently without significant delay or latency issues. This may
involve optimizing model performance through techniques such as parallel
processing or distributed computing if necessary.

4.3 Non-Functional Requirements

1. Performance:

• Scalability: The system should be able to handle a large volume of data


efficiently as the dataset grows.

• Throughput: The model should be able to process predictions in real-


time to provide timely results.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 19


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 4. SYSTEM REQUIREMENTS
LEARNING ALGORITHMS

• Accuracy: The accuracy of the predictions should meet a predefined


threshold to ensure reliable results.

2. Reliability:

• Availability: The system should be available whenever needed to make


predictions.

• Fault Tolerance: The model should be resilient to failures and errors,


ensuring continuous operation.

• Consistency: The predictions should be consistent across different runs


with the same input data.

3. Security:

• Data Privacy: Ensure that sensitive information used for training and
prediction is protected from unauthorized access.

• Model Protection: Implement measures to prevent tampering or


unauthorized modifications to the model.

• Secure Communication: Ensure secure communication channels


between components of the system.

4. Maintainability:

• Modifiability: The system should be designed in a way that allows for


easy updates and modifications as needed.

• Documentation: Comprehensive documentation should be provided to


facilitate understanding and maintenance of the system.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 20


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 4. SYSTEM REQUIREMENTS
LEARNING ALGORITHMS

• Monitoring and Logging: Implement monitoring tools to track the


performance of the model and log relevant information for
troubleshooting.

5. Usability:

• User Interface: Provide an intuitive interface for users to interact with


the predictive model easily.

• Interpretability: Ensure that the predictions made by the model are


interpretable and understandable by users.

• Training Requirements: Consider the ease of training new users on


how to use and interpret the results of the model.

6. Scalability:

• Resource Utilization: Optimize resource usage to ensure efficient


performance without unnecessary overhead.

• Adaptability: Design the system in a way that allows for easy


adaptation to changing requirements or environments.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 21


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 4. SYSTEM REQUIREMENTS
LEARNING ALGORITHMS

4.2 Hardware Requirements

4.4 Software Requirements

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 22


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
CHAPTER 5
SOFTWARE
ENVIRONMENT
FAKE JOB PREDICTION USING MACHINE 5. SOFTWARE ENVIRONMENT
LEARNING ALGORITHMS

5. SOFTWARE ENVIRONMENT

5.1 PYTHON

What is Python programming language?

Python is a high-level, general-purpose, interpreted programming language.

1) High-level

Python is a high-level programming language that makes it easy to learn. Python


doesn’t require you to understand the details of the computer in order to develop
programs efficiently.

2) General-purpose

Python is a general-purpose language. It means that you can use Python in various
domains including:

• Web applications
• Big data applications
• Testing
• Automation
• Data science, machine learning, and AI
• Desktop software
• Mobile apps

The targeted language like SQL which can be used for querying data from relational
databases.

3) Interpreted Python is an interpreted language. To develop a Python program, you


write Python code into a file called source code.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 23


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 5. SOFTWARE ENVIRONMENT
LEARNING ALGORITHMS

To execute the source code, you need to convert it to the machine language that the
computer can understand. And the Python interpreter turns the source code, line by
line, once at a time, into the machine code when the Python program executes.Compiled
languages like Java and C# use a compiler that compiles the whole source code before
the program executes.

5.2 Why Python

Python increases your productivity. Python allows you to solve complex


problems in less time and fewer lines of code. It’s quick to make a prototype in Python.
Python becomes a solution in many areas across industries, from web applications to
data science and machine learning. Python is quite easy to learn in comparison with
other programming languages. Python syntax is clear and beautiful. Python has a large
ecosystem that includes lots of libraries and frameworks. Python is cross-platform.
Python programs can run on Windows, Linux, and macOS. Python has a huge
community. Whenever you get stuck, you can get help from an active community.

Python developers are in high demand.

5.3 History of Python

• Python was created by Guido Van Rossum.


• The design began in the late 1980s and was first released in February 1991.
5.3.1 Why the name Python?
No. It wasn't named after a dangerous snake. Rossum was fan of a comedy series
from late 70s.
The name "Python" was adopted from the same series "Monty Python's Flying Circus".
5.3.2 Python Version History
Implementation started - December 1989
Internal releases – 1990

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 24


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 5. SOFTWARE ENVIRONMENT
LEARNING ALGORITHMS

FIG 5.1 HISTORY OF PYTHON

5.4Install Python on Windows

First, download the latest version of Python from the download page.

Second, double-click the installer file to launch the setup wizard.

In the setup window, you need to check the Add Python 3.8 to PATH and click Install
Now to begin the installation.

FIG 5.2 Install Python on Windows

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 25


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 5. SOFTWARE ENVIRONMENT
LEARNING ALGORITHMS

It’ll take a few minutes to complete the setup.

Fig 5.2.1 Install Python on Windows

Once the setup completes, you’ll see the following window:

FIG 5.2.2 Install Python on Windows

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 26


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 5. SOFTWARE ENVIRONMENT
LEARNING ALGORITHMS

To verify the installation, you open the Run window and type cmd and press Enter:

FIG 5.3.1 Verify the installation

In the Command Prompt, type python command as follows:

FIG 5.3.1 Verify the installation

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 27


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 5. SOFTWARE ENVIRONMENT
LEARNING ALGORITHMS

If you see the output like the above screenshot, you’ve successfully installed Python on
your computer.

To exit the program, you type Ctrl-Z and press Enter.

If you see the following output from the Command Prompt after typing
the python command:

'python' is not recognized as an internal or external command,


operable program or batch file.

Likely, you didn’t check the Add Python 3.8 to PATH checkbox when you install
Python.

5.4.2 Install Python on macOS

It’s recommended to install Python on macOS using an official installer. Here are the
steps:

• First, download a Python release for macOS.


• Second, run the installer by double-clicking the installer file.
• Third, follow the instruction on the screen and click the Next button until the
installer completes.

5.4.3 Install Python on Linux

Before installing Python 3 on your Linux distribution, you check whether Python 3 was
already installed by running the following command from the terminal:

python3 --version

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 28


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 5. SOFTWARE ENVIRONMENT
LEARNING ALGORITHMS

If you see a response with the version of Python, then your computer already has
Python 3 installed. Otherwise, you can install Python 3 using a package management
system.

For example, you can install Python 3.10 on Ubuntu using apt:

sudo apt install python3.10

To install the newer version, you replace 3.10 with that version.

A quick introduction to the Visual Studio Code

Visual Studio Code is a lightweight source code editor. The Visual Studio Code is often
called VS Code. The VS Code runs on your desktop. It’s available for Windows,
macOS, and Linux.VS Code comes with many features such as IntelliSense, code
editing, and extensions that allow you to edit Python source code effectively. The best
part is that the VS Code is open-source and free.Besides the desktop version, VS Code
also has a browser version that you can use directly in your web browser without
installing it.

Setting up Visual Studio Code

To set up the VS Code, you follow these steps:

First, navigate to the VS Code official website and download the VS code based on
your platform (Windows, macOS, or Linux).

Second, launch the setup wizard and follow the steps.

Once the installation completes, you can launch the VS code application:

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 29


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 5. SOFTWARE ENVIRONMENT
LEARNING ALGORITHMS

FIG 5.4 Visual Studio Code

5.4.4 Install Python Extension

To make the VS Code works with Python, you need to install the Python extension
from the Visual Studio Marketplace.

The following picture illustrates the steps:

FIG 5.5 Install Python Extension

• First, click the Extensions tab.


• Second, type the python extension pack keyword on the search input.
• Third, click the Python extension pack. It’ll show detailed information on the
right pane.
• Finally, click the Install button to install the Python extension.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 30


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 5. SOFTWARE ENVIRONMENT
LEARNING ALGORITHMS

Now, you’re ready to develop the first program in Python.

Creating a new Python project

First, create a new folder called helloworld.

Second, launch the VS code and open the helloworld folder.

Third, create a new app.py file and enter the following code and save the file:

print('Hello, World!')
Code language: Python (python)

The print() is a built-in function that displays a message on the screen. In this
example, it’ll show the message 'Hello, Word!'.

What is a function

When you sum two numbers, that’s a function. And when you multiply two numbers,
that’s also a function.

Each function takes your inputs, applies some rules, and returns a result.

In the above example, the print() is a function. It accepts a string and shows it on the
screen.

Python has many built-in functions like the print() function to use them out of the box
in your program.

In addition, Python allows you to define your functions, which you’ll learn how to do
it later.

Executing the Python Hello World program

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 31


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 5. SOFTWARE ENVIRONMENT
LEARNING ALGORITHMS

To execute the app.py file, you first launch the Command Prompt on Windows or
Terminal on macOS or Linux.

Then, navigate to the helloworld folder.

After that, type the following command to execute the app.py file:

python app.py
Code language: Python (python)

If you use macOS or Linux, you use python3 command instead:

python3 app.py
Code language: CSS (css)

If everything is fine, you’ll see the following message on the screen:

Hello, World!
Code language: Python (python)

If you use VS Code, you can also launch the Terminal within the VS code by:

• Accessing the menu Terminal > New Terminal


• Or using the keyboard shortcut Ctrl+Shift+`.

Typically, the backtick key (`) locates under the Esc key on the keyboard.

5.5 Python IDLE

Python IDLE is the Python Integration Development Environment (IDE) that comes
with the Python distribution by default.

The Python IDLE is also known as an interactive interpreter. It has many features such
as:

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 32


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 5. SOFTWARE ENVIRONMENT
LEARNING ALGORITHMS

• Code editing with syntax highlighting


• Smart indenting
• And auto-completion

In short, the Python IDLE helps you experiment with Python quickly in a trial-and-
error manner.

The following shows you step by step how to launch the Python IDLE and use it to
execute the Python code:

First, launch the Python IDLE program:

FIG 5.6 Python IDLE

A new Python Shell window will display as follows:

FIG 5.7 Python SHELL WINDOW

Now, you can enter the Python code after the cursor >>> and press Enter to execute it.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 33


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 5. SOFTWARE ENVIRONMENT
LEARNING ALGORITHMS

For example, you can type the code print('Hello, World!') and press Enter, you’ll see
the message Hello, World! immediately on the screen:

FIG 5.7.1 Python SHELL WINDOW

5.6 Python Syntax

Whitespace and indentation

If you’ve been working in other programming languages such as Java, C#, or C/C++,
you know that these languages use semicolons (;) to separate the statements.

However, Python uses whitespace and indentation to construct the code structure.

The following shows a snippet of Python code:

# define main function to print out something


def main():
i=1
max = 10
while (i < max):
print(i)
i=i+1

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 34


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 5. SOFTWARE ENVIRONMENT
LEARNING ALGORITHMS

# call function main


main()

The meaning of the code isn’t important to you now. Please pay attention to the code
structure instead.

At the end of each line, you don’t see any semicolon to terminate the statement. And
the code uses indentation to format the code.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 35


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
CHAPTER 6
SYSTEM ARCHITECTURE
FAKE JOB PREDICTION USING MACHINE 6. SYSTEM ARCHITECTURE
LEARNING ALGORITHMS

6. SYSTEM ARCHITECTURE

6.1 SYSTEM ARCHITECTURE :

FIG.6.1.1: SYSTEM ARCHITECURE

6.2.1 UML Diagrams


UML stands for Unified Modeling Language. UML is a standardized general-
purpose modeling language in the field of object-oriented software engineering. The
standard is managed, and was created by, the Object Management Group. The goal is
for UML to become a common language for creating models of object oriented
computer software. In its current form UML is comprised of two major components: a

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 36


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 6. SYSTEM ARCHITECTURE
LEARNING ALGORITHMS

Meta-model and a notation. In the future, some form of method or process may also be
added to; or associated with, UML.
The Unified Modeling Language is a standard language for specifying,
Visualization, Constructing and documenting the artifacts of software system, as well
as for business modeling and other non-software systems. The UML represents a
collection of best engineering practices that have proven successful in the modeling of
large and complex systems. The UML is a very important part of developing objects
oriented software and the software development process. The UML uses mostly
graphical notations to express the design of software projects.

GOALS:

The Primary goals in the design of the UML are as follows:

1.Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.

2.Provide extendibility and specialization mechanisms to extend the core concepts.

3.Be independent of particular programming languages and development process.

4.Provide a formal basis for understanding the modeling language.

5.Encourage the growth of OO tools market.

6.Support higher level development concepts such as collaborations, frameworks,


patterns and components.

7.Integrate best practices.

6.2.2 Use case Diagram:

A use case diagram in the Unified Modeling Language (UML) is a type of


behavioral diagram defined by and created from a Use-case analysis. Its purpose is to
present a graphical overview of the functionality provided by a system in terms of

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 37


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 6. SYSTEM ARCHITECTURE
LEARNING ALGORITHMS

actors, their goals (represented as use cases), and any dependencies between those use
cases. The main purpose of a use case diagram is to show what system functions are
performed for which actor. Roles of the actors in the system can be depicted.

FIG 6.2.1 : USECASE DIAGRAM BY ADMIN

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 38


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 6. SYSTEM ARCHITECTURE
LEARNING ALGORITHMS

FIG 6.2.1 : USECASE DIAGRAM BY ADMIN

6.2.3 Class Diagram:

In software engineering, a class diagram in the Unified Modeling Language


(UML) is a type of static structure diagram that describes the structure of a system by
showing the system's classes, their attributes, operations (or methods), and the
relationships among the classes. It explains which class contains information.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 39


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 6. SYSTEM ARCHITECTURE
LEARNING ALGORITHMS

FIG 6.2.2 : CLASS DIAGRAM

6.2.4 Sequence Diagram:

A sequence diagram in Unified Modeling Language (UML) is a kind of


interaction diagram that shows how processes operate with one another and in what
order. It is a construct of a Message Sequence Chart. Sequence diagrams are sometimes
called event diagrams, event scenarios, and timing diagrams.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 40


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 6. SYSTEM ARCHITECTURE
LEARNING ALGORITHMS

FIG 6.2.3 : SEQUENCE DIAGRAM BY ADMIN

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 41


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 6. SYSTEM ARCHITECTURE
LEARNING ALGORITHMS

FIG 6.2.3 : SEQUENCE DIAGRAM BY USER

6.2.5 Activity diagram:

Activity diagrams are graphical representations of workflows of stepwise


activities and actions with support for choice, iteration and concurrency. In the Unified
Modeling Language, activity diagrams can be used to describe the business and
operational step-by-step workflows of components in a system. An activity diagram
shows the overall flow of control.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 42


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 6. SYSTEM ARCHITECTURE
LEARNING ALGORITHMS

FIG 6.2.4 : ACTIVITY DIAGRAM BY ADMIN

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 43


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 6. SYSTEM ARCHITECTURE
LEARNING ALGORITHMS

FIG 6.2.4 : ACTIVITY DIAGRAM BY USER

6.2.6 Deployment Diagram:

Deployment Diagram is a type of diagram that specifies the physical hardware


on which the software system will execute. It also determines how the software is
deployed on the underlying hardware. It maps software pieces of a system to the device
that are going to execute it.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 44


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 6. SYSTEM ARCHITECTURE
LEARNING ALGORITHMS

FIG 6.2.5 : DEPLOYMENT DIAGRAM

The deployment diagram maps the software architecture created in design to the
physical system architecture that executes it. In distributed systems, it models the
distribution of the software across the physical nodes.The software systems are
manifested using various artifacts, and then they are mapped to the execution
environment that is going to execute the software such as nodes. Many nodes are
involved in the deployment diagram; hence, the relation between them is represented
using communication paths.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 45


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 6. SYSTEM ARCHITECTURE
LEARNING ALGORITHMS

6.2.7 Data Flow Diagram:

• The DFD is also called as bubble chart. It is a simple graphical formalism that
can be used to represent a system in terms of input data to the system, various
processing carried out on this data, and the output data is generated by this
system.
• The data flow diagram (DFD) is one of the most important XXX Modelling
tools. It is used to model the system components. These components are the
system process, the data used by the process, an external entity that interacts
with the system and the information flows in the system.
• DFD shows how the information moves through the system and how it is
modified by a series of transformations. It is a graphical technique that depicts
information flow and the transformations that are applied as data moves from
input to output.
• DFD is also known as bubble chart. A DFD may be used to represent a system
at any level of abstraction. DFD may be partitioned into levels that represent
increasing information flow and functional detail.

FIG 6.2.6 : DATAFLOW DIAGRAM

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 46


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
CHAPTER 7
MODULES
FAKE JOB PREDICTION USING MACHINE 7. MODULES
LEARNING ALGORITHMS

7. MODULES

Fake job listings have become a significant problem in the job market, leading to an
increased interest in using machine learning algorithms to predict and prevent such
fraudulent activities. In this context, several modules can be employed to build an
effective fake job prediction system.

7.1 MODULES

There are 2 modules:


1. Admin
2. User or Candidate
Admin:-
❖ Login
❖ User Management
❖ Pending Users
❖ All User
❖ Fake job
❖ Upload Dataset
❖ View Dataset
❖ Algorithm
❖ SVM Algorithm
❖ Decision Tree Algorithm
❖ Naïve Bayes Algorithm
❖ K-NN Bayes Algorithm
❖ Random Forest Algorithm
❖ Graph Analysis
❖ Comparision Graph

User:-
❖ Register
❖ Login
❖ Predict

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 47


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 7. MODULES
LEARNING ALGORITHMS

7.1 Description of Modules

The admin module provides a central hub for managing the system and its users. Here's
a breakdown of its functionalities:

User Management:

• User List: View a list of all registered users, including usernames, emails, and
potentially registration dates.
• User Details: Access detailed information about specific users, allowing for
further investigation if needed.
• User Management Actions: This might include functionalities like user
account activation/deactivation, or even deletion in case of suspicious activity.

System Management:

• Model Management: Depending on the system design, the admin might be


able to view and manage different machine learning models used for fake job
prediction. This could involve:
o Uploading new models for testing.
o Selecting the currently active model for prediction.
o Monitoring model performance metrics (accuracy, precision, recall) to
identify potential issues.
• Data Management:
o The admin might have access to view and manage the system's training
data. This could involve:
▪ Uploading new datasets for training or retraining models.
▪ Monitoring data quality for potential biases or inconsistencies.
• Settings: The admin might be able to configure various settings for the system,
such as:
o Specifying thresholds for flagging suspicious job postings.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 48


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 7. MODULES
LEARNING ALGORITHMS
o Enabling/disabling specific features.

Additional Considerations:

• Security: The admin module should have robust security measures in place to
prevent unauthorized access. This might involve features like:
o Secure login with strong password requirements.
o User roles with different access levels.
o Activity logs to track user actions.
• Reporting: The admin might have access to generate reports on various aspects
of the system, such as:
o Number of flagged job postings.
o User activity statistics.
o Model performance over time.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 49


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
CHAPTER 8
IMPLEMENTATION
FAKE JOB PREDICTION USING MACHINE 8. IMPLEMENTATION
LEARNING ALGORITHMS

8.1 SOURCE CODE

from django.shortcuts import render,redirect


from django.contrib import messages
from userapp.models import *
from mainapp.models import *

# Create your views here.


def user_dashboard(request):
return render(request,'user/user-dashboard.html')

def user_predict(request):
user_id = request.session['user_id']
user = UserModel.objects.get(user_id=user_id)

if request.method == 'POST':
title = request.POST.get("jobtitle")
location=request.POST.get("location")
department=request.POST.get("department")
salary_range=request.POST.get("salary_range")
company_profile=request.POST.get("Company_Profile")
description=request.POST.get("decription")
requirements=request.POST.get("requirements")
benefits=request.POST.get("benefits")
req_experience=request.POST.get("required_experience")
req_education=request.POST.get("required_education")
industry=request.POST.get("industry")
function=request.POST.get("function")
emp_type=request.POST.get("employment_type")

print(emp_type,title,location,department,salary_range,company_pro
file,description,requirements,benefits,req_experience,req_educati
on,industry,function)

job=JobModel.objects.create(job_title=title,job_location=location
,job_dept=department,job_com_profile=company_profile,

job_description=description,job_requirement=requirements,job_bene
fits=benefits,

job_req_experience=req_experience,job_req_education=req_education
,job_industry=industry,

job_function=function,job_salary_range=salary_range,job_emp_type=
emp_type,user_url=user)

if job:
messages.success(request, 'successfully entered
jobdata')

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 50


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 8. IMPLEMENTATION
LEARNING ALGORITHMS

return redirect('user_result',id=job.job_id)

else:
messages.error(request, 'Invalid data')
return redirect('user_predict')

return render(request,'user/user-predict.html')

def user_profile(req):
user_id = req.session['user_id']
user = UserModel.objects.get(user_id=user_id)
if req.method == 'POST':
username = req.POST.get("user_username")
email = req.POST.get("user_email")
contact = req.POST.get("user_contact")
password = req.POST.get("user_password")
if len(req.FILES) != 0:
image = req.FILES["image"]
user.user_username = username
user.user_contact = contact
user.user_password = password
user.user_image = image
user.save()
messages.success(req,'Updated
Successfully')
else:
user.user_username = username

user.user_contact = contact
user.user_username = username
user.user_contact = contact
user.user_password = password
user.save()
messages.success(req,'Updated
Successfully')

return redirect('user_profile')
return render(req,'user/user-profile.html',{'user':user})

def user_result(request,id):
user_id = request.session['user_id']
user = UserModel.objects.get(user_id=user_id)

# job = JobModel.objects.get(pk=id)
predict=JobModel.objects.get(pk=id)
print(predict,'ooooooooooooo')

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 51


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 8. IMPLEMENTATION
LEARNING ALGORITHMS

X_test=[predict.job_title + predict.job_location +
predict.job_dept + predict.job_com_profile +
predict.job_description +
predict.job_requirement + predict.job_benefits +
predict.job_req_education + predict.job_req_experience +
predict.job_industry + predict.job_function]
print(X_test)

- import joblib
file=open('job_vc_rf.pkl','rb')
vc=joblib.load(file)
X_test1=vc.transform(X_test)
print(X_test1,'gggggggggggggggggggggggg')
import joblib
file=open('job_rf.pkl','rb')
rfmodel=joblib.load(file)
from sklearn.svm import SVC
y_pred=rfmodel.predict(X_test1)
print(y_pred[0])
predict.job_status=y_pred[0]
predict.save()
print(predict.job_status,'hhhhhhhhhhhhhhhhhhh')
# messages.info(request,"non-fraudulent")
messages.success(request,'Predicted Successfully')

return render(request,'user/user-
result.html',{'job':predict})

5.8 SAMPLE SOURCE CODE

from django.shortcuts import render,redirect,get_object_or_404


from mainapp.models import UserModel
from adminapp.models import Dataset
from userapp.models import *
import pandas as pd
# from django.shortcuts import render,redirect
from adminapp.models import *
from mainapp.models import *
from userapp.models import *
from django.contrib import messages
#Importing Libraries
import re
import string
import numpy as np

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 52


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 8. IMPLEMENTATION
LEARNING ALGORITHMS

import pandas as pd
import random
import missingno
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.base import TransformerMixin
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from wordcloud import WordCloud
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from spacy.lang.en import English
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score,f1_score, recall_score, precision_score

# Create your views here.


def admin_dash(request):
dataset=Dataset.objects.all().count()
user=UserModel.objects.all().count()
test=JobModel.objects.all().count()
return render(request,'admin/admin-
dash.html',{'Dataset':dataset,'user':user,'test':test})

def admin_algocomp(request):
try:

dt = Dataset.objects.filter(dt_algo='DecisionTreeClassifier').first()
dt_ac = dt.dt_Accuracy*100
dt_pr = dt.dt_Precision*100
dt_re = dt.dt_Recall*100
dt_fs = dt.dt_F1_Score*100
lr = Dataset.objects.filter(lr_algo='Logistic Regression').first()
lr_ac = lr.lr_Accuracy*100
lr_pr = lr.lr_Precision*100

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 53


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 8. IMPLEMENTATION
LEARNING ALGORITHMS

lr_re = lr.lr_Recall*100
lr_fs = lr.lr_F1_Score*100
nb = Dataset.objects.filter(nb_algo='Naive-Bayes').first()
nb_ac = nb.nb_Accuracy*100
nb_pr = nb.nb_Precision*100
nb_re = nb.nb_Recall*100

nb_fs = nb.nb_F1_Score*100
rf = Dataset.objects.filter(rf_algo='RandomForestClassifier').first()
rf_ac = rf.rf_Accuracy*100
rf_pr = rf.rf_Precision*100
rf_re = rf.rf_Recall*100
rf_fs = rf.rf_F1_Score*100
context = {
'lr_ac':lr_ac,
'lr_pr':lr_pr,
'lr_re':lr_re,
'lr_fs':lr_fs,
'nb_ac':nb_ac,
'nb_pr':nb_pr,
'nb_re':nb_re,
'nb_fs':nb_fs,
'dt_ac':dt_ac,
'dt_pr':dt_pr,
'dt_re':dt_re,
'dt_fs':dt_fs,
'rf_ac':rf_ac,
'rf_pr':rf_pr,
'rf_re':rf_re,
'rf_fs':rf_fs,

}
return render(request,'admin/admin-algocomp.html',context)
except:
messages.warning(request,'Run all 4 algorithms to compare values')
return redirect('admin_view')

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 54


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 8. IMPLEMENTATION
LEARNING ALGORITHMS

def admin_allusers(request):
user=UserModel.objects.filter(user_status='accepte

d').order_by('user_id')
return render(request,'admin/admin-allusers.html',{'user':user})

def admin_dectree(request):

data=Dataset.objects.all().order_by('-data_id').first()
return render(request,'admin/admin-dectree.html',{'data':data})

def admin_lr(request):
data = Dataset.objects.all().order_by('-data_id').first()
print(data,type(data),'dataaaaaaaaaaa')

return render(request,'admin/admin-lr.html',{'data':data})

def admin_nb(request):
data = Dataset.objects.all().order_by('-data_id').first()
return render(request,'admin/admin-nb.html',{'data':data})

def admin_pendingusers(request):
items = UserModel.objects.filter(user_status='pending').order_by('-user_id')
return render(request,'admin/admin-pendingusers.html' ,{'items':items})

def admin_randfor(request):
data = Dataset.objects.all().order_by('-data_id').first()
return render(request,'admin/admin-randfor.html',{'data':data})

def admin_upload(request):
if request.method == 'POST':
dataset = request.FILES['dataset']
data = Dataset.objects.create(data_set = dataset)
data = data.data_id
print(type(data),'type')

return redirect('admin_view')

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 55


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 8. IMPLEMENTATION
LEARNING ALGORITHMS

return render(request,'admin/admin-upload.html')

def admin_view(request):
data = Dataset.objects.all().order_by('-data_id').first()
print(data,type(data),'sssss')
file = str(data.data_set)
df = pd.read_csv(f'./media/{file}')
table = df.to_html(table_id='data_table')

return render(request,'admin/admin-view.html',{'i':data,'t':table})

def accept_user(request,id):

accept = get_object_or_404(UserModel,user_id=id)
accept.user_status = "accepted"
accept.save(update_fields=["user_status"])
accept.save()

return redirect('admin_pendingusers')

def decline_user(request,id):
decline = get_object_or_404(UserModel,user_id=id)
decline.user_status = "declined"
decline.save(update_fields=["user_status"])
decline.save()

return redirect('admin_pendingusers')

def RandomForest(request):
Accuracy = None
Precision = None
Recall = None
F1_Score = None
data = Dataset.objects.all().order_by('-data_id').first()
file = str(data.data_set)
df = pd.read_csv(f'./media/{file}')

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 56


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 8. IMPLEMENTATION
LEARNING ALGORITHMS

x=df['title']
y=df['fraudulent']
print(x.shape,'hhhhhhhhhhhhhhhhhhhhhhhhhh')
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)
#converting text into Numbers
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer()
x_train1 = tf.fit_transform(x_train)
x_test1 = tf.transform(x_test)
import joblib
file=open('job_vc_rf.pkl','wb')
joblib.dump(tf,file)
#Mchinelearning
from sklearn.ensemble import RandomForestClassifier
model_name = RandomForestClassifier()
model_name.fit(x_train1,y_train)
prediction = model_name.predict(x_test1)
import joblib
file=open('job_rf.pkl','wb')
joblib.dump(model_name,file)
Accuracy=(accuracy_score(prediction,y_test))
Precision=(precision_score(prediction,y_test,average = 'macro'))
Recall=(recall_score(prediction,y_test,average = 'macro'))
F1_Score=(f1_score(prediction,y_test,average = 'macro'))
print(Accuracy,Precision,Recall,F1_Score,'Scoreeeeeeeeeeeee lr')

data.rf_Accuracy = Accuracy
data.rf_Precision = Precision
data.rf_Recall = Recall
data.rf_F1_Score = F1_Score
data.save()
data = Dataset.objects.filter(rf_algo='RandomForestClassifier').order_by('-
data_id').first()
return render(request,'admin/admin-randfor.html',{'data':data})

def LogisticRegression(request):
Accuracy = None

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 57


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 8. IMPLEMENTATION
LEARNING ALGORITHMS

Precision = None
Recall = None
F1_Score = None

data = Dataset.objects.all().order_by('-data_id').first()
# id = data.data_id
file = str(data.data_set)
df = pd.read_csv(f'./media/{file}')
x=df['title']
y=df['fraudulent']
print(x.shape,'hhhhhhhhhhhhhhhhhhhhhhhhhh')
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)
#converting text into Numbers
from sklearn.feature_extraction.text import

TfidfVectorizer
tf = TfidfVectorizer()
x_train1 = tf.fit_transform(x_train)
x_test1 = tf.transform(x_test)
#Mchinelearning
from sklearn.linear_model import LogisticRegression
model_name = LogisticRegression()
model_name.fit(x_train1,y_train)
prediction = model_name.predict(x_test1)
Accuracy=(accuracy_score(prediction,y_test))
Precision=(precision_score(prediction,y_test,average = 'macro'))
Recall=(recall_score(prediction,y_test,average = 'macro'))
F1_Score=(f1_score(prediction,y_test,average = 'macro'))
print(Accuracy,Precision,Recall,F1_Score,'Scoreeeeeeeeeeeee lr')
data.lr_Accuracy = Accuracy
data.lr_Precision = Precision
data.lr_Recall = Recall
data.lr_F1_Score = F1_Score
data.save()

data = Dataset.objects.filter(lr_algo='Logistic Regression').order_by('-


data_id').first()

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 58


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 8. IMPLEMENTATION
LEARNING ALGORITHMS

return render(request,'admin/admin-lr.html',{'data':data})
def navie_bayes(request):
Accuracy = None
Precision = None
Recall = None
F1_Score = None
data = Dataset.objects.all().order_by('-data_id').first()
file = str(data.data_set)
df = pd.read_csv(f'./media/{file}')
x=df['title']
y=df['fraudulent']
print(x.shape,'hhhhhhhhhhhhhhhhhhhhhhhhhh')
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)
#converting text into Numbers
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer()
x_train1 = tf.fit_transform(x_train)
x_test1 = tf.transform(x_test)
#Mchinelearning
from sklearn.naive_bayes import MultinomialNB
model_name = MultinomialNB()
model_name.fit(x_train1,y_train)
prediction = model_name.predict(x_test1)
Accuracy=(accuracy_score(prediction,y_test))
Precision=(precision_score(prediction,y_test,average = 'macro'))

Recall=(recall_score(prediction,y_test,average = 'macro'))
F1_Score=(f1_score(prediction,y_test,average = 'macro'))
print(Accuracy,Precision,Recall,F1_Score,'Scoreeeeeeeeeeeee lr')
data.nb_Accuracy = Accuracy
data.nb_Precision = Precision
data.nb_Recall = Recall
data.nb_F1_Score = F1_Score
data.save()

data = Dataset.objects.filter(nb_algo='Naive-Bayes').order_by('-data_id').first()

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 59


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 8. IMPLEMENTATION
LEARNING ALGORITHMS

return render(request,'admin/admin-nb.html',{'data':data})

def DecisionTree(request):
Accuracy = None
Precision = None
Recall = None
F1_Score = None
data = Dataset.objects.all().order_by('-data_id').first()
file = str(data.data_set)
df = pd.read_csv(f'./media/{file}')
x=df['title']
y=df['fraudulent']
print(x.shape,'hhhhhhhhhhhhhhhhhhhhhhhhhh')
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=0)
print(x_train,'train')
#converting text into Numbers
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer()
x_train1 = tf.fit_transform(x_train)
x_test1 = tf.transform(x_test)
#Mchinelearning
from sklearn.tree import DecisionTreeClassifier
model_name = DecisionTreeClassifier()
model_name.fit(x_train1,y_train)
prediction = model_name.predict(x_test1)
Accuracy=(accuracy_score(prediction,y_test))
Precision=(precision_score(prediction,y_test,average = 'macro'))
Recall=(recall_score(prediction,y_test,average = 'macro'))
F1_Score=(f1_score(prediction,y_test,average = 'macro'))
print(Accuracy,Precision,Recall,F1_Score,'Scoreeeeeeeeeeeee lr')
data.dt_Accuracy = Accuracy
data.dt_Precision = Precision
data.dt_Recall = Recall
data.dt_F1_Score = F1_Score
data.save()

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 60


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 8. IMPLEMENTATION
LEARNING ALGORITHMS

data=Dataset.objects.filter(dt_algo='DecisionTreeClassifier').order_by('-
data_id').first()
return render(request,'admin/admin-dectree.html',{'data':data})

def button(request,id):
predict=JobModel.objects.get(pk=id)
print(predict,'ooooooooooooo')
X_test=[predict.job_title + predict.job_location + predict.job_dept +
predict.job_com_profile + predict.job_description +
predict.job_requirement + predict.job_benefits + predict.job_req_education +
predict.job_req_experience +
predict.job_industry + predict.job_function]
print(X_test)
import joblib
file=open('job_vc_rf.pkl','rb')
vc=joblib.load(file)
X_test1=vc.transform(X_test)
print(X_test1,'gggggggggggggggggggggggg')
import joblib
file=open('job_rf.pkl','rb')
rfmodel=joblib.load(file)
from sklearn.svm import SVC
y_pred=rfmodel.predict(X_test1)
print(y_pred[0])
return redirect('user_result',id=id)

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 61


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
CHAPTER 9
SCREEN SHOTS
FAKE JOB PREDICTION USING MACHINE 9. SCREEN SHOTS
LEARNING ALGORITHMS

9. SCREEN SHOTS

Figure 9.1 : Website Page

Figure 9.2 User Projects

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 62


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 9. SCREEN SHOTS
LEARNING ALGORITHMS

Fig 9.3 : User login page

Figure 9.4 : Contact details

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 63


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 9. SCREEN SHOTS
LEARNING ALGORITHMS

Fig 9.5 : Decision tree algorithm

Fig 9.6 : uploading data set

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 64


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 9. SCREEN SHOTS
LEARNING ALGORITHMS

Fig 9.7 : job details

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 65


Gunthapally (V), Abdullaputmet (M), R.R. District-501512
Department of Computer Science (AIML)
CHAPTER 10
SYSTEM TESTING
FAKE JOB PREDICTION USING MACHINE 10. SYSTEM TESTING
LEARNING ALGORITHMS

10. SYSTEM TESTING


10.1 Results /Discussion
Different Testing Types with Details We, as testers, are aware of the various types of
Software Testing like Functional Testing, Non-Functional Testing, Automation
Testing, Agile Testing, and their sub-types, etc. Each type of testing has its own
features, advantages, and disadvantages as well. However, in this tutorial, we have
covered mostly each and every type of software testing which we usually use in our
day-to-day testing life.
Different Types of Software Testing

FIG10 : types of software testing

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 66


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 10. SYSTEM TESTING
LEARNING ALGORITHMS

10.1 Functional Testing


There are four main types of functional testing.
10.1.1 Unit Testing
Unit testing is a type of software testing which is done on an individual unit or
component to test its corrections. Typically, Unit testing is done by the developer at the
application development phase. Each unit in unit testing can be viewed as a method,
function, procedure, or object. Developers often use test automation tools such as N
Unit, X unit, JUnit for the test execution. Unit testing is important because we can find
more defects at the unit test level. For example, there is a simple calculator application.
The developer can write the unit test to check if the user can enter two numbers and get
the correct sum for addition functionality.

a) White Box Testing


White box testing is a test technique in which the internal structure or code of an
application is visible and accessible to the tester. In this technique, it is easy to find
loopholes in the design of an application or fault in business logic. Statement coverage
and decision coverage/branch coverage are examples of white box test techniques.

b) Gorilla Testing
Gorilla testing is a test technique in which the tester and/or developer test the module
of the application thoroughly in all aspects. Gorilla testing is done to check how robust
your application is. For example, the tester is testing the pet insurance company’s
website, which provides the service of buying an insurance policy, tag for the pet,
Lifetime membership. The tester can focus on any one module, let’s say, the insurance
policy module, and test it thoroughly with positive and negative test scenarios.

10.1.1.2 Integration Testing


Integration testing is a type of software testing where two or more modules of an
application are logically grouped together and tested as a whole. The focus of this type

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 67


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 10. SYSTEM TESTING
LEARNING ALGORITHMS

of testing is to find the defect on interface, communication, and data flow among
modules. Top-down or Bottom-up approach is used while integrating modules into the
whole system. This type of testing is done on integrating modules of a system or
between systems. For example, a user is buying a flight ticket from any airline website.
Users can see flight details and payment information while buying a ticket, but flight
details and payment processing are two different systems. Integration testing should be
done while integrating of airline website and payment processing system.

a) Gray box testing


As the name suggests, gray box testing is a combination of white-box testing and black-
box testing. Testers have partial knowledge of the internal structure or code of an
application.

10.1.1.3 System Testing


System testing is types of testing where tester evaluates the whole system against the
specified requirements.

a) End to End Testing


It involves testing a complete application environment in a situation that mimics real-
world use, such as interacting with a database, using network communications, or
interacting with other hardware, applications, or systems if appropriate. For example, a
tester is testing a pet insurance website. End to End testing involves testing of buying
an insurance policy, LPM, tag, adding another pet, updating credit card information on
users’ accounts, updating user address information, receiving order confirmation emails
and policy documents.

b) Black Box Testing


Blackbox testing is a software testing technique in which testing is performed without
knowing the internal structure, design, or code of a system under test. Testers should

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 68


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 10. SYSTEM TESTING
LEARNING ALGORITHMS

focus only on the input and output of test objects. Detailed information about the
advantages, disadvantages, and types of Black Box testing can be found here.
c) Smoke Testing
Smoke testing is performed to verify that basic and critical functionality of the system
under test is working fine at a very high level. Whenever a new build is provided by the
development team, then the Software Testing team validates the build and ensures that
no major issue exists. Them testing team will ensure that the build is stable, and a
detailed level of testing will be carried out further. For example, tester is testing pet
insurance website. Buying an insurance policy, adding another pet, providing quotes
are all basic and critical functionality of the application. Smoke testing for this website
verifies that all these functionalities are working fine before doing any in-depth testing.

d) Sanity Testing
Sanity testing is performed on a system to verify that newly added functionality or bug
fixes are working fine. Sanity testing is done on stable build. It is a subset of the
regression test. For example, a tester is testing a pet insurance website. There is a
change in then discount for buying a policy for second pet. Then sanity testing is only
performed on buying insurance policy module.

e) Happy path Testing


The objective of Happy Path Testing is to test an application successfully on a positive
flow. It does not look for negative or error conditions. The focus is only on valid and
positive inputs through which the application generates the expected output.

f) Monkey Testing
Monkey Testing is carried out by a tester, assuming that if the monkey uses the
application, then how random input and values will be entered by the Monkey without
any knowledge or understanding of the application. The objective of Monkey Testing
is to check if an application or system gets crashed by providing random input

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 69


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 10. SYSTEM TESTING
LEARNING ALGORITHMS

values/data. Monkey Testing is performed randomly, no test cases are scripted, and it
is not necessary to be aware of the full functionality of the system.

10.1.1.4 Acceptance Testing


Acceptance testing is a type of testing where client/business/customer test the software
with real time business scenarios. The client accepts the software only when all the
features and functionalities work as expected. This is the last phase of testing, after
which the software goes into production. This is also called User Acceptance Testing
(UAT).

a) Alpha Testing
Alpha testing is a type of acceptance testing performed by the team in an organization
to find as many defects as possible before releasing software to customers. For
example, the pet insurance website is under UAT. UAT team will run real- time
scenarios like buying an insurance policy, buying annual membership, changing the
address, ownership transfer of the pet in a same way the user uses the real website. The
team can use test credit card information to process payment-related scenarios.b) Beta

Testing
Beta Testing is a type of software testing which is carried out by the clients/customers.
It is performed in the Real Environment before releasing the product to the market for
the actual end-users. Beta Testing is carried out to ensure that there are no major failures
in the software or product, and it satisfies the business requirements from an end-user
perspective. Beta Testing is successful when the customer accepts the software.
Usually, this testing is typically done by the end-users. This is the final testing done
before releasing the application for commercial purposes. Usually, the Beta version of
the software or product released is limited to a certain number of users in a specific
area. So, the end-user uses the software and shares the feedback with the company. The
company then takes necessary action before releasing the software worldwide.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 70


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 10. SYSTEM TESTING
LEARNING ALGORITHMS

c) Operational acceptance testing (OAT)


Operational acceptance testing of the system is performed by operations or system
administration staff in the production environment. The purpose of operational
acceptance testing is to make sure that the system administrators can keep the system
working properly for the users in a real-time environment.
The focus of the OAT is on the following points:
❖ Testing of backup and restore.
❖ Installing, uninstalling, upgrading software.
❖ The recovery process in case of natural disaster.
❖ User management.
❖ Maintenance of the software.

10.1.2 Non-Functional Testing


There are four main types of functional testing.

10.1.2.1 Security Testing


It is a type of testing performed by a special team. Any hacking method can penetrate
the system. Security Testing is done to check how the software, application, or website
is secure from internal and/or external threats. This testing includes how much software
is secure from malicious programs, viruses and how secure &amp; strong the
authorization and authentication processes are. It also checks how software behaves for
any hacker’s attack &amp; malicious programs and how software is maintained for data
security after such a hacker attack.

a) Penetration Testing
Penetration Testing or Pen testing is the type of security testing performed as an
authorized cyberattack on the system to find out the weak points of the system in terms
of security. Pen testing is performed by outside contractors, generally known as ethical
hackers. That is why it is also known as ethical hacking. Contractors perform different

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 71


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 10. SYSTEM TESTING
LEARNING ALGORITHMS

operations like SQL injection, URL manipulation, Privilege Elevation, session expiry,
and provide reports to the organization.
Notes: Do not perform the Pen testing on your laptop/computer. Always take
written permission to do pen tests.

10.1.2.2 Performance Testing


Performance testing is testing of an application’s stability and response time by
applying load. The word stability means the ability of the application to withstand in
the presence of load. Response time is how quickly an application is available to users.
Performance testing is done with the help of tools. Loader.IO, JMeter, LoadRunner,
etc. are good tools available in the market.

a) Load testing
Load testing is testing of an application’s stability and response time by applying load,
which is equal to or less than the designed number of users for an application. For
example, your application handles 100 users at a time with a response time of 3 seconds,
then load testing can be done by applying a load of the maximum of 100 or less than
100 users. The goal is to verify that the application is responding within 3 seconds for
all the users.

b) Stress Testing
Stress testing is testing an application’s stability and response time by applying load,
which is more than the designed number of users for an application. For example, your
application handles 1000 users at a time with a response time of 4 seconds, then stress
testing can be done by applying a load of more than 1000 users. Test the application
with 1100,1200,1300 users and notice the response time. The goal is to verify the
stability of an application under stress.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 72


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 10. SYSTEM TESTING
LEARNING ALGORITHMS

c) Scalability Testing
Scalability testing is testing an application’s stability and response time by applying
load, which is more than the designed number of users for an application. For
example, your application handles 1000 users at a time with a response time of 2
seconds, then scalability testing can be done by applying a load of more than 1000 users
and gradually increasing the number of users to find out where exactly my application
is crashing. Let’s say my application is giving response time as follows:

❖ 1000 users -2 sec


❖ 1400 users -2 sec
❖ 4000 users -3 sec
❖ 5000 users -45 sec
❖ 5150 users- crash – This is the point that needs to identify in scalability

testing

d) Volume testing (flood testing)


Volume testing is testing an application’s stability and response time by transferring a
large volume of data to the database. Basically, it tests the capacity of the database to
handle the data.

e) Endurance Testing (Soak Testing)


Endurance testing is testing an application’s stability and response time by applying
load continuously for a longer period to verify that the application is working fine. For
example, car companies soak testing to verify that users can drive cars continuously for
hours without any problem.

10.1.2.3 Usability Testing


Usability testing is testing an application from the user’s perspective to check the look
and feel and user-friendliness. For example, there is a mobile app for stock trading, and

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 73


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 10. SYSTEM TESTING
LEARNING ALGORITHMS

a tester is performing usability testing. Testers can check the scenario like if the mobile
app is easy to operate with one hand or not, scroll bar should be vertical, background
colour of the app should be black and price of and stock is displayed in red or green
colour. The main idea of usability testing of this kind of app is that as soon as the user
opens the app, the user should get a glance at the market.

a) Exploratory testing
Exploratory Testing is informal testing performed by the testing team. The objective of
this testing is to explore the application and look for defects that exist in the
application. Testers use the knowledge of the business domain to test the application.
Test charters are used to guide the exploratory testing.

b) Cross browser testing


Cross browser testing is testing an application on different browsers, operating systems,
mobile devices to see look and feel and performance. Why do we need cross-browser
testing?
The answer is different users use different operating systems, different browsers, and
different mobile devices. The goal of the company is to get a good user experience
regardless of those devices. Browser stack provides all the versions of all the browsers
and all mobile devices to test the application. For learning purposes, it is good to take
the free trial given by browser stack for a few days.

c) Accessibility Testing
The aim of Accessibility Testing is to determine whether the software or application
is accessible for disabled people or not. Here, disability means deafness, colour
blindness, mentally disabled, blind, old age, and other disabled groups. Various checks
are performed, such as font size for visually disabled, colour and contrast for colour
blindness, etc.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 74


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
CHAPTER 11
CONCLUSION
FAKE JOB PREDICTION USING MACHINE 11. CONCLUSION
LEARNING ALGORITHMS

11. CONCLUSION

11.1 CONCLUSION

Employment scam detection will guide job-seekers to get only legitimate offers from
companies. For tackling employment scam detection, several machine learning
algorithms are proposed as countermeasures in this paper. Supervised mechanism is
used to exemplify the use of several classifiers for employment scam detection.
Experimental results indicate that Random Forest classifier outperforms over its peer
classification tool. The proposed approach achieved accuracy 98.27% which is much
higher than the existing methods.

Fake job prediction using machine learning algorithms is a crucial application that can
significantly impact the detection and prevention of employment fraud. Through the
utilization of advanced techniques such as Natural Language Processing (NLP) and
classification algorithms like Naive Bayes and Stochastic Gradient Descent (SGD)
classifiers, it is possible to develop effective models that can distinguish between
legitimate job postings and fraudulent ones. By analyzing various features of job
postings such as text content, title, location, profile information, and character count,
these machine learning models can accurately predict the likelihood of a job
advertisement being fake.

The combination of NLP and classification algorithms in the final model provides a
robust framework for identifying false job postings and alerting applicants to potential
scams, thereby enhancing overall job market security and trustworthiness.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 75


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 11. CONCLUSION
LEARNING ALGORITHMS

11.2 FUTURE SCOPE

Current State of Fake Job Prediction: Fake job postings have been a persistent issue
in the online recruitment space, leading to various fraudulent activities and scams
targeting job seekers. The use of machine learning algorithms, such as Naive Bayes and
Stochastic Gradient Descent (SGD) classifiers, has shown promise in detecting and
predicting fake job postings by analyzing textual data from job listings.

Advancements in Machine Learning Techniques: As machine learning techniques


continue to evolve, there is a significant potential for enhancing the accuracy and
efficiency of fake job prediction models. Future advancements may include the
integration of more sophisticated algorithms, such as deep learning models like
recurrent neural networks (RNNs) or transformers, to capture complex patterns in
textual data related to job postings.

Incorporating Multimodal Data Analysis: The future scope for fake job prediction
using machine learning algorithms could involve incorporating multimodal data
analysis, which combines textual information with other modalities like images or
videos associated with job listings. By leveraging a combination of text and visual data,
machine learning models can gain a more comprehensive understanding of the context
and authenticity of job postings.

Enhanced Feature Engineering and Model Interpretability: Future research may


focus on developing advanced feature engineering techniques tailored specifically for
identifying fake job postings. Additionally, improving the interpretability of machine
learning models used for fake job prediction can help stakeholders understand the
underlying factors contributing to the classification decisions, thereby increasing trust
in the predictive outcomes.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 76


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 11. CONCLUSION
LEARNING ALGORITHMS

Integration of Real-Time Data Streams: Another area of future exploration could


involve integrating real-time data streams from various online platforms where job
postings are published. By continuously updating the training data with new
information, machine learning models can adapt to emerging trends and evolving
strategies employed by fraudsters, enhancing their predictive capabilities over time.

Collaborative Efforts and Industry Partnerships: To further advance the field of


fake job prediction using machine learning algorithms, collaborative efforts between
academia, industry partners, and regulatory bodies will be crucial. Sharing insights,
datasets, and best practices can foster innovation and drive the development of more
robust solutions to combat employment fraud effectively.

Ethical Considerations and Bias Mitigation: As with any application of artificial


intelligence, addressing ethical considerations and mitigating biases in fake job
prediction models will be essential. Future research should prioritize fairness,
transparency, and accountability in algorithmic decision-making processes to ensure
equitable outcomes for all stakeholders involved.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 77


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
CHAPTER 12
REFERENCES/
BIBILOGRAPHY
FAKE JOB PREDICTION USING MACHINE 12. REFERENCES/BIBILOGRAPHY
LEARNING ALGORITHMS

12. REFERENCES/BIBILOGRAPHY

1. Bandar Alghamdi, Fahad Alharby, “An Intelligent Model for Online Recruitment
Fraud Detection”, Journal of Information Security, 2019, pp. 155 176.

2. Tao Jiang, Jian ping li, Amin ul Haq, Abdus labor, and Amjad al, “A Novel Stacking
Approach for Accurate Detection of Fake News”, Vol. 9, 2021, pp.

22626-22639.

3. Karri sai Suresh reddy, karri Lakshmana reddy, “fake job recruitment detection”,
JETIR August 2021, Vol. 8, pp. d443-d448.

4. Tulus Suryanto, Robbi Rahim, Ansari Saleh Ahmar, “Employee Recruitment Fraud
Prevention with the Implementation of Decision Support System”, Journal of Physics
Conference Series, 2018, pp.1-11.

5. C. Jagadeesh, Dr. Pravin R Kshirsagar, G. Sarayu, G.Gouthami, B.Manasa,


“Artificial intelligence based Fake Job Recruitment Detection Using Machine
Learning Approach”, Journal of Engineering Sciences, Vol. 12, 2021, pp. 0377-

9254.

6. Lal, Sangeeta, Rishabh Jiaswal, Neetu Sardana, Ayushi Verma, Amanpreet Kaur, and
Rahul Mourya. &quot;ORFDetector: ensemble learning based online recruitment
fraud detection.&quot; In 2019 Twelfth International Conference on Contemporary
Computing (IC3), pp. 1-5. IEEE, 2019.

7. Samir Bandyopadhyay, Shawni Dutta, “Fake Job Recruitment Detection Using


Machine Learning Approach”, International Journal of Engineering Trends and
Technology (IJETT),Vol. 68, 2020, pp. 48- 53

8. George Tsakalidis, Graduate Student Member, IEEE, and Kostas Vergidis, “A


Systematic Approach Toward Description and Classification of Cybercrime Incidents”,

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 78


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 12. REFERENCES/BIBILOGRAPHY
LEARNING ALGORITHMS

IEEE Transactions on Systems, Man, and Cybernetics: Systems, Vol. 49, 2019, pp. 1-
20

9. Andrii Shalaginov, Jan William Johnsen, Katrin Franke, “Cyber Crime


Investigations in the Era of Big Data”, IEEE International Conference on Big Data,
2017, pp. 3672-3676.

10. Sokratis Vidros, Constantinos Kolias, Georgios Kambourakis and Leman Akoglu,
“Automatic Detection of Online Recruitment Frauds: Characteristics, Methods, and a
Public Dataset”, Future Internet 2017, pp. 2-19.

11. Shu, Kai, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. &quot;Fake news
detection on social media: A data mining perspective.&quot; ACM SIGKDD
explorations newsletter 19, no. 1 (2017): 22-36.

12. Devsmit Ranparia; Shaily Kumari; Ashish Sahani, ”Fake Job Prediction using
Sequential Network”, IEEE 15th International Conference on Industrial and
Information Systems (ICIIS), 2020, pp.339-343

13. Syed Mahbub, Eric Pardede, “Using Contextual Features for Online Recruitment
Fraud Detection”, 27th International Conference on Information Systems
Development, 2018.

14. Najma Imtiaz Ali, Suhaila Samsuri, Muhamad Sadry, Imtiaz Ali Brohi, Asadullah
Shah, “Online Shopping Satisfaction in Malaysia: A Framework for Security, Trust and
Cybercrime”, 6th International Conference on Information and Communication
Technology for The Muslim World, 2016, pp. 194-198.

15. Vidros, Sokratis; Kolias, Constantinos; Kambourakis, Georgios, “Online


recruitment services: another playground for fraudsters”, Computer Fraud &amp;
Security, 2016, pp. 8-13.

16.Sultana Umme Habiba, Md. Khairul Islam, Farzana Tasnim, “A Comparative Study
on Fake Job Post Prediction Using Different Data mining Techniques”, 2nd

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 79


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
FAKE JOB PREDICTION USING MACHINE 12. REFERENCES/BIBILOGRAPHY
LEARNING ALGORITHMS

International Conference on Robotics,Electrical and Signal Processing Techniques


(ICREST), 2021, pp. 543-546.

17. Sarvesh Tanwar, Thomas Paul, Kanwarpreet Singh, Mannat Joshi, Ajay Rana,
“Classification and Impact of Cyber Threats in India: A review”, 8th International
Conference on Reliability, Infocom Technologies and Optimization (Trends and Future
Directions) (ICRITO), 2020,pp. 129-135.

18. Veena, K., and P. Visu. &quot;Detection of cyber crime: An approach using the lie
detection technique and methods to solve it.&quot; In 2016 International Conference
on Information Communication and Embedded Systems (ICICES), pp. 1-6. IEEE, 2016.

19.Gunjan, Vinit Kumar; Kumar, Amit; Avdhanam, Sharda, “A survey of cybercrime in


India”, 15th International Conference on Advanced Computing Technologies (ICACT),
2013, pp. 1–6.

20. Thangiah, Murugan; Basri, Shuib; Sulaiman, Suziah, “A framework to detect


cybercrime in the virtual environment”, International Conference on Computer &amp;
Information Science (ICCIS), 2012, pp. 553–557.

AVANTHIS SCIENTIFIC TECHNOLOGICAL & RESEARCH ACADEMY 80


Gunthapally (V), Abdullapurmet (M), R.R. District-501512
Department of Computer Science (AIML)
13. APPENDICES
APPENDIX-A

LIST OF FIGURES
S No Figure No Figure Name Page No

1 6.1.1 System Architecture 36

2 6.2.1 Use Case Diagram 37

3 6.2.2 Class Diagram 39

4 6.2.3 Sequence Diagram 40

5 6.2.4 Activity Diagram 42

6 6.2.5 Deployment Diagram 44

7 6.2.6 Data Flow Diagram 46

LIST OF ABBREVIATIONS

DFD Data Flow Diagram

UML Unified Modeling Language

KNN K Nearest Neighbour

UAT User Acceptance Testing

UI User Interface
13.APPENDIX-B
APPENDIX-B

S.NO Figure No Figure Name Page No

1 9.1 Website Page 62

2 9.2 User Projects 62

3 9.3 User Login 63

4 9.4 Contact Details 63

5 9.5 Decision tree algorithm 64

6 9.6 Uploading Dataset 64

7 9.7 Job details 65


14. PROGRAM
OUTCOMES
PROGRAM OUTCOMES:
Program Outcomes (POs) describe what students are expected to know and be able to
do by the time of graduation to accomplish Program Educational Objectives (PEOs).
The Program Outcomes for Computer Science and Engineering graduate are:

PO1: Engineering knowledge: Apply the knowledge of mathematics, science,


engineering fundamentals, and an engineering specialization to the solution of complex
engineering problems.

PO2: Problem analysis: Identify, formulate, review research literature, and analyze
complex engineering problems reaching substantiated conclusions using first principles
of mathematics, natural sciences, and engineering sciences.

PO3: Design/development of solutions: Design solutions for complex engineering


problems and design system components or processes that meet the specified needs
with appropriate consideration for the public health and safety, and the cultural, societal,
and environmental considerations.

PO4: Conduct investigations of complex problems: Use research-based knowledge


and research methods including design of experiments, analysis and interpretation of
data, and synthesis of the information to provide valid conclusions.

PO5: Modern tool usage: Create, select, and apply appropriate techniques, resources,
and modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.

PO6: The engineer and society: Apply reasoning informed by the contextual
knowledge to asses societal, health, safety, legal and cultural issues and the consequent
responsibilities relevant to the professional engineering practice.

PO7: Environment and sustainability: Understand the impact of the professional


engineering solutions in societal and environmental contexts, and demonstrate the
knowledge of, and need for sustainable development.
PO8: Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.

PO9: Individual and team work: Function effectively as an individual, and as a


member or leader in diverse teams, and in multidisciplinary settings.

PO10: Communication: Communicate effectively on complex engineering activities


with the engineering community and with society at large, such as, being able to
comprehend and write effective reports and design documentation, make effective
presentation, and give and clear instructions.

PO11: Project management and finance: Demonstrate knowledge and


understanding of the engineering and management principles and apply these to one’s
own work, as a member and leader in a team, to manage projects and in
multidisciplinary environments.

PO12: Life-long learning: Recognize the need for, and have the preparation and ability
to engage in independent and life-long learning in the broadest context of technologies
change.

PROGRAM SPECIFICATION OUTCOMES:

PSO1: Design, implement, test and evaluate a computer system, or algorithm to meet
desired needs and to solve a computational problem.

PSO2: Ability to analyze, design and implement hardware and software components.
15. PO ATTAINMENT
AVANTHIS SCIENTIFIC TECHNOLOGICAL AND RESEARCH
ACADEMY

(Affiliated to JNTUH, Approved by AICTE, Recognized by Govt of T.S 501512)


PROJECT TITLE: FAKE JOB PREDICTION USING MACHINE LEARNING
ALGORITHMS

INTERNAL GUIDE: DR.N.V. RAMANAREDDY M. Tech, M.B.A, Ph.D

TEAM MEMBERS: B. KEERTHANA (21PT5A6601)


CH. DINESH (20PT1A6601)
T. HITISH VARDHAN (20PT1A6602)
K. RUSHIKESH (20PT1A6604)
G. NANDHINI (21PT5A6602)
J. MADHU (21PT5A6604)
PO ATTAINMENT
COURSE PAGE PO DESCRIPTION
KNOWLEDGE NO ATTAINMENT

SOFTWARE 14 PO.5, PO.8 Development phases


ENGINEERING

PYTHON 15 PO.1, PO.3 In source Code

COMPUTER 42 PO.3, PO.4 Explore the Feature Extraction


VISION methods by
AWS
MACHINE PO.2, PO.3, Build custom Helmet
LEARNING 17 PO.4, Detection Models from
PO.5 Libraries like TensorFlow
or PyTorch
Explore advanced deep
DEEP 40 PO.4 learning architecture like
LEARNING YOLO or Single Shot
MultiBox Detector for
Real time Helmet
Detection
Integrate voice-based
NATURAL alerts by training an NLP
LANGUAGE 63 PO.2, PO.3, model to convert
PROCESSING PO4 detected helmet into
PO.10
spoken notification
RL Algorithms to optimize
REINFORCEMENT PO.2, PO4 their movement and
LEARNING 70 helmet detection strategies

DBMS used to Store, Retrieve,


DATABASE PSO1, And run queries on data.
MANAGEMENT 30 PSO2, PO.5
SYSTEMS

You might also like