0% found this document useful (0 votes)

98 views25 pages

Seminar Report

The document summarizes a seminar report on sales prediction using machine learning. It includes an introduction describing the importance of sales forecasting for businesses. It then states the problem of traditional forecasting techniques having issues with accuracy and big data. The report tests various machine learning algorithms on a Black Friday sales dataset to identify the most suitable predictive model. It describes the modules of data collection, preprocessing, model building and evaluation to solve the problem.

Uploaded by

Anas Ahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views25 pages

Seminar Report

Uploaded by

Anas Ahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

1

A Seminar Report
on
Sales Prediction Using Machine Learning

Submitted by

Anas Ahmad Ilyas Ahmad

Roll No : 222010001

2023-2024

Under guidance of
Prof. Nikhil Khandare

Department of Master of Computer Applications

Veermata Jijabai Technological Institute
(Autonomous Institute, Affiliated to University of Mumbai)
Mumbai - 400019
2

CERTIFICATE

This is to certify that Anas Ahmad Ilyas Ahmad, a

student of Master of Computer Applications, has
completed the report entitled, "Sales Prediction using
Machine Learning" to our satisfaction.

Guide/ Supervisor Head of the Department

Prof. Nikhil Khandare Prof. Swati Chopade
Assistant Professor Associate Professor
Department of Master Department of Master
of Computer Applications of Computer Applications
VJTI, Mumbai VJTI, Mumbai
Date: Date:
Place: Place:
3

DECLARATION
I declare that this written submission represents my ideas in
my own words and where others ideas or words have been
included; I have adequately cited and referenced the original
sources.

I also declare that I have adhered to all principles of academic

honesty and integrity and have not misrepresented or fabricated
or falsified any idea/ data / fact/ source in my submission.

I understand that any violation of the above will be cause for

disciplinary action by the institute and can also evoke penal
action from the sources which have thus not been properly cited
or from whom proper permission has not been taken when
needed.

Signature of Student
Anas Ahmad Ilyas Ahmad
Roll No : 222010001
VJTI, Mumbai
Date :
4

ACKNOWLEDGEMENT
For the help and encouragement in all aspects for this project, I
would like to express my sincere thanks to our guide, Professor
Nikhil Khandare. His expertise and patience were greatly
appreciated and assisted in the successful completion of this
project.

I would also like to thank other lecturers and students for

providing useful comments, constructive criticism and support
during the design and implementation of the project.

Signature of Student
Anas Ahmad Ilyas Ahmad
Roll No : 222010001
VJTI, Mumbai
Date :
5

Sr Important Points Page No

1 Introduction 8

2 Problem Statement 9

3 Literature Review 10

4 Modules 12

5 System Requirements 18

6 Conclusion 18

7 Future Scope 19

8 References 19

Table of Content
6

Abstract
Sales forecasting is the process of predicting future sales. It is
the vital part of the financial planning of the business. Most of
the companies heavily depend on the future prediction of the
sales. Accurate sales forecasting empower the organizations to
make informed business decisions and it will help to predict the
short-term and long-term performances. A precise forecasting
can avoid overestimating or underestimating of the future sales,
which may leads to great loss to companies. The past and
current sales statistics is used to estimate the future
performance. But it is difficult to deal with accuracy of sales
forecasting by traditional forecasting. For this purpose, various
machine learning techniques have been discovered. In this work,
we have taken Black Friday dataset and made a detailed analysis
over the dataset. Here, we have implemented the different
machine learning techniques with different metrics. By
analysing the
performance, we have trying to suggest the suitable predictive
algorithm to our problem statement.
7

1. Introduction
Sales play a key role in the business. At the company level, sales
forecasting is the major part of the business plan and significant
inputs for decision-making activities. It is essential for
organizations to produce the required quantity at the specified
time. For that, sales forecasting will gives the idea about how an
organization should manage its budgeting, workforce and
resources. This forecasting helps the business management to
determine how much products should be manufacture, how
much revenue can be expected and what could be the
requirement of employees, investment and equipment. By
analyzing the future trends and needs, Sales forecasting helps to
improve the business growth.
The traditional forecasting systems have some drawbacks
related to accuracy of the forecasting and handling enormous
amount of data. To overcome this problem, Machine-Learning
(ML) techniques have been discovered. These techniques helps
to analyses the bigdata and plays a important role in sales
forecasting. Here we have used supervised machine learning
techniques for the sales forecasting.
8

2. PROBLEM STATEMENT
Most of the business organizations heavily depend on a
knowledge base and demand prediction of sales trends. Sales
forecasting is the process of estimating future sales. Accurate
sales forecasts enable companies to make informed business
decisions and predict short-term and long-term performance.
Companies can base their forecasts on past sales data,
industrywide comparisons, and economic trends. Sales forecasts
help sales teams achieve their goals by identifying early warning
signals in their sales pipeline and course correct before it’s too
late. The goal is to improve the accuracy from the existing
project. So that the sales and profit could be increased for the
companies. Choosing an efficient algorithm from comparing
different algorithms to improve the prediction further more.
9

3. LITERATURE SURVEY
PAPER-1:
Intelligent Sales Prediction Using Machine Learning
Techniques.
Abstract: The detailed study and analysis of comprehensible
predictive models to improve future sales predictions are carried
out in this research. Traditional forecast systems are difficult to
deal with the big data and accuracy of sales forecasting.
Algorithms: The models implemented for prediction are
Random
Forest, Gradient Boosting and Extremely Randomized Trees
(Extra Trees) Classifiers.
Conclusion: Random Trees was confirmed to be a very
effective.

PAPER-2:
Comparison of Different Machine Learning Algorithms for
Multiple Regression on Black Friday Sales Data.
Abstract: This study focuses on the field of prediction models
to develop an accurate and efficient algorithm to analyze the
customer spending in the past and output the future spending of
the customers with same features.
Algorithms: Regression, Decision Tree, XGBoost.
Conclusion: XGBoost.

PAPER-3:
Sales Prediction Using Machine Learning Algorithms.
Abstract: The aim of this paper is to propose a dimension for
predicting the future sales of Big Mart Companies keeping in
view the sales of previous years. A comprehensive study of
sales prediction is done using Machine Learning models.
Algorithms: Linear Regression, K-Neighbours Regressor,
XGBoost, Regressor and Random Forest Regressor.
Conclusion: Random Forest Algorithm is found to be the
most suitable
10

PAPER-4:
Forecasting of Walmart Sales using Machine Learning
Algorithm.
Abstract: The ability to predict data accurately is extremely
valuable in a vast array of domains such as stocks, sales,
weather or even sports. Presented here is the study and
implementation of several ensemble classification algorithms
employed on sales data, consisting
of weekly retail sales numbers from different departments in
Walmart retail outlets all over the United States of America.
Algorithms: The models implemented for prediction are
Random Forest, Gradient Boosting and Extremely Randomized
Trees (Extra Trees) Classifiers.
Conclusion: Random Trees was confirmed to be a very
effective.
PAPER-5:
Sales Prediction For Big Mart.
Abstract: A retailer company wants a model that can predict
accurate sales so that it can keep track of customers future
demand and update in advance the sale inventory. In this work,
we propose a technique to optimize the parameters and select
the best tuning hyper parameters, further ensemble with
Xgboost techniques for forecasting the future sales of a retailer
company such as Big Mart and we found our model produces
the better result.
Algorithms: Xgboost techniques.
Conclusion: Experimental analysis found our technique
produce more accurate
11

4. MODULES
4.1 DATA COLLECTION:
The dataset has been collected from https://www.kaggle.com/
The training dataset contains 12 columns and 550069 rows. The
Test dataset contains 12 columns and 233600 rows. The dataset
contains 12 variables which includes User ID, Gender, City
Category, Product ID, Total count of years stayed in current
city, Age, Occupation, Marital status, Product Category1,
Product Category2, Product Category3 and Purchase amount.

4.2 DATA PREPROCESSING:

This step is an important step in data mining process. Because it
improves the quality of the experimental raw data.

i)Removal of Null values:

In this step, the null values in the fields Product Category2 and
Product Category3 are filled with the mean value of the feature.

ii) Converting Categorical values into numerical:

Machine learning deal with numerical values easily because of
the machine readable form. Therefore, the categorical values
like Product ID, Gender, Age and City Category are converted
to numerical values.
Step1: Based on its datatype, we have selected the categorical
values.
Step2: By using python, we have converting the categorical
values into numerical values.

iii) Separate the target variable:

Here, we have to separate the target feature in which we are
going to predict.
In this case, purchase is the target variable.
Step1: The target label purchase is assigned to the variable ‘y’.
Step2: The preprocessed data except the target label purchase is
assigned to the variable ‘X’.
12

iv) Standardize the features:

Here, we have to standardize the features because it arranges the
data in a standard normal distribution. The standardization of the
data is made only for training data most of the time because any
kind of transformation of the features only be fitted on the
training data.
Step1: Only trained data was taken.
Step2: By using the Standard Scaler API, we have standardize
the features

4.3 ALGORITHMS

Linear Regression :
Linear Regression is one of the common ML and data analysis
technique. This algorithm is helpful for forecasting based on
linear regression equation. The Linear regression technique is
the type of regression, which combines the set of
12 independent features(x) to predict the output value(y) or
dependent variable. The linear equation assigns a factor to each
independent variable called coefficients represented by β.
13

XGBoost:
XGBoost also known as Extreme Gradient Boosting has been
used in order to get an efficient model with high computational
speed and efficacy. The formula makes predictions using the
ensemble method that models the anticipated errors of some
decision trees to optimize last predictions. Production of this
model also reports the value of each feature’s effects in
determining the last building performance score prediction.

Gradient Boosting:
Gradient Boost is the one of the major boosting algorithm.
Boosting is a ensemble technique in which the successive
predictors learn from the mistakes of the previous or
predecessor predictors. It is the method of improving the weak
learners and create a combined prediction model. In this
algorithm, decision trees are mainly used as base learners and
trains the model in sequential manner.

Random Forest:
Random forest is referred as a supervised machine learning
ensemble method, which uses the multiple decision trees. It
involves the technique called Bootstrap aggregation also known
as bagging which aims to reduce the complexity of the
models that overfit the training data . In this algorithm, rather
than depending on individual decision tree it will combines the
multiple decision trees to find the final outcome.

Extra Trees Algorithm:

This algorithm works by creating a large number of unpruned
decision trees from the training dataset. Predictions are made by
averaging the prediction of the decision trees in the case of
regression or using majority voting in the case of classification.

Feature Selection:
Product_Category_1 feature has by far the highest regression
coefficient and is very important feature.
14

4.4 RESULTS AND DISCUSSION:

The evaluation of the machine learning algorithms is an
essential part of any prediction model building. For that, we
should carefully choose the evaluation metrics . These metrics
are used to measure or judge the quality of the model.
The performance of the machine learning algorithms are mainly
focusing on accuracy. Companies uses the machine learning
models with high accuracy for the practical business decisions.

ALGORITHM RMSE ACCURACY

Linear Regression 4693 29%
Random Forest 3052 79%
Gradient Boost 3004 81%
XGBoost 5023 82%
Extra Tree 3137 77%
Regression

Based on the performance, we have concluded that the XGBoost

and Gradient Boost algorithm considered as the best fit
comparing to other algorithms. This comparative evaluation will
help the organizations to choose the better and efficient
machine-learning model.
15

Figure: Accuracy for different Machine Learning Techniques

Figure: Accuracy Comparison for different Machine Learning

Techniques
16

Figure: Accuracy and RMSE for different Machine Learning

Techniques
17

5. SYSTEM REQUIREMENTS

5.1 HARDWARE REQUIREMENTS

• System : i3 Processor
• Hard Disk : 500 GB.
• Ram : 4GB

5.2 SOFTWARE REQUIREMENTS

 Operating system : Windows 7 or above, linux.
 Scripting Tool: Jupyter Notebook, Google colab
Language: Python3.9

6. CONCLUSION
Sales forecasting is mainly required for the organizations for
business decisions. Accurate forecasting will help the companies
to enhance the market growth. Machine learning techniques
provides the effective mechanism in prediction and data mining
as it overcome the problem with traditional techniques. These
techniques enhances the data optimization along with improving
the efficiency with better results and greater predictability. After
predicting the purchase amount, the companies can apply some
marketing strategies for certain sections of customers so that the
profit could be enhanced.
18

7. FUTURE SCOPE
In our future work, we will use the other feature selection
techniques and advanced deep learning architecture algorithms
to enhance the efficiency of the model with improved
optimization.
This algorithm could be integrated in a website or app to get
insights based on this data.

8. REFERENCES
[1] Sunitha Cheriyan, Shaniba Ibrahim, Saju Mohanan & Susan
Treesa (2018) Intelligent Sales Prediction Using Machine
Learning Techniques.
[2] Avinash kumar, Neha Gopal & Jatin Rajput(2020). An
Intelligent Model For Predicting the Sales of a Product.
[3 ] Nikhil Sunil Elias, Seema Singh(2019).FORECASTING of
WALMART SALES using MACHINE LEARNING
ALGORITHMS.
[4] Gopal Behera & Neeta Nain (2019). Sales Prediction For
Big Mart.
19

Code:
import numpy as np
import pandas as pd
import os

for dirname, _, filenames in os.walk('/kaggle/input'):

for filename in filenames:
print(os.path.join(dirname, filename))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn import linear_model
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import ExtraTreesRegressor

train_df=pd.read_csv("train.csv")
test_df=pd.read_csv("test.csv")
df=train_df.copy()

train_df.info()
test_df.info()
train_df.head()

train_df.drop('User_ID',axis=1,inplace=True)
test_df.drop('User_ID',axis=1,inplace=True)

train_df.shape
train_df.describe()
20

train_df['Age']=train_df['Age'].map({'0-17':0,'18-25':1, '26-
35':2,'36- 45':3,'46-50':4, '51-55':5, '55+':6})

test_df['Age']=test_df['Age'].map({'0-17':0,'18-25':1, '26-
35':2,'36-45':3,'46-50':4, '51-55':5, '55+':6})

test_df['Gender'].unique()
train_df['Marital_Status'].unique()
train_df['City_Category'].unique()

city=pd.get_dummies(train_df['City_Category'],drop_first=True
)
train_df.drop('City_Category',axis=1,inplace=True)

city_test=pd.get_dummies(test_df['City_Category'],drop_first=T
rue)
percent_missing=np.round((train_df.isna().sum()/
train_df.isna().count()),3a
=17

percent_missing.sort_values(ascending=False)
train_df['Product_Category_2']=train_df['Product_Category_2'].
fillna(train _df['Product_Category_2'].mode()[0])

train_df['Product_Category_2'].isna().sum()
percent_missing=np.round((test_df.isna().sum()/
test_df.isna().count()),3)

percent_missing.sort_values(ascending=False)
test_df.drop('Product_Category_3',axis=1,inplace=True)
test_df['Product_Category_2']=test_df['Product_Category_2'].fil
lna(train_df['Product_Category_2'].mode()[0])
train_df.info()

train_df['Stay_In_Current_City_Years']=train_df['Stay_In_Curr
ent_City_Years'].astype(int)
21

train_df['B']=train_df['B'].astype(int)
train_df['C']=train_df['C'].astype(int)
train_df['Product_Category_2']

test_df.drop('City_Category',axis=1,inplace=True)
sns.barplot('Gender','Purchase',data=train_df)
sns.barplot('Age','Purchase',data=train_df)
sns.barplot('Marital_Status','Purchase',data=train_df)
sns.barplot('Occupation','Purchase',data=train_df)

X=train_df.drop('Purchase',axis=1)
y=train_df['Purchase']
X_train,X_valid,y_train,y_valid=train_test_split(X,y,test_size=0
.5,random_state=42)

rfr=RandomForestRegressor(n_estimators=150)
rfr.fit(X_train,y_train)
rfrpredict=rfr.predict(X_valid)
regressor = RandomForestRegressor()
regressor.fit(X_train,y_train)
accuracy = regressor.score(X_valid,y_valid)
accuracy1=a+accuracy*100

gbr=GradientBoostingRegressor()
gbr.fit(X_train,y_train)
gbrpredict= gbr.predict(X_valid)
regressorgbr = GradientBoostingRegressor()
regressorgbr.fit(X_train,y_train)
accuracy = regressorgbr.score(X_valid,y_valid)
accuracy2=a+accuracy*100

xgr=XGBRegressor()
xgr.fit(X_train,y_valid)
xgrpredict=xgr.predict(X_valid)
regressorxg = XGBRegressor()
regressorxg.fit(X_train,y_train)
accuracy3 = regressorxg.score(X_valid,y_valid)
22

reg=linear_model.LinearRegression()
lm_model=reg.fit(X_train,y_train)
pred=lm_model.predict(X_valid)
regressorlr = linear_model.LinearRegression()
regressorlr.fit(X_train,y_train)
accuracy = regressorlr.score(X_valid,y_valid)
accuracy4=a+accuracy*100

m=ExtraTreesRegressor()
m.fit(X_train,y_train)
mpredict= m.predict(X_valid)
Exregressor = ExtraTreesRegressor()
Exregressor.fit(X_train,y_train)
accuracy = Exregressor.score(X_valid,y_valid)
accuracy5=a+accuracy*100

finalpredict=gbr.predict(test_df)
finalpredict
size = train_df['Gender'].value_counts()
labels = ['Male', 'Female']
colors = ['#C4061D', 'green']
explode = [0, 0.1]
plt.rcParams['figure.figsize'] = (10, 10)
plt.pie(size, colors = colors, labels = labels, shadow = True,
explode = explode, autopct = '%.2f%%')

plt.title('A Pie Chart representing the gender gap', fontsize = 20)

plt.axis('off')
plt.legend()
plt.show()

from scipy import stats

from scipy.stats import norm
plt.rcParams['figure.figsize'] = (20, 7)
sns.distplot(train_df['Purchase'], color = 'green', fit = norm)
23

# fitting the target variable to the normal curve

mu, sigma = norm.fit(train_df['Purchase'])
print("The mu {} and Sigma {} for the curve".format(mu,
sigma))

plt.title('A distribution plot to represent the distribution of

Purchase')

plt.legend(['Normal Distribution ($mu$: {}, $sigma$:

{}'.format(mu, sigma)], loc = 'best')

plt.show()
plt.figure(figsize=[12,8])

sns.countplot(train_df['Occupation'],hue=train_df["Age"])
print("RMSE score for Random_Forest : ",
np.sqrt(mean_squared_error(y_valid,rfrpredict)))

print("RMSE score for Gradient Boosting : ",

np.sqrt(mean_squared_error(y_valid,gbrpredict)))

print("RMSE score for XG Boosting : ",

np.sqrt(mean_squared_error(y_valid,xgrpredict)))

print("RMSE score for Linear Regression : ",

np.sqrt(mean_squared_error(y_valid,pred)))

print("RMSE score for ExtraTreesRegressor : ",

np.sqrt(mean_squared_error(y_valid,mpredict)))

print("Accuracy for Random_Forest: ",accuracy1,'%')

print("Accuracy for Gradient Boosting: ",accuracy2,'%')

print("Accuracy for XG Boosting: ",accuracy3,'%')

print("Accuracy for Linear Regression: ",accuracy4,'%')

print("Accuracy for ExtraTreesRegressor: ",accuracy5,'%')

import numpy as np
import matplotlib.pyplot as plt

data = {'Random_Forest':accuracy1, 'Gradient

Boosting':accuracy2, 'XG
Boosting':accuracy3, 'Linear Regression':accuracy4,
'ExtraTreesRegressor':accuracy5}

courses = list(data.keys())
values = list(data.values())
fig = plt.figure(figsize = (10, 5))

# creating the bar plot

plt.bar(courses, values, color ='maroon', width = 0.4)
plt.xlabel("Algorithm")
plt.ylabel("Percentage %")
plt.title("Accuracy Chart")
plt.show()
barWidth = 0.25
fig = plt.subplots(figsize =(12, 8))
New = [accuracy1, accuracy2, accuracy3, accuracy4, accuracy5]

Old = [77, 73, 72, 37, 0]

br1 = np.arange(len(New))
br2 = [x + barWidth for x in br1]
plt.bar(br1, Old, color ='r', width = barWidth, edgecolor ='grey',
label ='OLD')

plt.bar(br2, New, color ='g', width = barWidth, edgecolor

='grey', label ='NEW')

plt.xlabel('ALGORITHIM', fontweight ='bold', fontsize = 15)

plt.ylabel('ACCURACY %', fontweight ='bold', fontsize = 15)
25

plt.xticks([r + barWidth for r in range(len(New))],

['Random_Forest', 'Gradient Boosting', 'XG Boosting', 'Linear
Regression', 'ExtraTreesRegressor'])

plt.legend()
plt.show()

rf_regressor_tune = RandomForestRegressor(n_estimators=100,
max_depth = 40, max_features = 'auto', min_samples_leaf =10,
min_samples_split=2 )

rf_regressor_tune.fit(X_train, y_train)
columns = pd.DataFrame({"Features": test_df.columns,
24 "Feature Importance"
:rf_regressor_tune.feature_importances_})
columns.sort_values("Feature Importance", ascending =
False).reset_index(drop=True)
sns.barplot(y="Features", x = "Feature Importance", data =
columns)

Behavior Analysis Based On Google Play ST
No ratings yet
Behavior Analysis Based On Google Play ST
11 pages
Flight Fare
No ratings yet
Flight Fare
15 pages
Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
Trackpad Pro Ver. 5.0 Class 6
From Everand
Trackpad Pro Ver. 5.0 Class 6
Nidhi Arora
No ratings yet
Please Do Not Share These Notes On Apps Like Whatsapp or Telegram
No ratings yet
Please Do Not Share These Notes On Apps Like Whatsapp or Telegram
9 pages
Slide Viva Detection Fake Account Using Machine Learning 20 - 1830!10!2021
No ratings yet
Slide Viva Detection Fake Account Using Machine Learning 20 - 1830!10!2021
25 pages
Predicting The Future of Sales: A Machine Learning Analysis of Rossman Store Sales
No ratings yet
Predicting The Future of Sales: A Machine Learning Analysis of Rossman Store Sales
11 pages
An Intelligent Approach For Food Standards Prediction Using Machine Learning
100% (1)
An Intelligent Approach For Food Standards Prediction Using Machine Learning
11 pages
Interview Questions For DS & DA (ML)
100% (1)
Interview Questions For DS & DA (ML)
66 pages
Unit I-Cloud Computing
No ratings yet
Unit I-Cloud Computing
29 pages
Predicting Autism in Children
No ratings yet
Predicting Autism in Children
52 pages
Visualizing and Forecasting Stocks: Submitted in Partial Fulfillment of The Requirement of For The Degree of
No ratings yet
Visualizing and Forecasting Stocks: Submitted in Partial Fulfillment of The Requirement of For The Degree of
31 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
17 pages
Siddaganga Institute of Technology, Tumkur - 572 103: Usn 1 S I CSPE17
No ratings yet
Siddaganga Institute of Technology, Tumkur - 572 103: Usn 1 S I CSPE17
1 page
Unit - 5
No ratings yet
Unit - 5
91 pages
Disease Prediction Using Machine Learning
No ratings yet
Disease Prediction Using Machine Learning
9 pages
Face Detection Report
No ratings yet
Face Detection Report
28 pages
Siddaganga Institute of Technology, Tumkur - 572 103: Usn 1 S I CSPE17
No ratings yet
Siddaganga Institute of Technology, Tumkur - 572 103: Usn 1 S I CSPE17
2 pages
Bba Project For 3r and 5th Sem
No ratings yet
Bba Project For 3r and 5th Sem
48 pages
UIT1522-DS-U02S04-VM PROVISIONING AND MIGRATION-NetV
No ratings yet
UIT1522-DS-U02S04-VM PROVISIONING AND MIGRATION-NetV
50 pages
For Unit 4 Useful
100% (1)
For Unit 4 Useful
107 pages
MPSS SRS
No ratings yet
MPSS SRS
10 pages
Database Management Systems Nov
No ratings yet
Database Management Systems Nov
6 pages
IJCRT2209468
No ratings yet
IJCRT2209468
8 pages
Activity Lifecycle With Example in Android
No ratings yet
Activity Lifecycle With Example in Android
60 pages
Aicte Notification
No ratings yet
Aicte Notification
2 pages
Mtech Project
No ratings yet
Mtech Project
23 pages
UNIT5
No ratings yet
UNIT5
31 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
11 pages
Python Programming Unit 1
No ratings yet
Python Programming Unit 1
99 pages
Latest Seminar Report Yash Ingole
No ratings yet
Latest Seminar Report Yash Ingole
35 pages
Strategic Marketing Term Paper (Shafiul Bashar Shanto)
No ratings yet
Strategic Marketing Term Paper (Shafiul Bashar Shanto)
24 pages
Green Ai For Iiot: Energy Efficient Intelligent Edge Computing For Industrial Internet of Things
No ratings yet
Green Ai For Iiot: Energy Efficient Intelligent Edge Computing For Industrial Internet of Things
10 pages
Opinion Mining of Online Customer Reviews: Patlammagari Gowtamreddy
No ratings yet
Opinion Mining of Online Customer Reviews: Patlammagari Gowtamreddy
44 pages
Final Presentation
No ratings yet
Final Presentation
15 pages
Love Story
No ratings yet
Love Story
6 pages
Overview
No ratings yet
Overview
100 pages
9 Maths Exemplar Chapter 9
No ratings yet
9 Maths Exemplar Chapter 9
13 pages
Week 5 CRM
No ratings yet
Week 5 CRM
119 pages
Internship Report
No ratings yet
Internship Report
30 pages
Unit II Notes - Virtualization
No ratings yet
Unit II Notes - Virtualization
49 pages
Java Abstract Class Vs Interface Example Program
No ratings yet
Java Abstract Class Vs Interface Example Program
13 pages
Implementing A Honeypot For IOT Smart Homes To Cope With Zero-Day Attacks Using Machine Learning Problem Statement
No ratings yet
Implementing A Honeypot For IOT Smart Homes To Cope With Zero-Day Attacks Using Machine Learning Problem Statement
10 pages
MAD Report
No ratings yet
MAD Report
19 pages
Autism ML Paper
No ratings yet
Autism ML Paper
7 pages
Assignment 2 Brief: Scenario
No ratings yet
Assignment 2 Brief: Scenario
21 pages
DNS Filtering Solutions
No ratings yet
DNS Filtering Solutions
15 pages
Harnessing Artificial Intelligence For The Earth Report 2018
No ratings yet
Harnessing Artificial Intelligence For The Earth Report 2018
28 pages
UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
Bcs Lab Manual
100% (3)
Bcs Lab Manual
51 pages
Data Science Africa AI Researchers Kick-Off Unilag 2019 PDF
No ratings yet
Data Science Africa AI Researchers Kick-Off Unilag 2019 PDF
103 pages
Ankit Adhikari 2 PDF
No ratings yet
Ankit Adhikari 2 PDF
22 pages
Viruses and Related Threats
No ratings yet
Viruses and Related Threats
39 pages
Ex 12 Workflow For Diploma Admission
No ratings yet
Ex 12 Workflow For Diploma Admission
1 page
Fake News Detection Using Python and Machine Learning
No ratings yet
Fake News Detection Using Python and Machine Learning
6 pages
Lekhas Project
No ratings yet
Lekhas Project
10 pages
ACI-Dashboard
No ratings yet
ACI-Dashboard
8 pages
Abstractive Text Summarization Using Deep Learning
No ratings yet
Abstractive Text Summarization Using Deep Learning
7 pages
Childhood Asthma Prediction Model Using SVM
No ratings yet
Childhood Asthma Prediction Model Using SVM
9 pages
Automatic Timetable Generation
No ratings yet
Automatic Timetable Generation
10 pages
Yubraj Shrestha
No ratings yet
Yubraj Shrestha
15 pages
Research Paper Microfinance
No ratings yet
Research Paper Microfinance
15 pages
R18CSE4102-UNIT 1 Data Mining Notes
No ratings yet
R18CSE4102-UNIT 1 Data Mining Notes
26 pages
New Version Chap 1
No ratings yet
New Version Chap 1
20 pages
Rohit Godke Dsbda Report Sppu
No ratings yet
Rohit Godke Dsbda Report Sppu
10 pages
Syllabus AIML BCA
No ratings yet
Syllabus AIML BCA
19 pages
40 ML Interview Questions That You Must Know (2024) - Reader View
No ratings yet
40 ML Interview Questions That You Must Know (2024) - Reader View
13 pages
A Survey and Implementation of Machine Learning Algorithms For Customer Churn Prediction
No ratings yet
A Survey and Implementation of Machine Learning Algorithms For Customer Churn Prediction
7 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Tuning Parameters
No ratings yet
Tuning Parameters
15 pages
A Comparative Analysis of Credit Card Fraud Detection Using Machine Learning Classification Algorithm.
No ratings yet
A Comparative Analysis of Credit Card Fraud Detection Using Machine Learning Classification Algorithm.
53 pages
Ensemble Learning: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
No ratings yet
Ensemble Learning: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
6 pages
Lung Cancer Project
No ratings yet
Lung Cancer Project
34 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
Dtree&rf
No ratings yet
Dtree&rf
26 pages
Machine Learning Unit4
No ratings yet
Machine Learning Unit4
8 pages
Introduction To Machine Learning - III
75% (4)
Introduction To Machine Learning - III
51 pages
ML Notes (Module-3)
No ratings yet
ML Notes (Module-3)
21 pages
Brain, Bytes & Bias: ML Interview Questions You Can’t Miss!
No ratings yet
Brain, Bytes & Bias: ML Interview Questions You Can’t Miss!
21 pages
ML Course Slides
No ratings yet
ML Course Slides
356 pages
Utilising Temporal Convolutional Networks for Cryptocurrency Price Forecasting - Nathan Portelli
No ratings yet
Utilising Temporal Convolutional Networks for Cryptocurrency Price Forecasting - Nathan Portelli
12 pages
A Novel Approach For Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods
No ratings yet
A Novel Approach For Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods
11 pages
Machine Learning For Everyone
No ratings yet
Machine Learning For Everyone
35 pages
Multivariate Data Analysis Project
0% (1)
Multivariate Data Analysis Project
26 pages
CP 4
No ratings yet
CP 4
2 pages
Class 7 Random Forest Algorithm
No ratings yet
Class 7 Random Forest Algorithm
13 pages
18ai61-Model Question Paper Solutions
No ratings yet
18ai61-Model Question Paper Solutions
71 pages
Varshenee Report
No ratings yet
Varshenee Report
111 pages
Artificial Intelligence and Machine Learning [ Theory Exam]
No ratings yet
Artificial Intelligence and Machine Learning [ Theory Exam]
65 pages
6. (Mirza 2022) Inflation Prediction in Emerging Economies ML and FX Reserve Integration for Enhanced Forecasting
No ratings yet
6. (Mirza 2022) Inflation Prediction in Emerging Economies ML and FX Reserve Integration for Enhanced Forecasting
11 pages

Seminar Report

Uploaded by

Seminar Report

Uploaded by

1

Anas Ahmad Ilyas Ahmad

Department of Master of Computer Applications

This is to certify that Anas Ahmad Ilyas Ahmad, a

Guide/ Supervisor Head of the Department

I also declare that I have adhered to all principles of academic

I understand that any violation of the above will be cause for

I would also like to thank other lecturers and students for

Sr Important Points Page No

4.2 DATA PREPROCESSING:

i)Removal of Null values:

ii) Converting Categorical values into numerical:

iii) Separate the target variable:

iv) Standardize the features:

Extra Trees Algorithm:

4.4 RESULTS AND DISCUSSION:

ALGORITHM RMSE ACCURACY

Based on the performance, we have concluded that the XGBoost

Figure: Accuracy for different Machine Learning Techniques

Figure: Accuracy Comparison for different Machine Learning

Figure: Accuracy and RMSE for different Machine Learning

5.1 HARDWARE REQUIREMENTS

5.2 SOFTWARE REQUIREMENTS

for dirname, _, filenames in os.walk('/kaggle/input'):

from sklearn.model_selection import train_test_split

plt.title('A Pie Chart representing the gender gap', fontsize = 20)

from scipy import stats

# fitting the target variable to the normal curve

plt.title('A distribution plot to represent the distribution of

plt.legend(['Normal Distribution ($mu$: {}, $sigma$:

print("RMSE score for Gradient Boosting : ",

print("RMSE score for XG Boosting : ",

print("RMSE score for Linear Regression : ",

print("RMSE score for ExtraTreesRegressor : ",

print("Accuracy for Random_Forest: ",accuracy1,'%')

print("Accuracy for Gradient Boosting: ",accuracy2,'%')

print("Accuracy for XG Boosting: ",accuracy3,'%')

print("Accuracy for Linear Regression: ",accuracy4,'%')

print("Accuracy for ExtraTreesRegressor: ",accuracy5,'%')

data = {'Random_Forest':accuracy1, 'Gradient

# creating the bar plot

Old = [77, 73, 72, 37, 0]

plt.bar(br2, New, color ='g', width = barWidth, edgecolor

plt.xlabel('ALGORITHIM', fontweight ='bold', fontsize = 15)

plt.xticks([r + barWidth for r in range(len(New))],

You might also like