100% found this document useful (1 vote)

2K views12 pages

HPC Mini Project Report

This document describes using classification algorithms like logistic regression and random forest in SPSS Modeler to analyze gravitational wave strain data. It aims to predict gravitational wave events using attributes like strain value and type. The dataset will be split into 70% for training models and 30% for testing. Both algorithms will be used to classify the testing data and compare their accuracy, with the most accurate model being selected.

Uploaded by

Ketan Ingale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

2K views12 pages

HPC Mini Project Report

Uploaded by

Ketan Ingale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

“CLASSIFICATION ALGORITHMS USING

SPSS MODELER”

A Mini Project

Submitted by

Rakshitha Shettigar (BC058)

Nishant Dalvi (BC051)

Ketan Ingale (BC045)

Farhan Ansari (BC007)

FOURTH YEAR COMPUTER ENGINEERING

Department of Computer Engineering

Hope Foundation's
International Institute of Information Technology

Hinjawadi, Pune – 411057

AY 2018-2019
Semester-1
Classification algorithms using SPSS Modeler

TABLE OF CONTENTS

1. PROBLEM STATEMENT 3
2. ABSTRACT 3
3. INTRODUCTION 3
4. OBJECTIVE 6
5. METHODOLOGY 6
6. MATHEMATICAL MODEL 7
7. ALGORITHM 8
8. FLOWCHART 10
9. RESULT 11
10. CONCLUSION 12
11. REFERENCES 12

2 Department of Computer Engineering I2IT, Pune

Classification algorithms using SPSS Modeler

1. PROBLEM STATEMENT

Perform Logistic Regression Classifier and Random Forest Classifier of CBC data using
SPSS Modeler tool

Dataset used- Gravitational wave strain for H1 and L1.

2. ABSTRACT

Gravitational waves are disturbances in the curvature of space-time, generated by accelerated

masses that propagate as waves outward from their source at the speed of light. As a
gravitational wave passes an observer, that observer will find space-time distorted by the
effects of strain.

The Laser Interferometer Gravitational-Wave Observatory (LIGO) the Virgo detector are
large-scale physics experiments designed to directly detect gravitational waves. The LIGO
Scientific Collaboration (LSC) and the Virgo Collaboration pursue gravitational wave
science with these detectors, along with partner collaborations around the world. These
gravitational strain waves are represented in the form of events.

To perform supervised machine learning algorithm to predict an event based on the strain
type and strain value, we are to train the model by feeding 70% data as input. The testing is
done on the remaining dataset in which strain value and strain type will be taken as input and
the model will predict the event.

3. INTRODUCTION

Data Mining is a technique used in various domains to give meaning to the available data
Classification is a data mining (machine learning) technique used to predict group
membership for data instances.
Classification is a technique where we categorize data into a given number of classes. The
main goal of a classification problem is to identify the category/class to which a new data
will fall under.
Classification is used to find out in which group each data instance is related within a
given dataset. It is used for classifying data into different classes according to some
constrains. Several major kinds of classification algorithms including C4.5, ID3, k-nearest
neighbor classifier, Naive Bayes, SVM, and ANN are used for classification. Generally, a
classification technique follows three approaches Statistical, Machine Learning and Neural
Network for classification.
Classification is a two step process. During first step the model is created by applying
classification algorithm on training data set then in second step the extracted model is tested
against a predefined test data set to measure the model trained performance and accuracy.
Therefore, classification is the process to assign class label from data set whose class label is
unknown.

3 Department of Computer Engineering I2IT, Pune

Classification algorithms using SPSS Modeler

SPSS Modeller

IBM SPSS Modeler is a data mining and text analytics software application from IBM.
It is used to build predictive models and conduct other analytic tasks. It has a visual
interface which allows users to leverage statistical and data mining algorithms without
programming.
One of its main aims from the outset was to get rid of unnecessary complexity in data
transformations, and to make complex predictive models very easy to use. The first
version incorporated decision trees (ID3), and neural networks (backprop), which could
both be trained without underlying knowledge of how those techniques worked.
IBM SPSS Modeler was originally named Clementine by its creators, Integral
Solutions Limited. This name continued for a while after SPSS's acquisition of the
product. SPSS later changed the name to SPSS Clementine, and then later to PASW
Modeler.[1] Following IBM's 2009 acquisition of SPSS, the product was renamed IBM
SPSS Modeler.

Applications:

a. Customer analytics and Customer relationship management (CRM)

b. Fraud detection and prevention
c. Optimizing insurance claims
d. Risk management
e. Manufacturing quality improvement
f. Healthcare quality improvement
g. Forecasting demand or sales
h. Law enforcement and border security
i. Education
j. Telecommunications
k. Entertainment: e.g., predicting movie box office receipts

4 Department of Computer Engineering I2IT, Pune

Classification algorithms using SPSS Modeler

Classification algorithms :
• Logistic Regression

Logistic regression is the appropriate regression analysis to conduct when the

dependent variable is dichotomous (binary). Like all regression analyses, the logistic
regression is a predictive analysis. Logistic regression is used to describe data and to
explain the relationship between one dependent binary variable and one or more
nominal, ordinal, interval or ratio-level independent variables.
Sometimes logistic regressions are difficult to interpret; the Intellectus Statistics tool
easily allows you to conduct the analysis, then in plain English interprets the output.

• Random Forrest Classifier

Random forest, as its name implies, consists of a large number of individual decision
trees that operate as an ensemble. Each individual tree in the random forest spits out a
class prediction and the class with the most votes becomes our model’s prediction (see
figure below).

Visualization of a Random Forest Model Making a Prediction

The fundamental concept behind random forest is a simple but powerful one — the
wisdom of crowds. In data science speak, the reason that the random forest model works so
well is:

A large number of relatively uncorrelated models (trees) operating as a committee will

outperform any of the individual constituent models.

5 Department of Computer Engineering I2IT, Pune

Classification algorithms using SPSS Modeler

The low correlation between models is the key. Just like how investments with low
correlations (like stocks and bonds) come together to form a portfolio that is greater than the
sum of its parts, uncorrelated models can produce ensemble predictions that are more
accurate than any of the individual predictions. The reason for this wonderful effect is that
the trees protect each other from their individual errors (as long as they do not
constantly all err in the same direction). While some trees may be wrong, many other trees
will be right, so as a group the trees are able to move in the correct direction. Therefore, the
prerequisites for random forest to perform well are:

1. There needs to be some actual signal in our features so that models built using those
features do better than random guessing.

2. The predictions (and therefore the errors) made by the individual trees need to have low
correlations with each other.

4. OBJECTIVE
• To perform supervised machine learning on gravitational wave strain dataset.
• To use multiple classification algorithms and find the efficiency of them.\
• To find out which classification algorithm has the highest accuracy and correctly
predicts the event.

5. METHODOLOGY
• The gravitational wave strain data for H1 and L1 has 3 attributes – strain value,
strain type and event.
• The dataset is split into training dataset and testing dataset in 70% and 30%
respectively.
• The training dataset is fed to the classification algorithm to train the model to
correctly predict the event.
• The model is tested on the testing dataset where the event is predicted as the final
output.
• Accuracy of every testing model is compared and the model with the best
accuracy is found.

6 Department of Computer Engineering I2IT, Pune

Classification algorithms using SPSS Modeler

6. MATHEMATICAL MODEL

• Logistic Regression:

b0 = Regression constant. b1 = Steepness of curve.

p = probability of a class. x = categorical variable.

Logistic regression can handle any number of numerical and/or categorical variables.

b0 = Regression constant.

b1, b2.……bp = Steepness of curve.

p = probability of a class.

x1, x2…….xn = categorical variables.

7 Department of Computer Engineering I2IT, Pune

Classification algorithms using SPSS Modeler

• Random Forest:

It is made up of multiple decision trees. In decision analysis, a decision tree can

be used to visually and explicitly represent decisions and decision making. In data mining, a
decision tree describes data (but the resulting classification tree can be an input for decision
making)

In Decision Tree the major challenge is to identification of the attribute for the root node in
each level. This process is known as attribute selection. We have two popular attribute
selection measures:
1. Information Gain
2. Gini Index
3. Gain Ratio

Information Gain
When we use a node in a decision tree to partition the training instances into smaller subsets
the entropy changes. Information gain is a measure of this change in entropy.

Entropy
Entropy is the measure of uncertainty of a random variable, it characterizes the impurity of
an arbitrary collection of examples. The higher the entropy more the information content.

8 Department of Computer Engineering I2IT, Pune

Classification algorithms using SPSS Modeler

7. ALGORITHM

1) Split dataset into training dataset( 70% ) and testing dataset (30%).
2) Train the model using the training dataset and apply one of the classification
algorithms.
3) Compare the accuracy of every classification algorithm.

Random Forest Algorithm:

a. Takes the test features and use the rules of each randomly created decision tree to
predict the outcome and stores the predicted outcome (target)
b. Calculate the votes for each predicted target.
c. Consider the high voted predicted target as the final prediction from the random
forest algorithm.
d. To perform the prediction using the trained random forest algorithm we need to pass
the test features through the rules of each randomly created trees. Suppose let’s say
we formed 100 random decision trees to from the random forest.
e. Each random forest will predict different target (outcome) for the same test feature.
Then by considering each predicted target votes will be calculated. Suppose the 100
random decision trees are prediction some 3 unique targets x, y, z then the votes of x
is nothing but out of 100 random decision tree how many trees prediction is x.

Likewise for other 2 targets (y, z). If x is getting high votes. Let’s say out of 100
random decision tree 60 trees are predicting the target will be x. Then the final random
forest returns the x as the predicted target.

This concept of voting is known as majority voting.

9 Department of Computer Engineering I2IT, Pune

Classification algorithms using SPSS Modeler

8. FLOWCHART

End

10 Department of Computer Engineering I2IT, Pune

Classification algorithms using SPSS Modeler

4 RESULT
Logistic Regression

Random Forest Classifier

11 Department of Computer Engineering I2IT, Pune

Classification algorithms using SPSS Modeler

Logistic Regression:

Frequency Count Percentage Accuracy

Correctly Classified Records 8,591,864 60.44%
Incorrectly Classified Records 5,623,374 39.56%
Total 14,215,238

Random Forest Classifier:

Frequency Count Percentage Accuracy
Correctly Classified Records 12,897,318 90.68%
Incorrectly Classified Records 1,326,117 9.32%
Total 14,223,435

6 CONCLUSION

Thus we applied two different classification algorithms (Logistic Regression and Random
Forest Classifier) on the gravitational wave strain dataset. The efficiency of Random Forest
Classifier is substantially more than that of Logistic Regression.

7 REFERENCES

• https://stackabuse.com/decision-trees-in-python-with-scikit-learn/
• https://stackabuse.com/k-nearest-neighbors-algorithm-in-python-and-scikit-learn/
• https://stackabuse.com/the-naive-bayes-algorithm-in-python-with-scikit-learn/

12 Department of Computer Engineering I2IT, Pune

Beginners-Guide-To-Learn-Algorithmic-Trading 1
100% (22)
Beginners-Guide-To-Learn-Algorithmic-Trading 1
58 pages
Ford 5R55S Partes C
100% (1)
Ford 5R55S Partes C
8 pages
Report of Industrial Training
No ratings yet
Report of Industrial Training
22 pages
CGPA To Percent SPPU
100% (2)
CGPA To Percent SPPU
5 pages
SBI PO Data Analysis and Interpretation - Preparation Strategy
No ratings yet
SBI PO Data Analysis and Interpretation - Preparation Strategy
8 pages
Service Manual: CFD-S35CP
No ratings yet
Service Manual: CFD-S35CP
64 pages
DIP Mini Project
100% (1)
DIP Mini Project
12 pages
Plant Disease Recognition Based On Leaf Image Classification
No ratings yet
Plant Disease Recognition Based On Leaf Image Classification
29 pages
Medical Insurance Cost Prediction Report Full
100% (1)
Medical Insurance Cost Prediction Report Full
50 pages
AI-ML-DS_SUMMERINTERNSHIP
No ratings yet
AI-ML-DS_SUMMERINTERNSHIP
59 pages
Data Mining TOC
No ratings yet
Data Mining TOC
3 pages
Medicinal Drug Recommendation System
No ratings yet
Medicinal Drug Recommendation System
52 pages
Minor Project Report
0% (1)
Minor Project Report
25 pages
Flight Delay Prediction: Project Synopsis On
No ratings yet
Flight Delay Prediction: Project Synopsis On
13 pages
BSC-IT Sem-5 SPM MCQ (Itscholar - Codegency.co - In) (WC)
No ratings yet
BSC-IT Sem-5 SPM MCQ (Itscholar - Codegency.co - In) (WC)
11 pages
Pre-Processing: System Architecture
100% (2)
Pre-Processing: System Architecture
5 pages
Project Report
No ratings yet
Project Report
70 pages
Determining Fake Statements Made by Public Figures by Means of Artificial Intelligence
No ratings yet
Determining Fake Statements Made by Public Figures by Means of Artificial Intelligence
25 pages
Bangalore House Price Prediction Using The Best Machine Learning Model Submitted by Rukzana Vadakkekudy Rassak P2682221
No ratings yet
Bangalore House Price Prediction Using The Best Machine Learning Model Submitted by Rukzana Vadakkekudy Rassak P2682221
9 pages
Health Prediction Management System PDF
No ratings yet
Health Prediction Management System PDF
105 pages
Internship Report DiabetesPrediction
No ratings yet
Internship Report DiabetesPrediction
15 pages
TCS CodeVita Preparation Guide
No ratings yet
TCS CodeVita Preparation Guide
37 pages
Unit No 4 Slides Full
No ratings yet
Unit No 4 Slides Full
133 pages
Major Project Documentation Final 2
No ratings yet
Major Project Documentation Final 2
62 pages
Placement Prediction Using Various Machine Learning Models and Their Efficiency Comparison
No ratings yet
Placement Prediction Using Various Machine Learning Models and Their Efficiency Comparison
5 pages
Human Activity Recognition Using CNN
No ratings yet
Human Activity Recognition Using CNN
51 pages
Information Retrieval
100% (1)
Information Retrieval
11 pages
Bangalore House Price Prediction
No ratings yet
Bangalore House Price Prediction
14 pages
Mini Project Report: Submitted in Partial Fulfilment of The Requirement For The University of Mumbai For The Degree of by
No ratings yet
Mini Project Report: Submitted in Partial Fulfilment of The Requirement For The University of Mumbai For The Degree of by
24 pages
Internship at Brainybeam Technologies Pvt. LTD: Ruchit Mukeshbhai Patel
No ratings yet
Internship at Brainybeam Technologies Pvt. LTD: Ruchit Mukeshbhai Patel
51 pages
Batch C03 Medicine Recommendation System Using Machine Learning
No ratings yet
Batch C03 Medicine Recommendation System Using Machine Learning
17 pages
Heart Disease Prediction Using Adaptive Infinite Feature Selection and Deep Neural Networks
No ratings yet
Heart Disease Prediction Using Adaptive Infinite Feature Selection and Deep Neural Networks
6 pages
Unit 4 Lecture 4 w5hh Principle
No ratings yet
Unit 4 Lecture 4 w5hh Principle
23 pages
Capstone Project - Airline Passenger Satisfaction
No ratings yet
Capstone Project - Airline Passenger Satisfaction
18 pages
Final B.tech. Internship Report Sample Format-3
100% (1)
Final B.tech. Internship Report Sample Format-3
13 pages
Predicting The Reviews of The Restaurant Using Natural Language Processing Technique
No ratings yet
Predicting The Reviews of The Restaurant Using Natural Language Processing Technique
4 pages
Project Report
No ratings yet
Project Report
91 pages
Explain The Following Terms: A) Position Vectors B) Unit Vectors C) Cartesian Vectors
No ratings yet
Explain The Following Terms: A) Position Vectors B) Unit Vectors C) Cartesian Vectors
86 pages
STQA MiniProject
No ratings yet
STQA MiniProject
13 pages
MCA Project Titles
No ratings yet
MCA Project Titles
2 pages
Heart Disease Prediction Final Report
100% (1)
Heart Disease Prediction Final Report
31 pages
SEPM Module 5 - Software Testing & Maintenance
No ratings yet
SEPM Module 5 - Software Testing & Maintenance
16 pages
Movie Recommendation System Using Machine Learning
No ratings yet
Movie Recommendation System Using Machine Learning
23 pages
Dynamic Auto Selection and Auto Tuning of Machine Learning Models For Cloud Network Analytics Synopsis
No ratings yet
Dynamic Auto Selection and Auto Tuning of Machine Learning Models For Cloud Network Analytics Synopsis
9 pages
Final Report
No ratings yet
Final Report
49 pages
Report
100% (1)
Report
32 pages
AIML Internship Report
No ratings yet
AIML Internship Report
53 pages
Project Report On Crop Yield Prediction
No ratings yet
Project Report On Crop Yield Prediction
71 pages
Program 7
100% (1)
Program 7
4 pages
Flipkart Reviews Sentiment Analysis
100% (1)
Flipkart Reviews Sentiment Analysis
22 pages
Seminar
No ratings yet
Seminar
17 pages
Fake News Analysis
No ratings yet
Fake News Analysis
46 pages
Acknowledgement
No ratings yet
Acknowledgement
14 pages
Project Synopsis
33% (3)
Project Synopsis
4 pages
Humidity Sunny: (For Low) (For Sunny (For Yes) (For
100% (1)
Humidity Sunny: (For Low) (For Sunny (For Yes) (For
4 pages
Important Question Answer
No ratings yet
Important Question Answer
18 pages
Final Project Record Exmple
No ratings yet
Final Project Record Exmple
93 pages
Visvesvaraya Technological University: City Engineering College
No ratings yet
Visvesvaraya Technological University: City Engineering College
31 pages
Triggering Alerts
No ratings yet
Triggering Alerts
13 pages
Health Card Report Django
No ratings yet
Health Card Report Django
89 pages
SQL Code Challenges
No ratings yet
SQL Code Challenges
14 pages
Internship Report: "Web Development Using PHP and HTML"
No ratings yet
Internship Report: "Web Development Using PHP and HTML"
38 pages
Machine Learning Based Car Price Prediction System
No ratings yet
Machine Learning Based Car Price Prediction System
32 pages
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Questions On Budgets SPPU MBA Sem 1
No ratings yet
Questions On Budgets SPPU MBA Sem 1
7 pages
MHT-CET 2021 Onine Computer Based Test (CBT) Syllabus and Marking Scheme
No ratings yet
MHT-CET 2021 Onine Computer Based Test (CBT) Syllabus and Marking Scheme
1 page
Autonomous Cars PPT - Review 1
No ratings yet
Autonomous Cars PPT - Review 1
16 pages
Autonomous Cars Report - Part 2
No ratings yet
Autonomous Cars Report - Part 2
18 pages
LSTM Stock Prediction
100% (1)
LSTM Stock Prediction
38 pages
AdS Test
No ratings yet
AdS Test
9 pages
Saudi Arabian Airlines GDS CRS Booking and Ticketing Policy 10JUL2019
No ratings yet
Saudi Arabian Airlines GDS CRS Booking and Ticketing Policy 10JUL2019
28 pages
mb1419-g431cbu6-c01_schematic
No ratings yet
mb1419-g431cbu6-c01_schematic
9 pages
SC550 User Manual
No ratings yet
SC550 User Manual
2 pages
Maintenanace Proposal NAF Officers Mess & Suites AC PDF
No ratings yet
Maintenanace Proposal NAF Officers Mess & Suites AC PDF
9 pages
Assignment 7.2.2024
No ratings yet
Assignment 7.2.2024
1 page
Data Tempered Glass Smartphone Universal en
No ratings yet
Data Tempered Glass Smartphone Universal en
2 pages
Sertanselcuktez 16122021 Final
No ratings yet
Sertanselcuktez 16122021 Final
181 pages
Machine Learning Tutorial - 1
No ratings yet
Machine Learning Tutorial - 1
1 page
BNMIT Brochure Final
No ratings yet
BNMIT Brochure Final
8 pages
Final Druft Proposal 1
No ratings yet
Final Druft Proposal 1
56 pages
Basic Civil Engineering Lab-2-1
No ratings yet
Basic Civil Engineering Lab-2-1
8 pages
Session 1- Data Analytics Fundamentals
No ratings yet
Session 1- Data Analytics Fundamentals
23 pages
Thesis
100% (2)
Thesis
9 pages
3D View Nokia Airscale System Module Indoor
No ratings yet
3D View Nokia Airscale System Module Indoor
2 pages
Tangible Electricity: Audio Amplifier and Speaker
No ratings yet
Tangible Electricity: Audio Amplifier and Speaker
8 pages
Topic 5 - Group and Subgroup
No ratings yet
Topic 5 - Group and Subgroup
44 pages
Chap 8
No ratings yet
Chap 8
2 pages
BobCAD V24 Tutorial
No ratings yet
BobCAD V24 Tutorial
67 pages
Industrial_Measurement_and_Control_Lab
No ratings yet
Industrial_Measurement_and_Control_Lab
6 pages
UAE MoF Supplier Registration User-Manual
No ratings yet
UAE MoF Supplier Registration User-Manual
50 pages
Architectural + Engineering Design Proposal
No ratings yet
Architectural + Engineering Design Proposal
2 pages
18 How To Create A Timeline in Excel To Filter Pivot Tables
No ratings yet
18 How To Create A Timeline in Excel To Filter Pivot Tables
3 pages
SQL Progrmming
No ratings yet
SQL Progrmming
237 pages
Sample
No ratings yet
Sample
20 pages
INVENTIONS - USED TO and DIDN'T USE TO
No ratings yet
INVENTIONS - USED TO and DIDN'T USE TO
2 pages
Writing Task 1 (The Forum)
No ratings yet
Writing Task 1 (The Forum)
10 pages
Detailed Puta
No ratings yet
Detailed Puta
2 pages

HPC Mini Project Report

Uploaded by

HPC Mini Project Report

Uploaded by

“CLASSIFICATION ALGORITHMS USING

Rakshitha Shettigar (BC058)

Nishant Dalvi (BC051)

Ketan Ingale (BC045)

Farhan Ansari (BC007)

FOURTH YEAR COMPUTER ENGINEERING

Department of Computer Engineering

Hinjawadi, Pune – 411057

2 Department of Computer Engineering I2IT, Pune

Dataset used- Gravitational wave strain for H1 and L1.

Gravitational waves are disturbances in the curvature of space-time, generated by accelerated

3 Department of Computer Engineering I2IT, Pune

a. Customer analytics and Customer relationship management (CRM)

4 Department of Computer Engineering I2IT, Pune

Logistic regression is the appropriate regression analysis to conduct when the

• Random Forrest Classifier

Visualization of a Random Forest Model Making a Prediction

A large number of relatively uncorrelated models (trees) operating as a committee will

5 Department of Computer Engineering I2IT, Pune

6 Department of Computer Engineering I2IT, Pune

b0 = Regression constant. b1 = Steepness of curve.

p = probability of a class. x = categorical variable.

b1, b2.……bp = Steepness of curve.

x1, x2…….xn = categorical variables.

7 Department of Computer Engineering I2IT, Pune

It is made up of multiple decision trees. In decision analysis, a decision tree can

8 Department of Computer Engineering I2IT, Pune

Random Forest Algorithm:

This concept of voting is known as majority voting.

9 Department of Computer Engineering I2IT, Pune

10 Department of Computer Engineering I2IT, Pune

Random Forest Classifier

11 Department of Computer Engineering I2IT, Pune

Frequency Count Percentage Accuracy

Random Forest Classifier:

12 Department of Computer Engineering I2IT, Pune

You might also like