0% found this document useful (0 votes)

118 views28 pages

Final PPT File Cluster Analysis

The project titled 'World Development Measurement' aims to create clusters based on a dataset containing 2708 observations and 25 variables related to economic and development metrics across various countries. The analysis involves exploratory data analysis, feature engineering, and model building, with a focus on clustering techniques such as K-Means and Hierarchical Clustering. The project ultimately seeks to evaluate the performance of these models to understand global development trends.

Uploaded by

reddykarishma840

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views28 pages

Final PPT File Cluster Analysis

Uploaded by

reddykarishma840

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

World Development

Measurement

P ROJEC T T I T L E: “ WO R L D D EV ELO P M EN T M EA S U R EM EN T ”
M EN TO R N AM E: M s . S n e h al S h i n d e
GRO U P N O. : 4
STA RT DAT E: 6 N OV 2 0 2 3
Group details
Mr. Chintha Gangadhara [email protected] 8688620739

Mrs. Swapna Akella [email protected] 8008994791

Mr. Nikhil Gowda K [email protected] 8073285770

MR. SRIHARSHA NOORBHASHA [email protected] 9959905437

MR. Rohan Vasant Thorat [email protected] 8793574733

Mr. Akshay haridas rautray [email protected] 9637637670

Mrs. JONNADA BINDU [email protected] 9505539842

Business Objective:

⮚Creating clusters on the global development measurement dataset

⮚The dataset has information about important economic and development
metrics related to various countries across the globe.
Business Problem

⮚To develop a cluster model for the world development measurement

⮚ Evaluate the performance of the model

Since the goal is to do a cluster analysis, this is a cluster analysis problem. The goal of clustering
is to find distinct groups or “clusters” within a data set.
Project Architecture / Project Flow
Understanding the Business Problem

Dataset Understanding

EDA process

Feature Engineering

Model Building

Model Evaluation and Feedback

Deployment
Data Set Details:
The data file contains 2708 observations with 25 variables and has information about important
economic and development metrics related to various countries across the globe.
The variables, or features, are the following:

Birth Rate, Business tax, CO2emissions, Country, Days to start a business, Ease of business,

energy usage, GDP, healthexp%GDP, healthexp/capita, hours to do tax, infant mortality,

internet usage, lending rate, life expectancy female, etc.

Data Set
Exploratory Data Analysis (EDA) and
Feature Engineering
⮚In the given dataset the datatype of the observations in each variable are floating point datatypes.
⮚In the data set each feature contains null values.
⮚There are no duplicate observations present in the dataset.
⮚There are a total of 12203 null values present in the dataset.
⮚Data Distribution:
 skewness
Visualizing the null values for each
attribute with SNS Heatmap.

• Ease of business contains more number of

missing values
• Population urban having less number of missing
values.
• There are no missing values in country and
population total.
VISUALIZATION
Distribution plots for each column
• 'BusinessTaxRate',
'EaseofBusiness',
'HealthExpGDP', 'HourstodoTax'
and 'Population0to14' columns
have normal distribution so we
replace missing values by mean.
• And for remaining columns with
skewed data we replace missing
values by median.
OUTLIER DETECTION
• Some columns like "Population Total", "Tourism in bound", "Tourism out bound" has large number of outlier present.
• columns like "Population Urban", "Population 0 to 14" has less number of outliers.
TOP 10 CO2 EMISSION COUNTRIES
• China and the United States highest CO2
emission countries. TOP 10 HIGHEST BIRTH RATE COUNTRIES
• The Russian Federation, India, and Japan are
medium CO2 emission countries.
• Germany, Canada, UK, Korea, Iran Lowest CO2
emission Countries.
Visualizing Relation between variables

• 'Population 0 to 14' and 'Birth Rate’ have a strong

relation. (0.94)
• ‘ Population 15 to 64’ and ‘Birth Rate’ have a weak
relation. (0.90)
Scatter plot for CO2 emissions and energy usage Scatter plots for GDP-related sources
SCALING
• Scaling is a technique to standardize the
independent features present in the data in a
fixed range. We do this to make sure all the
features are in same scale
• Here we will be using Standard Scaler

Feature Engineering
PCA Technique :
• Looking at the graph we can decide how
much percentage we want and
accordingly go for that much column
numbers.
• here, we are taking 15 columns because
they are giving more than 95% data.
Model Building
Hierarchical Clustering with scaled data

Visualized with Dendrogram and Scatter plot

• Calculated silhouette score for labels.

• Scaled Data got a silhouette score - 0.4069002367119094

Hierarchical Clustering with complete linkage

• Calculated silhouette score for complete linkage.

• Scaled Data got a silhouette score - 0.44947005579625016

Agglomerative Clustering on PCA data

• Calculated silhouette score for agglomerative clustering on PCA data.

• agglomerative clustering on PCA data got a silhouette score - 0.4892175454522696
K-Means clustering on PCA Data

• By seeing above elbow curve considering 3 clusters.

• K Means with 4 Clusters we got silhouette score -
0.38589504828738286
• K Means with Clusters we got silhouette score -
0.4592576382823015

K-Means with PCA Data – 3 clusters

• By seeing the above elbow curve considering 3 clusters.

• K Means with PCA Data - 2 Clusters we got silhouette
score -0.4349454775483713

• K Means with PCA data - 4Clusters we got silhouette

score -0.3016847742272087
DBSCAN Using Original Data
• Calculated distance by using the nearest neighbors
method.
• DBSCAN with Original Data - we got silhouette score -
0.230808
• DBSCAN with PCA Data - we got silhouette score -
0.291135
Training and Testing Model accuracy using random forest classifier
We got accuracy – 0.96
Deployment Model
Developed country
Developing Country
References
1.https://pandas.pydata.org/
2.https://numpy.org/doc/
3.https://matplotlib.org/
4.https://seaborn.pydata.org/
5.https://scikit-learn.org/
6.https://www.statsmodels.org/
Thank You

Cluster Analysis PPT
No ratings yet
Cluster Analysis PPT
11 pages
Clustering Analysis for Aid Allocation
No ratings yet
Clustering Analysis for Aid Allocation
16 pages
CLUSTERING ANALYSIS State Wise Health PDF
No ratings yet
CLUSTERING ANALYSIS State Wise Health PDF
14 pages
Health & Economic Clustering Report
91% (11)
Health & Economic Clustering Report
18 pages
Data Mining Business Report 2
No ratings yet
Data Mining Business Report 2
18 pages
Practical Data Analysis Cookbook - Sample Chapter
100% (1)
Practical Data Analysis Cookbook - Sample Chapter
31 pages
Machine Learning Project 3
No ratings yet
Machine Learning Project 3
74 pages
Country Clustering via Unsupervised Learning
No ratings yet
Country Clustering via Unsupervised Learning
15 pages
bg4 calculatingGDP
No ratings yet
bg4 calculatingGDP
63 pages
DSBA - Exploratory Data Analysis v2
No ratings yet
DSBA - Exploratory Data Analysis v2
22 pages
DSBA - Exploratory Data Analysis v2
No ratings yet
DSBA - Exploratory Data Analysis v2
22 pages
Data Analytics for B.Tech Students
No ratings yet
Data Analytics for B.Tech Students
98 pages
Unit 4
No ratings yet
Unit 4
42 pages
Data Cleaning and EDA Techniques Guide
No ratings yet
Data Cleaning and EDA Techniques Guide
38 pages
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
No ratings yet
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
44 pages
Marketing Analytics Week-10 LAQ
No ratings yet
Marketing Analytics Week-10 LAQ
5 pages
Monika Sree 08-06-2024
No ratings yet
Monika Sree 08-06-2024
36 pages
EDA Explanations
No ratings yet
EDA Explanations
22 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Unsupervised Learning Essentials
No ratings yet
Unsupervised Learning Essentials
29 pages
Data Mining for Business Students
No ratings yet
Data Mining for Business Students
75 pages
DSBA - Exploratory Data Analysis v2
No ratings yet
DSBA - Exploratory Data Analysis v2
22 pages
Data Mining Techniques Overview
No ratings yet
Data Mining Techniques Overview
39 pages
Cluster Analysis: Kaushik B
No ratings yet
Cluster Analysis: Kaushik B
41 pages
Feature Engineering
No ratings yet
Feature Engineering
15 pages
Rangkuman Data Analitik Dan Big Data
No ratings yet
Rangkuman Data Analitik Dan Big Data
10 pages
Great Learning DATA MINING PROJECT
No ratings yet
Great Learning DATA MINING PROJECT
15 pages
Machine Learning Statistical Model Using Transportation Data
No ratings yet
Machine Learning Statistical Model Using Transportation Data
32 pages
Dmbi Unit-4
No ratings yet
Dmbi Unit-4
18 pages
Anomalies in Dataset
No ratings yet
Anomalies in Dataset
4 pages
TQM - TRG - F-07 - Cluster Analysis - Rev02 - 20180421
No ratings yet
TQM - TRG - F-07 - Cluster Analysis - Rev02 - 20180421
42 pages
Lecture - 04 - Data Understanding and Preparation
No ratings yet
Lecture - 04 - Data Understanding and Preparation
59 pages
Project Clustering
No ratings yet
Project Clustering
1 page
Air Quality Data Analysis Process
No ratings yet
Air Quality Data Analysis Process
8 pages
Comparison of Different Clustering Algorithms Using WEKA Tool
No ratings yet
Comparison of Different Clustering Algorithms Using WEKA Tool
3 pages
Slides Concepts
No ratings yet
Slides Concepts
55 pages
Creating Heatmaps With Hierarchical Clustering
No ratings yet
Creating Heatmaps With Hierarchical Clustering
14 pages
ML Report
No ratings yet
ML Report
12 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
Distance Based Pattern Driven Mining For Outlier Detection in High Dimensional Big Dataset
No ratings yet
Distance Based Pattern Driven Mining For Outlier Detection in High Dimensional Big Dataset
17 pages
Cheatsheet Data
No ratings yet
Cheatsheet Data
3 pages
EDA Guide for Data Analysts
No ratings yet
EDA Guide for Data Analysts
35 pages
Unit Iv DM
No ratings yet
Unit Iv DM
15 pages
ML SummaryFINAL
No ratings yet
ML SummaryFINAL
48 pages
ML Summary
No ratings yet
ML Summary
23 pages
Unit 3 DWDM
No ratings yet
Unit 3 DWDM
25 pages
Data Mining Project - Clustering - State Wise Health Income
No ratings yet
Data Mining Project - Clustering - State Wise Health Income
9 pages
K-Means Clustering Algorithm Based On E-Commerce B
No ratings yet
K-Means Clustering Algorithm Based On E-Commerce B
6 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
(Report) Group - 16 - CS661 - Project - Report
No ratings yet
(Report) Group - 16 - CS661 - Project - Report
13 pages
Data Science Mid Syllabus
No ratings yet
Data Science Mid Syllabus
102 pages
Lec4 SWN MC
No ratings yet
Lec4 SWN MC
45 pages
Categorical Data Clustering Method
No ratings yet
Categorical Data Clustering Method
5 pages
Data Visualization & Classification Guide
No ratings yet
Data Visualization & Classification Guide
25 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
Lecture 7
No ratings yet
Lecture 7
45 pages
Machine Learning: Technical Requirements & Data Processing Guide
No ratings yet
Machine Learning: Technical Requirements & Data Processing Guide
30 pages
3 DM
No ratings yet
3 DM
36 pages
Thinespary Sitharam 841007106016-Supply Chain Management Data Analytic
No ratings yet
Thinespary Sitharam 841007106016-Supply Chain Management Data Analytic
6 pages
Journal Reflection WK 2 5316
No ratings yet
Journal Reflection WK 2 5316
4 pages
Empowering Local Communities: Decentralization, Empowerment and Community Driven Development
No ratings yet
Empowering Local Communities: Decentralization, Empowerment and Community Driven Development
8 pages
Gen 1 Form GMC Registration
No ratings yet
Gen 1 Form GMC Registration
5 pages
Agricultural Extension in The Parish Development Model - June 2021 - Revised
No ratings yet
Agricultural Extension in The Parish Development Model - June 2021 - Revised
10 pages
Pre-Feasibility Report For Proposed Construction Project of "Santnagari" at
No ratings yet
Pre-Feasibility Report For Proposed Construction Project of "Santnagari" at
12 pages
No Speed Limit Three Essays On Accelerationism Forerunners Ideas First Steven Shaviro 2015 University of Minnesota Press PDF
No ratings yet
No Speed Limit Three Essays On Accelerationism Forerunners Ideas First Steven Shaviro 2015 University of Minnesota Press PDF
33 pages
Ritu
100% (1)
Ritu
33 pages
Air Law Dgca 1
No ratings yet
Air Law Dgca 1
26 pages
MVS Notes Unit-I
No ratings yet
MVS Notes Unit-I
16 pages
Bubbles and The Role of Analysts' Forecasts (The Journal of Psychology and Financial Markets, Vol. 3, Issue 1) (2002)
No ratings yet
Bubbles and The Role of Analysts' Forecasts (The Journal of Psychology and Financial Markets, Vol. 3, Issue 1) (2002)
12 pages
HBL632RT2: Construction Electrical Optics Specification Features
No ratings yet
HBL632RT2: Construction Electrical Optics Specification Features
2 pages
01 Tables DI Udit Sir
No ratings yet
01 Tables DI Udit Sir
11 pages
How To Solve The ProgrammingError - Column Does Not Exist Error in Odoo - Ngasturi Notes
No ratings yet
How To Solve The ProgrammingError - Column Does Not Exist Error in Odoo - Ngasturi Notes
4 pages
Bushara Reservoir Repair Project DRC
No ratings yet
Bushara Reservoir Repair Project DRC
241 pages
IT Third Year SYLLABUS
No ratings yet
IT Third Year SYLLABUS
28 pages
DC Motor: Construction and Operation
22% (9)
DC Motor: Construction and Operation
38 pages
Characteristics Classification of Living Organisms 1 Question Papers
No ratings yet
Characteristics Classification of Living Organisms 1 Question Papers
14 pages
Vinacafe Bien Hoa Company Overview
No ratings yet
Vinacafe Bien Hoa Company Overview
19 pages
RMI Water Resources and Quality Monitoring
No ratings yet
RMI Water Resources and Quality Monitoring
31 pages
Role of Pharmacist in Small Hospitals and Nursing Homes
No ratings yet
Role of Pharmacist in Small Hospitals and Nursing Homes
20 pages
Home Equity Loan
No ratings yet
Home Equity Loan
2 pages
5th Sem BCA 2024
No ratings yet
5th Sem BCA 2024
15 pages
Statistical Abstract of Andhra Pradesh 2019
100% (1)
Statistical Abstract of Andhra Pradesh 2019
723 pages
Automotive Electrical Assembly NC II
No ratings yet
Automotive Electrical Assembly NC II
70 pages
Tabular Analysis of Transactions
No ratings yet
Tabular Analysis of Transactions
3 pages
EasyStart Bluetooth Manual
No ratings yet
EasyStart Bluetooth Manual
12 pages
Flood Simulation Arcgis Pro
No ratings yet
Flood Simulation Arcgis Pro
20 pages
Ross FCF 10ce Ch10
No ratings yet
Ross FCF 10ce Ch10
12 pages
Prism Cement Limited Business Overview
No ratings yet
Prism Cement Limited Business Overview
30 pages
Analysis of Natural Frequencies For Cant
No ratings yet
Analysis of Natural Frequencies For Cant
8 pages