0% found this document useful (0 votes)
95 views28 pages

Com747-Statistical Modelling and Data Mining: Data Model On Suicide Rates

The document analyzes global suicide rate data from 1985-2015 using techniques like data cleaning, exploration, visualization, and linear regression to build a predictive model. Key findings include: peak global suicide rate in 1995 of 15.3 per 100k declining 25% to 11.5 in 2015; rates highest in Europe but declining 40% since 1995; male suicide rates are 3.5x higher than females globally; and suicide rates increase with age but are declining for ages 15 and over. Analysis of trends by country found rates decreasing in 32 countries and increasing most sharply in South Korea and Guyana.

Uploaded by

sobin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views28 pages

Com747-Statistical Modelling and Data Mining: Data Model On Suicide Rates

The document analyzes global suicide rate data from 1985-2015 using techniques like data cleaning, exploration, visualization, and linear regression to build a predictive model. Key findings include: peak global suicide rate in 1995 of 15.3 per 100k declining 25% to 11.5 in 2015; rates highest in Europe but declining 40% since 1995; male suicide rates are 3.5x higher than females globally; and suicide rates increase with age but are declining for ages 15 and over. Analysis of trends by country found rates decreasing in 32 countries and increasing most sharply in South Korea and Guyana.

Uploaded by

sobin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

COM747-STATISTICAL MODELLING AND

DATA MINING

DATA MODEL ON SUICIDE


RATES

Sobin Siby
[email protected]
Introduction
• This work is based on the dataset of global suicide rate calculated by
the world health organisation.
• This dataset contains the number of persons died, year, population,
gender and so on.
• Here I have used some techniques like data cleaning, EDA, data
visualisation and linear regression to build a data model.
Data Cleaning
Data cleansing or data cleaning is the process of detecting and correcting
corrupt or inaccurate records from a record set, table, or database and refers to
identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and
then replacing it into appropriate data.
Task Done:
• The data of 7countries are removed which have got 3 years and less of data
total.
• 2016 data was removed because, few countries had any, those that did often
had data missing.
• HDI was removed due to 2/3 missing data
•Continent was added to the dataset using the country code package
•Africa has very few countries providing suicide data
Linear Regression
Linear regression is a linear approach to modelling the
relationship between a scalar response (or dependent variable) and one
or more explanatory variables (or independent variables).
Task Done:
I have used some linear regression to identify whether richer countries
have a higher rate of suicide and in the classification of countries, to
find the p value.
Global Analysis
When we look on the global
analysis, we can some up
with some insights. The
below graph shows the
global average suicide rate
from 1985 - 2015: 13.1
deaths (per 100k, per year).
Global Analysis – Cont.
Hence we have sum-up into some insights from the graph
obtained.
• Peak suicide rate was 15.3 deaths per 100k in 1995
• Decreased steadily, to 11.5 per 100k in 2015 (~25% decrease)
• Rates are only now returning to their pre-90’s rates
• Limited data in the 1980’s, so it’s hard to say if rate then was truly
representative of the global population
By Continent
When we plot continent, we can sum up into some findings. The below shows
the graph obtained.
By Continent – Cont.
Hence we have sum-up into some insights from the graph
obtained.
• European rate highest overall, but has steadily decreased ~40%
since 1995
• The European rate for 2015 similar to Asia & Oceania
• The trendline for Africa is due to poor data quality - just 3 countries
have provided data
• Oceania & Americas trends are more concerning
By sex
The below graph shows the data when plotted by sex.
By sex-Cont.
We can arrive at a conclusion as follows
• Globally, the rate of suicide for men has been ~3.5x higher for men
• Both male & female suicide rates peaked in 1995, declining since
• This ratio of 3.5 : 1 (male : female) has remained relatively constant
since the mid 90’s
• However, during the 80’s this ratio was as low as 2.7 : 1 (male :
female)
By Age
The below graph shows the data when plotted by age.
By Age-Cont.
When looking on to the age, we can conclude as follows:
• Globally, the likelihood of suicide increases with age
• Since 1995, suicide rate for everyone aged >= 15 has been linearly
decreasing
• The suicide rate of those aged 75+ has dropped by more than 50%
since 1990
• Suicide rate in the ‘5-14’ category remains roughly static and small
(< 1 per 100k per year)
By Country
The below shows the classification among countries and the
geographical heat map of the suicide rates between the timeframe of this
analysis.
By Country-Cont.
By looking on to the output we can conclude with some insights
as follows:
• Lithuania’s rate has been highest by a large margin: > 41 suicides per
100k (per year)
• Large overrepresentation of European countries with high rates, few
with low rates
By Country (Linear
Regression)
• Instead of visualizing all 93
countries rates across time, I fit a
simple linear regression to every
countries data. I extract those
with a ‘year’ p-value of < 0.05.
• The below shows the output
obtained.
By Country (Linear Regression)-Cont.
• We can conclude as follows:
• ~1/2 of all countries suicide rates are changing linearly as time
progresses .
• 32 (2/3) of these 48 countries are decreasing
By Country (Linear Regression)-Cont.
When looking to the steepest increasing trends, there are 12
countries. The below graph shows the steepest increasing trends
(p<0.5).
By Country (Linear Regression)-Cont.

Hence, we can conclude as follows:


•South Korea shows the most concerning trend - an increase in
suicide of 0.931 people (per 100k, per year) - the steepest increase
globally
•Guyana is similar, at + 0.925 people (per 100k, per year)
•Between 1998 and 1999 (5.3 to 24.8), Guyana’s rate increased by
~365%
By Country (Linear Regression)-Cont.
When looking to the steepest decreasing trends, there are 12
countries. The below graph shows the steepest decreasing trends
(p<0.5).
By Country (Linear Regression)-Cont.
• Estonia shows the most positive trend - every year, ~1.31 less people
(per 100k) commit suicide - the steepest decrease globally
• Between 1995 and 2015, this drops from 43.8 to 15.7 per 100k (per
year) - a 64% decrease
• The Russian Federation trend is interesting, only beginning to drop in
2002. Since then it has decreased by ~50%.
Gender differences, by Continent

The below graph shows the


gender differences by continent:
Gender differences, by Continent- Cont
Hence we can conclude as follows:
• European men were at the highest risk between 1985 - 2015, at ~ 30
suicides (per 100k, per year)
• Asia had the smallest overrepresentation of male suicide - the rate was
~2.5x as high for men
• Comparatively, Europe’s rate was ~3.9x as high for men
Gender differences,
by Country
The following shows the
gender disparity by country and
continents:
Gender differences,
by Country
The below shows the proportion of
male and female in country.
Hence, we can conclude as follows:
•The overrepresentation of men in suicide
deaths appears to be universal, and can be
observed to differing extents in every
country
•Whilst women are more likely to suffer
from depression and suicidal thoughts, men
are more likely to die from suicide
Age differences, by
Continent
Below graph shows the
difference by continent. We can
conclude as follows:
• For the Americas, Asia & Europe
(which make up most of the
dataset), suicide rate increases
with age
• Oceania & Africa’s rates are
highest for those aged 25 - 34
As a country gets richer, does it’s suicide
rate decrease?

• It depends on the country, for almost every country, there is a


high correlation between year and gdp.
• The gdp per capita linearly increases.
• The mean correlation was 0.878, indicating a very strong
positive linear relationship.
• Some countries are increasing with time, most are decreasing.
Do richer countries have a higher rate of
suicide?
• Instead of looking at trends within countries,
here I take every country and calculate their
mean GDP (per capita) across all the years in
which data is available. I then measure how this
relates to the countries suicide rate across all
those years.
• The end result is one data point per country,
intended to give a general idea of the wealth of a
country and its suicide rate
• The correlation between GDP per capita and
suicide per 100k is shown below
Do richer countries have a higher rate of
suicide? – Cont.
• The p-value of the model is 0.0288 <
0.05. This means that a countries GDP
(per capita) has no association with
it’s rate of suicide (per 100k).
• There is a weak but significant
positive linear relationship - richer
countries are associated with higher
rates of suicide, but this is a weak
relationship which can be seen from
the graph below.

You might also like