0% found this document useful (0 votes)
15 views2 pages

Outlier Analysis

Uploaded by

gtoworld00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views2 pages

Outlier Analysis

Uploaded by

gtoworld00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
df=pd.read_csv("C:\\Users\\KIIT0001\\Desktop\\BDA_Documents\\StudentPerformance2.csv")

//Finding outlier using box plot

import seaborn as sns


sns.boxplot(ndf["Math"])

// outlier removal

def outlier_removal(df,column,threshold):
removed=df[df[column]<= threshold]
sns.boxplot(removed[column])
return removed
thresholdValue=60
nooutlier=outlier_removal(ndf,"Math", thresholdValue)

// Using Scatter Plot


columns=["Math","Reading Score","Placement Score"]
ax=df.plot.scatter(x="Math",y="Reading Score",c="red")

// removal of outlier

outlierIndices=np.where(ndf["Math"] <=60)
ndf.drop(outlierIndices[0],inplace=True)

// Plot after Removal

ax=ndf.plot.scatter(x="Math",y="Reading Score",c="red")

// using Z-score

The z-score, also known as the standard score, is a statistical measure that describes a
data point's position relative to the mean of a group of data points. It is measured in
terms of standard deviations from the mean. The formula for calculating a z-score is:

z=(X- μ)/бwhere:

 X is the value of the data point.


 Μ is the mean of the data set.
 σ is the standard deviation of the data set.

Interpretation of Z-Scores:
 A z-score of 0 indicates that the data point is exactly at the mean.
 A positive z-score indicates that the data point is above the mean.
 A negative z-score indicates that the data point is below the mean.
 The magnitude of the z-score indicates how many standard deviations the data point is from
the mean.

Example:
Suppose you have a data set with a mean (μ) of 100 and a standard deviation (σ) of
15. If you want to find the z-score of a data point X=130:

z=(130−100)/15=30/15=2

This means the data point 130 is 2 standard deviations above the mean.

Application of Z-Scores:
 Detecting Outliers: Data points with z-scores typically beyond ±3 are considered outliers.

// Example

from scipy import stats


import numpy as np
z = np.abs(stats.zscore(df["Math"]))
print(z):
import numpy as np

threshold_z = 2

outlier_indices = np.where(z > threshold_z)[0]


no_outliers = df.drop(outlier_indices[0],axis=0)

You might also like