0% found this document useful (0 votes)

15 views2 pages

Outlier Analysis

Uploaded by

gtoworld00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views2 pages

Outlier Analysis

Uploaded by

gtoworld00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
df=pd.read_csv("C:\\Users\\KIIT0001\\Desktop\\BDA_Documents\\StudentPerformance2.csv")

//Finding outlier using box plot

import seaborn as sns

sns.boxplot(ndf["Math"])

// outlier removal

def outlier_removal(df,column,threshold):
removed=df[df[column]<= threshold]
sns.boxplot(removed[column])
return removed
thresholdValue=60
nooutlier=outlier_removal(ndf,"Math", thresholdValue)

// Using Scatter Plot

columns=["Math","Reading Score","Placement Score"]
ax=df.plot.scatter(x="Math",y="Reading Score",c="red")

// removal of outlier

outlierIndices=np.where(ndf["Math"] <=60)
ndf.drop(outlierIndices[0],inplace=True)

// Plot after Removal

ax=ndf.plot.scatter(x="Math",y="Reading Score",c="red")

// using Z-score

The z-score, also known as the standard score, is a statistical measure that describes a
data point's position relative to the mean of a group of data points. It is measured in
terms of standard deviations from the mean. The formula for calculating a z-score is:

z=(X- μ)/бwhere:

 X is the value of the data point.

 Μ is the mean of the data set.
 σ is the standard deviation of the data set.

Interpretation of Z-Scores:
 A z-score of 0 indicates that the data point is exactly at the mean.
 A positive z-score indicates that the data point is above the mean.
 A negative z-score indicates that the data point is below the mean.
 The magnitude of the z-score indicates how many standard deviations the data point is from
the mean.

Example:
Suppose you have a data set with a mean (μ) of 100 and a standard deviation (σ) of
15. If you want to find the z-score of a data point X=130:

z=(130−100)/15=30/15=2

This means the data point 130 is 2 standard deviations above the mean.

Application of Z-Scores:
 Detecting Outliers: Data points with z-scores typically beyond ±3 are considered outliers.

// Example

from scipy import stats

import numpy as np
z = np.abs(stats.zscore(df["Math"]))
print(z):
import numpy as np

threshold_z = 2

outlier_indices = np.where(z > threshold_z)[0]

no_outliers = df.drop(outlier_indices[0],axis=0)

Concepts of EDA, Outliers-Detection and Treatment
No ratings yet
Concepts of EDA, Outliers-Detection and Treatment
99 pages
ML Ex2
No ratings yet
ML Ex2
7 pages
Anomoly Detection - Ensemble - Classifiers
No ratings yet
Anomoly Detection - Ensemble - Classifiers
68 pages
Feature Engineering
No ratings yet
Feature Engineering
66 pages
Data Cleaning
No ratings yet
Data Cleaning
4 pages
Lecture 05: Feature Engineering: Ms. Mehroz Sadiq
No ratings yet
Lecture 05: Feature Engineering: Ms. Mehroz Sadiq
69 pages
Outlier Analysis in Data Mining
No ratings yet
Outlier Analysis in Data Mining
5 pages
Empirical Rule and Outliers 1721456291
No ratings yet
Empirical Rule and Outliers 1721456291
13 pages
Outlier Treatment
No ratings yet
Outlier Treatment
16 pages
Outlier Detection
No ratings yet
Outlier Detection
41 pages
Lecture 3
No ratings yet
Lecture 3
23 pages
Unit 1
No ratings yet
Unit 1
21 pages
Identifying and Handling Outliers in Pandas - A Step-By-Step Guide - by Arvid Eichner - Python in Plain English
No ratings yet
Identifying and Handling Outliers in Pandas - A Step-By-Step Guide - by Arvid Eichner - Python in Plain English
19 pages
11 Different Ways For Outlier Detection in Python
No ratings yet
11 Different Ways For Outlier Detection in Python
11 pages
Ads 7
No ratings yet
Ads 7
6 pages
Explanatory Data Analysis
100% (1)
Explanatory Data Analysis
28 pages
Feature Scaling in Machine Learning
No ratings yet
Feature Scaling in Machine Learning
14 pages
Outlier Detection
No ratings yet
Outlier Detection
22 pages
Datamining Seminar
No ratings yet
Datamining Seminar
19 pages
3-Introduction To Data Cleaning Outlires
No ratings yet
3-Introduction To Data Cleaning Outlires
5 pages
Guide On Outlier Detection Methods
No ratings yet
Guide On Outlier Detection Methods
11 pages
5 Ways To Find Outliers in Your Data - Statistics by Jim
No ratings yet
5 Ways To Find Outliers in Your Data - Statistics by Jim
35 pages
Feature Engineering
No ratings yet
Feature Engineering
63 pages
Lec 7 Data Visualization Basic Statistics Updated 21102024 122008pm
No ratings yet
Lec 7 Data Visualization Basic Statistics Updated 21102024 122008pm
39 pages
Dsbda Ass2
No ratings yet
Dsbda Ass2
49 pages
6735367a5d6e24a5f185bf9c 99512104437
No ratings yet
6735367a5d6e24a5f185bf9c 99512104437
2 pages
Outliers ML
No ratings yet
Outliers ML
14 pages
Unit 4
No ratings yet
Unit 4
17 pages
Nikita Prasad - Outliers Basics
No ratings yet
Nikita Prasad - Outliers Basics
13 pages
Handling Outliers
No ratings yet
Handling Outliers
6 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
Research File 3
No ratings yet
Research File 3
10 pages
Detecting Outliers
No ratings yet
Detecting Outliers
16 pages
Inferential Statistical Analysis Using Python
No ratings yet
Inferential Statistical Analysis Using Python
22 pages
Outlier Detection and Removal
No ratings yet
Outlier Detection and Removal
2 pages
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
DataPreparation - Outlier - Treatment ASSIGEMENT ANSWER
No ratings yet
DataPreparation - Outlier - Treatment ASSIGEMENT ANSWER
4 pages
Dsi237 Group 2
No ratings yet
Dsi237 Group 2
27 pages
Detecting Outliers
No ratings yet
Detecting Outliers
16 pages
Outlier Detection and Capping
No ratings yet
Outlier Detection and Capping
7 pages
ISAT 600 Progress Report 3
No ratings yet
ISAT 600 Progress Report 3
4 pages
How To Handle Outliers
No ratings yet
How To Handle Outliers
6 pages
Presentation Points
No ratings yet
Presentation Points
1 page
Outliers in Machine Learning
No ratings yet
Outliers in Machine Learning
13 pages
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-09 Reference-Material-III
No ratings yet
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-09 Reference-Material-III
4 pages
Discusion Forum Unit 2
No ratings yet
Discusion Forum Unit 2
2 pages
Robust Statistics For Outlier Detection: Peter J. Rousseeuw and Mia Hubert
No ratings yet
Robust Statistics For Outlier Detection: Peter J. Rousseeuw and Mia Hubert
7 pages
Data Minning Unit 4-1
No ratings yet
Data Minning Unit 4-1
10 pages
Outliers
No ratings yet
Outliers
3 pages
Handle Outliers PySpark
No ratings yet
Handle Outliers PySpark
1 page
What Is Outlier
No ratings yet
What Is Outlier
3 pages
Finding Outliers 2 Wayes Z-Score and Interquortile Range
No ratings yet
Finding Outliers 2 Wayes Z-Score and Interquortile Range
1 page
Outliers Z-Score
No ratings yet
Outliers Z-Score
1 page
Week-6 DS Practical
No ratings yet
Week-6 DS Practical
12 pages
Statistics Measures of Position Unit Plan
No ratings yet
Statistics Measures of Position Unit Plan
3 pages
Handling Ouliers
No ratings yet
Handling Ouliers
5 pages
DataPreparation - Outlier - Treatment ASSIGNMENT 1
100% (1)
DataPreparation - Outlier - Treatment ASSIGNMENT 1
7 pages
How To Calculate Outliers
No ratings yet
How To Calculate Outliers
7 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet

Outlier Analysis

Uploaded by

Outlier Analysis

Uploaded by

import pandas as pd

//Finding outlier using box plot

import seaborn as sns

// Using Scatter Plot

// Plot after Removal

 X is the value of the data point.

from scipy import stats

outlier_indices = np.where(z > threshold_z)[0]

You might also like