0% found this document useful (0 votes)
0 views10 pages

Machine

Uploaded by

chatikuto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views10 pages

Machine

Uploaded by

chatikuto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

DESIGN AND ANALYSIS OF

ALGORITHM

Computer Science & Engineering (Computer Science Eng.)


Govt. Engineering college, Ajmer

Type your text (Session 2024-25)

SUBMITTED TO: SUBMITTED BY:


Ms sakshi jain Suryakant Acharya
CB 2
23CS138D

Department of Computer Science & Engineering (CSE)


Govt. Engineering college, Ajmer
What is ML?
Machine Learning is the field of study that gives computers the

capability to learn without being explicitly programmed. ML is

one of the most exciting technologies that one would have ever

come across. As it is evident from the name, it gives the

computer that makes it more similar to humans: The ability to

learn

What is Exploratory Data Analysis (EDA)?


Exploratory Data Analysis (EDA) is a crucial initial step in data science
projects. It involves analyzing and visualizing data to understand its key
characteristics, uncover patterns, and identify relationships between
variables refers to the method of studying and exploring record sets to
apprehend their predominant traits, discover patterns, locate outliers, and
identify relationships between variables. EDA is normally carried out as a
preliminary step before undertaking extra formal statistical analyses or
modeling
Key aspects of EDA include:
 Distribution of Data: Examining the distribution of data points to
understand their range, central tendencies (mean, median), and dispersion
(variance, standard deviation).
 Graphical Representations: Utilizing charts such as histograms, box plots,
scatter plots, and bar charts to visualize relationships within the data and
distributions of variables.
 Outlier Detection: Identifying unusual values that deviate from other data
points. Outliers can influence statistical analyses and might indicate data
entry errors or unique cases.
 Correlation Analysis: Checking the relationships between variables to
understand how they might affect each other. This includes computing
correlation coefficients and creating correlation matrices.
 Handling Missing Values: Detecting and deciding how to address missing
data points, whether by imputation or removal, depending on their impact
and the amount of missing data.
 Summary Statistics: Calculating key statistics that provide insight into data
trends and nuances.
 Testing Assumptions: Many statistical tests and models assume the data
meet certain conditions (like normality or homoscedasticity). EDA helps
verify these assumptions.
IMPLEMENTATION:

 Libraries like “pandas”, “matplotlib” are imported to use inbuilt functions


to work on the dataset.

 Using the mount() function in Google Colab allows any code in the
notebook to access any file in Google Drive.
 The data set is then read and printed.

 df.head(): This method returns the first 5 rows of the DataFrame by


default.
 shape(): shape will show how many features (columns) and observations
(rows) there are in the dataset.
 info() facilitates comprehension of the data type and related information,
such as the quantity of records in each column, whether the data is null or
not, the type of data, and the dataset’s memory use.

 df.describe(), which gives the count, mean, standard deviation, minimum,


and quartiles for each numerical column. The dataset’s central tendencies
and spread are briefly summarized.

 df.columns.tolist() converts the column names of the DataFrame ‘df’ into


a Python list, providing a convenient way to access and manipulate column
names
 df.isnull().sum() checks for missing values in each column of the
DataFrame ‘df’ and returns the sum of null values for each column

 df.nunique() determines how many unique values there are in each column
of the DataFrame “df,” offering information about the variety of data that
makes up each feature.
 Here , this count plot graph shows the count of the species with its count.
 Here, in the kernel density plot is about the skewness of the of the
corresponding feature. The features in this dataset that have skewness are
exactly 0 depicts the symmetrical distribution and the plots with skewness
1 or above 1 is positively or right skewd distribution. In right skewd or
positively skewed distribution if the tail is more on the right side, that
indicates extremely high values.
 This graph shows the swarm plot for ‘Petal width’ and ‘Species’ column.
This plot depicts that the higher point density in specific regions shows the
concentration indicating where the majority of data points cluster. The
points isolated and are far away from the clusters shows the outliers.

You might also like