0% found this document useful (0 votes)

17 views64 pages

Data Sceince Lab Manual

The document outlines the objectives and exercises for a Data Science Laboratory course, focusing on Python libraries, statistical measures, and data visualization. It includes a list of exercises that cover installation and usage of libraries like NumPy, SciPy, and Pandas, as well as practical applications on various data sets. The course aims to equip students with skills in data analytics and visualization, culminating in a practical examination.

Uploaded by

c.muthupriya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views64 pages

Data Sceince Lab Manual

Uploaded by

c.muthupriya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

CS3361 DATA SCIENCE LABORATORY LTPC0042

COURSE OBJECTIVES:
 To understand the python libraries for data science
 To understand the basic Statistical and Probability measures for data science.
 To learn descriptive analytics on the benchmark data sets.
 To apply correlation and regression analytics on standard data sets.
 To present and interpret data using visualization packages in Python

LIST OF EXERCISES:
1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and
Pandas packages.
2. Working with Numpy arrays
3. Working with Pandas data frames
4. Reading data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set.
5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the
following:
a) Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Skewness and Kurtosis.
b) Bivariate analysis: Linear and logistic regression modeling
c) Multiple Regression analysis
d) Also compare the results of the above analysis for the two data sets.
6. Apply and explore various plotting functions on UCI data sets.
a) Normal curves
b) Density and contour plots
c) Correlation and scatter plots
d) Histograms
e) Three dimensional plotting
7. Visualizing Geographic Data with Basemap

LIST OF EQUIPMENTS :(30 Students per Batch)

Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh Note:
Example data sets like: UCI, Iris, Pima Indians Diabetes etc.

TOTAL: 60 PERIODS
COURSE OUTCOMES:
At the end of this course, the students will be able to:
CO1: Make use of the python libraries for data science
CO2: Make use of the basic Statistical and Probability measures for data science.
CO3: Perform descriptive analytics on the benchmark data sets.
CO4: Perform correlation and regression analytics on standard data sets
CO5: Present and interpret data using visualization packages in Python
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
AVADI – I.A.F. MUTHAPUDUPET, CHENNAI – 600 055.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS3361-DATA SCIENCE LABORATORY

NAME:

DEGREE:

BRANCH:

YEAR:

SEMESTER:
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
AVADI – I.A.F. MUTHAPUDUPET, CHENNAI – 600 055.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Bonafide Certificate

CS3361-DATA SCIENCE LABORATORY

Certified that this is a bonafide record of work done by _________________________ of II

year in the III semester of B.E. Computer Science and Engineering in the CS3361 – Data
Science Laboratory during the Academic Year 2022-2023.

Signature of Faculty-in charge Signature of HOD

Submitted for the Anna University Practical Examination held on_____________.

INTERNAL EXAMINER EXTERNAL EXAMINER

EX. DATE NAME OF THE EXERCISE PAGE FACULTY
NO. NO. SIGN

1 Download, install and explore the features of

NumPy, SciPy, Jupyter, Statsmodels and
Pandas packages

2 Working with Numpy arrays

3 Working with Pandas data frames

4 Reading data from text files, Excel and the

web and exploring various commands for
doing descriptive analytics on the Iris data
set.

5 5. Use the diabetes data set from UCI and

Pima Indians Diabetes data set for
performing the following:
a) Univariate analysis: Frequency,
Mean, Median, Mode, Variance,
Standard Deviation, Skewness and
Kurtosis.
b) Bivariate analysis: Linear and logistic
regression modeling
c) Multiple Regression analysis
d) Also compare the results of the above
analysis for the two data set

6 6. Apply and explore various plotting

functions on UCI data sets.
a) Normal curves
b) Density and contour plots
c) Correlation and scatter plots
d) Histograms
e) Three dimensional plotting

7 7. Visualizing Geographic Data with

Basemap
Ex. No: 1 Download, Install and explore the features of numpy, scipy, jupyter,
Date: statsmodels and pandas Packages

AIM:
To Install and explore the features of NumPy, SciPy, Statsmodels, jupyter and Pandas
packages in Python.
NUMPY
NumPy stands for Numerical Python which is a Python library used for working with
arrays. It provides an efficient interface to store and operate on dense data buffers.
NumPy arrays provide much more efficient storage and data operations as the arrays
grow larger in size. NumPy arrays form the core of nearly the entire ecosystem of data
science tools in Python. It also has functions for working in domain of linear algebra,
fourier transform and matrices. The array object in NumPy is called ndarray.
INSTALL PIP

sudo apt install python-pip

INSTALL NUMPY

pip install numpy

CHECK IF NUMPY IS INSTALLED

pip show numpy

Name: numpy
Version: 1.16.6
Summary: NumPy is the fundamental package for array computing with Python.
Home-page: https://www.numpy.org
Author: Travis E. Oliphant et al.
Author-email: None
License: BSD
Location: /home/cc1-48/local/lib/python2.7/site-packages

SCIPY
SciPy is an open-source library used for solving mathematical, scientific, engineering,
and technical problems. It allows users to manipulate the data and visualize the data
using a wide range of high-level Python commands. SciPy is built on the Python
NumPy extension.
INSTALL SCIPY

pip install scipy

CHECK IF SCIPY INSTALLED

pip show scipy

Name: scipy
Version: 1.2.3
Summary: SciPy: Scientific Library for Python
Home-page: https://www.scipy.org
Author: None
Author-email: None
License: BSD
Location: /home/cc1-48/.local/lib/python2.7/site-packages
Requires: numpy

STATSMODELS
Statsmodels is a Python library built specifically for statistics. Statsmodels is built on top of
NumPy, SciPy and matplotlib, but it contains more advanced functions for statistical testing
and modeling. It includes advanced statistical testing functions and comes with a plethora of
descriptive statistics, statistical tests, result statistics and plotting functions. Matplotlib
Library is used to power its graphical functions.
INSTALL STATSMODELS

pip install statsmodels

CHECK IF STATSMODELS IS INSTALLED

pip show statsmodels

Name: statsmodels
Version: 0.11.0
Summary: Statistical computations and models for Python
Home-page: https://www.statsmodels.org/
Author: None
Author-email: None
License: BSD License
Location: /home/cc1-48/.local/lib/python2.7/site-packages
Requires: scipy, pandas, numpy, patsy
Required-by:

PANDAS
Pandas is a widely used Python library. It is used in multiple stages of data analytics
starting from data manipulation to data analysis
INSTALL PANDAS

pip install pandas

CHECK IF PANDAS IS INSTALLED

Name: pandas
Version: 0.24.2
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: http://pandas.pydata.org
Author: None
Author-email: None
License: BSD
Location: /home/cc1-48/.local/lib/python2.7/site-packages
Requires: numpy, python-dateutil, pytz
Required-by: statsmodels

JUPYTER NOTEBOOK
The jupyter notebook is a powerful tool for interactively developing and presenting
data science projects. It’s a single document which can run code, display the output,
add explanations, formulas, charts, and make the work more transparent,
understandable, repeatable and shareable. Jupyter Notebook combine code,
comments, multimedia, and visualizations in an interactive document called a
notebook, that can be shared, re-used, and re-worked. As jupyter Notebook runs via a
web browser, the notebook could be hosted on the local machine or on a remote
server.
INSTALL JUPYTER

pip install jupyter notebook

TO LAUNCH
jupyter notebook

CHECK IF JUPYTER INSTALLED

pip show jupyter

Name: jupyter
Version: 1.0.0
Summary: Jupyter metapackage. Install all the Jupyter components in one go.
Home-page: http://jupyter.org
Author: Jupyter Development Team
Author-email: [email protected]
License: BSD
Location: /home/cc1-48/.local/lib/python2.7/site-packages
Requires: qtconsole, ipykernel, ipywidgets, jupyter-console, nbconvert, notebook
Required-by:

RESULT:
The features of packages like NumPy, SciPy, statsmodels, jupyter notebook and Pandas
were explored.
Ex. No: 2 Working with numpy array
Date:

2.1. Create a numpynd array object by using array() function.

Aim:
To Create a numpynd array object by using array() function.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as arr
step 4: print array
step 5: stop

Program:
import numpy as np
a=np.array([1,2,3,4,5])
print(a)

Output:

Result:
The program to create a numpynd array object by using array() function was executed
and the output verified.
2.2. Use tuples to create a numpy array.

Aim:
To use tuples to create a numpy array.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as tuple
step 4: print array
step 5: stop

program:
import numpy as np
a=np.array((1,2,3,4,5))
print(a)

output:

Result:
The program to use tuples to create a numpy array was executed and the output
verified.
2.3. Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6.

Aim:
To create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an 2-d array of values 1,2,3 & 4,5,6
step 4: print array
step 5: stop

Program:
import numpy as np
arr=np.array([[1,2,3],[4,5,6]])
print(arr)

Output:

Result:
The program to create a 2-D array was executed and the output verified.
2.4. Create a 3-D array.

Aim:
To create a 3-D array.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an 3-d array
step 4: print array
step 5: stop

Program:
import numpy as np
a=np.array([[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]])
print(a)

Output:

Result:
The program to create a 3-D array was executed and the output verified.
2.5. Displaying dimensions of array from 0 to 3.

Aim:
To display the dimensions of array from 0 to 3.

Algorithm:

step 1: start
step 2: import numpy module
step 3: declare arrays of dimensions from 0 to 3
step 4: print dimensions using ndim function
step 5: stop

program:
import numpy as np
a=np.array(42)
b=np.array([1,2,3,4,5])
c=np.array([[1,2,3],[4,5,6]])
d=np.array([[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

output:

Result:
The program to display dimensions of array was executed and the output verified.
2.6. Accessing array elements by indexing and adding it.

Aim:
To access array elements by indexing and adding it.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as arr
step 4: access the 2nd & 3rd element by indexing
step 5: print a[2]+a[3]
step 6: stop

Program:
import numpy as np
a=np.array([1,2,3,4])
print(a[2]+a[3])

Output:

Result:
The program to access array elements by indexing and adding it was executed and the
output verified.
2.7. Access the element on the 2nd row 5th column.

Aim:
To access the element on the 2nd row 5th column.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare a 2-d array
step 4: print 5th element on 2nd row by accessing the index of the array
step 5: stop

Program:
import numpy as np
a=np.array([[1,2,3,4,5],[6,7,8,9,10]])
print("5th element on 2nd row.....",a[1,4])

output:

Result:
The program to access the elements on the 2nd row 5thcolumn was executed and the
output verified.
2.8. Slice elements from index 1 to 5.

Aim:
To slice elements from index 1 to 5.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as arr
step 4: print slicing the array from 1 to 5
step 5: stop

program:
import numpy as np
a=np.array([1,2,3,4,5])
print(a[1:5])

output:

Result:
The program to slice elements from index 1 to 5 was executed and the output verified.
2.9. Slice elements from index 4 to the end of array.

Aim:
To slice elements from index 4 to the end of array.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as arr
step 4: print slicing the array from 4
step 5: stop

program:
import numpy as np
a=np.array([1,2,3,4,5])
print(a[4:])

output:

Result:
The program to slice elements from index 4 to the end of array was executed and the
output verified.
2.10. Slice elements from index 3 from the end to index 1 from end.

Aim:
To slice elements from index 3 from the end to index 1 from end.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as arr
step 4: print slicing the array by negative indexing from -5 to -1
step 5: stop

Program:
import numpy as np
a=np.array([1,2,3,4,5,6,7,8,9,10])
print(a[-5:-1])

Output:

Result:
The program to slice elements from index 3 from the end to index 1 from end was
executed and the output verified.
2.11. Print the shape of an array.

Aim:
To print the shape of an array.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare a 2-d array
step 4: print the shape of an array using arr.shape
step 5: stop

Program:
import numpy as np
a=np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(a.shape)

Output:

Result:
The program to print the shape of an array was executed and the output verified.
2.12. Iterate on the element of 1-D array.

Aim:
To iterate on the element of 1-D array.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as arr
step 4: using for loop print the elements of the arr
step 4: print i
step 5: stop

program:
import numpy as np
a=np.array([1,2,3,4,5])
for i in a:
print (i)

Output:

Result:
The program to iterate on the element of 1-D array was executed and the output verified.
2.13. Split the array in 3 parts.

Aim:
To split the array in 3 parts.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as a
step 4: create a variable newarr with array_split function to split the array into 3 parts
step 5: print newarr
step 6: stop

program:
import numpy as np
a=np.array([1,2,3,4,5,6])
newarr=np.array_split(a,3)
print(newarr)

Output:

Result:
The program to split the array in 3 parts was executed and the output verified.
2.14. Find the indexes where the value is 4.

Aim:
To write a program to find the indexes where the value is 4.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as arr
step 4: create a variable x using the function 'where' find the indexes whcih has the value 4
step 5: print x
step 6: stop
program:
import numpy as np
a=np.array([1,2,3,4,5,4,4])
x=np.where(a==4)
print(x)

output:

Result:
The program to find the indexes where the value is 4 was executed and the output
verified.
2.15. Find the indexes where the value is even.

Aim:
To write a program to find the indexes where the value is even.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as a
step 4: create a variable x using the function 'where' find the indexes where the value is even
step 5: print x
step 6: stop

program:
import numpy as np
a=np.array([1,2,3,4,5,6,7,8])
x=np.where(a%2==0)
print(x)

output:

Result:
The program to find the indexes where the value is even was executed and the output
verified.
2.16. Find the indexes where the value is odd.

Aim:
To write a program to find the indexes where the value is odd.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as a
step 4: create a variable x using the function 'where' find the indexes where the value is odd
step 5: print x
step 6: stop

program:
import numpy as np
a=np.array([1,2,3,4,5,6,7,8])
x=np.where(a%2==1)
print(x)

Output:

Result:
The program to find the indexes where the value is odd was executed and the output verified.
2.17. Sort the array.

Aim:
To write a program to sort the given array.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as a
step 4: print the sorted array using sort function
step 5: stop

program:
import numpy as np
a=np.array([3,2,0,1])
print(np.sort(a))

Output:

Result:
The program to sort the given array was executed and the output verified.
2.18. Sort the array alphabetically.

Aim:
To write a program to sort the given array alphabetically.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as a which consists of strings
step 4: print the sorted array using sort function
step 5: stop

program:
import numpy as np
a=np.array(‘banana’,’cherry’,’apple’)
print(np.sort(a))

Output:

Result:
The program to sort the given array alphabetically was executed and the output verified.
2.19. Create an array from the elements on index 0 and 2.

Aim:
To create an array from the elements on index 0 and 2.

Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as a
step 4: create another array as x which has booleanvalues(True & False)
step 5: assign the variable newarr to a[x]
step 6: print newarr
step 7: stop

program:
import numpy as np
a=np.array([41,42,43,44])
x=[True,False,True,False]
newarr=a[x]
print(newarr)

Output:

Result:
The program to create an array from the elements on index 0 and 2 was executed and the
output verified.
Ex. No: 3 Working with pandasdata frame
Date:

3(a). Create a data frame using a list of elements

Aim:
To write a python program to create a data frame using a list of elements.

Algorithm:
step 1: start
step 2: import pandas as pd
step 3: declare a list consists of elements a to g.
Step 4: assign df to pd.DataFrame(lst)
step 5: print df
step 6: stop

Program:
import pandas as pd
lst=['A','B','C','D','E','F','G']
df=pd.DataFrame(lst)
print(df)

Output

Result:
The program to create a data frame using a list of elements was executed and the output was
verified.
3(b). Create a data frame using the dictionary.

Aim:
To write a python program to create a data frame using the dictionary.

Algorithm:

step 1: start
step 2: import pandas as pd
step 3: assign a variable data to a dictionary whichconsists of keys: name and age.
step 4: assign df to pd.DataFrame(lst)
step 5: print df
step 6: stop

Program:
import pandas as pd
data={'name':['tom','nick','krish','jack'],'age':[20,21,19,18]}
df=pd.DataFrame(data)
print(df)

Output:

Result:
The program to create a data frame using the dictionary was executed and the output was
verified.
3(c). select a column from data frame

Aim:
To write a python program to select a column from data frame.

Algorithm:
step 1: start
step 2: import pandas as pd
step 3: assign a variable data to a dictionary which consists of keys: name , age and
qualification.
step 4: assign df to pd.DataFrame(data)
step 5: print df
step 6: print df with column name
step 7: stop

Program:
import pandas as pd
data={'name':['jai','princy','gaurav','anuj'],'age':[27,24,22,32],'address':['delhi','kanpur','allahab
ad','kannauj'],'qualification':
['MA','MCA','Phd']}
df=pd.DataFrame(data)
print(df)
print(df[['name','qualification']])

output:

Result:
The program to select a column from data frame was executed and the output was verified.
3(d). Checking for missing values using isnull() and notnull()

Aim:
To write a python program to check for missing values using isnull() and notnull().

Algorithm:
Step 1: start
step 2: import pandas as pd
step 3: import numpy as np
step 4: assign a variable dic to a dictionary consists of keys : first, second and third.
Step 5: assign df to pd.DataFrame(dic)
step 6: print(df.isnull())
step 7: stop

program:

import pandas as pd
import numpy as np
dic={'first':[100,90,np.nan,95],'second':[30,45,56,np.nan],'third':[np.nan,40,80,98]}
df=pd.DataFrame(dic)
print(df.isnull())

output:

Result:
The program to check for missing values using isnull() and notnull() was executed.
Ex. No: 4 Reading Data from Text Files, Excel and The Web and Exploring
Date: Various Commands For Doing Descriptive Analytics On The Iris
Data Set

Aim
To read data from the text files, excel and the web and exploring various commands for
doing descriptive analytics on the iris data set.
READING DATA FROM EXCEL FILE AND EXPLORING DESCRIPTIVE
ANALYTICS ON THE IRIS DATA SET
Algorithm
Step 1: Start
Step2: Importing pandas as pad
Step 3: using read_csv() function to read the csv file
Step 4: Printing the df with to_string()
Step 5: Stop
Requirement:
open pyxl module to write data to xlsx file user warning pandas requires version 0.98 or
neer of “ xlsxwriter ”(version 0.96 currently installed)

# needsopenpyxl module – pip install

openpycl

Program:
Read data from a csv file
import pandas as pd # reading the csv file
df=pd.read_csv("/home/cc1-43/ex4/Iris.csv")
# change path correctly
# print data from csv
print(df.to_string())
print(data.describe())
Output:
READING DATA FROM TEXT FILE
Algorithm:
Step 1: Start
Step 2: Importing pandas as pd
Step 3: Using read_csv() function to read file
Step 4: Printing the df
Step 5: Stop
Requirement:
text file - “texttest.txt”to be present in file location.

Program
# read text files with pandas using read_csv()
# importing pandas
import pandas as pd
# read text file into pandas data frame
df=pd.read_csv("texttest.txt",sep=" ")
# display data frame
print(df)

Output:

READING DATA FROM WEB

Algorithm
Step 1: Start
Step 2: Import pandas as pd
Step 3: Inserting URL
Step 4: Reading the URL content
Step 5: Printing Version, Release date
Step 6: Stop
Requirement
# requires – pip install lxml html5 lib beautiful soup 4
Program
import pandas as pd
url='https://en.wikipedia.org/wiki/History_of_Python'
dfs=pd.read_html(url)
print(len(dfs))
#=================STEP 1===========================
print(dfs[0])
print(dfs[0]['version'])
print(dfs[0]['release date'])
#=================================================
#=========================STEP 2==================
df=dfs[0]
df2=df[['Version','Release date']]
print(df2)
#=====================STEP3========================
#write the data to file
#needsopenpyxlmodule_pip install openpyxlpip
#df2.to_excel('python.xlsx')

print(data.describe())

Output:
Result
The program to read text file, excel and the web and exploring various commands for
doing descriptive analytics on the iris data set was successfully executed.
Ex. No: 5 Use the diabetes data set from UCI and PIMA Indian diabetes
Date:

5a.Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard

Deviation, Skewness and Kurtosis.

Aim
To analyse the Pima Indians diabetes data set for Univariate like Frequency ,Mean, median ,
etc..

Algorithm
Step 1: Start
Step 2: importing pandas as pd
Step 3: importing numpy as np
Step 4: importing statistics as st
Step 5: reading csv file using read_csv function
Step 6: printing information , shape ,mean, median , mode , standard deviation, variance
skew, kurtosis of data set
Step 7: importing matplotlib as plt
Step 8: representing the data in graphs charts
Step 9: Stop

Program
import pandas as pd
import numpy as np
import statistics as st
df=pd.read_csv("pima.csv")
print(df.shape)
print(df.info())
print('MEAN:\n',df.mean())
print('MEDIAN:\n:',df.median())
print('MODE:\n:',df.mode())
print('STANDARD DEVIATION:\n:',df.std())
print('VARIANCE:\n:',df.var())
print('SKEWNESS:\n:',df.skew())
print('KURTOSIS:\n:',df.kurtosis())
df.describe()
Data_X=df.copy(deep=True)
Data_X=Data_X.drop(['Outcome'],axis=1)
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize']=[40,40]
Data_X.hist(bins=40)
Output
(768, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column
Non-Null Count Dtype
--- ------
-------------- -----
0 Pregnancies No. Of pregnancies
768 non-null int64
1 Glucose

Plasma glucose concentration a 2 hours in an oral glucose tolerance test 768 non-
null int64

2 Diastolic blood pressure (mm Hg)

768 non-null int64
3 SkinThickness
Triceps skin fold thickness (mm) 768 non-null
int64
4 Insulin

2-Hour serum insulin (mu U/ml) 768 non-

null int64

5 Body mass index (weight in kg/(height in m)^2)

768 non-null float64
6 DiabetesPedigreeFunction
768 non-null float64
7 Age
768 non-null int64
8 Outcome
768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
None
MEAN:
Pregnancies No. Of pregnancies
3.845052
Glucose\nPlasma glucose concentration a 2 hours in an oral glucose tolerance test
120.894531
Diastolic blood pressure (mm Hg)
69.105469
SkinThickness\nTriceps skin fold thickness (mm)
20.536458
Insulin\n2-Hour serum insulin (mu U/ml)
79.799479
Body mass index (weight in kg/(height in m)^2)
31.992578
DiabetesPedigreeFunction
0.471876
Age
33.240885
Outcome
0.348958
dtype: float64
MEDIAN:
: Pregnancies No. Of
pregnancies 3.0000
Glucose\nPlasma glucose concentration a 2 hours in an oral glucose tolerance test
117.0000
Diastolic blood pressure (mm Hg)
72.0000

SkinThickness\nTriceps skin fold thickness (mm)

23.0000
Insulin\n2-Hour serum insulin (mu U/ml)
30.5000
Body mass index (weight in kg/(height in m)^2)
32.0000
DiabetesPedigreeFunction
0.3725
Age
29.0000
Outcome
0.0000
dtype: float64
MODE:
: Pregnancies No. Of pregnancies \
0 1.0
1 NaN
Glucose\nPlasma glucose concentration a 2 hours in an oral glucose tolerance test \
0 99
1 100
Diastolic blood pressure (mm Hg) \
0 70.0
1 NaN
SkinThickness\nTriceps skin fold thickness (mm) \
0 0.0
1 NaN
Insulin\n2-Hour serum insulin (mu U/ml) \
0 0.0
1 NaN
Body mass index (weight in kg/(height in m)^2) DiabetesPedigreeFunction \
0 32.0 0.254
1 NaN 0.258
Age Outcome
0 22.0 0.0
1 NaNNaN
STANDARD DEVIATION:
: Pregnancies No. Of
pregnancies 3.369578
Glucose\nPlasma glucose concentration a 2 hours in an oral glucose tolerance test
31.972618
Diastolic blood pressure (mm Hg)
19.355807
SkinThickness\nTriceps skin fold thickness (mm)
15.952218
Insulin\n2-Hour serum insulin (mu U/ml)
115.244002
Body mass index (weight in kg/(height in m)^2)
7.884160
DiabetesPedigreeFunction
0.331329
Age
11.760232
Outcome
0.476951
dtype: float64
VARIANCE:
: Pregnancies No. Of
pregnancies 11.354056
Glucose\nPlasma glucose concentration a 2 hours in an oral glucose tolerance test
1022.248314
Diastolic blood pressure (mm Hg)
374.647271

SkinThickness\nTriceps skin fold thickness (mm)

254.473245
Insulin\n2-Hour serum insulin (mu U/ml)
13281.180078
Body mass index (weight in kg/(height in m)^2)
62.159984
DiabetesPedigreeFunction
0.109779
Age
138.303046
Outcome
0.227483
dtype: float64
SKEWNESS:
: Pregnancies No. Of pregnancies
0.901674
Glucose\nPlasma glucose concentration a 2 hours in an oral glucose tolerance test
0.173754
Diastolic blood pressure (mm Hg)
-1.843608
SkinThickness\nTriceps skin fold thickness (mm)
0.109372
Insulin\n2-Hour serum insulin (mu U/ml)
2.272251
Body mass index (weight in kg/(height in m)^2)
-0.428982
DiabetesPedigreeFunction
1.919911
Age
1.129597
Outcome
0.635017
dtype: float64
KURTOSIS:
: Pregnancies No. Of pregnancies
0.159220
Glucose\nPlasma glucose concentration a 2 hours in an oral glucose tolerance test
0.640780
Diastolic blood pressure (mm Hg)
5.180157
SkinThickness\nTriceps skin fold thickness (mm)
-0.520072
Insulin\n2-Hour serum insulin (mu U/ml)
7.214260
Body mass index (weight in kg/(height in m)^2)
3.290443
DiabetesPedigreeFunction
5.594954
Age
0.643159
Outcome
-1.600930
dtype: float64
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f4f9e4bdbe0>,

<matplotlib.axes._subplots.AxesSubplot object at 0x7f4f9dfb26a0>,

<matplotlib.axes._subplots.AxesSubplot object at 0x7f4f9e05ad90>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x7f4f9df19490>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f4f9df22e20>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f4f9da9b5e0>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x7f4f9da9b9a0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f4f9c1dccd0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f4f9c1453a0>]],
dtype=object)

Result
The program Univariate analysis Frequency, Mean, Median, Mode, Variance, Standard
Deviation, Skewness and Kurtosis is successfully executed
5 b. Bivariate analysis: Linear regression modeling

Aim
To analyse the Bivariate operations and Linear regression modeling

Algorithm
Step 1: Start
Step 2: Import matplotlib as plt
Step 3: Import Numpy as np
Step 4: Import pandas as pd
Step 5: using sklearn module loading the data set
Step 6: calculating the mean squared error
Step 7: then calculating the mse and rmse
Step 8: printing the values of mse and rmse
Step 9: stop

Program
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets,linear_model
from sklearn.metrics import mean_squared_error
diabetes=datasets.load_diabetes()
diabetes.keys()
df=pd.DataFrame(diabetes['data'],columns=diabetes['feature_names'])
x=df
y=diabetes['target']
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=101)
from sklearn import linear_model
model=linear_model.LinearRegression()
model.fit(x_train,y_train)
y_pre=model.predict(x_test)
from sklearn.model_selection import cross_val_score
scores=cross_val_score(model,x,y,scoring="neg_mean_squared_error",cv=10)
rmse_scores=np.sqrt(-scores).mean()
print('Cross validation:',rmse_scores)
from sklearn.metrics import r2_score
print('r^2:',r2_score(y_test,y_pre))
mse=mean_squared_error(y_test,y_pre)
rmse=np.sqrt(mse)
print('RMSE:',rmse)
print("Weights:",model.coef_)
print("\nIntercept",model.intercept_)
Output
Cross validation: 54.40461553640237
r^2: 0.45767674177195583
RMSE: 58.009275047551995
Weights: [ -8.02566358 -308.83945001 583.63074324 299.9976184 -360.68940198
95.14235214 -93.03306818 118.15005596 662.12887711 26.07401648]
Intercept 153.72029738615726

5 b. Bivariate analysis: Logistics regression modeling

Aim
To analyse the Bivariate operations on logistics regression modeling

Program
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets#,linear_model
from sklearn.metrics import mean_squared_error
diabetes = datasets.load_diabetes()
diabetes.keys()
df=pd.DataFrame(diabetes['data'],columns=diabetes['feature_names'])
x=df
y=diabetes['target']
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=101)
#import Model
#fromsklearn
from sklearn.linear_model import LogisticRegression
model=LogisticRegression()
model.fit(x_train,y_train)
y_pre= model.predict(x_test)
from sklearn.metrics import r2_score
print('r^2:',r2_score(y_test,y_pre))
mse=mean_squared_error(y_test,y_pre)
rmse=np.sqrt(mse)
print('RMSE:',rmse)
Output
r^2: -0.44401265478624397
RMSE: 94.65723681369009

Result
The program Bivariate analysis of linear regression and logistic regression is successfully
executed
5 c. Multi regression line analysis

Aim
To Perform the multi regression analysis

Algorithm
step 1: Start
step 2: importing matplotlib as pd
step 3: importing numpy as np
step 4: importing sklearn
step 5: importing pandas as pd
step 6: reading csv file
step 7: fetching data from data set
step 8: calculating the mean squared value
step 9: then calculating the mse and rmse
step 10: printing the mse and rmse
step 11: Stop

Program
import matplotlib.pyplot as pd

import numpy as np

from sklearn import datasets, linear_model, metrics

import pandas as pd

import numpy as np

df=pd.read_csv('pima-indians-diabetes.csv')

data=df[['Age','Glucose','Bmi','Blood pressure','Pregnancies']]

target = df[['Outcome']]

print(data)

print(target)

X=data

Y=target

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train ,Y_test = train_test_split(X,Y,test_size = 0.3,random_state=101)

reg=linear_model.LinearRegression()

reg.fit(X_train ,Y_train )

Y_predict=reg.predict(X_test)
print('Coeffients',reg.coef_)

print('Variance Scores : {}'.format(reg.score(X_test,Y_test)))

from sklearn.metrics import r2_score

print('r^2 :',r2_score(Y_test,Y_predict))

from sklearn.metrics import mean_squared_error

mse = mean_squared_error(Y_test,Y_predict)

rmse=np.sqrt(mse)

print('RMSE:',rmse)

Output
Age GlucoseBmi Blood pressure Pregnancies

0 50 148 33.6 72 6

1 31 85 26.6 66 1

2 32 183 23.3 64 8

3 21 89 28.1 66 1

4 33 137 43.1 40 0

.. ... ... ... ... ...

763 63 101 32.9 76 10

764 27 122 36.8 70 2

765 30 121 26.2 72 5

766 47 126 30.1 60 1

767 23 93 30.4 70 1

[768 rows x 5 columns]

Outcome

0 1

1 0

2 1

3 0
4 1

.. ...

763 0

764 0

765 0

766 1

767 0

[768 rows x 1 columns]

Coeffients [[ 0.00362921 0.0057603 0.01359201 -0.0022797 0.01903324]]

Variance Scores : 0.3119613858813981

r^2 : 0.3119613858813981

RMSE: 0.3958061749043919

Result:
The program multiple regression is successfully executed
Ex. No: 6 Apply And Explore Various Plotting Functions OnUci Data Sets.
Date:

6.a. Normal curves

Aim
ToApply and explore various plotting function on UCI data set for normal curve

Algorithm
Step 1: start
Step 2: importing numpy as np
Step 3: import matplotlib as plt
Step 4: import scipy.stats
Step 5: import statistics
Step 6: creating np array with x, y dimensions
Step 7: using matplotlib showing the graph
Step 8: stop

Program
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import statistics
x_axis = np.arange(-20, 20, 0.01)
# Calculating mean and standard deviation
mean = statistics.mean(x_axis)
sd = statistics.stdev(x_axis)
plt.plot(x_axis, norm.pdf(x_axis, mean, sd))
plt.show()

Output
Program
from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
# Creating the distribution
data = np.arange(1,10,0.01)
pdf = norm.pdf(data , loc = 5.3 , scale = 1 )
#Visualizing the distribution
sb.set_style('whitegrid')
sb.lineplot(data, pdf , color = 'black')
plt.xlabel('Heights')
plt.ylabel('Probability Density’)

OUTPUT:
Text(0, 0.5, 'Probability Density')

Result

The program for Apply and explore various plotting function on UCI data set for
normal curve was executed and output obtained.
6b. Density and contour plots

Aim
ToApply and explore various plotting function on UCI data set for density and contour
plots

Algorithm
Step 1: start
Step 2: importing numpy as np
Step 3: import matplotlib as plt
Step 4: creating np array with x, y dimensions
Step 5: using matplotlib showing the graph
Step 6: stop

Program
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contour(X, Y, Z, colors='black');

Output
Program
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contour(X, Y, Z, 20, cmap='RdGy');

Output

Result
The program for Apply and explore various plotting function on UCI data set for
Density and contour plot was executed and output obtained.
6.c Correlation and scatter plots
Aim

ToApply and explore various plotting function on UCI data set from Correlation and scatter
plots

Algorithm
Step 1: start
Step 2: importing numpy as np
Step 3: import matplotlib as plt
Step 4: creating np array with x, y dimensions
Step 5: using matplotlib showing the graph
Step 6: stop

Program
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
x = np.linspace(0,10,30)
y = np.sin(x)
plt.plot(x, y, 'o', color='black');
Output

Program
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
rng = np.random.RandomState(0)
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']:
plt.plot(rng.rand(5), rng.rand(5),marker, label="marker='{0}'".format(marker))
plt.legend(numpoints=1)
plt.xlim(0, 1.8);
Output

Program
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
plt.plot(x,y,'-ok');

Output
Program
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
rng=np.random.RandomState(0)
x=rng.randn(100)
y=rng.randn(100)
colors=rng.rand(100)
sizes=1000 * rng.rand(100)
plt.scatter(x,y,c=colors,s=sizes,alpha=0.3,cmap='viridis')
plt.colorbar();#show color scale

Output

Result
The program for Apply and explore various plotting function on UCI data set for
correlation and scatterplot was executed and output obtained.
6.d. Histograms
Aim
ToApply and explore various plotting function on UCI data set for Histogram

Algorithm
Step 1: start
Step 2: importing numpy as np
Step 3: import matplotlib as plt
Step 4: creating np array with x, y dimensions
Step 5: using matplotlib showing the graph
Step 6: stop

Program
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
data=np.random.randn(1000)
plt.hist(data);
plt.hist(data,
bins=30,normed=True,alpha=0.5,histtype='stepfilled',color='steelblue',edgecolor='none');
Output
Two-dimensional histogram and binnings
Program
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
mean =[0,0]
cov=[[1,1],[1,2]]
x,y=np.random.multivariate_normal(mean,cov,10000).T
plt.hist2d(x,y,bins=30,cmap='Blues')
cb=plt.colorbar()
cb.set_label('counts in bin')
plt.hexbin(x,y,gridsize=30,cmap='Blues')
cb=plt.colorbar(label='count in bin')
Output

Result
The program for Apply and explore various plotting function on UCI data set for
histogram was executed and output obtained.
6.d. Three-dimensional plotting
Aim

ToApply and explore three-dimensional plotting function on UCI data set

Algorithm
Step 1: start
Step 2: importing numpy as np
Step 3: import matplotlib as plt
Step 4: creating np array with x, y dimensions
Step 5: using matplotlib showing the graph
Step 6: stop

Program
from mpl_toolkits import mplot3d
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
fig=plt.figure()
ax=plt.axes(projection='3d')
Output

Program

from mpl_toolkits import mplot3d

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
ax=plt.axes(projection='3d')
zline = np.linspace(0,15,1000)
xline = np.sin(zline)
yline = np.cos(zline)
ax.plot3D(xline,yline,zline,'gray');
zdata = 15 * np.random.random(100)
xdata = np.sin(zdata) + 0.1 * np.random.random(100)
ydata = np.cos(zdata) + 0.1 * np.random.random(100)
ax.scatter3D(xdata,ydata,zdata, c=zdata, cmap='Greens');

Output

Program

from mpl_toolkits import mplot3d

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
deff(x,y):
returnnp.sin(np.sqrt(x ** 2 + y ** 2))
x = np.linspace(-6,6,30)
y = np.linspace(-6,6,30)
X, Y = np.meshgrid(x,y)
z = f(X,Y)
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(x,y,z,50,cmap='binary')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z');

Output

Program

from mpl_toolkits import mplot3d

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
ax.view_init(60, 35)
fig=plt.figure( )
OUTPUT:

Result
The program for Apply and explore various plotting function on UCI data set for
Three-dimensional plotting was executed and output obtained.
Ex: no: 7 Visualizing geographic data with basemap
Date:

Aim
Tovisualizinggeographicdatawithbasemap
Requirement
!apt install proj-bin libproj-dev libgeos-dev
!pip install https://github.com/matplotlib/1.0.tar.gz
!pip install Basemap
Algorithm
Step 1: start
Step 2: importing numpy as np
Step 3: import matplotlib as plt
Step 4: import Basemap
Step 5: using matplotlib showing the graph
Step 6: stop
Program
frommpl_toolkits.basemapimportBasemap
importmatplotlib.pyplotasplt
importnumpyas np
%matplotlibinline
import warnings
importmatplotlib.cbook
warnings.filterwarnings("ignore",category=matplotlib.cbook.mplDeprecation)
Basemap?
fig =plt.figure(num=None,figsize=(12,8))
m =Basemap(projection='merc',llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-
180,urcrnrlon=180,resolution='c')
m.drawcoastlines()
plt.title("Mercator Projection")
plt.show()
OUTPUT:

PROGRAM:
frommpl_toolkits.basemapimportBasemap
importmatplotlib.pyplotasplt
importnumpyas np
%matplotlibinline
import warnings
importmatplotlib.cbook
warnings.filterwarnings("ignore",category=matplotlib.cbook.mplDeprecation)
Basemap?
fig =plt.figure(num=None,figsize=(12,8))
m =Basemap(projection='merc',llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-
180,urcrnrlon=180,resolution='c')
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
m.drawparallels(np.arange(-90.,91.,30.),labels=[True,True,False,False],dashes=[2,2])
m.drawmeridians(np.arange(-180.,181.,60.),labels=[False,False,False,True],dashes=[2,2])
m.drawmapboundary(fill_color='lightblue')
plt.title("Mercator Projection")

Output

Result
Thus, visualizing geographic data with basemap is implemented.

CS3361-Data Science Laboratory Manual
No ratings yet
CS3361-Data Science Laboratory Manual
58 pages
1755594488288_data science lab manual
No ratings yet
1755594488288_data science lab manual
56 pages
Fds Lab Manual (6)
No ratings yet
Fds Lab Manual (6)
74 pages
CS3362 - Data Science Laboratory - Manual - Final-1
No ratings yet
CS3362 - Data Science Laboratory - Manual - Final-1
76 pages
Fods Lab
No ratings yet
Fods Lab
54 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
74 pages
OCS353 - Data Science Manual-FULL
No ratings yet
OCS353 - Data Science Manual-FULL
64 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
MGNM801 Ca2 Final
No ratings yet
MGNM801 Ca2 Final
13 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
85 pages
foundation of data science lab manual
No ratings yet
foundation of data science lab manual
31 pages
Python For Data Science
No ratings yet
Python For Data Science
8 pages
Cs3361 Data Science Laboratory
No ratings yet
Cs3361 Data Science Laboratory
139 pages
Lab - Manual FDS
No ratings yet
Lab - Manual FDS
12 pages
Python Ca22
No ratings yet
Python Ca22
14 pages
23CS302 - Dslab - Experiment 1
No ratings yet
23CS302 - Dslab - Experiment 1
5 pages
Data Science Laboratory
No ratings yet
Data Science Laboratory
2 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
82 pages
CS 3361 Data Science Laboratory Syllabus
No ratings yet
CS 3361 Data Science Laboratory Syllabus
1 page
Lesson Plan For CS3361
No ratings yet
Lesson Plan For CS3361
2 pages
Data Science Lab
No ratings yet
Data Science Lab
61 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
18 pages
Fods Final Done
No ratings yet
Fods Final Done
67 pages
FDS Ex No 1
No ratings yet
FDS Ex No 1
6 pages
Grace Python Numpy MB
No ratings yet
Grace Python Numpy MB
56 pages
Programming For Data Science
No ratings yet
Programming For Data Science
48 pages
Black Box Fairness Testing of Machine Learning Models
No ratings yet
Black Box Fairness Testing of Machine Learning Models
11 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
Rig No.: 314 Well Name: Date: 0.00 Drill Pipe: 0.00 Bha: 0.00 Kelly: Depth 0.00 Page #: 1
100% (1)
Rig No.: 314 Well Name: Date: 0.00 Drill Pipe: 0.00 Bha: 0.00 Kelly: Depth 0.00 Page #: 1
7 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
FDS Lab Meterial CS3361
No ratings yet
FDS Lab Meterial CS3361
30 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
DS409 DataScience LabManual Jan2021
No ratings yet
DS409 DataScience LabManual Jan2021
41 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
CS 3361 Set 2
No ratings yet
CS 3361 Set 2
3 pages
Fods (1) - Merged (1) - 1
No ratings yet
Fods (1) - Merged (1) - 1
100 pages
Fdsa Lab Manual
No ratings yet
Fdsa Lab Manual
53 pages
FDS Aim Algorithm
No ratings yet
FDS Aim Algorithm
18 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
62 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Fds Record
No ratings yet
Fds Record
69 pages
Lab Manual Fds
No ratings yet
Lab Manual Fds
44 pages
CS3362 Data Science Laboratory Alok Kumar
No ratings yet
CS3362 Data Science Laboratory Alok Kumar
50 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
DS Lab Manual
No ratings yet
DS Lab Manual
113 pages
ML With Python Lab (MCA)
No ratings yet
ML With Python Lab (MCA)
36 pages
Unit 5-dld Notes (Pranalini)
No ratings yet
Unit 5-dld Notes (Pranalini)
16 pages
Data Ty
No ratings yet
Data Ty
59 pages
Chapter Shutdown
No ratings yet
Chapter Shutdown
31 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Fdsa Manual
No ratings yet
Fdsa Manual
53 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
Ocs353 Data Science Fundamentals Laboratory-eee
No ratings yet
Ocs353 Data Science Fundamentals Laboratory-eee
52 pages
FDS Lab
No ratings yet
FDS Lab
43 pages
Business Analytics Record Ccw331 (f16's) (1)
No ratings yet
Business Analytics Record Ccw331 (f16's) (1)
109 pages
SCRIPT - Camtasia 2020 Essential Training
No ratings yet
SCRIPT - Camtasia 2020 Essential Training
41 pages
Unit 5
No ratings yet
Unit 5
27 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
31 pages
Dslab Manual_merged (1)
No ratings yet
Dslab Manual_merged (1)
59 pages
Unit I Es
No ratings yet
Unit I Es
23 pages
Schneider Electric - Altivar-31-Variable-Speed-Drives-VFD-Legacy - ATV31HU40N4
No ratings yet
Schneider Electric - Altivar-31-Variable-Speed-Drives-VFD-Legacy - ATV31HU40N4
4 pages
First Week First Course Introduction To Digital Technologies
No ratings yet
First Week First Course Introduction To Digital Technologies
8 pages
Hands-On Exercise No. 1 Batch-02 Graphic Design Total Marks: 10 Due Date: 04/08/2022
No ratings yet
Hands-On Exercise No. 1 Batch-02 Graphic Design Total Marks: 10 Due Date: 04/08/2022
3 pages
FDS Dhana
No ratings yet
FDS Dhana
49 pages
LAB MANUAL ML R22
No ratings yet
LAB MANUAL ML R22
27 pages
Lecture05 IntervalTree
No ratings yet
Lecture05 IntervalTree
4 pages
2024 - 10 - 14 - ASEAN ITU GovStack - Brunei Country Update FINAL
No ratings yet
2024 - 10 - 14 - ASEAN ITU GovStack - Brunei Country Update FINAL
16 pages
Empirical Study On Terminal Water Velocity of Drainage Stack - C.L. Cheng, K.C. He e C.L
No ratings yet
Empirical Study On Terminal Water Velocity of Drainage Stack - C.L. Cheng, K.C. He e C.L
15 pages
CEng 6104-Course Outline March 2023
No ratings yet
CEng 6104-Course Outline March 2023
2 pages
Advanced Statistics
No ratings yet
Advanced Statistics
125 pages
SH Verion
No ratings yet
SH Verion
29 pages
Final Devops
No ratings yet
Final Devops
23 pages
Ethics and AI
No ratings yet
Ethics and AI
51 pages
Abtik Group
No ratings yet
Abtik Group
23 pages
List of EN1317 Compliant RRS March 2016
No ratings yet
List of EN1317 Compliant RRS March 2016
82 pages
WK1-JUL25-01-07-COMPILATION-CROSSWORD Compressed 64463233 2025 07 14 08 20
No ratings yet
WK1-JUL25-01-07-COMPILATION-CROSSWORD Compressed 64463233 2025 07 14 08 20
30 pages
Article Review - Samakaab Basha - SRE
No ratings yet
Article Review - Samakaab Basha - SRE
4 pages
Xuewei 2020
No ratings yet
Xuewei 2020
5 pages
Chapter I Review of Related Studies and Literature
89% (18)
Chapter I Review of Related Studies and Literature
5 pages
DELTA IA-TC DTM B EN-DIN 20181004 Web
No ratings yet
DELTA IA-TC DTM B EN-DIN 20181004 Web
4 pages
Cambridge 1 Syllabus Planer Nov - Dec 2023
No ratings yet
Cambridge 1 Syllabus Planer Nov - Dec 2023
3 pages
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
2 pages
Space Systems - Responsive Missions
No ratings yet
Space Systems - Responsive Missions
2 pages
Compiled Test 1 EIS (All)
No ratings yet
Compiled Test 1 EIS (All)
349 pages
ZXCATPU01 Exit Target Hours
No ratings yet
ZXCATPU01 Exit Target Hours
2 pages
Consent Form Version 6
No ratings yet
Consent Form Version 6
2 pages
Power System Course Outline 2022
No ratings yet
Power System Course Outline 2022
1 page
Swanti Satsangi
No ratings yet
Swanti Satsangi
1 page
GPL Statement
No ratings yet
GPL Statement
1 page

Data Sceince Lab Manual

Uploaded by

Data Sceince Lab Manual

Uploaded by

CS3361 DATA SCIENCE LABORATORY LTPC0042

LIST OF EQUIPMENTS :(30 Students per Batch)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS3361-DATA SCIENCE LABORATORY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CS3361-DATA SCIENCE LABORATORY

Certified that this is a bonafide record of work done by _________________________ of II

Signature of Faculty-in charge Signature of HOD

Submitted for the Anna University Practical Examination held on_____________.

INTERNAL EXAMINER EXTERNAL EXAMINER

1 Download, install and explore the features of

2 Working with Numpy arrays

3 Working with Pandas data frames

4 Reading data from text files, Excel and the

5 5. Use the diabetes data set from UCI and

6 6. Apply and explore various plotting

7 7. Visualizing Geographic Data with

sudo apt install python-pip

pip install numpy

CHECK IF NUMPY IS INSTALLED

pip show numpy

pip install scipy

CHECK IF SCIPY INSTALLED

pip install statsmodels

CHECK IF STATSMODELS IS INSTALLED

pip show statsmodels

pip install pandas

CHECK IF PANDAS IS INSTALLED

pip install jupyter notebook

CHECK IF JUPYTER INSTALLED

pip show jupyter

2.1. Create a numpynd array object by using array() function.

3(a). Create a data frame using a list of elements

# needsopenpyxl module – pip install

READING DATA FROM WEB

5a.Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard

2 Diastolic blood pressure (mm Hg)

2-Hour serum insulin (mu U/ml) 768 non-

5 Body mass index (weight in kg/(height in m)^2)

SkinThickness\nTriceps skin fold thickness (mm)

SkinThickness\nTriceps skin fold thickness (mm)

<matplotlib.axes._subplots.AxesSubplot object at 0x7f4f9dfb26a0>,

5 b. Bivariate analysis: Logistics regression modeling

from sklearn import datasets, linear_model, metrics

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train ,Y_test = train_test_split(X,Y,test_size = 0.3,random_state=101)

print('Variance Scores : {}'.format(reg.score(X_test,Y_test)))

from sklearn.metrics import r2_score

from sklearn.metrics import mean_squared_error

.. ... ... ... ... ...

763 63 101 32.9 76 10

764 27 122 36.8 70 2

765 30 121 26.2 72 5

766 47 126 30.1 60 1

[768 rows x 5 columns]

[768 rows x 1 columns]

Coeffients [[ 0.00362921 0.0057603 0.01359201 -0.0022797 0.01903324]]

Variance Scores : 0.3119613858813981

6.a. Normal curves

ToApply and explore three-dimensional plotting function on UCI data set

from mpl_toolkits import mplot3d

from mpl_toolkits import mplot3d

from mpl_toolkits import mplot3d

You might also like