Data Sceince Lab Manual
Data Sceince Lab Manual
COURSE OBJECTIVES:
To understand the python libraries for data science
To understand the basic Statistical and Probability measures for data science.
To learn descriptive analytics on the benchmark data sets.
To apply correlation and regression analytics on standard data sets.
To present and interpret data using visualization packages in Python
LIST OF EXERCISES:
1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and
Pandas packages.
2. Working with Numpy arrays
3. Working with Pandas data frames
4. Reading data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set.
5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the
following:
a) Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Skewness and Kurtosis.
b) Bivariate analysis: Linear and logistic regression modeling
c) Multiple Regression analysis
d) Also compare the results of the above analysis for the two data sets.
6. Apply and explore various plotting functions on UCI data sets.
a) Normal curves
b) Density and contour plots
c) Correlation and scatter plots
d) Histograms
e) Three dimensional plotting
7. Visualizing Geographic Data with Basemap
TOTAL: 60 PERIODS
COURSE OUTCOMES:
At the end of this course, the students will be able to:
CO1: Make use of the python libraries for data science
CO2: Make use of the basic Statistical and Probability measures for data science.
CO3: Perform descriptive analytics on the benchmark data sets.
CO4: Perform correlation and regression analytics on standard data sets
CO5: Present and interpret data using visualization packages in Python
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
AVADI – I.A.F. MUTHAPUDUPET, CHENNAI – 600 055.
NAME:
REGISTER NUMBER:
DEGREE:
BRANCH:
YEAR:
SEMESTER:
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
AVADI – I.A.F. MUTHAPUDUPET, CHENNAI – 600 055.
Bonafide Certificate
REGISTER NUMBER:
AIM:
To Install and explore the features of NumPy, SciPy, Statsmodels, jupyter and Pandas
packages in Python.
NUMPY
NumPy stands for Numerical Python which is a Python library used for working with
arrays. It provides an efficient interface to store and operate on dense data buffers.
NumPy arrays provide much more efficient storage and data operations as the arrays
grow larger in size. NumPy arrays form the core of nearly the entire ecosystem of data
science tools in Python. It also has functions for working in domain of linear algebra,
fourier transform and matrices. The array object in NumPy is called ndarray.
INSTALL PIP
INSTALL NUMPY
Name: numpy
Version: 1.16.6
Summary: NumPy is the fundamental package for array computing with Python.
Home-page: https://www.numpy.org
Author: Travis E. Oliphant et al.
Author-email: None
License: BSD
Location: /home/cc1-48/local/lib/python2.7/site-packages
SCIPY
SciPy is an open-source library used for solving mathematical, scientific, engineering,
and technical problems. It allows users to manipulate the data and visualize the data
using a wide range of high-level Python commands. SciPy is built on the Python
NumPy extension.
INSTALL SCIPY
Name: scipy
Version: 1.2.3
Summary: SciPy: Scientific Library for Python
Home-page: https://www.scipy.org
Author: None
Author-email: None
License: BSD
Location: /home/cc1-48/.local/lib/python2.7/site-packages
Requires: numpy
STATSMODELS
Statsmodels is a Python library built specifically for statistics. Statsmodels is built on top of
NumPy, SciPy and matplotlib, but it contains more advanced functions for statistical testing
and modeling. It includes advanced statistical testing functions and comes with a plethora of
descriptive statistics, statistical tests, result statistics and plotting functions. Matplotlib
Library is used to power its graphical functions.
INSTALL STATSMODELS
Name: statsmodels
Version: 0.11.0
Summary: Statistical computations and models for Python
Home-page: https://www.statsmodels.org/
Author: None
Author-email: None
License: BSD License
Location: /home/cc1-48/.local/lib/python2.7/site-packages
Requires: scipy, pandas, numpy, patsy
Required-by:
PANDAS
Pandas is a widely used Python library. It is used in multiple stages of data analytics
starting from data manipulation to data analysis
INSTALL PANDAS
JUPYTER NOTEBOOK
The jupyter notebook is a powerful tool for interactively developing and presenting
data science projects. It’s a single document which can run code, display the output,
add explanations, formulas, charts, and make the work more transparent,
understandable, repeatable and shareable. Jupyter Notebook combine code,
comments, multimedia, and visualizations in an interactive document called a
notebook, that can be shared, re-used, and re-worked. As jupyter Notebook runs via a
web browser, the notebook could be hosted on the local machine or on a remote
server.
INSTALL JUPYTER
TO LAUNCH
jupyter notebook
Name: jupyter
Version: 1.0.0
Summary: Jupyter metapackage. Install all the Jupyter components in one go.
Home-page: http://jupyter.org
Author: Jupyter Development Team
Author-email: [email protected]
License: BSD
Location: /home/cc1-48/.local/lib/python2.7/site-packages
Requires: qtconsole, ipykernel, ipywidgets, jupyter-console, nbconvert, notebook
Required-by:
RESULT:
The features of packages like NumPy, SciPy, statsmodels, jupyter notebook and Pandas
were explored.
Ex. No: 2 Working with numpy array
Date:
Aim:
To Create a numpynd array object by using array() function.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as arr
step 4: print array
step 5: stop
Program:
import numpy as np
a=np.array([1,2,3,4,5])
print(a)
Output:
Result:
The program to create a numpynd array object by using array() function was executed
and the output verified.
2.2. Use tuples to create a numpy array.
Aim:
To use tuples to create a numpy array.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as tuple
step 4: print array
step 5: stop
program:
import numpy as np
a=np.array((1,2,3,4,5))
print(a)
output:
Result:
The program to use tuples to create a numpy array was executed and the output
verified.
2.3. Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6.
Aim:
To create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an 2-d array of values 1,2,3 & 4,5,6
step 4: print array
step 5: stop
Program:
import numpy as np
arr=np.array([[1,2,3],[4,5,6]])
print(arr)
Output:
Result:
The program to create a 2-D array was executed and the output verified.
2.4. Create a 3-D array.
Aim:
To create a 3-D array.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an 3-d array
step 4: print array
step 5: stop
Program:
import numpy as np
a=np.array([[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]])
print(a)
Output:
Result:
The program to create a 3-D array was executed and the output verified.
2.5. Displaying dimensions of array from 0 to 3.
Aim:
To display the dimensions of array from 0 to 3.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare arrays of dimensions from 0 to 3
step 4: print dimensions using ndim function
step 5: stop
program:
import numpy as np
a=np.array(42)
b=np.array([1,2,3,4,5])
c=np.array([[1,2,3],[4,5,6]])
d=np.array([[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)
output:
Result:
The program to display dimensions of array was executed and the output verified.
2.6. Accessing array elements by indexing and adding it.
Aim:
To access array elements by indexing and adding it.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as arr
step 4: access the 2nd & 3rd element by indexing
step 5: print a[2]+a[3]
step 6: stop
Program:
import numpy as np
a=np.array([1,2,3,4])
print(a[2]+a[3])
Output:
Result:
The program to access array elements by indexing and adding it was executed and the
output verified.
2.7. Access the element on the 2nd row 5th column.
Aim:
To access the element on the 2nd row 5th column.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare a 2-d array
step 4: print 5th element on 2nd row by accessing the index of the array
step 5: stop
Program:
import numpy as np
a=np.array([[1,2,3,4,5],[6,7,8,9,10]])
print("5th element on 2nd row.....",a[1,4])
output:
Result:
The program to access the elements on the 2nd row 5thcolumn was executed and the
output verified.
2.8. Slice elements from index 1 to 5.
Aim:
To slice elements from index 1 to 5.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as arr
step 4: print slicing the array from 1 to 5
step 5: stop
program:
import numpy as np
a=np.array([1,2,3,4,5])
print(a[1:5])
output:
Result:
The program to slice elements from index 1 to 5 was executed and the output verified.
2.9. Slice elements from index 4 to the end of array.
Aim:
To slice elements from index 4 to the end of array.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as arr
step 4: print slicing the array from 4
step 5: stop
program:
import numpy as np
a=np.array([1,2,3,4,5])
print(a[4:])
output:
Result:
The program to slice elements from index 4 to the end of array was executed and the
output verified.
2.10. Slice elements from index 3 from the end to index 1 from end.
Aim:
To slice elements from index 3 from the end to index 1 from end.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as arr
step 4: print slicing the array by negative indexing from -5 to -1
step 5: stop
Program:
import numpy as np
a=np.array([1,2,3,4,5,6,7,8,9,10])
print(a[-5:-1])
Output:
Result:
The program to slice elements from index 3 from the end to index 1 from end was
executed and the output verified.
2.11. Print the shape of an array.
Aim:
To print the shape of an array.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare a 2-d array
step 4: print the shape of an array using arr.shape
step 5: stop
Program:
import numpy as np
a=np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(a.shape)
Output:
Result:
The program to print the shape of an array was executed and the output verified.
2.12. Iterate on the element of 1-D array.
Aim:
To iterate on the element of 1-D array.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as arr
step 4: using for loop print the elements of the arr
step 4: print i
step 5: stop
program:
import numpy as np
a=np.array([1,2,3,4,5])
for i in a:
print (i)
Output:
Result:
The program to iterate on the element of 1-D array was executed and the output verified.
2.13. Split the array in 3 parts.
Aim:
To split the array in 3 parts.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as a
step 4: create a variable newarr with array_split function to split the array into 3 parts
step 5: print newarr
step 6: stop
program:
import numpy as np
a=np.array([1,2,3,4,5,6])
newarr=np.array_split(a,3)
print(newarr)
Output:
Result:
The program to split the array in 3 parts was executed and the output verified.
2.14. Find the indexes where the value is 4.
Aim:
To write a program to find the indexes where the value is 4.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as arr
step 4: create a variable x using the function 'where' find the indexes whcih has the value 4
step 5: print x
step 6: stop
program:
import numpy as np
a=np.array([1,2,3,4,5,4,4])
x=np.where(a==4)
print(x)
output:
Result:
The program to find the indexes where the value is 4 was executed and the output
verified.
2.15. Find the indexes where the value is even.
Aim:
To write a program to find the indexes where the value is even.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as a
step 4: create a variable x using the function 'where' find the indexes where the value is even
step 5: print x
step 6: stop
program:
import numpy as np
a=np.array([1,2,3,4,5,6,7,8])
x=np.where(a%2==0)
print(x)
output:
Result:
The program to find the indexes where the value is even was executed and the output
verified.
2.16. Find the indexes where the value is odd.
Aim:
To write a program to find the indexes where the value is odd.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as a
step 4: create a variable x using the function 'where' find the indexes where the value is odd
step 5: print x
step 6: stop
program:
import numpy as np
a=np.array([1,2,3,4,5,6,7,8])
x=np.where(a%2==1)
print(x)
Output:
Result:
The program to find the indexes where the value is odd was executed and the output verified.
2.17. Sort the array.
Aim:
To write a program to sort the given array.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as a
step 4: print the sorted array using sort function
step 5: stop
program:
import numpy as np
a=np.array([3,2,0,1])
print(np.sort(a))
Output:
Result:
The program to sort the given array was executed and the output verified.
2.18. Sort the array alphabetically.
Aim:
To write a program to sort the given array alphabetically.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as a which consists of strings
step 4: print the sorted array using sort function
step 5: stop
program:
import numpy as np
a=np.array(‘banana’,’cherry’,’apple’)
print(np.sort(a))
Output:
Result:
The program to sort the given array alphabetically was executed and the output verified.
2.19. Create an array from the elements on index 0 and 2.
Aim:
To create an array from the elements on index 0 and 2.
Algorithm:
step 1: start
step 2: import numpy module
step 3: declare an array as a
step 4: create another array as x which has booleanvalues(True & False)
step 5: assign the variable newarr to a[x]
step 6: print newarr
step 7: stop
program:
import numpy as np
a=np.array([41,42,43,44])
x=[True,False,True,False]
newarr=a[x]
print(newarr)
Output:
Result:
The program to create an array from the elements on index 0 and 2 was executed and the
output verified.
Ex. No: 3 Working with pandasdata frame
Date:
Aim:
To write a python program to create a data frame using a list of elements.
Algorithm:
step 1: start
step 2: import pandas as pd
step 3: declare a list consists of elements a to g.
Step 4: assign df to pd.DataFrame(lst)
step 5: print df
step 6: stop
Program:
import pandas as pd
lst=['A','B','C','D','E','F','G']
df=pd.DataFrame(lst)
print(df)
Output
Result:
The program to create a data frame using a list of elements was executed and the output was
verified.
3(b). Create a data frame using the dictionary.
Aim:
To write a python program to create a data frame using the dictionary.
Algorithm:
step 1: start
step 2: import pandas as pd
step 3: assign a variable data to a dictionary whichconsists of keys: name and age.
step 4: assign df to pd.DataFrame(lst)
step 5: print df
step 6: stop
Program:
import pandas as pd
data={'name':['tom','nick','krish','jack'],'age':[20,21,19,18]}
df=pd.DataFrame(data)
print(df)
Output:
Result:
The program to create a data frame using the dictionary was executed and the output was
verified.
3(c). select a column from data frame
Aim:
To write a python program to select a column from data frame.
Algorithm:
step 1: start
step 2: import pandas as pd
step 3: assign a variable data to a dictionary which consists of keys: name , age and
qualification.
step 4: assign df to pd.DataFrame(data)
step 5: print df
step 6: print df with column name
step 7: stop
Program:
import pandas as pd
data={'name':['jai','princy','gaurav','anuj'],'age':[27,24,22,32],'address':['delhi','kanpur','allahab
ad','kannauj'],'qualification':
['MA','MCA','Phd']}
df=pd.DataFrame(data)
print(df)
print(df[['name','qualification']])
output:
Result:
The program to select a column from data frame was executed and the output was verified.
3(d). Checking for missing values using isnull() and notnull()
Aim:
To write a python program to check for missing values using isnull() and notnull().
Algorithm:
Step 1: start
step 2: import pandas as pd
step 3: import numpy as np
step 4: assign a variable dic to a dictionary consists of keys : first, second and third.
Step 5: assign df to pd.DataFrame(dic)
step 6: print(df.isnull())
step 7: stop
program:
import pandas as pd
import numpy as np
dic={'first':[100,90,np.nan,95],'second':[30,45,56,np.nan],'third':[np.nan,40,80,98]}
df=pd.DataFrame(dic)
print(df.isnull())
output:
Result:
The program to check for missing values using isnull() and notnull() was executed.
Ex. No: 4 Reading Data from Text Files, Excel and The Web and Exploring
Date: Various Commands For Doing Descriptive Analytics On The Iris
Data Set
Aim
To read data from the text files, excel and the web and exploring various commands for
doing descriptive analytics on the iris data set.
READING DATA FROM EXCEL FILE AND EXPLORING DESCRIPTIVE
ANALYTICS ON THE IRIS DATA SET
Algorithm
Step 1: Start
Step2: Importing pandas as pad
Step 3: using read_csv() function to read the csv file
Step 4: Printing the df with to_string()
Step 5: Stop
Requirement:
open pyxl module to write data to xlsx file user warning pandas requires version 0.98 or
neer of “ xlsxwriter ”(version 0.96 currently installed)
Program:
Read data from a csv file
import pandas as pd # reading the csv file
df=pd.read_csv("/home/cc1-43/ex4/Iris.csv")
# change path correctly
# print data from csv
print(df.to_string())
print(data.describe())
Output:
READING DATA FROM TEXT FILE
Algorithm:
Step 1: Start
Step 2: Importing pandas as pd
Step 3: Using read_csv() function to read file
Step 4: Printing the df
Step 5: Stop
Requirement:
text file - “texttest.txt”to be present in file location.
Program
# read text files with pandas using read_csv()
# importing pandas
import pandas as pd
# read text file into pandas data frame
df=pd.read_csv("texttest.txt",sep=" ")
# display data frame
print(df)
Output:
print(data.describe())
Output:
Result
The program to read text file, excel and the web and exploring various commands for
doing descriptive analytics on the iris data set was successfully executed.
Ex. No: 5 Use the diabetes data set from UCI and PIMA Indian diabetes
Date:
Aim
To analyse the Pima Indians diabetes data set for Univariate like Frequency ,Mean, median ,
etc..
Algorithm
Step 1: Start
Step 2: importing pandas as pd
Step 3: importing numpy as np
Step 4: importing statistics as st
Step 5: reading csv file using read_csv function
Step 6: printing information , shape ,mean, median , mode , standard deviation, variance
skew, kurtosis of data set
Step 7: importing matplotlib as plt
Step 8: representing the data in graphs charts
Step 9: Stop
Program
import pandas as pd
import numpy as np
import statistics as st
df=pd.read_csv("pima.csv")
print(df.shape)
print(df.info())
print('MEAN:\n',df.mean())
print('MEDIAN:\n:',df.median())
print('MODE:\n:',df.mode())
print('STANDARD DEVIATION:\n:',df.std())
print('VARIANCE:\n:',df.var())
print('SKEWNESS:\n:',df.skew())
print('KURTOSIS:\n:',df.kurtosis())
df.describe()
Data_X=df.copy(deep=True)
Data_X=Data_X.drop(['Outcome'],axis=1)
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize']=[40,40]
Data_X.hist(bins=40)
Output
(768, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column
Non-Null Count Dtype
--- ------
-------------- -----
0 Pregnancies No. Of pregnancies
768 non-null int64
1 Glucose
Plasma glucose concentration a 2 hours in an oral glucose tolerance test 768 non-
null int64
Result
The program Univariate analysis Frequency, Mean, Median, Mode, Variance, Standard
Deviation, Skewness and Kurtosis is successfully executed
5 b. Bivariate analysis: Linear regression modeling
Aim
To analyse the Bivariate operations and Linear regression modeling
Algorithm
Step 1: Start
Step 2: Import matplotlib as plt
Step 3: Import Numpy as np
Step 4: Import pandas as pd
Step 5: using sklearn module loading the data set
Step 6: calculating the mean squared error
Step 7: then calculating the mse and rmse
Step 8: printing the values of mse and rmse
Step 9: stop
Program
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets,linear_model
from sklearn.metrics import mean_squared_error
diabetes=datasets.load_diabetes()
diabetes.keys()
df=pd.DataFrame(diabetes['data'],columns=diabetes['feature_names'])
x=df
y=diabetes['target']
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=101)
from sklearn import linear_model
model=linear_model.LinearRegression()
model.fit(x_train,y_train)
y_pre=model.predict(x_test)
from sklearn.model_selection import cross_val_score
scores=cross_val_score(model,x,y,scoring="neg_mean_squared_error",cv=10)
rmse_scores=np.sqrt(-scores).mean()
print('Cross validation:',rmse_scores)
from sklearn.metrics import r2_score
print('r^2:',r2_score(y_test,y_pre))
mse=mean_squared_error(y_test,y_pre)
rmse=np.sqrt(mse)
print('RMSE:',rmse)
print("Weights:",model.coef_)
print("\nIntercept",model.intercept_)
Output
Cross validation: 54.40461553640237
r^2: 0.45767674177195583
RMSE: 58.009275047551995
Weights: [ -8.02566358 -308.83945001 583.63074324 299.9976184 -360.68940198
95.14235214 -93.03306818 118.15005596 662.12887711 26.07401648]
Intercept 153.72029738615726
Aim
To analyse the Bivariate operations on logistics regression modeling
Algorithm
Step 1: Start
Step 2: Import matplotlib as plt
Step 3: Import Numpy as np
Step 4: Import pandas as pd
Step 5: using sklearn module loading the data set
Step 6: calculating the mean squared error
Step 7: then calculating the mse and rmse
Step 8: printing the values of mse and rmse
Step 9: stop
Program
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets#,linear_model
from sklearn.metrics import mean_squared_error
diabetes = datasets.load_diabetes()
diabetes.keys()
df=pd.DataFrame(diabetes['data'],columns=diabetes['feature_names'])
x=df
y=diabetes['target']
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=101)
#import Model
#fromsklearn
from sklearn.linear_model import LogisticRegression
model=LogisticRegression()
model.fit(x_train,y_train)
y_pre= model.predict(x_test)
from sklearn.metrics import r2_score
print('r^2:',r2_score(y_test,y_pre))
mse=mean_squared_error(y_test,y_pre)
rmse=np.sqrt(mse)
print('RMSE:',rmse)
Output
r^2: -0.44401265478624397
RMSE: 94.65723681369009
Result
The program Bivariate analysis of linear regression and logistic regression is successfully
executed
5 c. Multi regression line analysis
Aim
To Perform the multi regression analysis
Algorithm
step 1: Start
step 2: importing matplotlib as pd
step 3: importing numpy as np
step 4: importing sklearn
step 5: importing pandas as pd
step 6: reading csv file
step 7: fetching data from data set
step 8: calculating the mean squared value
step 9: then calculating the mse and rmse
step 10: printing the mse and rmse
step 11: Stop
Program
import matplotlib.pyplot as pd
import numpy as np
import pandas as pd
import numpy as np
df=pd.read_csv('pima-indians-diabetes.csv')
data=df[['Age','Glucose','Bmi','Blood pressure','Pregnancies']]
target = df[['Outcome']]
print(data)
print(target)
X=data
Y=target
reg=linear_model.LinearRegression()
reg.fit(X_train ,Y_train )
Y_predict=reg.predict(X_test)
print('Coeffients',reg.coef_)
print('r^2 :',r2_score(Y_test,Y_predict))
mse = mean_squared_error(Y_test,Y_predict)
rmse=np.sqrt(mse)
print('RMSE:',rmse)
Output
Age GlucoseBmi Blood pressure Pregnancies
0 50 148 33.6 72 6
1 31 85 26.6 66 1
2 32 183 23.3 64 8
3 21 89 28.1 66 1
4 33 137 43.1 40 0
767 23 93 30.4 70 1
Outcome
0 1
1 0
2 1
3 0
4 1
.. ...
763 0
764 0
765 0
766 1
767 0
r^2 : 0.3119613858813981
RMSE: 0.3958061749043919
Result:
The program multiple regression is successfully executed
Ex. No: 6 Apply And Explore Various Plotting Functions OnUci Data Sets.
Date:
Aim
ToApply and explore various plotting function on UCI data set for normal curve
Algorithm
Step 1: start
Step 2: importing numpy as np
Step 3: import matplotlib as plt
Step 4: import scipy.stats
Step 5: import statistics
Step 6: creating np array with x, y dimensions
Step 7: using matplotlib showing the graph
Step 8: stop
Program
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import statistics
x_axis = np.arange(-20, 20, 0.01)
# Calculating mean and standard deviation
mean = statistics.mean(x_axis)
sd = statistics.stdev(x_axis)
plt.plot(x_axis, norm.pdf(x_axis, mean, sd))
plt.show()
Output
Program
from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
# Creating the distribution
data = np.arange(1,10,0.01)
pdf = norm.pdf(data , loc = 5.3 , scale = 1 )
#Visualizing the distribution
sb.set_style('whitegrid')
sb.lineplot(data, pdf , color = 'black')
plt.xlabel('Heights')
plt.ylabel('Probability Density’)
OUTPUT:
Text(0, 0.5, 'Probability Density')
Result
The program for Apply and explore various plotting function on UCI data set for
normal curve was executed and output obtained.
6b. Density and contour plots
Aim
ToApply and explore various plotting function on UCI data set for density and contour
plots
Algorithm
Step 1: start
Step 2: importing numpy as np
Step 3: import matplotlib as plt
Step 4: creating np array with x, y dimensions
Step 5: using matplotlib showing the graph
Step 6: stop
Program
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contour(X, Y, Z, colors='black');
Output
Program
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contour(X, Y, Z, 20, cmap='RdGy');
Output
Program
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
import numpy as np
def f(x, y):
return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
x = np.linspace(0, 5, 50)
y = np.linspace(0, 5, 40)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contour(X, Y, Z, 20, cmap='RdGy')
plt.colorbar();
Output
Result
The program for Apply and explore various plotting function on UCI data set for
Density and contour plot was executed and output obtained.
6.c Correlation and scatter plots
Aim
ToApply and explore various plotting function on UCI data set from Correlation and scatter
plots
Algorithm
Step 1: start
Step 2: importing numpy as np
Step 3: import matplotlib as plt
Step 4: creating np array with x, y dimensions
Step 5: using matplotlib showing the graph
Step 6: stop
Program
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
x = np.linspace(0,10,30)
y = np.sin(x)
plt.plot(x, y, 'o', color='black');
Output
Program
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
rng = np.random.RandomState(0)
for marker in ['o', '.', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']:
plt.plot(rng.rand(5), rng.rand(5),marker, label="marker='{0}'".format(marker))
plt.legend(numpoints=1)
plt.xlim(0, 1.8);
Output
Program
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
plt.plot(x,y,'-ok');
Output
Program
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import numpy as np
rng=np.random.RandomState(0)
x=rng.randn(100)
y=rng.randn(100)
colors=rng.rand(100)
sizes=1000 * rng.rand(100)
plt.scatter(x,y,c=colors,s=sizes,alpha=0.3,cmap='viridis')
plt.colorbar();#show color scale
Output
Result
The program for Apply and explore various plotting function on UCI data set for
correlation and scatterplot was executed and output obtained.
6.d. Histograms
Aim
ToApply and explore various plotting function on UCI data set for Histogram
Algorithm
Step 1: start
Step 2: importing numpy as np
Step 3: import matplotlib as plt
Step 4: creating np array with x, y dimensions
Step 5: using matplotlib showing the graph
Step 6: stop
Program
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
data=np.random.randn(1000)
plt.hist(data);
plt.hist(data,
bins=30,normed=True,alpha=0.5,histtype='stepfilled',color='steelblue',edgecolor='none');
Output
Two-dimensional histogram and binnings
Program
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-white')
mean =[0,0]
cov=[[1,1],[1,2]]
x,y=np.random.multivariate_normal(mean,cov,10000).T
plt.hist2d(x,y,bins=30,cmap='Blues')
cb=plt.colorbar()
cb.set_label('counts in bin')
plt.hexbin(x,y,gridsize=30,cmap='Blues')
cb=plt.colorbar(label='count in bin')
Output
Result
The program for Apply and explore various plotting function on UCI data set for
histogram was executed and output obtained.
6.d. Three-dimensional plotting
Aim
Algorithm
Step 1: start
Step 2: importing numpy as np
Step 3: import matplotlib as plt
Step 4: creating np array with x, y dimensions
Step 5: using matplotlib showing the graph
Step 6: stop
Program
from mpl_toolkits import mplot3d
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
fig=plt.figure()
ax=plt.axes(projection='3d')
Output
Program
Output
Program
Output
Program
Result
The program for Apply and explore various plotting function on UCI data set for
Three-dimensional plotting was executed and output obtained.
Ex: no: 7 Visualizing geographic data with basemap
Date:
Aim
Tovisualizinggeographicdatawithbasemap
Requirement
!apt install proj-bin libproj-dev libgeos-dev
!pip install https://github.com/matplotlib/1.0.tar.gz
!pip install Basemap
Algorithm
Step 1: start
Step 2: importing numpy as np
Step 3: import matplotlib as plt
Step 4: import Basemap
Step 5: using matplotlib showing the graph
Step 6: stop
Program
frommpl_toolkits.basemapimportBasemap
importmatplotlib.pyplotasplt
importnumpyas np
%matplotlibinline
import warnings
importmatplotlib.cbook
warnings.filterwarnings("ignore",category=matplotlib.cbook.mplDeprecation)
Basemap?
fig =plt.figure(num=None,figsize=(12,8))
m =Basemap(projection='merc',llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-
180,urcrnrlon=180,resolution='c')
m.drawcoastlines()
plt.title("Mercator Projection")
plt.show()
OUTPUT:
PROGRAM:
frommpl_toolkits.basemapimportBasemap
importmatplotlib.pyplotasplt
importnumpyas np
%matplotlibinline
import warnings
importmatplotlib.cbook
warnings.filterwarnings("ignore",category=matplotlib.cbook.mplDeprecation)
Basemap?
fig =plt.figure(num=None,figsize=(12,8))
m =Basemap(projection='merc',llcrnrlat=-80,urcrnrlat=80,llcrnrlon=-
180,urcrnrlon=180,resolution='c')
m.drawcoastlines()
m.fillcontinents(color='tan',lake_color='lightblue')
m.drawparallels(np.arange(-90.,91.,30.),labels=[True,True,False,False],dashes=[2,2])
m.drawmeridians(np.arange(-180.,181.,60.),labels=[False,False,False,True],dashes=[2,2])
m.drawmapboundary(fill_color='lightblue')
plt.title("Mercator Projection")
Output
Result
Thus, visualizing geographic data with basemap is implemented.