0% found this document useful (0 votes)
13 views31 pages

Vanshika Goyal Gec Practicals

The document outlines a series of practical exercises for a Data Visualization using Python course. It includes tasks involving the numpy and pandas libraries for statistical analysis, data manipulation, and visualization techniques. The exercises cover various topics such as computing statistics, creating and reshaping arrays, handling missing values, and performing data merges and visualizations with the Iris dataset.

Uploaded by

vanshikagoyal726
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views31 pages

Vanshika Goyal Gec Practicals

The document outlines a series of practical exercises for a Data Visualization using Python course. It includes tasks involving the numpy and pandas libraries for statistical analysis, data manipulation, and visualization techniques. The exercises cover various topics such as computing statistics, creating and reshaping arrays, handling missing values, and performing data merges and visualizations with the Iris dataset.

Uploaded by

vanshikagoyal726
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

GEC - PRACTICALS

NAME : VANSHIKA GOYAL


ROLL NO. : 24504061
COURSE : BCOM [HONS.]
SUBJECT : DATA VISUALISATION USING
PYTHON
1. Write programmes in python using numpy library to do the following:

a. compute the mean , standard deviation , and variance of a two dimensional random integer
array along the second axis.
In[1] import numpy as np
Array1=np.random.randint(2,20,size=(3,4))
print(array1)
print('mean of random array along second axis:',np.mean(array1,axis=1))
print('standard deviation of random array along second axis:',np.std(array1,axis=1))
print('variance of random array along second axis:',np.var(array1,axis=1))

Out[1] [[15 6 7 16]


[ 6 14 10 6]
[14 7 5 2]]
mean of random array along second axis: [11. 9. 7.]
standard deviation of random array along second axis: [4.52769257 3.31662479 4.41588043]
variance of random array along second axis: [20.5 11. 19.5]
b. Create a 2-dimensional array of size m x n integer elements, also print the shape , type
and data type of the array and then reshape it into an n x m array, where n and m are user
inputs given at the run time.
c. Test whether the elements of a given 1D array are zero, non-zero and NaN. Record the
indices of these elements in seperate arrays.
d. Create three random arrays of the same size: Array1,Array2,Array3. subtract Array2 from
array3 and store Array4, create another array Array5 having two times the values in array1.
find co-variance and correlation of Array 1 with array4 and array5 respectively.
e. Create 2 random arrays of the same size 10 : array1,array2. Find the sum of the first half
of both the arrays and product of the second half of both the arrays.
In[1] import numpy as np
arr1=np.random.random(size=10)
arr2=np.random.random(size=10)
print(arr1)
print(arr2)
arr3=arr1[0:5]+arr2[0:5]
arr4=arr1[5:10]*arr2[5:10]
print(arr3)
print(arr4)

Out[1] [0.63618375 0.86171874 0.56897631 0.37959409 0.34805725 0.91758604


0.17253892 0.77094538 0.95741841 0.95282946]
[0.60511907 0.51368738 0.73009941 0.87229216 0.11689907 0.57703957
0.81210364 0.70982319 0.33538714 0.4209075 ]
[1.24130282 1.37540612 1.29907573 1.25188625 0.46495632]
[0.52948345 0.14011949 0.54723491 0.32110582 0.40105307]
2. Do the following using Pandas series:

a.Create a series with 5 elements. Display the series sorted on index and also sorted on values separately.
In[1] import pandas as pd
s1=pd.Series([9,5,0,8,6],index=['a','b','c','d','e'])
x=s1.sort_index()
y=s1.sort_values()
print(x)
print(y)
Out[1] a 9
b 5
c 0
d 8
e 6
dtype: int64
c 0
b 5
e 6
d 8
a 9
dtype: int64
b. Create a series with N elements with some duplicate values. Find the minimum and
maximum ranks assigned to the values using ‘first’ and ‘max’ method.
In[1] import pandas as pd
s2=pd.Series([8,5,4,3,1,2],index=['a','b','c','d','e','f'])
x=s2.rank(method='first‘)
y=s2.rank(method='max')
print(x)
print(y)

Out[1] a 6.0
b 5.0
c 4.0
d 3.0
e 1.0
f 2.0
dtype: float64
a 6.0
b 5.0
c 4.0
d 3.0
e 1.0
f 2.0
dtype: float64
c. Display the index value of the minimum and maximum elements of a series.
In[] import pandas as pd
s=pd.Series([123,564,181,345,65,4567,41,5])
print(s)
print("index value of the maximum element is:",s.idxmax())
print("index value of the minimum element is:",s.idxmin())

Out[] 0 123
1 564
2 181
3 345
4 65
5 4567
6 41
7 5
dtype: int64
index value of the maximum element is: 5
index value of the minimum element is: 7
3. Create a data frame having atleast 3 columns and 50 rows to store numerical data generated
using a random function. Replace 10%of the values by null values whose index positions are
generated using random function.
a. Identify and count missing values in a data frame.
b. Drop the column having more than 5 null values.
c. Identify the row label having maximum of the sum of all values in a row and drop that row.
d. Sort the data on the basis of the first column.
e. Remove all the duplicates from the first column.
f. Find the correlation between first and second column and covariance between second and
third column.
g. Discretize the second column and create 5 bins.
4. Consider 2 excel files having attendance of two workshops. Each file has 3 fields ‘name’ , ’date’ ,
’duration’ (in minutes) where names are unique within a file . Note that the duration may take one of the
three values (30,40,50) only import the data into two data frames and do the following:

a. Perform merging of the two data frames to find the names of the student who had
attended both workshops.
b. Find names of all students who have attended a single workshop only.

c. Merge two data frames row wise and find the total number of records in the data frame.
d. Merge two data frames row wise and use two columns viz. names and dates as multi-row
indexes. Generate descriptive statistics for the hierarchical data frame.
5. Using iris data, plot the following with proper legand and axis labels: ( download IRIS Data from
: https://archieve.ics.uci.edu/ml/datasets/iris or import it from sklearn datasets).
a. Plot bar chart to show the frequency of each class label in the Data.
b. Draw a scatter plot for petal width vs. sepal width and fit a regression line.
c. Plot density distribution for feature petal length.
d. Use a pair plot to show pairwise bivariate distribution in the Iris Dataset.
e. Draw heatmap for the four numeric attributes.
g. Compute correlation coefficients between each pair of features and plot heatmap.
6. Consider the following data frame containing a family name, gender of the family member
and his/her monthly income in each record.
NAME GENDER MONTHLY INCOME {RS.}

Shah Male 11400.00

Vats Male 65000.00

Vats Female 43150.00

Kumar Female 69500.00

Vats Female 155000.00

Kumar Male 103000.00

Shah Male 55000.00

Shah Female 112400.00

Kumar Female 81030.00

Vats Male 71900.00


Write a program in python using Pandas to perform the following:

a. Calculate and display family wise gross monthly income.


b. Calculate and display the member with highest monthly income.

c. Calculate and display monthly income of all members with income greater than Rs. 60000.00.

d. Calculate and display the average monthly income of female members.

You might also like