0% found this document useful (0 votes)

23 views15 pages

Data Science Python Notebook (1)

This document is a Jupyter notebook focused on data visualization using Python, specifically exploring weather data from Jaipur. It introduces essential Python libraries such as Pandas, NumPy, and Matplotlib, and guides users through data manipulation techniques including reading CSV files, exploring datasets, and sorting values. The notebook provides practical examples and additional resources to enhance understanding of data visualization in data science.

Uploaded by

shivankwadhwa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views15 pages

Data Science Python Notebook (1)

Uploaded by

shivankwadhwa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

29/04/2023, 09:39 2023 Data Science Python Notebook.

ipynb - Colaboratory

Data Sciences

Introduction
Data visualization is part of data exploration, which is a critical step in the AI cycle. You will use this technique to gain understanding and
insights to the data you have gathered, and determine if the data is ready for further processing or if you need to collect more data or clean the
data.

You will also use this technique to present your results.

In this notebook, we will explore python packages crucial for Data Sciences. Packages like Pandas, NumPy and Matplotlib are used in the whole
process.

About the Notebook

This jupyter notebook focusses on Data Visualisation in Python. To let youth understand it in the best way possible, a lot of additional resources
have been provided in the notebook as links. The readers can simply go to those links to explore more on the subject.

Context
We will be working with Jaipur weather data obtained from Kaggle, a platform for data enthusiasts to gather, share knowledge and compete for
many prizes!

The data has been cleaned and simplified, so that we can focus on data visualization instead of data cleaning. Our data is stored in the file
named mydata.csv. This file contains weather information of Jaipur and is saved at the same location as the notebook.

What do you do next?

Side note: What is csv?

CSV (Comma-Separated Value) is a file containing a set of data, separated by commas.

We usually access these files using spreadsheet applications such as Excel or Google Sheet. Do you know how this is done?

Today, we will learn how to use Python to open csv files.

Use Python to open csv files

We will use the pandas library to work with our dataset. Pandas is a popular Python library for data science. It offers powerful and flexible data
structures to make data manipulationa and analysis easier.

Import Pandas

import pandas as pd #import pandas as pd means we can type "pd" to call the pandas library

Now that we have imported pandas, let's start by reading the csv file.

#saving the csv file into a variable which we will call data frame
dataframe = pd.read_csv("mydata.csv")

Exploring our data

Great! We have now a variable to contain our weather data. Let's explore our data. Use the .head() function to see the first few rows of data.

#dataframe.head() means we are getting the first 5 rows of data

# try running it to see what data is in the jaipur csv file
print (dataframe.head())

date mean_temperature max_temperature min_temperature \

0 2016-05-04 34 41 27
1 2016-05-05 31 38 24
2 2016-05-06 28 34 21
3 2016-05-07 30 38 23
4 2016-05-08 34 41 26

Mean_dew_pt mean_pressure max_humidity min_humidity max_dew_pt_1 \

0 6 1006.00 27 5 12

https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 1/15
29/04/2023, 09:39 2023 Data Science Python Notebook.ipynb - Colaboratory
1 7 1005.65 29 6 13
2 11 1007.94 61 13 16
3 13 1008.39 69 18 17
4 10 1007.62 50 8 14

max_dew_pt_2 min_dew_pt_1 min_dew_pt_2 max_pressure_1 max_pressure_2 \

0 10 -2 -2 1009 1008
1 12 0 -2 1008 1009
2 13 6 0 1011 1008
3 16 9 6 1011 1011
4 17 6 9 1010 1011

min_pressure_1 min_pressure_2 rainfall

0 1000 1001 0.0
1 1001 1000 0.0
2 1003 1001 5.0
3 1004 1003 0.0
4 1002 1004 0.0

Display the first 10 rows of data by modifying the function above

print (dataframe.head(10))

date mean_temperature max_temperature min_temperature \

0 2016-05-04 34 41 27
1 2016-05-05 31 38 24
2 2016-05-06 28 34 21
3 2016-05-07 30 38 23
4 2016-05-08 34 41 26
5 2016-05-09 34 42 27
6 2016-05-10 34 41 27
7 2016-05-11 32 40 25
8 2016-05-12 34 42 27
9 2016-05-13 34 42 26

Mean_dew_pt mean_pressure max_humidity min_humidity max_dew_pt_1 \

0 6 1006.00 27 5 12
1 7 1005.65 29 6 13
2 11 1007.94 61 13 16
3 13 1008.39 69 18 17
4 10 1007.62 50 8 14
5 8 1006.73 32 7 12
6 11 1005.75 45 7 16
7 16 1007.10 51 12 18
8 16 1006.78 66 16 22
9 13 1003.83 58 9 20

max_dew_pt_2 min_dew_pt_1 min_dew_pt_2 max_pressure_1 max_pressure_2 \

0 10 -2 -2 1009 1008
1 12 0 -2 1008 1009
2 13 6 0 1011 1008
3 16 9 6 1011 1011
4 17 6 9 1010 1011
5 14 6 6 1010 1010
6 12 7 6 1008 1010
7 16 13 7 1010 1008
8 18 10 13 1011 1010
9 22 10 10 1007 1011

min_pressure_1 min_pressure_2 rainfall

0 1000 1001 0.0
1 1001 1000 0.0
2 1003 1001 5.0
3 1004 1003 0.0
4 1002 1004 0.0
5 1002 1002 0.0
6 1000 1002 0.3
7 1002 1000 0.8
8 1001 1002 2.0
9 998 1001 0.3

Find out your data type

You can use dtypes to find out the type of data (i.e. string, float, integer) you have.

dataframe.dtypes

date object
mean_temperature int64
max_temperature int64
min_temperature int64
Mean_dew_pt int64
mean_pressure float64

https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 2/15
29/04/2023, 09:39 2023 Data Science Python Notebook.ipynb - Colaboratory
max_humidity int64
min_humidity int64
max_dew_pt_1 int64
max_dew_pt_2 int64
min_dew_pt_1 int64
min_dew_pt_2 int64
max_pressure_1 int64
max_pressure_2 int64
min_pressure_1 int64
min_pressure_2 int64
rainfall float64
dtype: object

Remove unwanted columns

Looks like there are 16 columns in this dataset and we don't need all of them for the purposes of this activity. One way to go about doing this, is
to drop the columns that we don't need. Pandas provide an easy way for us to drop columns using the ".drop" function.

dataframe = dataframe.drop(["max_dew_pt_2"], axis=1) # no output will be generated , the column will be removed

Let's print to ensure that the columns are dropped, try printing them with head() or dtypes.

dataframe.dtypes

date object
mean_temperature int64
max_temperature int64
min_temperature int64
Mean_dew_pt int64
mean_pressure float64
max_humidity int64
min_humidity int64
max_dew_pt_1 int64
min_dew_pt_1 int64
min_dew_pt_2 int64
max_pressure_1 int64
max_pressure_2 int64
min_pressure_1 int64
min_pressure_2 int64
rainfall float64
dtype: object

Drop the following columns: (min_dew_pt_2, max_pressure_2, min_pressure_2)

dataframe = dataframe.drop(["min_dew_pt_2", "max_pressure_2", "min_pressure_2"], axis=1)

Now check again if these columns have been dropped

dataframe.dtypes

date object
mean_temperature int64
max_temperature int64
min_temperature int64
Mean_dew_pt int64
mean_pressure float64
max_humidity int64
min_humidity int64
max_dew_pt_1 int64
min_dew_pt_1 int64
max_pressure_1 int64
min_pressure_1 int64
rainfall float64
dtype: object

Great! We can now focus on this set of data!

Sorting values using pandas

Many times, you want to have a sense of range of data to help you understand more about it. Another feature of pandas dataframe is sorting of
values. You can do so by using the sort_values() function.
https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 3/15
29/04/2023, 09:39 2023 Data Science Python Notebook.ipynb - Colaboratory

jaipur_weather = dataframe.sort_values(by='date',ascending = False)

print(jaipur_weather.head(5))

date mean_temperature max_temperature min_temperature \

675 2018-03-11 26 34 18
674 2018-03-10 26 34 19
673 2018-03-09 26 33 19
672 2018-03-08 24 32 15
671 2018-03-07 24 32 15

Mean_dew_pt mean_pressure max_humidity min_humidity max_dew_pt_1 \

675 4 1013.76 38 6 8
674 3 1014.16 37 8 6
673 1 1014.41 42 7 5
672 2 1014.07 55 5 8
671 4 1015.39 48 6 9

min_dew_pt_1 max_pressure_1 min_pressure_1 rainfall

675 0 1017 1009 0.0
674 -1 1017 1009 0.0
673 -5 1017 1011 0.0
672 -6 1017 1011 0.0
671 -3 1018 1012 0.0

What do you notice from the number? Look at the date. Can you see how the function help us sort data based on the date?

Sort the values in ascending order of mean temperature and print the first 5 rows

jaipur_weather = dataframe.sort_values(by='mean_temperature',ascending = True)

print(jaipur_weather.head(5))

date mean_temperature max_temperature min_temperature \

252 2017-01-11 10 18 3
253 2017-01-12 12 19 4
254 2017-01-13 12 20 4
255 2017-01-14 12 20 5
258 2017-01-17 12 20 5

Mean_dew_pt mean_pressure max_humidity min_humidity max_dew_pt_1 \

252 3 1017.00 94 17 9
253 -3 1017.54 70 13 2
254 -5 1017.24 75 4 2
255 -1 1017.75 70 10 1
258 3 1017.35 74 15 7

min_dew_pt_1 max_pressure_1 min_pressure_1 rainfall

252 -5 1019 1015 0.0
253 -7 1020 1015 0.0
254 -93 1020 1015 0.0
255 -8 1020 1016 0.0
258 -2 1019 1015 0.0

Look at the max and min temperature! See the range of temperature that one can experience within a day.

Sort the values in descending order of mean temperature and print the first 5 rows

jaipur_weather = dataframe.sort_values(by='mean_temperature',ascending = False)

print(jaipur_weather.head(5))

date mean_temperature max_temperature min_temperature \

32 2016-06-05 38 45 31
15 2016-05-19 38 46 29
31 2016-06-04 38 44 31
34 2016-06-07 38 45 30
35 2016-06-08 38 44 31

Mean_dew_pt mean_pressure max_humidity min_humidity max_dew_pt_1 \

32 5 1004.67 27 4 18
15 11 999.88 45 5 17
31 13 1004.93 34 10 18
34 13 1003.29 51 5 21
35 12 1002.83 47 4 22

min_dew_pt_1 max_pressure_1 min_pressure_1 rainfall

32 2 1007 999 0.0
15 6 1002 994 0.0
31 7 1008 999 0.0
34 5 1007 997 0.0
35 2 1006 996 0.0

df 1 d d (" d t " 3)
https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 4/15
29/04/2023, 09:39 2023 Data Science Python Notebook.ipynb - Colaboratory
dframe1=pd.read_csv("mydata.csv",nrows=3)
print(dframe1)

date mean_temperature max_temperature min_temperature \

0 2016-05-04 34 41 27
1 2016-05-05 31 38 24
2 2016-05-06 28 34 21

Mean_dew_pt mean_pressure max_humidity min_humidity max_dew_pt_1 \

0 6 1006.00 27 5 12
1 7 1005.65 29 6 13
2 11 1007.94 61 13 16

max_dew_pt_2 min_dew_pt_1 min_dew_pt_2 max_pressure_1 max_pressure_2 \

0 10 -2 -2 1009 1008
1 12 0 -2 1008 1009
2 13 6 0 1011 1008

min_pressure_1 min_pressure_2 rainfall

0 1000 1001 0
1 1001 1000 0
2 1003 1001 5

dframe2=pd.read_csv("mydata.csv", usecols=['date','mean_temperature'])
print(dframe2.head(10))

date mean_temperature
0 2016-05-04 34
1 2016-05-05 31
2 2016-05-06 28
3 2016-05-07 30
4 2016-05-08 34
5 2016-05-09 34
6 2016-05-10 34
7 2016-05-11 32
8 2016-05-12 34
9 2016-05-13 34

Now we have a clearer picture of our dataset. Using these functions, we can analyze our data and gain insights of them.

However, we want to get an even better picture. We want to learn how to explore these data visually.

Let's now use the matplotlib library to help us with data visualization in Python.

Importing matplotlib
Matplotlib is a Python 2D plotting library that we can use to produce high quality data visualization. It is highly usable (as you will soon find out),
you can create simple and complex graphs with just a few lines of codes!

Now let's load matplotlib to start plotting some graphs

import matplotlib.pyplot as plt

import numpy as np

Scatter plot
Scatter plots use a collection of points on a graph to display values from two variables. This allow us to see if there is any relationship or
correlation between the two variables.

Let's see how mean temperature changes over the years!

x = dataframe.date
y = dataframe.mean_temperature

plt.scatter(x,y)
plt.show()

https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 5/15
29/04/2023, 09:39 2023 Data Science Python Notebook.ipynb - Colaboratory

Do you see that the x axis is filled with a thick line, and that there's no tick label available? This makes us unable to analyze the data.

Let's try to modify this scatter plot so that we can see the ticks!

Choose only several ticks

The first thing we are going to do is to then reduce the number of ticks/ points for the x axis. We do this using the np.arrange function as below:

plt.scatter(x,y)
plt.xticks(np.arange(0, 731, 180)) #numpy.arange(start, stop, step)
plt.show()

What is the interval you use so that you can see all the dates? Do you notice that now we are only having very few ticks?

Let's try to rotate our ticks. See the example on Stackoverflow!

Note: Stackoverflow is a site where technical personnel gather and share their knowledge. You can search for any queries over the site and see
if there are already others who solve it!

Rotate our x ticks label so that we can see more ticks more clearly

plt.scatter(x,y)
plt.xticks(np.arange(0, 731, 60))
plt.xticks (rotation=90)
plt.show()

https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 6/15
29/04/2023, 09:39 2023 Data Science Python Notebook.ipynb - Colaboratory

Now we can see the x-ticks clearly.

Notice how temperature changes according to the time of the year. Compare it with this website. Does it inform you when to best plant your
crop?

Giving label to the x and y axis

You can also give label to the x and y axis. This will make it easier for you to visualise and share your data.

plt.scatter(x,y)
plt.xticks(np.arange(0, 731, 60))
plt.xticks (rotation=30)

# Add x and y labels and set a font size

plt.xlabel ("Date", fontsize = 14)
plt.ylabel ("Mean Temperature", fontsize = 14)

plt.show()

Looks good!

Now, let's add a title.

See how to do it here.

plt.scatter(x,y)
plt.xticks(np.arange(0, 731, 60))
plt.xticks (rotation=30)

# Add x and y labels and set a font size

plt.xlabel ("Date", fontsize = 14)
plt.ylabel ("Mean Temperature", fontsize = 14)
plt.title('Mean Temperature at Jaipur')

plt.show()

https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 7/15
29/04/2023, 09:39 2023 Data Science Python Notebook.ipynb - Colaboratory

Task 11: Change the title size to be bigger than the x and y labels!

plt.scatter(x,y)
plt.xticks(np.arange(0, 731, 60))
plt.xticks (rotation=30)

# Add x and y labels, title and set a font size

plt.xlabel ("Date", fontsize = 14)
plt.ylabel ("Mean Temperature", fontsize = 14)
plt.title('Mean Temperature at Jaipur', fontsize = 20)

plt.show()

Change your marker shape!

# Change the default figure size (default figure size is 6.4 for the width and 4.8 for the height (in inches))
plt.figure(figsize=(10,10)) #figure(figsize=(WIDTH_SIZE,HEIGHT_SIZE))

plt.scatter(x,y, marker='*') #https://matplotlib.org/stable/api/markers_api.html check the website for more marker styles
plt.xticks(np.arange(0, 731, 60))
plt.xticks (rotation=30)

# Add x and y labels, title and set a font size

plt.xlabel ("Date", fontsize = 24)
plt.ylabel ("Mean Temperature", fontsize = 24)
plt.title('Mean Temperature at Jaipur', fontsize = 30)

# Set the font size of the number labels on the axes

plt.xticks (fontsize = 12)
plt.yticks (fontsize = 12)

plt.xticks (rotation=30, horizontalalignment='right')

https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 8/15
29/04/2023, 09:39 2023 Data Science Python Notebook.ipynb - Colaboratory

plt.show()

Changing color
You can also change the marker color. Check out the code below which show you how to do it!

# Change the default figure size

plt.figure(figsize=(10,10))

plt.scatter(x,y, c='green', marker='*')

plt.xticks(np.arange(0, 731, 60))
plt.xticks (rotation=30)

# Add x and y labels, title and set a font size

plt.xlabel ("Date", fontsize = 24)
plt.ylabel ("Mean Temperature", fontsize = 24)
plt.title('Mean Temperature at Jaipur', fontsize = 30)

# Set the font size of the number labels on the axes

plt.xticks (fontsize = 12)
plt.yticks (fontsize = 12)

plt.xticks (rotation=30, horizontalalignment='right')

plt.show()

https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 9/15
29/04/2023, 09:39 2023 Data Science Python Notebook.ipynb - Colaboratory

Saving plot
You can use plt.savefig("figurename.png") to save the figure. The command should be written before plt.show() command

plt.savefig("graph1.png")

<Figure size 640x480 with 0 Axes>

Line Plots

Besides showing relationship using scatter plot, time data as above can also be represented with a line plot. Let's see how this is done!

plt.figure(figsize=(10,10))
y = dataframe.mean_temperature

plt.plot(x,y, "o:r") #the points are marked with circle and connected via dotted lines in red color ; refer to https://www.w3s
plt.ylabel("Mean Temperature")
plt.xlabel("Time")

plt.xticks(np.arange(0, 731, 60) , rotation=30)

plt.xticks()

plt.show()

https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 10/15
29/04/2023, 09:39 2023 Data Science Python Notebook.ipynb - Colaboratory

Change the labels and add title so that it is clearer and easier for you to show this graph
to others

Drawing multiple lines in a plot

x = dataframe.date
y_1 = dataframe.max_temperature
y_2 = dataframe.min_temperature

plt.plot(x,y_1, label = "Max temp")

plt.plot(x,y_2, label = "Min temp")

plt.xticks(np.arange(0, 731, 60))

plt.xticks (rotation=30)

plt.legend()
plt.show()

https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 11/15
29/04/2023, 09:39 2023 Data Science Python Notebook.ipynb - Colaboratory

Draw at least 3 line graphs in one plot!

x = dataframe.date
y_1 = dataframe.max_temperature
y_2 = dataframe.min_temperature
y_3 = dataframe.mean_temperature

z = y_1-y_2

plt.plot(x,y_1, label = "Max temp")

plt.plot(x,y_2, label = "Min temp")
plt.plot(x,y_3, label = "Mean temp")
plt.plot(x,z, label = "range")

plt.xticks(np.arange(0, 731, 60))

plt.xticks (rotation=30)

plt.legend()
plt.show()

Bar Charts

import matplotlib.pyplot as plt

import numpy as np

plt.figure(figsize=(10,10))

plt.bar(x,y, align='center')

plt.show()

https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 12/15
29/04/2023, 09:39 2023 Data Science Python Notebook.ipynb - Colaboratory

Great! You have now gained the ability to visualize data using matplotlib.

#creating pie chart

z=[12,23,34,45,56]
plt.pie(z)
plt.show()

#By default the plotting of the first wedge starts from the x-axis and moves counterclockwise

z=[23,34,45,56]
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
myexplode = [0.2, 0, 0, 0]

plt.pie(z, labels = mylabels, explode = myexplode) #The explode parameter, if specified, and not None, must be an array with o
#Each value represents how far from the center each wedge is displayed
plt.show()

https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 13/15
29/04/2023, 09:39 2023 Data Science Python Notebook.ipynb - Colaboratory

z=[23,34,45,56]
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
myexplode = [0.2, 0, 0, 0]
mycolors = ["black", "pink", "b", "#4CAF50"]

plt.pie(z, labels = mylabels, explode = myexplode,colors = mycolors)

plt.legend()

plt.show()

#creating histogram
#A histogram is a graph showing frequency distributions.
#It is a graph showing the number of observations within each given interval.

plt.hist(y)
plt.show()

https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 14/15
29/04/2023, 09:39 2023 Data Science Python Notebook.ipynb - Colaboratory

check 0s completed at 09:13

https://colab.research.google.com/drive/1t6dZnaLbMdyIjY39F5ueQ8wrCQ0ypOv1#scrollTo=aYokM5PqFVdb&printMode=true 15/15

JCPDS Card 00-019-0629
No ratings yet
JCPDS Card 00-019-0629
3 pages
Cofe2o4 Jcpds Data Card
100% (1)
Cofe2o4 Jcpds Data Card
3 pages
Temperature Log
No ratings yet
Temperature Log
1 page
Yulu Case Study
No ratings yet
Yulu Case Study
1 page
PV Breaker Inspection - Maintenance, Check Density of Liquid - Antifreezev
100% (1)
PV Breaker Inspection - Maintenance, Check Density of Liquid - Antifreezev
3 pages
JCPDScardno 024-0735
No ratings yet
JCPDScardno 024-0735
3 pages
HR excel dashboard templates 01
No ratings yet
HR excel dashboard templates 01
17 pages
raob_data_example.ipynb
No ratings yet
raob_data_example.ipynb
484 pages
Database Fe2o3 Gamma
No ratings yet
Database Fe2o3 Gamma
3 pages
Trip 20230529 060328
No ratings yet
Trip 20230529 060328
213 pages
JCPDS Card 00-017-0306
No ratings yet
JCPDS Card 00-017-0306
3 pages
CPDScardno 044-0141
No ratings yet
CPDScardno 044-0141
3 pages
Sag Tension Report
No ratings yet
Sag Tension Report
3 pages
explainable-ai-driven-rainfall-prediction-using-dl
No ratings yet
explainable-ai-driven-rainfall-prediction-using-dl
66 pages
Data Curah Hujan Kelompok 8
No ratings yet
Data Curah Hujan Kelompok 8
60 pages
676 Rows × 17 Columns: Import As
0% (1)
676 Rows × 17 Columns: Import As
2 pages
Ggggg
No ratings yet
Ggggg
32 pages
Booktttt1
No ratings yet
Booktttt1
27 pages
Data Hujan Harian - 2
No ratings yet
Data Hujan Harian - 2
92 pages
Weather Dataset Stage1
No ratings yet
Weather Dataset Stage1
14 pages
HYDRAULIC-ANALYSIS-FOR-PIPE-NETWORKS
No ratings yet
HYDRAULIC-ANALYSIS-FOR-PIPE-NETWORKS
17 pages
The Sas System
No ratings yet
The Sas System
16 pages
Tcs EDA Question
0% (1)
Tcs EDA Question
5 pages
Pos Hujan-Pringsewu (R006)
No ratings yet
Pos Hujan-Pringsewu (R006)
10 pages
Data Meteorologico SETIEMBRE ASV 2023
No ratings yet
Data Meteorologico SETIEMBRE ASV 2023
52 pages
RESULTS
No ratings yet
RESULTS
26 pages
PYTHON - Record Programs
No ratings yet
PYTHON - Record Programs
15 pages
Expt_6_Surface pressure distribution on symmetric airfoil_2025 (1) (AutoRecovered).docx
No ratings yet
Expt_6_Surface pressure distribution on symmetric airfoil_2025 (1) (AutoRecovered).docx
19 pages
Kaolinite Database
No ratings yet
Kaolinite Database
4 pages
DM Project - Step 4
No ratings yet
DM Project - Step 4
11 pages
Data
No ratings yet
Data
31 pages
JCPDS Card 00-022-1086
No ratings yet
JCPDS Card 00-022-1086
3 pages
Week 1 Numericals
No ratings yet
Week 1 Numericals
21 pages
weather_dataset_stage1
No ratings yet
weather_dataset_stage1
14 pages
Week 13 1-Pandas
No ratings yet
Week 13 1-Pandas
10 pages
Atl Jan 2012
No ratings yet
Atl Jan 2012
2 pages
Assessment On Tuple
No ratings yet
Assessment On Tuple
8 pages
Maghda Zakiyah Muthi'Ah - Colab
No ratings yet
Maghda Zakiyah Muthi'Ah - Colab
4 pages
Appendix H - GEXOL-HF Multi-Conductor Control Cable, LSZH, 0.6-1kV, 90 ̊C
No ratings yet
Appendix H - GEXOL-HF Multi-Conductor Control Cable, LSZH, 0.6-1kV, 90 ̊C
3 pages
Untitled 23
No ratings yet
Untitled 23
4 pages
9443 Report
No ratings yet
9443 Report
3 pages
Appendix J - GEXOL-HF Shielded Pair Instrumentation Cable, LSZH, 0.6-1kV, 90 ̊C
No ratings yet
Appendix J - GEXOL-HF Shielded Pair Instrumentation Cable, LSZH, 0.6-1kV, 90 ̊C
2 pages
Python Crash Course by Ehmatthes 16
No ratings yet
Python Crash Course by Ehmatthes 16
1 page
Department of Statistics Course STATS 330 Model Answer For Assignment 3, 2005
No ratings yet
Department of Statistics Course STATS 330 Model Answer For Assignment 3, 2005
8 pages
01-078-1332
No ratings yet
01-078-1332
4 pages
AI Data Science Practical
No ratings yet
AI Data Science Practical
9 pages
2022091812
No ratings yet
2022091812
2 pages
Data Analysis Dummy Report: 0. Data Import and Cleaning
No ratings yet
Data Analysis Dummy Report: 0. Data Import and Cleaning
1 page
MoS2 01 073 1508
No ratings yet
MoS2 01 073 1508
3 pages
Agartala STAAD-1
No ratings yet
Agartala STAAD-1
302 pages
Calculation of Service Life: Load Data
No ratings yet
Calculation of Service Life: Load Data
4 pages
CASH
No ratings yet
CASH
3 pages
Cobalt Hydroxide
No ratings yet
Cobalt Hydroxide
2 pages
Name and Formula: Reference Code: 00-022-0093 PDF Index Name: Barium Strontium Niobium Oxide Empirical Formula: Ba
No ratings yet
Name and Formula: Reference Code: 00-022-0093 PDF Index Name: Barium Strontium Niobium Oxide Empirical Formula: Ba
3 pages
Cs3353 Foundations of Data Science Unit V
No ratings yet
Cs3353 Foundations of Data Science Unit V
13 pages
Course - Python For Data Analysis
No ratings yet
Course - Python For Data Analysis
11 pages
grade 12 worksheets_chapter1
No ratings yet
grade 12 worksheets_chapter1
19 pages
00-013-0558 Magnesium Silicate Hydroxide - Talc-2M
No ratings yet
00-013-0558 Magnesium Silicate Hydroxide - Talc-2M
3 pages
Python Basic Data Analysis 20180412
No ratings yet
Python Basic Data Analysis 20180412
53 pages
Camera Jammer Synopsis Electronics Engineering Project Erole Technologies PVT LTD
100% (1)
Camera Jammer Synopsis Electronics Engineering Project Erole Technologies PVT LTD
11 pages
AIot Lab Syllabus
No ratings yet
AIot Lab Syllabus
4 pages
CSE 4102 - Artificial Intelligence Lab Manual
No ratings yet
CSE 4102 - Artificial Intelligence Lab Manual
42 pages
Describe Artificial Intelligence and Machine Learning
No ratings yet
Describe Artificial Intelligence and Machine Learning
27 pages
Data Analysis Using Python (Python For Beginners) - CloudxLab
No ratings yet
Data Analysis Using Python (Python For Beginners) - CloudxLab
152 pages
Pag-Asa Climatological Normals
100% (1)
Pag-Asa Climatological Normals
2 pages
Ge3171-Python Lab
No ratings yet
Ge3171-Python Lab
82 pages
Kendriya Vidyalaya Sangathan Regional Office, Jabalpur Region
No ratings yet
Kendriya Vidyalaya Sangathan Regional Office, Jabalpur Region
24 pages
Aryan Sunil Mishra
No ratings yet
Aryan Sunil Mishra
1 page
NumPy Functions
No ratings yet
NumPy Functions
5 pages
Shraddha
No ratings yet
Shraddha
29 pages
Football Match Data Analysis Using Machine Learning: Bachelor of Science (Information Technology)
No ratings yet
Football Match Data Analysis Using Machine Learning: Bachelor of Science (Information Technology)
24 pages
Soft Computing
No ratings yet
Soft Computing
38 pages
Phython Practical Notebook
No ratings yet
Phython Practical Notebook
14 pages
Graduation Project Report - Turki Ali - Ahmed Eid
No ratings yet
Graduation Project Report - Turki Ali - Ahmed Eid
28 pages
Python For Sciences and Engineering
100% (2)
Python For Sciences and Engineering
89 pages
Python AI Overview
No ratings yet
Python AI Overview
10 pages
Introduction To Numpy Exercise
No ratings yet
Introduction To Numpy Exercise
24 pages
Vapor Pressure of Liquid Water and Ice As A Function of Temperature
No ratings yet
Vapor Pressure of Liquid Water and Ice As A Function of Temperature
2 pages
Python Assignment-13
No ratings yet
Python Assignment-13
2 pages
Numpy Python Cheat Sheet
0% (1)
Numpy Python Cheat Sheet
1 page
Python Program
No ratings yet
Python Program
12 pages
Name and Formula: Merck Index, 8th Ed., P. 272
100% (1)
Name and Formula: Merck Index, 8th Ed., P. 272
2 pages
Project
No ratings yet
Project
15 pages
Application of Python and Data Analytics in Oil and GAs-1
No ratings yet
Application of Python and Data Analytics in Oil and GAs-1
40 pages
Python Basics Nympy
No ratings yet
Python Basics Nympy
5 pages
Fake News Detection
No ratings yet
Fake News Detection
14 pages
Informatics Practicals PDF
No ratings yet
Informatics Practicals PDF
10 pages
Kecerdasan Artifisial Dan Masyarakat - M5
No ratings yet
Kecerdasan Artifisial Dan Masyarakat - M5
8 pages
Big Data, Machine Learning, and Data Mining Explained
From Everand
Big Data, Machine Learning, and Data Mining Explained
Chitrali Kaul
No ratings yet
Big Data for IoT, Cloud, and AI
From Everand
Big Data for IoT, Cloud, and AI
Anasooya Khanna
No ratings yet

Data Science Python Notebook (1)

Uploaded by

Data Science Python Notebook (1)

Uploaded by

29/04/2023, 09:39 2023 Data Science Python Notebook.

You will also use this technique to present your results.

About the Notebook

What do you do next?

Side note: What is csv?

Today, we will learn how to use Python to open csv files.

Use Python to open csv files

Exploring our data

#dataframe.head() means we are getting the first 5 rows of data

date mean_temperature max_temperature min_temperature \

Mean_dew_pt mean_pressure max_humidity min_humidity max_dew_pt_1 \

max_dew_pt_2 min_dew_pt_1 min_dew_pt_2 max_pressure_1 max_pressure_2 \

min_pressure_1 min_pressure_2 rainfall

Display the first 10 rows of data by modifying the function above

date mean_temperature max_temperature min_temperature \

Mean_dew_pt mean_pressure max_humidity min_humidity max_dew_pt_1 \

max_dew_pt_2 min_dew_pt_1 min_dew_pt_2 max_pressure_1 max_pressure_2 \

min_pressure_1 min_pressure_2 rainfall

Find out your data type

Remove unwanted columns

Drop the following columns: (min_dew_pt_2, max_pressure_2, min_pressure_2)

dataframe = dataframe.drop(["min_dew_pt_2", "max_pressure_2", "min_pressure_2"], axis=1)

Now check again if these columns have been dropped

Great! We can now focus on this set of data!

Sorting values using pandas

jaipur_weather = dataframe.sort_values(by='date',ascending = False)

date mean_temperature max_temperature min_temperature \

Mean_dew_pt mean_pressure max_humidity min_humidity max_dew_pt_1 \

min_dew_pt_1 max_pressure_1 min_pressure_1 rainfall

jaipur_weather = dataframe.sort_values(by='mean_temperature',ascending = True)

date mean_temperature max_temperature min_temperature \

Mean_dew_pt mean_pressure max_humidity min_humidity max_dew_pt_1 \

min_dew_pt_1 max_pressure_1 min_pressure_1 rainfall

jaipur_weather = dataframe.sort_values(by='mean_temperature',ascending = False)

date mean_temperature max_temperature min_temperature \

Mean_dew_pt mean_pressure max_humidity min_humidity max_dew_pt_1 \

min_dew_pt_1 max_pressure_1 min_pressure_1 rainfall

date mean_temperature max_temperature min_temperature \

Mean_dew_pt mean_pressure max_humidity min_humidity max_dew_pt_1 \

max_dew_pt_2 min_dew_pt_1 min_dew_pt_2 max_pressure_1 max_pressure_2 \

min_pressure_1 min_pressure_2 rainfall

Now let's load matplotlib to start plotting some graphs

import matplotlib.pyplot as plt

Let's see how mean temperature changes over the years!

Choose only several ticks

Let's try to rotate our ticks. See the example on Stackoverflow!

Now we can see the x-ticks clearly.

Giving label to the x and y axis

# Add x and y labels and set a font size

Now, let's add a title.

# Add x and y labels and set a font size

# Add x and y labels, title and set a font size

Change your marker shape!

# Add x and y labels, title and set a font size

# Set the font size of the number labels on the axes

plt.xticks (rotation=30, horizontalalignment='right')

# Change the default figure size

plt.scatter(x,y, c='green', marker='*')

# Add x and y labels, title and set a font size

# Set the font size of the number labels on the axes

plt.xticks (rotation=30, horizontalalignment='right')

<Figure size 640x480 with 0 Axes>

plt.xticks(np.arange(0, 731, 60) , rotation=30)

Drawing multiple lines in a plot

plt.plot(x,y_1, label = "Max temp")

plt.xticks(np.arange(0, 731, 60))

Draw at least 3 line graphs in one plot!

plt.plot(x,y_1, label = "Max temp")

plt.xticks(np.arange(0, 731, 60))

import matplotlib.pyplot as plt

#creating pie chart

plt.pie(z, labels = mylabels, explode = myexplode,colors = mycolors)

check 0s completed at 09:13

You might also like