0% found this document useful (0 votes)
2 views6 pages

Lab Exercise 3

This document outlines a lab exercise for a computer science course focused on data correlation using the Pandas module in Python. It explains the use of the corr() method to calculate relationships between data columns, provides examples of interpreting correlation values, and includes tasks for students to perform using Python code. Additionally, it prompts students to identify good and bad correlations and to explore different correlation methods.

Uploaded by

Haez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views6 pages

Lab Exercise 3

This document outlines a lab exercise for a computer science course focused on data correlation using the Pandas module in Python. It explains the use of the corr() method to calculate relationships between data columns, provides examples of interpreting correlation values, and includes tasks for students to perform using Python code. Additionally, it prompts students to identify good and bad correlations and to explore different correlation methods.

Uploaded by

Haez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

COMPUTER SCIENCE DEPARTMENT

CS0009
FUNDAMENTAL OF ANALYTICS

LAB EXERCISE

3
DATA CORRELATION

Name of Student Name of Professor


Data Performed Date Submitted
Finding Relationships
A great aspect of the Pandas module is the corr() method.

The corr() method calculates the relationship between each column in your data set.

The examples in this page uses a CSV file called: 'data.csv'.

Download data.csv. or Open data.csv

Example
Show the relationship between the columns:

df.corr()

Result:

Duration Pulse Maxpulse Calories


Duration 1.000000 -0.155408 0.009403 0.922721
Pulse -0.155408 1.000000 0.786535 0.025120
Maxpulse 0.009403 0.786535 1.000000 0.203814
Calories 0.922721 0.025120 0.203814 1.000000

Note: The corr() method ignores "not numeric" columns.

Result Explained:
The Result of the corr() method is a table with a lot of numbers that represents how well the
relationship is between two columns.

The number varies from -1 to 1.

1 means that there is a 1 to 1 relationship (a perfect correlation), and for this data set, each time a
value went up in the first column, the other one went up as well.

0.9 is also a good relationship, and if you increase one value, the other will probably increase as
well.

-0.9 would be just as good relationship as 0.9, but if you increase one value, the other will
probably go down.

0.2 means NOT a good relationship, meaning that if one value goes up does not mean that the
other will.

What is a good correlation? It depends on the use, but I think it is safe to say you have to have at
least 0.6 (or -0.6) to call it a good correlation.

Perfect Correlation:
We can see that "Duration" and "Duration" got the number 1.000000, which makes sense, each
column always has a perfect relationship with itself.

TASK:

Open any tool in writing Python program.

Follow the given steps in finding data correlation using Pandas in Python:

NOTE: Take a screenshot of Steps 1-2.

A. To show relationship between columns, type the given codes below:

import pandas as pd

df = pd.read_csv('mydataset.csv')

#for showing relationship of all the columns


print(df.corr())

#for showing relationship between two specific columns


print(df["Duration"].corr(df["Calories"]))

B. Add a parameter method in corr() function.

method : pearson: standard correlation coefficient


kendall: Kendall Tau correlation coefficient
spearman: Spearman rank correlation

For Example:

import pandas as pd

df = pd.read_csv('mydataset.csv')

print(df.corr(‘pearson’))
Question and Answer:

1. Find and identify data with good correlation.

Source Code:

Screenshot:

Explanation:
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
2. Find and identify data with bad correlation.

Source Code:

Screenshot:

Explanation:
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
3. What do you notice when you use different method in corr() function?

Source Code:

Screenshot:

Explanation:
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________

References:
https://www.w3schools.com/python/pandas/pandas_correlations.asp

You might also like