Lab Exercise 3
Lab Exercise 3
CS0009
FUNDAMENTAL OF ANALYTICS
LAB EXERCISE
3
DATA CORRELATION
The corr() method calculates the relationship between each column in your data set.
Example
Show the relationship between the columns:
df.corr()
Result:
Result Explained:
The Result of the corr() method is a table with a lot of numbers that represents how well the
relationship is between two columns.
1 means that there is a 1 to 1 relationship (a perfect correlation), and for this data set, each time a
value went up in the first column, the other one went up as well.
0.9 is also a good relationship, and if you increase one value, the other will probably increase as
well.
-0.9 would be just as good relationship as 0.9, but if you increase one value, the other will
probably go down.
0.2 means NOT a good relationship, meaning that if one value goes up does not mean that the
other will.
What is a good correlation? It depends on the use, but I think it is safe to say you have to have at
least 0.6 (or -0.6) to call it a good correlation.
Perfect Correlation:
We can see that "Duration" and "Duration" got the number 1.000000, which makes sense, each
column always has a perfect relationship with itself.
TASK:
Follow the given steps in finding data correlation using Pandas in Python:
import pandas as pd
df = pd.read_csv('mydataset.csv')
For Example:
import pandas as pd
df = pd.read_csv('mydataset.csv')
print(df.corr(‘pearson’))
Question and Answer:
Source Code:
Screenshot:
Explanation:
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
2. Find and identify data with bad correlation.
Source Code:
Screenshot:
Explanation:
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
3. What do you notice when you use different method in corr() function?
Source Code:
Screenshot:
Explanation:
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
References:
https://www.w3schools.com/python/pandas/pandas_correlations.asp