Solar Data
Solar Data
0.0.1 2 Dataframes have been created : One with null values and other without it
0.0.2 Overview of the dataframe (initial) is shown below
[4]: data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2784 entries, 0 to 2783
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 2784 non-null object
1 Time 2784 non-null object
2 Actual 1121 non-null float64
3 GHI 1086 non-null float64
4 GTI 1086 non-null float64
dtypes: float64(3), object(2)
memory usage: 108.9+ KB
[5]: dropped_data.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1082 entries, 28 to 2755
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 1082 non-null object
1
1 Time 1082 non-null object
2 Actual 1082 non-null float64
3 GHI 1082 non-null float64
4 GTI 1082 non-null float64
dtypes: float64(3), object(2)
memory usage: 50.7+ KB
[6]: data.describe()
[7]: dropped_data.describe()
[8]: data.corr(numeric_only=True)
[9]: dropped_data.corr(numeric_only=True)
2
[10]: #--> GHI and GTI are highly correlated with Actual power generated values
3
[14]: # Spread of Actual power generated represented through histogram
sns.histplot(data=data, x='Actual', bins=25, kde=True)
4
[16]: # Spread of GTI represented through histogram
sns.histplot(data=data, x='GTI', bins=20, kde=True)
5
0.0.6 Correlation of GHI and GTI with DateTime using ScatterPlots
plt.xticks(rotation=45)
plt.show()
6
[18]: sns.scatterplot(data=data.dropna(), x='DateTime', y='GTI', label='Global Tilted␣
↪Irradiance (GTI)')
plt.xticks(rotation=45)
plt.show()
7
[19]: # --> Analysis through Scatter Plot infers the uniform distrubtion of GHI and␣
↪GTI values over the month is same
plt.xticks(rotation=45)
plt.show()
8
[21]: sns.lineplot(data=data.dropna(), x='DateTime', y='GTI', label='Global Tilted␣
↪Irradiance (GTI)')
plt.xticks(rotation=45)
plt.show()
9
0.0.8 BoxPlot Analysis to extract median, interquartile range and to look on existence
of outliers
[31]: plt.figure(figsize=(5, 3))
sns.boxplot(x='Actual', data=data.dropna())
plt.title('Box plot of Actual Power Generated')
plt.show()
'''
Inference which can be drawn through boxplot are the avg power generated is␣
↪around 150
outlier_value = max(dropped_data['Actual'])
min_value = min(dropped_data['Actual'])
print('Outlier Value-->',outlier_value)
10
print('Minimum Value-->',min_value)
'''
Inference which can be drawn through boxplot are the avg power generated is␣
↪around 470
11
Maximum Value--> 891.21
Minimum Value--> 2.07
'''
Inference which can be drawn through boxplot are the avg power generated is␣
↪around 530
outlier_value = max(dropped_data['GTI'])
min_value = min(dropped_data['GTI'])
print('Maximum Value-->',outlier_value)
print('Minimum Value-->',min_value)
12
Maximum Value--> 1012.71
Minimum Value--> 4.21
13
[26]: print(data[data['Actual'].isna()]['DateTime'].dt.time.unique())
'''
This command shows that from timings 00:00 to 06:45 and from 17:15 to 23:59 (in␣
↪general) data has been not collected due to absence of sunlight
'''
14
datetime.time(22, 15) datetime.time(22, 30) datetime.time(22, 45)
datetime.time(23, 0) datetime.time(23, 15) datetime.time(23, 30)
datetime.time(23, 45) datetime.time(7, 0) datetime.time(7, 15)
datetime.time(7, 30) datetime.time(7, 45) datetime.time(8, 0)
datetime.time(8, 15) datetime.time(8, 30) datetime.time(8, 45)
datetime.time(9, 0) datetime.time(9, 15) datetime.time(9, 30)
datetime.time(9, 45) datetime.time(10, 0) datetime.time(10, 15)
datetime.time(10, 30) datetime.time(10, 45) datetime.time(11, 0)
datetime.time(15, 15) datetime.time(15, 30)]
[26]: '\nThis command shows that from timings 00:00 to 06:45 and from 17:15 to 23:59
(in general) data has been not collected due to absence of sunlight\n'
0.0.10 Code to generate Pandas Profiling(Ydata Profiling) Report for overall view of
data in concise
[27]: cd = ProfileReport(data)
pd_profile_report = ProfileReport(dropped_data)
[28]: cd.to_file('solar.html')
[29]: pd_profile_report.to_file('solar_dropped.html')
[ ]:
15