0% found this document useful (0 votes)
29 views15 pages

Solar Data

Uploaded by

Rajole Deepak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views15 pages

Solar Data

Uploaded by

Rajole Deepak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

solar-data

April 27, 2024

[1]: import pandas as pd


import seaborn as sns
import matplotlib.pyplot as plt
from ydata_profiling import ProfileReport

[2]: #Loading Data


data = pd.read_excel("solar Actual, GHI and GTI.xlsx")

[3]: #Dropping Null values


dropped_data = data.dropna(axis=0)

0.0.1 2 Dataframes have been created : One with null values and other without it
0.0.2 Overview of the dataframe (initial) is shown below

[4]: data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2784 entries, 0 to 2783
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 2784 non-null object
1 Time 2784 non-null object
2 Actual 1121 non-null float64
3 GHI 1086 non-null float64
4 GTI 1086 non-null float64
dtypes: float64(3), object(2)
memory usage: 108.9+ KB

[5]: dropped_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1082 entries, 28 to 2755
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 1082 non-null object

1
1 Time 1082 non-null object
2 Actual 1082 non-null float64
3 GHI 1082 non-null float64
4 GTI 1082 non-null float64
dtypes: float64(3), object(2)
memory usage: 50.7+ KB

0.0.3 Statistical Represntation of Data

[6]: data.describe()

[6]: Actual GHI GTI


count 1121.000000 1086.000000 1086.000000
mean 134.684335 458.477127 519.267716
std 77.546991 250.283042 290.885248
min 0.630000 2.070000 4.210000
25% 60.890000 240.112500 264.360000
50% 143.180000 476.080000 531.575000
75% 203.390000 685.267500 778.825000
max 566.650000 891.210000 1012.710000

[7]: dropped_data.describe()

[7]: Actual GHI GTI


count 1082.000000 1082.000000 1082.000000
mean 138.364427 459.203355 519.917468
std 76.385456 250.163863 290.804078
min 0.630000 2.070000 4.210000
25% 70.985000 240.362500 266.460000
50% 147.995000 476.780000 531.575000
75% 205.510000 686.077500 779.162500
max 566.650000 891.210000 1012.710000

[8]: data.corr(numeric_only=True)

[8]: Actual GHI GTI


Actual 1.000000 0.956921 0.948010
GHI 0.956921 1.000000 0.987347
GTI 0.948010 0.987347 1.000000

[9]: dropped_data.corr(numeric_only=True)

[9]: Actual GHI GTI


Actual 1.000000 0.956921 0.948010
GHI 0.956921 1.000000 0.987368
GTI 0.948010 0.987368 1.000000

2
[10]: #--> GHI and GTI are highly correlated with Actual power generated values

0.0.4 Feature Engineering

[11]: data['DateTime'] = pd.to_datetime(data['Date'] + ' ' + data['Time'])

[12]: data.drop(['Date', 'Time'], axis=1, inplace=True)

0.0.5 Graphical Interpretation

[13]: # Overview of all numeric column's pairplot


sns.pairplot(data=data.dropna(), diag_kind='kde')

[13]: <seaborn.axisgrid.PairGrid at 0x188485cbb20>

3
[14]: # Spread of Actual power generated represented through histogram
sns.histplot(data=data, x='Actual', bins=25, kde=True)

[14]: <Axes: xlabel='Actual', ylabel='Count'>

[15]: # Spread of GHI represented through histogram


sns.histplot(data=data, x='GHI ', bins=20, kde=True)

[15]: <Axes: xlabel='GHI ', ylabel='Count'>

4
[16]: # Spread of GTI represented through histogram
sns.histplot(data=data, x='GTI', bins=20, kde=True)

[16]: <Axes: xlabel='GTI', ylabel='Count'>

5
0.0.6 Correlation of GHI and GTI with DateTime using ScatterPlots

[17]: sns.scatterplot(data=data.dropna(), x='DateTime', y='GHI ', label='Global␣


↪Horizontal Irradiance (GHI)')

plt.xticks(rotation=45)
plt.show()

6
[18]: sns.scatterplot(data=data.dropna(), x='DateTime', y='GTI', label='Global Tilted␣
↪Irradiance (GTI)')

plt.xticks(rotation=45)
plt.show()

7
[19]: # --> Analysis through Scatter Plot infers the uniform distrubtion of GHI and␣
↪GTI values over the month is same

0.0.7 LinePlot Analysis which is analogous to above ScatterPlot

[20]: sns.lineplot(data=data.dropna(), x='DateTime', y='GHI ', label='Global␣


↪Horizontal Irradiance (GHI)')

plt.xticks(rotation=45)
plt.show()

8
[21]: sns.lineplot(data=data.dropna(), x='DateTime', y='GTI', label='Global Tilted␣
↪Irradiance (GTI)')

plt.xticks(rotation=45)
plt.show()

9
0.0.8 BoxPlot Analysis to extract median, interquartile range and to look on existence
of outliers
[31]: plt.figure(figsize=(5, 3))
sns.boxplot(x='Actual', data=data.dropna())
plt.title('Box plot of Actual Power Generated')
plt.show()

'''
Inference which can be drawn through boxplot are the avg power generated is␣
↪around 150

25 percentile is around 80 and 75 percentile range is around 210


there are some outliers at 600+ but general range is around 0 - 300
'''

outlier_value = max(dropped_data['Actual'])
min_value = min(dropped_data['Actual'])
print('Outlier Value-->',outlier_value)

10
print('Minimum Value-->',min_value)

Outlier Value--> 566.65


Minimum Value--> 0.63

[32]: plt.figure(figsize=(5, 3))


sns.boxplot(x='GHI ', data=data.dropna())
plt.title('Box plot of GHI')
plt.show()

'''
Inference which can be drawn through boxplot are the avg power generated is␣
↪around 470

25 percentile is around 240 and 75 percentile range is around 690


No outliers can be observed
'''

outlier_value = max(dropped_data['GHI '])


min_value = min(dropped_data['GHI '])
print('Maximum Value-->',outlier_value)
print('Minimum Value-->',min_value)

11
Maximum Value--> 891.21
Minimum Value--> 2.07

[33]: plt.figure(figsize=(5, 3))


sns.boxplot(x='GTI', data=data.dropna())
plt.title('Box plot of GTI')
plt.show()

'''
Inference which can be drawn through boxplot are the avg power generated is␣
↪around 530

25 percentile is around 260 and 75 percentile range is around 780


No outliers are observed
'''

outlier_value = max(dropped_data['GTI'])
min_value = min(dropped_data['GTI'])
print('Maximum Value-->',outlier_value)
print('Minimum Value-->',min_value)

12
Maximum Value--> 1012.71
Minimum Value--> 4.21

0.0.9 Bar Plot of GHI and GTI distribution by Date

[25]: plt.figure(figsize=(10, 6))


sns.barplot(data=dropped_data, x='Date', y='GTI', color='red', label='GTI')
sns.barplot(data=dropped_data, x='Date', y='GHI ', color='orange', label='GHI')
plt.xticks(rotation=90)
plt.xlabel('Date')
plt.ylabel('Value')
plt.title('Bar Plot of GHI and GTI by Date')
plt.legend(title='Variable')
plt.show()

13
[26]: print(data[data['Actual'].isna()]['DateTime'].dt.time.unique())
'''
This command shows that from timings 00:00 to 06:45 and from 17:15 to 23:59 (in␣
↪general) data has been not collected due to absence of sunlight

'''

[datetime.time(0, 0) datetime.time(0, 15) datetime.time(0, 30)


datetime.time(0, 45) datetime.time(1, 0) datetime.time(1, 15)
datetime.time(1, 30) datetime.time(1, 45) datetime.time(2, 0)
datetime.time(2, 15) datetime.time(2, 30) datetime.time(2, 45)
datetime.time(3, 0) datetime.time(3, 15) datetime.time(3, 30)
datetime.time(3, 45) datetime.time(4, 0) datetime.time(4, 15)
datetime.time(4, 30) datetime.time(4, 45) datetime.time(5, 0)
datetime.time(5, 15) datetime.time(5, 30) datetime.time(5, 45)
datetime.time(6, 0) datetime.time(6, 15) datetime.time(6, 30)
datetime.time(6, 45) datetime.time(17, 15) datetime.time(17, 30)
datetime.time(17, 45) datetime.time(18, 0) datetime.time(18, 15)
datetime.time(18, 30) datetime.time(18, 45) datetime.time(19, 0)
datetime.time(19, 15) datetime.time(19, 30) datetime.time(19, 45)
datetime.time(20, 0) datetime.time(20, 15) datetime.time(20, 30)
datetime.time(20, 45) datetime.time(21, 0) datetime.time(21, 15)
datetime.time(21, 30) datetime.time(21, 45) datetime.time(22, 0)

14
datetime.time(22, 15) datetime.time(22, 30) datetime.time(22, 45)
datetime.time(23, 0) datetime.time(23, 15) datetime.time(23, 30)
datetime.time(23, 45) datetime.time(7, 0) datetime.time(7, 15)
datetime.time(7, 30) datetime.time(7, 45) datetime.time(8, 0)
datetime.time(8, 15) datetime.time(8, 30) datetime.time(8, 45)
datetime.time(9, 0) datetime.time(9, 15) datetime.time(9, 30)
datetime.time(9, 45) datetime.time(10, 0) datetime.time(10, 15)
datetime.time(10, 30) datetime.time(10, 45) datetime.time(11, 0)
datetime.time(15, 15) datetime.time(15, 30)]

[26]: '\nThis command shows that from timings 00:00 to 06:45 and from 17:15 to 23:59
(in general) data has been not collected due to absence of sunlight\n'

0.0.10 Code to generate Pandas Profiling(Ydata Profiling) Report for overall view of
data in concise
[27]: cd = ProfileReport(data)
pd_profile_report = ProfileReport(dropped_data)

[28]: cd.to_file('solar.html')

Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]


Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]
Export report to file: 0%| | 0/1 [00:00<?, ?it/s]

[29]: pd_profile_report.to_file('solar_dropped.html')

[ ]:

15

You might also like