Class07
Class07
ipynb - Colab
info.csv has the data hour by hour data about the concentration of pollutants in the air and the status of the instruments.
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
data = pd.read_csv(r"Desktop/NUCOT/NumPy and Pandas/Measurement_info.csv")
data.head() # displays 1st five rows
Measurement date Station code Item code Average value Instrument status
data.shape
(3885066, 5)
Item code Item name Unit of measurement Good(Blue) Normal(Green) Bad(Yellow) Very bad(Red)
item.shape
(6, 7)
0 101 Jongno-gu 19, Jong-ro 35ga-gil, Jongno-gu, Seoul, Republ... 37.572016 127.005008
1 102 Jung-gu 15, Deoksugung-gil, Jung-gu, Seoul, Republic o... 37.564263 126.974676
station.shape
(25, 5)
https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 1/13
17/03/2025, 09:19 Class07.ipynb - Colab
2017-01-01
0 101 1 0.004 0 SO2 ppm 0.02 0.05 0.15 1.0
00:00
2017-01-01
1 101 3 0.059 0 NO2 ppm 0.03 0.06 0.20 2.0
00:00
2017-01-01
2 101 5 1.200 0 CO ppm 2.00 9.00 15.00 50.0
00:00
2017-01-01
3 101 6 0.002 0 O3 ppm 0.03 0.09 0.15 0.5
00:00
2017-01-01
4 101 8 73.000 0 PM10 Mircrogram/m3 30.00 80.00 150.00 600.0
00:00
data.shape
(3885066, 11)
2017-01-01
0 101 1 0.004 0 SO2 ppm 0.02 0.05 0.15 1.0
00:00
2017-01-01
1 101 3 0.059 0 NO2 ppm 0.03 0.06 0.20 2.0
00:00
2017-01-01
2 101 5 1.200 0 CO ppm 2.00 9.00 15.00 50.0
00:00
2017-01-01
3 101 6 0.002 0 O3 ppm 0.03 0.09 0.15 0.5
00:00
2017-01-01
4 101 8 73.000 0 PM10 Mircrogram/m3 30.00 80.00 150.00 600.0
00:00
data.shape
(3885066, 15)
data.isnull().sum()
Measurement date 0
Station code 0
Item code 0
Average value 0
Instrument status 0
Item name 0
Unit of measurement 0
https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 2/13
17/03/2025, 09:19 Class07.ipynb - Colab
Good(Blue) 0
Normal(Green) 0
Bad(Yellow) 0
Very bad(Red) 0
Station name(district) 0
Address 0
Latitude 0
Longitude 0
dtype: int64
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3885066 entries, 0 to 3885065
Data columns (total 15 columns):
# Column Dtype
--- ------ -----
0 Measurement date object
1 Station code int64
2 Item code int64
3 Average value float64
4 Instrument status int64
5 Item name object
6 Unit of measurement object
7 Good(Blue) float64
8 Normal(Green) float64
9 Bad(Yellow) float64
10 Very bad(Red) float64
11 Station name(district) object
12 Address object
13 Latitude float64
14 Longitude float64
dtypes: float64(7), int64(3), object(5)
memory usage: 444.6+ MB
data['Instrument status'].value_counts()
Instrument status
0 3775778
8 32341
1 29717
4 22752
9 20490
2 3988
Name: count, dtype: int64
0 = Normal
1= Need Calibration
2 = Abnormal
8 = Under Repair
9 = Abnormal Data
0 0 Normal
1 1 Need Calibration
2 2 Abnormal
4 8 Under Repair
5 9 Abnormal Data
https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 3/13
17/03/2025, 09:19 Class07.ipynb - Colab
2017-01-01
0 101 1 0.004 0 SO2 ppm 0.02 0.05 0.15 1.0
00:00
2017-01-01
1 101 3 0.059 0 NO2 ppm 0.03 0.06 0.20 2.0
00:00
2017-01-01
2 101 5 1.200 0 CO ppm 2.00 9.00 15.00 50.0
00:00
2017-01-01
3 101 6 0.002 0 O3 ppm 0.03 0.09 0.15 0.5
00:00
2017-01-01
4 101 8 73.000 0 PM10 Mircrogram/m3 30.00 80.00 150.00 600.0
00:00
data.columns
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3885066 entries, 0 to 3885065
Data columns (total 10 columns):
# Column Dtype
--- ------ -----
0 Measurement date object
1 Average value float64
2 Item name object
3 Unit of measurement object
4 Good(Blue) float64
5 Normal(Green) float64
6 Bad(Yellow) float64
7 Very bad(Red) float64
8 Station name(district) object
9 Status object
dtypes: float64(5), object(5)
memory usage: 296.4+ MB
https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 4/13
17/03/2025, 09:19 Class07.ipynb - Colab
2017-01-01
0 0.004 SO2 ppm 0.02 0.05 0.15 1.0 Jongno-gu Normal 2
00:00
2017-01-01
1 0.059 NO2 ppm 0.03 0.06 0.20 2.0 Jongno-gu Normal 2
00:00
2017-01-01
2 1.200 CO ppm 2.00 9.00 15.00 50.0 Jongno-gu Normal 2
00:00
2017-01-01
3 0.002 O3 ppm 0.03 0.09 0.15 0.5 Jongno-gu Normal 2
00:00
2017-01-01
4 73.000 PM10 Mircrogram/m3 30.00 80.00 150.00 600.0 Jongno-gu Normal 2
00:00
data.tail()
2019-12-31
3885061 13.000 PM2.5 Mircrogram/m3 15.00 35.00 75.0 500.0 Gangnam-gu No
23:00
2019-12-31
3885062 24.000 PM2.5 Mircrogram/m3 15.00 35.00 75.0 500.0 Geumcheon-gu No
23:00
2019-12-31
3885063 19.000 PM10 Mircrogram/m3 30.00 80.00 150.0 600.0 Seodaemun-gu No
23:00
2019-12-31
3885064 0.037 NO2 ppm 0.03 0.06 0.2 2.0 Gangdong-gu No
23:00
2019-12-31
3885065 0.030 NO2 ppm 0.03 0.06 0.2 2.0 Gwangjin-gu No
23:00
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3885066 entries, 0 to 3885065
Data columns (total 13 columns):
# Column Dtype
--- ------ -----
0 Average value float64
1 Item name object
2 Unit of measurement object
3 Good(Blue) float64
4 Normal(Green) float64
5 Bad(Yellow) float64
6 Very bad(Red) float64
7 Station name(district) object
8 Status object
9 Year int32
10 Month int32
11 Date int32
12 Hour int32
dtypes: float64(5), int32(4), object(4)
memory usage: 326.0+ MB
data.columns
data = data[['Station name(district)', 'Status', 'Item name','Year', 'Month', 'Date', 'Hour','Average value','Good(Blue)
'Normal(Green)', 'Bad(Yellow)', 'Very bad(Red)']]
data.head()
https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 5/13
17/03/2025, 09:19 Class07.ipynb - Colab
Average V
Year Month Date Hour Good(Blue) Normal(Green) Bad(Yellow)
value bad(R
count 3.885066e+06 3.885066e+06 3.885066e+06 3.885066e+06 3.885066e+06 3.885066e+06 3.885066e+06 3.885066e+06 3.885066e
mean 2.017985e+03 6.549422e+00 1.577164e+01 1.150238e+01 1.161132e+01 7.846667e+00 2.070000e+01 4.008333e+01 1.922500e
std 8.133678e-01 3.452316e+00 8.829713e+00 6.919020e+00 3.816098e+01 1.125153e+01 2.925484e+01 5.584211e+01 2.551944e
min 2.017000e+03 1.000000e+00 1.000000e+00 0.000000e+00 -1.000000e+00 2.000000e-02 5.000000e-02 1.500000e-01 5.000000
25% 2.017000e+03 4.000000e+00 8.000000e+00 6.000000e+00 1.200000e-02 3.000000e-02 6.000000e-02 1.500000e-01 1.000000e
50% 2.018000e+03 7.000000e+00 1.600000e+01 1.200000e+01 7.000000e-02 1.015000e+00 4.545000e+00 7.600000e+00 2.600000e
75% 2.019000e+03 1.000000e+01 2.300000e+01 1.700000e+01 1.500000e+01 1.500000e+01 3.500000e+01 7.500000e+01 5.000000e
max 2.019000e+03 1.200000e+01 3.100000e+01 2.300000e+01 6.256000e+03 3.000000e+01 8.000000e+01 1.500000e+02 6.000000e
data.describe(include = 'all').T
count unique top freq mean std min 25% 50% 75% max
Station name(district) 3885066 25 Gangseo-gu 155436 NaN NaN NaN NaN NaN NaN NaN
Status 3885066 6 Normal 3775778 NaN NaN NaN NaN NaN NaN NaN
Item name 3885066 6 SO2 647511 NaN NaN NaN NaN NaN NaN NaN
Year 3885066.0 NaN NaN NaN 2017.985345 0.813368 2017.0 2017.0 2018.0 2019.0 2019.0
Month 3885066.0 NaN NaN NaN 6.549422 3.452316 1.0 4.0 7.0 10.0 12.0
Date 3885066.0 NaN NaN NaN 15.771636 8.829713 1.0 8.0 16.0 23.0 31.0
Hour 3885066.0 NaN NaN NaN 11.502379 6.91902 0.0 6.0 12.0 17.0 23.0
Average value 3885066.0 NaN NaN NaN 11.611324 38.160981 -1.0 0.012 0.07 15.0 6256.0
Good(Blue) 3885066.0 NaN NaN NaN 7.846667 11.251528 0.02 0.03 1.015 15.0 30.0
Normal(Green) 3885066.0 NaN NaN NaN 20.7 29.254844 0.05 0.06 4.545 35.0 80.0
Bad(Yellow) 3885066.0 NaN NaN NaN 40.083333 55.842111 0.15 0.15 7.6 75.0 150.0
Very bad(Red) 3885066.0 NaN NaN NaN 192.25 255.194362 0.5 1.0 26.0 500.0 600.0
# Group By Functions
data.groupby(['Year'])['Average value'].mean()
Year
2017 11.586200
2018 11.072096
2019 12.201237
Name: Average value, dtype: float64
Item name
CO 0.509197
NO2 0.022519
O3 0.017979
PM10 43.708051
PM2.5 25.411995
SO2 -0.001795
Name: Average value, dtype: float64
https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 6/13
17/03/2025, 09:19 Class07.ipynb - Colab
Station name(district)
Dobong-gu 11.607329
Dongdaemun-gu 10.241732
Dongjak-gu 11.091573
Eunpyeong-gu 11.292960
Gangbuk-gu 10.178419
Gangdong-gu 11.800125
Gangnam-gu 10.705099
Gangseo-gu 13.080428
Geumcheon-gu 10.899941
Guro-gu 13.797258
Gwanak-gu 12.185327
Gwangjin-gu 12.622543
Jongno-gu 10.242859
Jung-gu 10.233584
Jungnang-gu 10.069908
Mapo-gu 12.853491
Nowon-gu 10.874978
Seocho-gu 14.027262
Seodaemun-gu 10.805200
Seongbuk-gu 12.064813
Seongdong-gu 12.648037
Songpa-gu 11.749635
Yangcheon-gu 11.498710
Yeongdeungpo-gu 13.767154
Yongsan-gu 9.946050
Name: Average value, dtype: float64
data.groupby(['Station name(district)'])['Good(Blue)'].mean()
Station name(district)
Dobong-gu 7.846667
Dongdaemun-gu 7.846667
Dongjak-gu 7.846667
Eunpyeong-gu 7.846667
Gangbuk-gu 7.846667
Gangdong-gu 7.846667
Gangnam-gu 7.846667
Gangseo-gu 7.846667
Geumcheon-gu 7.846667
Guro-gu 7.846667
Gwanak-gu 7.846667
Gwangjin-gu 7.846667
Jongno-gu 7.846667
Jung-gu 7.846667
Jungnang-gu 7.846667
Mapo-gu 7.846667
Nowon-gu 7.846667
Seocho-gu 7.846667
Seodaemun-gu 7.846667
Seongbuk-gu 7.846667
Seongdong-gu 7.846667
Songpa-gu 7.846667
Yangcheon-gu 7.846667
Yeongdeungpo-gu 7.846667
Yongsan-gu 7.846667
Name: Good(Blue), dtype: float64
data.groupby(['Station name(district)'])['Good(Blue)'].median()
Station name(district)
Dobong-gu 1.015
Dongdaemun-gu 1.015
Dongjak-gu 1.015
Eunpyeong-gu 1.015
Gangbuk-gu 1.015
Gangdong-gu 1.015
Gangnam-gu 1.015
Gangseo-gu 1.015
Geumcheon-gu 1.015
Guro-gu 1.015
Gwanak-gu 1.015
Gwangjin-gu 1.015
Jongno-gu 1.015
Jung-gu 1.015
Jungnang-gu 1.015
Mapo-gu 1.015
Nowon-gu 1.015
Seocho-gu 1.015
Seodaemun-gu 1.015
Seongbuk-gu 1.015
Seongdong-gu 1.015
Songpa-gu 1.015
Yangcheon-gu 1.015
Yeongdeungpo-gu 1.015
https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 7/13
17/03/2025, 09:19 Class07.ipynb - Colab
Yongsan-gu 1.015
Name: Good(Blue), dtype: float64
data.groupby(['Station name(district)'])['Normal(Green)'].mean()
Station name(district)
Dobong-gu 20.7
Dongdaemun-gu 20.7
Dongjak-gu 20.7
Eunpyeong-gu 20.7
Gangbuk-gu 20.7
Gangdong-gu 20.7
Gangnam-gu 20.7
Gangseo-gu 20.7
Geumcheon-gu 20.7
Guro-gu 20.7
Gwanak-gu 20.7
Gwangjin-gu 20.7
Jongno-gu 20.7
Jung-gu 20.7
Jungnang-gu 20.7
Mapo-gu 20.7
Nowon-gu 20.7
Seocho-gu 20.7
Seodaemun-gu 20.7
Seongbuk-gu 20.7
Seongdong-gu 20.7
Songpa-gu 20.7
Yangcheon-gu 20.7
Yeongdeungpo-gu 20.7
Yongsan-gu 20.7
Name: Normal(Green), dtype: float64
data.groupby(['Station name(district)'])['Bad(Yellow)'].mean()
Station name(district)
Dobong-gu 40.083333
Dongdaemun-gu 40.083333
Dongjak-gu 40.083333
Eunpyeong-gu 40.083333
Gangbuk-gu 40.083333
Gangdong-gu 40.083333
Gangnam-gu 40.083333
Gangseo-gu 40.083333
Geumcheon-gu 40.083333
Guro-gu 40.083333
Gwanak-gu 40.083333
Gwangjin-gu 40.083333
Jongno-gu 40.083333
Jung-gu 40.083333
Jungnang-gu 40.083333
Mapo-gu 40.083333
Nowon-gu 40.083333
Seocho-gu 40.083333
Seodaemun-gu 40.083333
Seongbuk-gu 40.083333
Seongdong-gu 40.083333
Songpa-gu 40.083333
Yangcheon-gu 40.083333
Yeongdeungpo-gu 40.083333
Yongsan-gu 40.083333
Name: Bad(Yellow), dtype: float64
Station name(district)
Dobong-gu 192.25
Dongdaemun-gu 192.25
Dongjak-gu 192.25
Eunpyeong-gu 192.25
Gangbuk-gu 192.25
Gangdong-gu 192.25
Gangnam-gu 192.25
Gangseo-gu 192.25
Geumcheon-gu 192.25
Guro-gu 192.25
Gwanak-gu 192.25
Gwangjin-gu 192.25
Jongno-gu 192.25
Jung-gu 192.25
Jungnang-gu 192.25
Mapo-gu 192.25
Nowon-gu 192.25
Seocho-gu 192.25
Seodaemun-gu 192.25
https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 8/13
17/03/2025, 09:19 Class07.ipynb - Colab
Seongbuk-gu 192.25
Seongdong-gu 192.25
Songpa-gu 192.25
Yangcheon-gu 192.25
Yeongdeungpo-gu 192.25
Yongsan-gu 192.25
Name: Very bad(Red), dtype: float64
data.groupby(['Year','Month'])['Average value'].mean()
Year Month
2017 1 14.337679
2 12.489017
3 16.619704
4 13.785433
5 14.591266
6 10.687240
7 9.339461
8 5.769704
9 8.523488
10 7.367980
11 10.714108
12 14.811703
2018 1 14.942190
2 13.753487
3 14.093738
4 13.291203
5 11.618625
6 11.320648
7 7.856529
8 6.994282
9 5.760708
10 7.579595
11 13.760277
12 12.148338
2019 1 18.397707
2 16.660576
3 22.281463
4 11.211278
5 14.403760
6 9.440856
7 8.515863
8 9.054132
9 6.947658
10 7.703279
11 12.081643
12 12.388990
Name: Average value, dtype: float64
https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 9/13
17/03/2025, 09:19 Class07.ipynb - Colab
9 8.377735
10 9.076809
11 10.676297
12 12.724758
Dongdaemun-gu 2017 1 15.280303
2 12.952813
3 15.940782
4 12.119957
5 12.777500
6 9.134105
7 7.868023
8 5.612222
9 8.890782
10 7.533812
11 10.209900
12 15.060177
2018 1 14.198175
2 13.533617
3 13.983695
4 11.790508
5 9.683964
6 8.785286
7 6.381974
8 5.236286
9 4 277463
data.pivot_table(index = ['Status'] , columns = ['Station name(district)','Year'], values = ['Average value'])
Average value
Station
Dobong-gu Dongdaemun-gu Dongjak-gu Eunpyeong-gu
name(district)
Year 2017 2018 2019 2017 2018 2019 2017 2018 2019 2017 2018
Status
Abnormal 48.590909 0.000000 NaN NaN 0.004000 0.920727 48.357143 -0.825222 58.333333 -1.000000 985.0
Abnormal Data 33.016874 22.594080 568.946159 37.741935 12.437877 27.287980 63.462469 32.073552 37.463228 20.649643 29.0
Need Calibration 14.275168 9.224388 7.610652 9.307926 4.197164 8.112720 21.588804 1.710099 2.684361 6.236259 17.0
Normal 11.224980 11.117148 10.011704 11.145147 10.016295 10.782746 11.261688 10.450340 11.638033 11.261054 10.9
Power Cut Off -1.000000 -1.000000 -1.000000 NaN 0.000000 -0.779910 NaN -1.000000 -0.163886 -0.503601 0.0
Under Repair 1.312428 0.598308 NaN 0.845784 0.208913 0.126879 11.057540 0.001000 NaN 3.051658 4.4
https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 10/13
17/03/2025, 09:19 Class07.ipynb - Colab
Station
Year
name(district)
https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 12/13
17/03/2025, 09:19 Class07.ipynb - Colab
Average value
Status
https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 13/13