0% found this document useful (0 votes)
15 views

Class07

The document outlines the air quality data analysis for Seoul, which is divided into three CSV files: info.csv, item_info, and station_info. It includes detailed descriptions of the data structure, merging datasets, and handling instrument status. The dataset contains over 3.8 million entries with various pollutants measured at different stations across the city.

Uploaded by

ahmedfaraz1102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Class07

The document outlines the air quality data analysis for Seoul, which is divided into three CSV files: info.csv, item_info, and station_info. It includes detailed descriptions of the data structure, merging datasets, and handling instrument status. The dataset contains over 3.8 million entries with various pollutants measured at different stations across the city.

Uploaded by

ahmedfaraz1102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

17/03/2025, 09:19 Class07.

ipynb - Colab

keyboard_arrow_down Seoul Data Analysis:


The air quality data for this segment has been divided into three different csv files.

info.csv has the data hour by hour data about the concentration of pollutants in the air and the status of the instruments.

item_info has the data for items and levels of concentration.

station_info has the data for measuring stations.

You can download the dataset from kaggle website: https://www.kaggle.com/bappekim/air-pollution-in-seoul

import pandas as pd

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
data = pd.read_csv(r"Desktop/NUCOT/NumPy and Pandas/Measurement_info.csv")
data.head() # displays 1st five rows

Measurement date Station code Item code Average value Instrument status

0 2017-01-01 00:00 101 1 0.004 0

1 2017-01-01 00:00 101 3 0.059 0

2 2017-01-01 00:00 101 5 1.200 0

3 2017-01-01 00:00 101 6 0.002 0

4 2017-01-01 00:00 101 8 73.000 0

data.shape

(3885066, 5)

item = pd.read_csv(r"Desktop/NUCOT/NumPy and Pandas/Measurement_item_info.csv")


item.head()

Item code Item name Unit of measurement Good(Blue) Normal(Green) Bad(Yellow) Very bad(Red)

0 1 SO2 ppm 0.02 0.05 0.15 1.0

1 3 NO2 ppm 0.03 0.06 0.20 2.0

2 5 CO ppm 2.00 9.00 15.00 50.0

3 6 O3 ppm 0.03 0.09 0.15 0.5

4 8 PM10 Mircrogram/m3 30.00 80.00 150.00 600.0

item.shape

(6, 7)

station = pd.read_csv(r"Desktop/NUCOT/NumPy and Pandas/Measurement_station_info.csv")


station.head()

Station code Station name(district) Address Latitude Longitude

0 101 Jongno-gu 19, Jong-ro 35ga-gil, Jongno-gu, Seoul, Republ... 37.572016 127.005008

1 102 Jung-gu 15, Deoksugung-gil, Jung-gu, Seoul, Republic o... 37.564263 126.974676

2 103 Yongsan-gu 136, Hannam-daero, Yongsan-gu, Seoul, Republic... 37.540033 127.004850

3 104 Eunpyeong-gu 215, Jinheung-ro, Eunpyeong-gu, Seoul, Republi... 37.609823 126.934848

4 105 Seodaemun-gu 32, Segeomjeong-ro 4-gil, Seodaemun-gu, Seoul,... 37.593742 126.949679

station.shape

(25, 5)

data = data.merge(item,on = ['Item code'], how = 'left')


data.head()

https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 1/13
17/03/2025, 09:19 Class07.ipynb - Colab

Measurement Station Item Average Instrument Item Unit of Very


Good(Blue) Normal(Green) Bad(Yellow)
date code code value status name measurement bad(Red)

2017-01-01
0 101 1 0.004 0 SO2 ppm 0.02 0.05 0.15 1.0
00:00

2017-01-01
1 101 3 0.059 0 NO2 ppm 0.03 0.06 0.20 2.0
00:00

2017-01-01
2 101 5 1.200 0 CO ppm 2.00 9.00 15.00 50.0
00:00

2017-01-01
3 101 6 0.002 0 O3 ppm 0.03 0.09 0.15 0.5
00:00

2017-01-01
4 101 8 73.000 0 PM10 Mircrogram/m3 30.00 80.00 150.00 600.0
00:00

data.shape

(3885066, 11)

data = data.merge(station, on = ['Station code'], how = 'left')


data.head()

Measurement Station Item Average Instrument Item Unit of Very


Good(Blue) Normal(Green) Bad(Yellow)
date code code value status name measurement bad(Red)

2017-01-01
0 101 1 0.004 0 SO2 ppm 0.02 0.05 0.15 1.0
00:00

2017-01-01
1 101 3 0.059 0 NO2 ppm 0.03 0.06 0.20 2.0
00:00

2017-01-01
2 101 5 1.200 0 CO ppm 2.00 9.00 15.00 50.0
00:00

2017-01-01
3 101 6 0.002 0 O3 ppm 0.03 0.09 0.15 0.5
00:00

2017-01-01
4 101 8 73.000 0 PM10 Mircrogram/m3 30.00 80.00 150.00 600.0
00:00

data.shape

(3885066, 15)

data.isnull().sum()

Measurement date 0
Station code 0
Item code 0
Average value 0
Instrument status 0
Item name 0
Unit of measurement 0

https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 2/13
17/03/2025, 09:19 Class07.ipynb - Colab
Good(Blue) 0
Normal(Green) 0
Bad(Yellow) 0
Very bad(Red) 0
Station name(district) 0
Address 0
Latitude 0
Longitude 0
dtype: int64

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3885066 entries, 0 to 3885065
Data columns (total 15 columns):
# Column Dtype
--- ------ -----
0 Measurement date object
1 Station code int64
2 Item code int64
3 Average value float64
4 Instrument status int64
5 Item name object
6 Unit of measurement object
7 Good(Blue) float64
8 Normal(Green) float64
9 Bad(Yellow) float64
10 Very bad(Red) float64
11 Station name(district) object
12 Address object
13 Latitude float64
14 Longitude float64
dtypes: float64(7), int64(3), object(5)
memory usage: 444.6+ MB

data['Instrument status'].value_counts()

Instrument status
0 3775778
8 32341
1 29717
4 22752
9 20490
2 3988
Name: count, dtype: int64

0 = Normal

1= Need Calibration

2 = Abnormal

4 = Power Cut Off

8 = Under Repair

9 = Abnormal Data

Status = {'Instrument status': [0,1,2,4,8,9],'Status':['Normal','Need Calibration','Abnormal','Power Cut Off','Under Rep


Status = pd.DataFrame(Status)
Status

Instrument status Status

0 0 Normal

1 1 Need Calibration

2 2 Abnormal

3 4 Power Cut Off

4 8 Under Repair

5 9 Abnormal Data

data = data.merge(Status , on = ['Instrument status'], how = 'left')


data.head()

https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 3/13
17/03/2025, 09:19 Class07.ipynb - Colab

Measurement Station Item Average Instrument Item Unit of Very


Good(Blue) Normal(Green) Bad(Yellow)
date code code value status name measurement bad(Red)

2017-01-01
0 101 1 0.004 0 SO2 ppm 0.02 0.05 0.15 1.0
00:00

2017-01-01
1 101 3 0.059 0 NO2 ppm 0.03 0.06 0.20 2.0
00:00

2017-01-01
2 101 5 1.200 0 CO ppm 2.00 9.00 15.00 50.0
00:00

2017-01-01
3 101 6 0.002 0 O3 ppm 0.03 0.09 0.15 0.5
00:00

2017-01-01
4 101 8 73.000 0 PM10 Mircrogram/m3 30.00 80.00 150.00 600.0
00:00

data.columns

Index(['Measurement date', 'Station code', 'Item code', 'Average value',


'Instrument status', 'Item name', 'Unit of measurement', 'Good(Blue)',
'Normal(Green)', 'Bad(Yellow)', 'Very bad(Red)',
'Station name(district)', 'Address', 'Latitude', 'Longitude', 'Status'],
dtype='object')

data = data.drop(['Instrument status','Station code','Item code','Address', 'Latitude','Longitude'], axis = 1)


data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3885066 entries, 0 to 3885065
Data columns (total 10 columns):
# Column Dtype
--- ------ -----
0 Measurement date object
1 Average value float64
2 Item name object
3 Unit of measurement object
4 Good(Blue) float64
5 Normal(Green) float64
6 Bad(Yellow) float64
7 Very bad(Red) float64
8 Station name(district) object
9 Status object
dtypes: float64(5), object(5)
memory usage: 296.4+ MB

data['Year'] = pd.DatetimeIndex(data['Measurement date']).year


data['Month'] = pd.DatetimeIndex(data['Measurement date']).month
data['Date'] = pd.DatetimeIndex(data['Measurement date']).day
data['Hour'] = pd.DatetimeIndex(data['Measurement date']).hour
data.head()

https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 4/13
17/03/2025, 09:19 Class07.ipynb - Colab

Measurement Average Item Unit of Very Station


Good(Blue) Normal(Green) Bad(Yellow) Status Y
date value name measurement bad(Red) name(district)

2017-01-01
0 0.004 SO2 ppm 0.02 0.05 0.15 1.0 Jongno-gu Normal 2
00:00

2017-01-01
1 0.059 NO2 ppm 0.03 0.06 0.20 2.0 Jongno-gu Normal 2
00:00

2017-01-01
2 1.200 CO ppm 2.00 9.00 15.00 50.0 Jongno-gu Normal 2
00:00

2017-01-01
3 0.002 O3 ppm 0.03 0.09 0.15 0.5 Jongno-gu Normal 2
00:00

2017-01-01
4 73.000 PM10 Mircrogram/m3 30.00 80.00 150.00 600.0 Jongno-gu Normal 2
00:00

data.tail()

Measurement Average Item Unit of Very Station


Good(Blue) Normal(Green) Bad(Yellow) Sta
date value name measurement bad(Red) name(district)

2019-12-31
3885061 13.000 PM2.5 Mircrogram/m3 15.00 35.00 75.0 500.0 Gangnam-gu No
23:00

2019-12-31
3885062 24.000 PM2.5 Mircrogram/m3 15.00 35.00 75.0 500.0 Geumcheon-gu No
23:00

2019-12-31
3885063 19.000 PM10 Mircrogram/m3 30.00 80.00 150.0 600.0 Seodaemun-gu No
23:00

2019-12-31
3885064 0.037 NO2 ppm 0.03 0.06 0.2 2.0 Gangdong-gu No
23:00

2019-12-31
3885065 0.030 NO2 ppm 0.03 0.06 0.2 2.0 Gwangjin-gu No
23:00

data = data.drop(['Measurement date'], axis = 1)


data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3885066 entries, 0 to 3885065
Data columns (total 13 columns):
# Column Dtype
--- ------ -----
0 Average value float64
1 Item name object
2 Unit of measurement object
3 Good(Blue) float64
4 Normal(Green) float64
5 Bad(Yellow) float64
6 Very bad(Red) float64
7 Station name(district) object
8 Status object
9 Year int32
10 Month int32
11 Date int32
12 Hour int32
dtypes: float64(5), int32(4), object(4)
memory usage: 326.0+ MB

data.columns

Index(['Average value', 'Item name', 'Unit of measurement', 'Good(Blue)',


'Normal(Green)', 'Bad(Yellow)', 'Very bad(Red)',
'Station name(district)', 'Status', 'Year', 'Month', 'Date', 'Hour'],
dtype='object')

data = data[['Station name(district)', 'Status', 'Item name','Year', 'Month', 'Date', 'Hour','Average value','Good(Blue)
'Normal(Green)', 'Bad(Yellow)', 'Very bad(Red)']]
data.head()

https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 5/13
17/03/2025, 09:19 Class07.ipynb - Colab

Station Item Average Very


Status Year Month Date Hour Good(Blue) Normal(Green) Bad(Yellow)
name(district) name value bad(Red)

0 Jongno-gu Normal SO2 2017 1 1 0 0.004 0.02 0.05 0.15 1.0

1 Jongno-gu Normal NO2 2017 1 1 0 0.059 0.03 0.06 0.20 2.0

2 Jongno-gu Normal CO 2017 1 1 0 1.200 2.00 9.00 15.00 50.0

3 Jongno-gu Normal O3 2017 1 1 0 0.002 0.03 0.09 0.15 0.5

4 Jongno-gu Normal PM10 2017 1 1 0 73.000 30.00 80.00 150.00 600.0

data.describe() # central tendancy of the data

Average V
Year Month Date Hour Good(Blue) Normal(Green) Bad(Yellow)
value bad(R

count 3.885066e+06 3.885066e+06 3.885066e+06 3.885066e+06 3.885066e+06 3.885066e+06 3.885066e+06 3.885066e+06 3.885066e

mean 2.017985e+03 6.549422e+00 1.577164e+01 1.150238e+01 1.161132e+01 7.846667e+00 2.070000e+01 4.008333e+01 1.922500e

std 8.133678e-01 3.452316e+00 8.829713e+00 6.919020e+00 3.816098e+01 1.125153e+01 2.925484e+01 5.584211e+01 2.551944e

min 2.017000e+03 1.000000e+00 1.000000e+00 0.000000e+00 -1.000000e+00 2.000000e-02 5.000000e-02 1.500000e-01 5.000000

25% 2.017000e+03 4.000000e+00 8.000000e+00 6.000000e+00 1.200000e-02 3.000000e-02 6.000000e-02 1.500000e-01 1.000000e

50% 2.018000e+03 7.000000e+00 1.600000e+01 1.200000e+01 7.000000e-02 1.015000e+00 4.545000e+00 7.600000e+00 2.600000e

75% 2.019000e+03 1.000000e+01 2.300000e+01 1.700000e+01 1.500000e+01 1.500000e+01 3.500000e+01 7.500000e+01 5.000000e

max 2.019000e+03 1.200000e+01 3.100000e+01 2.300000e+01 6.256000e+03 3.000000e+01 8.000000e+01 1.500000e+02 6.000000e

data.describe(include = 'all').T

count unique top freq mean std min 25% 50% 75% max

Station name(district) 3885066 25 Gangseo-gu 155436 NaN NaN NaN NaN NaN NaN NaN

Status 3885066 6 Normal 3775778 NaN NaN NaN NaN NaN NaN NaN

Item name 3885066 6 SO2 647511 NaN NaN NaN NaN NaN NaN NaN

Year 3885066.0 NaN NaN NaN 2017.985345 0.813368 2017.0 2017.0 2018.0 2019.0 2019.0

Month 3885066.0 NaN NaN NaN 6.549422 3.452316 1.0 4.0 7.0 10.0 12.0

Date 3885066.0 NaN NaN NaN 15.771636 8.829713 1.0 8.0 16.0 23.0 31.0

Hour 3885066.0 NaN NaN NaN 11.502379 6.91902 0.0 6.0 12.0 17.0 23.0

Average value 3885066.0 NaN NaN NaN 11.611324 38.160981 -1.0 0.012 0.07 15.0 6256.0

Good(Blue) 3885066.0 NaN NaN NaN 7.846667 11.251528 0.02 0.03 1.015 15.0 30.0

Normal(Green) 3885066.0 NaN NaN NaN 20.7 29.254844 0.05 0.06 4.545 35.0 80.0

Bad(Yellow) 3885066.0 NaN NaN NaN 40.083333 55.842111 0.15 0.15 7.6 75.0 150.0

Very bad(Red) 3885066.0 NaN NaN NaN 192.25 255.194362 0.5 1.0 26.0 500.0 600.0

# Group By Functions

data.groupby(['Year'])['Average value'].mean()

Year
2017 11.586200
2018 11.072096
2019 12.201237
Name: Average value, dtype: float64

data.groupby(['Item name'])['Average value'].mean()

Item name
CO 0.509197
NO2 0.022519
O3 0.017979
PM10 43.708051
PM2.5 25.411995
SO2 -0.001795
Name: Average value, dtype: float64

data.groupby(['Station name(district)'])['Average value'].mean()

https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 6/13
17/03/2025, 09:19 Class07.ipynb - Colab
Station name(district)
Dobong-gu 11.607329
Dongdaemun-gu 10.241732
Dongjak-gu 11.091573
Eunpyeong-gu 11.292960
Gangbuk-gu 10.178419
Gangdong-gu 11.800125
Gangnam-gu 10.705099
Gangseo-gu 13.080428
Geumcheon-gu 10.899941
Guro-gu 13.797258
Gwanak-gu 12.185327
Gwangjin-gu 12.622543
Jongno-gu 10.242859
Jung-gu 10.233584
Jungnang-gu 10.069908
Mapo-gu 12.853491
Nowon-gu 10.874978
Seocho-gu 14.027262
Seodaemun-gu 10.805200
Seongbuk-gu 12.064813
Seongdong-gu 12.648037
Songpa-gu 11.749635
Yangcheon-gu 11.498710
Yeongdeungpo-gu 13.767154
Yongsan-gu 9.946050
Name: Average value, dtype: float64

data.groupby(['Station name(district)'])['Good(Blue)'].mean()

Station name(district)
Dobong-gu 7.846667
Dongdaemun-gu 7.846667
Dongjak-gu 7.846667
Eunpyeong-gu 7.846667
Gangbuk-gu 7.846667
Gangdong-gu 7.846667
Gangnam-gu 7.846667
Gangseo-gu 7.846667
Geumcheon-gu 7.846667
Guro-gu 7.846667
Gwanak-gu 7.846667
Gwangjin-gu 7.846667
Jongno-gu 7.846667
Jung-gu 7.846667
Jungnang-gu 7.846667
Mapo-gu 7.846667
Nowon-gu 7.846667
Seocho-gu 7.846667
Seodaemun-gu 7.846667
Seongbuk-gu 7.846667
Seongdong-gu 7.846667
Songpa-gu 7.846667
Yangcheon-gu 7.846667
Yeongdeungpo-gu 7.846667
Yongsan-gu 7.846667
Name: Good(Blue), dtype: float64

data.groupby(['Station name(district)'])['Good(Blue)'].median()

Station name(district)
Dobong-gu 1.015
Dongdaemun-gu 1.015
Dongjak-gu 1.015
Eunpyeong-gu 1.015
Gangbuk-gu 1.015
Gangdong-gu 1.015
Gangnam-gu 1.015
Gangseo-gu 1.015
Geumcheon-gu 1.015
Guro-gu 1.015
Gwanak-gu 1.015
Gwangjin-gu 1.015
Jongno-gu 1.015
Jung-gu 1.015
Jungnang-gu 1.015
Mapo-gu 1.015
Nowon-gu 1.015
Seocho-gu 1.015
Seodaemun-gu 1.015
Seongbuk-gu 1.015
Seongdong-gu 1.015
Songpa-gu 1.015
Yangcheon-gu 1.015
Yeongdeungpo-gu 1.015

https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 7/13
17/03/2025, 09:19 Class07.ipynb - Colab
Yongsan-gu 1.015
Name: Good(Blue), dtype: float64

data.groupby(['Station name(district)'])['Normal(Green)'].mean()

Station name(district)
Dobong-gu 20.7
Dongdaemun-gu 20.7
Dongjak-gu 20.7
Eunpyeong-gu 20.7
Gangbuk-gu 20.7
Gangdong-gu 20.7
Gangnam-gu 20.7
Gangseo-gu 20.7
Geumcheon-gu 20.7
Guro-gu 20.7
Gwanak-gu 20.7
Gwangjin-gu 20.7
Jongno-gu 20.7
Jung-gu 20.7
Jungnang-gu 20.7
Mapo-gu 20.7
Nowon-gu 20.7
Seocho-gu 20.7
Seodaemun-gu 20.7
Seongbuk-gu 20.7
Seongdong-gu 20.7
Songpa-gu 20.7
Yangcheon-gu 20.7
Yeongdeungpo-gu 20.7
Yongsan-gu 20.7
Name: Normal(Green), dtype: float64

data.groupby(['Station name(district)'])['Bad(Yellow)'].mean()

Station name(district)
Dobong-gu 40.083333
Dongdaemun-gu 40.083333
Dongjak-gu 40.083333
Eunpyeong-gu 40.083333
Gangbuk-gu 40.083333
Gangdong-gu 40.083333
Gangnam-gu 40.083333
Gangseo-gu 40.083333
Geumcheon-gu 40.083333
Guro-gu 40.083333
Gwanak-gu 40.083333
Gwangjin-gu 40.083333
Jongno-gu 40.083333
Jung-gu 40.083333
Jungnang-gu 40.083333
Mapo-gu 40.083333
Nowon-gu 40.083333
Seocho-gu 40.083333
Seodaemun-gu 40.083333
Seongbuk-gu 40.083333
Seongdong-gu 40.083333
Songpa-gu 40.083333
Yangcheon-gu 40.083333
Yeongdeungpo-gu 40.083333
Yongsan-gu 40.083333
Name: Bad(Yellow), dtype: float64

data.groupby(['Station name(district)'])['Very bad(Red)'].mean()

Station name(district)
Dobong-gu 192.25
Dongdaemun-gu 192.25
Dongjak-gu 192.25
Eunpyeong-gu 192.25
Gangbuk-gu 192.25
Gangdong-gu 192.25
Gangnam-gu 192.25
Gangseo-gu 192.25
Geumcheon-gu 192.25
Guro-gu 192.25
Gwanak-gu 192.25
Gwangjin-gu 192.25
Jongno-gu 192.25
Jung-gu 192.25
Jungnang-gu 192.25
Mapo-gu 192.25
Nowon-gu 192.25
Seocho-gu 192.25
Seodaemun-gu 192.25
https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 8/13
17/03/2025, 09:19 Class07.ipynb - Colab
Seongbuk-gu 192.25
Seongdong-gu 192.25
Songpa-gu 192.25
Yangcheon-gu 192.25
Yeongdeungpo-gu 192.25
Yongsan-gu 192.25
Name: Very bad(Red), dtype: float64

data.groupby(['Year','Month'])['Average value'].mean()

Year Month
2017 1 14.337679
2 12.489017
3 16.619704
4 13.785433
5 14.591266
6 10.687240
7 9.339461
8 5.769704
9 8.523488
10 7.367980
11 10.714108
12 14.811703
2018 1 14.942190
2 13.753487
3 14.093738
4 13.291203
5 11.618625
6 11.320648
7 7.856529
8 6.994282
9 5.760708
10 7.579595
11 13.760277
12 12.148338
2019 1 18.397707
2 16.660576
3 22.281463
4 11.211278
5 14.403760
6 9.440856
7 8.515863
8 9.054132
9 6.947658
10 7.703279
11 12.081643
12 12.388990
Name: Average value, dtype: float64

data.groupby(['Station name(district)','Year','Month'])['Average value'].mean()

Station name(district) Year Month


Dobong-gu 2017 1 14.163661
2 12.117790
3 15.274379
4 13.405665
5 15.129161
6 10.941122
7 9.150097
8 5.685269
9 7.992110
10 7.105030
11 9.830824
12 13.826626
2018 1 14.365763
2 14.425309
3 14.515254
4 13.637658
5 10.096776
6 9.873897
7 6.640097
8 6.120289
9 5.060646
10 7.283574
11 14.649538
12 13.345104
2019 1 19.420026
2 15.462532
3 21.773700
4 10.318269
5 17.464176
6 11.321872
7 9.304642
8 10.625705

https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 9/13
17/03/2025, 09:19 Class07.ipynb - Colab
9 8.377735
10 9.076809
11 10.676297
12 12.724758
Dongdaemun-gu 2017 1 15.280303
2 12.952813
3 15.940782
4 12.119957
5 12.777500
6 9.134105
7 7.868023
8 5.612222
9 8.890782
10 7.533812
11 10.209900
12 15.060177
2018 1 14.198175
2 13.533617
3 13.983695
4 11.790508
5 9.683964
6 8.785286
7 6.381974
8 5.236286
9 4 277463
data.pivot_table(index = ['Status'] , columns = ['Station name(district)','Year'], values = ['Average value'])

Average value

Station
Dobong-gu Dongdaemun-gu Dongjak-gu Eunpyeong-gu
name(district)

Year 2017 2018 2019 2017 2018 2019 2017 2018 2019 2017 2018

Status

Abnormal 48.590909 0.000000 NaN NaN 0.004000 0.920727 48.357143 -0.825222 58.333333 -1.000000 985.0

Abnormal Data 33.016874 22.594080 568.946159 37.741935 12.437877 27.287980 63.462469 32.073552 37.463228 20.649643 29.0

Need Calibration 14.275168 9.224388 7.610652 9.307926 4.197164 8.112720 21.588804 1.710099 2.684361 6.236259 17.0

Normal 11.224980 11.117148 10.011704 11.145147 10.016295 10.782746 11.261688 10.450340 11.638033 11.261054 10.9

Power Cut Off -1.000000 -1.000000 -1.000000 NaN 0.000000 -0.779910 NaN -1.000000 -0.163886 -0.503601 0.0

Under Repair 1.312428 0.598308 NaN 0.845784 0.208913 0.126879 11.057540 0.001000 NaN 3.051658 4.4

data.pivot_table(index = ['Status'] , columns = ['Station name(district)','Year'], values = ['Average value']).T

https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 10/13
17/03/2025, 09:19 Class07.ipynb - Colab

Abnormal Need Power Cut Under


Status Abnormal Normal
Data Calibration Off Repair

Station
Year
name(district)

Average Dobong-gu 2017 48.590909 33.016874 14.275168 11.224980 -1.000000 1.312428


value
2018 0.000000 22.594080 9.224388 11.117148 -1.000000 0.598308

2019 NaN 568.946159 7.610652 10.011704 -1.000000 NaN

Dongdaemun-gu 2017 NaN 37.741935 9.307926 11.145147 NaN 0.845784

2018 0.004000 12.437877 4.197164 10.016295 0.000000 0.208913

2019 0.920727 27.287980 8.112720 10.782746 -0.779910 0.126879

Dongjak-gu 2017 48.357143 63.462469 21.588804 11.261688 NaN 11.057540

2018 -0.825222 32.073552 1.710099 10.450340 -1.000000 0.001000

2019 58.333333 37.463228 2.684361 11.638033 -0.163886 NaN

Eunpyeong-gu 2017 -1.000000 20.649643 6.236259 11.261054 -0.503601 3.051658

2018 985.000000 29.065445 17.010359 10.986868 0.000000 4.417200

2019 135.523077 31.345793 4.041211 11.162641 -0.998247 -1.000000

Gangbuk-gu 2017 NaN 5.882641 19.976316 9.902053 NaN 0.808267

2018 NaN 19.699577 3.152871 9.153018 NaN NaN

2019 NaN 35.526079 0.525987 11.660219 -0.750000 NaN

Gangdong-gu 2017 0.004000 41.828778 6.681843 12.093712 -0.906727 0.677956

2018 3.670000 33.474940 26.974808 11.329084 10.739705 0.020397

2019 23.614370 30.863651 38.103776 11.317047 -0.732143 NaN

Gangnam-gu 2017 8.618658 28.559846 13.546572 11.346464 -1.000000 19.015797

2018 60.666667 32.408574 21.561715 9.551760 1.199437 NaN

2019 22.492236 39.159623 8.103081 11.485999 -0.894350 0.209211

Gangseo-gu 2017 35.136364 30.009618 13.051222 11.684362 NaN 71.612560

2018 NaN 1276.033138 13.322847 10.199640 NaN NaN

2019 0.356000 33.033797 2.399893 12.059901 -0.264579 NaN

Geumcheon-gu 2017 NaN 15.610723 7.744808 11.250420 0.361765 10.533500

2018 0.000000 41.789786 26.721175 10.428037 3.000000 NaN

2019 NaN 41.785582 50.465668 10.560088 5.933333 NaN

Guro-gu 2017 -1.000000 36.908083 15.561682 11.767185 NaN 0.728972

2018 30.348315 323.860317 18.831494 10.797984 -0.500000 5.948672

2019 686.071429 363.365020 107.636299 10.670323 -0.857143 56.142857

Gwanak-gu 2017 0.990615 44.653805 0.664774 11.527457 0.000000 0.005259

2018 0.052850 176.978533 33.600459 11.851936 NaN 0.031512

2019 851.444136 39.525312 45.361680 12.200562 -0.811741 0.052819

Gwangjin-gu 2017 12.238095 3.793985 10.891158 11.310548 4.291953 11.000000

2018 0.407000 63.569006 49.686069 10.518076 0.494525 3.271569

2019 985.000000 155.738854 106.824375 11.255021 -0.948107 1.760000

Jongno-gu 2017 NaN 30.366409 20.319359 10.753219 -1.000000 1.496383

2018 37.433333 45.937327 13.922225 9.471369 -1.000000 -0.028738

2019 14.246429 34.352941 46.443580 10.100247 1.018583 NaN

Jung-gu 2017 NaN 30.370647 15.013770 10.900571 -1.000000 NaN

2018 -0.257000 43.712846 2.262388 9.708067 -1.000000 NaN

2019 NaN 44.305130 1.949978 10.114266 0.300000 NaN

Jungnang-gu 2017 NaN 24.710845 10.545353 11.638309 NaN 0.876160

2018 0.069069 12.111542 5.280196 9.389546 NaN 0.495436

2019 17.454545 24.153489 10.769362 10.292692 -0.902666 0.087852

M 2017 N N 13 507082 3 471733 11 561097 0 872958 4 021468


https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 11/13
17/03/2025, 09:19 Class07.ipynb - Colab
Mapo-gu 2017 NaN 13.507082 3.471733 11.561097 0.872958 4.021468

2018 71.243902 277.364171 7.934948 11.501300 -0.978261 0.001011

2019 860.333333 32.266232 42.768801 12.372240 -0.924358 0.114386

Nowon-gu 2017 NaN 22.359850 21.152725 11.354312 NaN 0.865385

2018 20.655902 4.626649 4.166894 10.276375 NaN 0.005423

2019 NaN 48.480000 1.804926 11.033216 -0.888889 0.000654

Seocho-gu 2017 NaN 28.146489 10.057098 11.828383 NaN 0.731882

2018 1037.757009 498.453391 3.877194 10.555635 NaN 5.844039

2019 1105.337500 41.115277 74.668615 11.278722 -0.964286 NaN

Seodaemun-gu 2017 1.882759 22.923758 10.089192 11.794005 -0.378947 0.256582

2018 NaN 10.894273 8.190010 10.839053 -0.812781 0.695815

2019 -1.000000 131.570215 5.278555 10.676471 -0.542500 NaN

Seongbuk-gu 2017 55.417280 24.534495 11.692913 12.089933 NaN 2.710803

2018 0.002119 28.072244 13.162340 10.431501 NaN 31.876290

2019 940.181818 188.727341 60.425807 10.892656 -0.552632 NaN

Seongdong-gu 2017 NaN 74.924624 21.337824 11.568283 0.594342 27.937081

2018 0.182623 29.179528 14.416531 10.914839 -0.714286 5.918474

2019 696.428729 34.989390 70.441988 11.710722 -0.878882 NaN

Songpa-gu 2017 3.000000 39.523537 5.660233 11.416196 2.944444 3.174074

2018 NaN 141.643519 10.238895 10.615511 0.236342 66.205128

2019 NaN 393.692042 71.493105 10.555947 -1.000000 NaN

Yangcheon-gu 2017 0.532609 47.372742 22.051409 12.075341 -0.233333 0.000131

2018 0.000545 6.928410 3.616822 10.161909 NaN 0.021216

2019 NaN 49.227287 1.184988 12.376423 0.000827 0.000000

Yeongdeungpo-gu 2017 0.007309 22.341982 13.280724 12.350276 -0.750000 3.043808

2018 35.215821 190.210017 51.219182 12.610844 -0.500000 4.922618

2019 985.000000 74.842739 36.219209 12.016179 0.000500 NaN

Yongsan-gu 2017 2.040276 32.931914 0.194561 10.683261 NaN 0.032000

2018 0.012273 79.665263 3.469231 9.517789 -0.500000 6.124031

2019 29.677318 31.547504 4.098531 9.569089 -1.000000 NaN

https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 12/13
17/03/2025, 09:19 Class07.ipynb - Colab

grouped_data = data.pivot_table(index = ['Status'] , columns = ['Station name(district)','Year'], values = ['Average val


grouped_data.to_csv('grouped_data.csv',index = False)

df = data.pivot_table(index='Status',columns=['Item name'],values=['Average value'],aggfunc='median')


df

Average value

Item name CO NO2 O3 PM10 PM2.5 SO2

Status

Abnormal 0.1 0.000 0.001 985.0 21.0 0.001

Abnormal Data 0.9 0.004 0.021 29.0 27.0 0.003

Need Calibration 0.3 0.012 0.013 9.0 4.0 0.003

Normal 0.5 0.025 0.021 35.0 19.0 0.004

Power Cut Off -1.0 -1.000 -1.000 -1.0 -1.0 -1.000

Under Repair 0.1 0.000 0.000 0.0 0.0 0.000

import matplotlib.pyplot as plt, seaborn as sns


sns.heatmap(df, annot = True, cmap='YlGnBu')
plt.show()

df1 = data.pivot_table(index='Status',columns=['Item name'],values=['Average value'],aggfunc='mean')


sns.heatmap(df1, annot = True,cmap='coolwarm')
plt.show()

https://colab.research.google.com/drive/1Dl8Ht_z3Pn_IWAFQEzFaxrFspI6Jkya2#printMode=true 13/13

You might also like