This project investigates how air pollution—specifically levels of PM2.5 and Ozone—is associated with respiratory health outcomes across different regions in the United States. We aim to integrate environmental and public health data to explore spatial patterns and correlations.
- Source: CDC BRFSS
- Format: Fixed-width ASCII (.ASC), converted to CSV
- Key Features: Health responses including asthma prevalence, demographics, and location data
- Key Column:
_STATE(mapped to full state name)
- Files used:
annual_conc_by_monitor_2023.csv– Contains pollutant levels by monitoring stationaqs_sites.csv– Metadata for monitoring site locations
- Pollutants of Interest:
- PM2.5 - Local Conditions
- Ozone
- Key Columns:
State Code,County Code,Site Num,Parameter Name,Arithmetic Mean
- Convert fixed-width BRFSS ASCII data to CSV using column specifications from the CDC-provided layout and codebook.
- Decode categorical variables using SAS
FORMAT23.sasmappings. - Join AQS pollutant data with monitoring site metadata to extract
State NameandCounty Name. - Aggregate pollutant data by state and county.
- Merge BRFSS and AQS data on
State Nameto begin initial analysis (next step: improve precision using county-level identifiers).
- Enhance BRFSS records with county-level location data if available.
- Conduct statistical analysis to explore correlations between air quality and asthma prevalence.
- Generate maps and visualizations to illustrate geographic disparities.
- Python (Pandas, regex, Altair, PyPlot, Folium)
- EPA AirData and CDC BRFSS public data
- Jupyter Notebook environment
- Sachin Murthy
- Heba Jaber
- Dharshana Somasunderam