Technical note: Investigating sub-city gradients of air quality: lessons learned with low-cost PM2.5 and AOD monitors and machine learning
Abstract. Accurate sub-city fine particulate matter (PM2.5) estimates could improve epidemiological and health-impact studies in cities with heterogeneous distributions of PM2.5, yet most cities globally lack the monitoring density necessary for sub-city-scale estimates. To estimate spatiotemporal variability in PM2.5, we use machine learning (Random Forests; RFs) and concurrent PM2.5 and AOD measurements from the Citizen Enabled Aerosol Measurements for Satellites (CEAMS) low-cost sensor network as well as PM2.5 measurements from the Environmental Protection Agency’s (EPA) reference monitors during wintertime in Denver, CO, USA. The RFs predicted PM2.5 in a 5-fold cross validation (CV) with relatively high skill (95% confidence interval R2=0.74–0.84 for CEAMS; R2=0.68–0.75 for EPA) though the models were aided by the spatiotemporal autocorrelation of the PM2.5 measurements. We found that the most important predictors of PM2.5 were factors associated with pooling of pollution in wintertime, such as low planetary boundary layer heights (PBLH), stagnant wind conditions, and, to a lesser degree, elevation. In general, spatial predictors were less important than spatiotemporal predictors because temporal variability exceeded spatial variability in our dataset. Finally, although concurrent AOD was an important predictor in our RF model for hourly PM2.5, it did not improve model performance with high statistical significance. Regardless, we found that low-cost PM2.5 measurements incorporated into an RF model were useful in interpreting meteorological and geographic drivers of PM2.5 over wintertime Denver. We also explored how the RF model performance and interpretation changes based on different model configurations and data processing.
Viewed (geographical distribution)