the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical note: Investigating sub-city gradients of air quality: lessons learned with low-cost PM2.5 and AOD monitors and machine learning
Abstract. Accurate sub-city fine particulate matter (PM2.5) estimates could improve epidemiological and health-impact studies in cities with heterogeneous distributions of PM2.5, yet most cities globally lack the monitoring density necessary for sub-city-scale estimates. To estimate spatiotemporal variability in PM2.5, we use machine learning (Random Forests; RFs) and concurrent PM2.5 and AOD measurements from the Citizen Enabled Aerosol Measurements for Satellites (CEAMS) low-cost sensor network as well as PM2.5 measurements from the Environmental Protection Agency’s (EPA) reference monitors during wintertime in Denver, CO, USA. The RFs predicted PM2.5 in a 5-fold cross validation (CV) with relatively high skill (95% confidence interval R2=0.74–0.84 for CEAMS; R2=0.68–0.75 for EPA) though the models were aided by the spatiotemporal autocorrelation of the PM2.5 measurements. We found that the most important predictors of PM2.5 were factors associated with pooling of pollution in wintertime, such as low planetary boundary layer heights (PBLH), stagnant wind conditions, and, to a lesser degree, elevation. In general, spatial predictors were less important than spatiotemporal predictors because temporal variability exceeded spatial variability in our dataset. Finally, although concurrent AOD was an important predictor in our RF model for hourly PM2.5, it did not improve model performance with high statistical significance. Regardless, we found that low-cost PM2.5 measurements incorporated into an RF model were useful in interpreting meteorological and geographic drivers of PM2.5 over wintertime Denver. We also explored how the RF model performance and interpretation changes based on different model configurations and data processing.
- Preprint
(1248 KB) - Metadata XML
-
Supplement
(2351 KB) - BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on acp-2021-751', Anonymous Referee #1, 09 Jan 2022
General comments:
This paper uses the machine learning (ML) method to investigate the spatiotemporal variability of PM2.5 in winter over Denver. Although this is an interesting attempt, I found the methodology of this study has not been clearly stated, so that I can’t confirm the results are scientifically sound under current condition. Following are my main concerns:
1) Section 2.1.3. Question on the spatial resolution of the meteorological inputs. The meteorological inputs for the RF model are derived from the GESO-FP data with a coarse resolution of 25 km. According to fig.1, almost 2/3 of the sites are located in one grid. You mention that the data were interpolated spatially to the CEAMS and EPA sites, but no detailed information is given. What method do you use to downscale the areal data into point data? How do you check the accuracy of the interpolation results?
2) Section 2.2.3 and figure 6. Question on the validation process of the ML model. As far as I understand, you use the whole dataset to tune the RF model with k-fold cross validation method, and then use the same dataset to validate the model performance with k-fold cross validation method and bootstrapping. In my opinion, to give unbiased evaluation on the robustness of the ML model, the validation dataset should never be used in the training process. Otherwise, the accuracy of the ML model is certain to be high since the model has already learned the pattern. Please clarify if my understanding is wrong.
Line 464-466: “However, even though we do not have confidence that our CEAMS model would have predictive skill for new time periods, we do have more confidence that our interpretation of the top meteorological and geographical relationships is valid under the conditions of the CEAMS campaign.”
I do not agree with this sentence. A well-developed ML model should be able to work on new datasets. This is why we test the model’s ability with new datasets in the model validation process. If the model can only work well on the training dataset, it may have an overfitting problem.
3) Question on the temporal resolution of the inputs and outputs. In the 24-hour RF model case, the model inputs and outputs are not of the same temporal resolution. The output/prediction is 24-hour PM2.5. But the meteorology inputs, separated into daytime (11am-3pm) group and nighttime (11pm-3am) group, only cover the information of these 8 hours of a day. This method is valid if you can prove that the 8-hour data is enough to represent the whole day.
Since these comments are related to the fundamental methodology of the study, I cannot recommend this study for publication before these questions are explained.
Specific comments:
1) The target of this study is to “investigate the potential drivers of fine-scale PM2.5 spatiotemporal variability in wintertime Denver…” (line 93). However, you pay a lot of attention on testing the importance of including co-located AOD measurements in the RF model. What is the reason of picking this specific variable out of all factors that could contribute to the spatiotemporal variation of PM2.5? The motivation sounds weak especially when your conclusion is that adding co-located AOD data makes very little improvements to model prediction (line 508-509 and line 514-515).
2) Line 149-150: “In this study, the Plantower PM2.5 data were not corrected using the time-integrated filter measurements of PM2.5 taken by the AMODs as in Ford et al., (2019)”.
Did you compare the real-time measurement with time-integrated filter measurements? Are they in good agreement? The word “corrected” sounds that the real-time measurement is not so reliable as the filter measurements. Please rewrite it.
3) Line 372-373: “We also found that the RF models were better at capturing temporal variability than spatial variability during the CEAMS deployment.”
Is Figure S13 the average results of all available monitoring sites? If so, I can only see the model's ability on temporal variability but not on spatial variability. Please give more explanation on this finding.
Citation: https://doi.org/10.5194/acp-2021-751-RC1 -
RC2: 'Comment on acp-2021-751', Anonymous Referee #2, 04 Apr 2022
The authors propose a Random Forest model to predict sub-city-scale PM2.5 concentrations. The studied case is wintertime in Denver, captured by CEAMS’ low-cost sensor network on the one hand, and EPA’s reference monitors on the other. A permutation metric is applied to conclude predictor importance, with a special interest in AOD.
While this is an interesting approach to quantify the influence of various drivers, I would like to point out some insufficiently discussed choices in applying the methods that might compromise the results.
Main concerns:
- From line 283 I conclude that the model was trained and tested on the same dataset that was used to tune the hyperparameters beforehand. Therefore, the test data can’t strictly be considered unseen. The extent to which this limits the detection of overfitting and therefore validity of the results should at least be discussed. Potential overfitting is also implied by the authors’ lack of confidence in the predictive skills of their model for new data (lines 464-466).
- Caveats in the analysis of predictor importance. A citation introducing and discussing the permutation metric seems to be missing. To my knowledge, the current gold standard to deduce predictor importance are Shapley-value based methods, due to their favorable theoretical properties. Therefore, it would be nice to justify the choice (presumably computational cost?). Especially the presence of a competitor like RH, that apparently got an unfair advantage by the correction factor (lines 428-430), seems to call for a metric where subsets of predictors are left out in the training. It is also questionable how well models trained on highly autocorrelated data are suited for the importance analysis, as stated in lines 314-316. Further justification is needed.
Minor concerns:
- Further investigation of the impact of interpolating the data could be insightful.
- To me, the main purpose of the paper is partly unclear. While transparency about the training and tuning process is important, the extensive explanation of Random Forests, cross validation and parameter tuning seems a bit convoluted for a paper whose foremost goal is to investigate the impact of different factors on the spatiotemporal variability of PM5, and not necessarily to serve as a guide on applying RF models.
Technical notes:
- Line 160: consistency in use of special characters in “Angstrom.”
- Line 262: missing hyphen in “over- or underfitting”
- Line 278: “depth of 15, 2 samples needed” – as far as I know, starting a clause with a symbol is considered bad style and also interrupts the reading flow here
- Table 2: The explanation for min_samples_leaf seems misleading, since leaf nodes aren’t split. Do you mean the minimum samples stored in a leaf?
- Line 289: “This process was repeated until a distribution of each error statistic was created” makes it sounds as if there was an absolute threshold on how often to repeat a process before you can apply statistics. Maybe rather something like: “…repeated to create a distribution…”?
- It seems counterintuitive that the shuffled folds entail more autocorrelation than the consecutive ones. A very brief explanation or some numbers in the supplementary material could be helpful. On a positive note, I appreciate the topic is addressed at all.
Citation: https://doi.org/10.5194/acp-2021-751-RC2 - AC1: 'Comment on acp-2021-751', Michael Cheeseman, 30 Jul 2022
Status: closed
-
RC1: 'Comment on acp-2021-751', Anonymous Referee #1, 09 Jan 2022
General comments:
This paper uses the machine learning (ML) method to investigate the spatiotemporal variability of PM2.5 in winter over Denver. Although this is an interesting attempt, I found the methodology of this study has not been clearly stated, so that I can’t confirm the results are scientifically sound under current condition. Following are my main concerns:
1) Section 2.1.3. Question on the spatial resolution of the meteorological inputs. The meteorological inputs for the RF model are derived from the GESO-FP data with a coarse resolution of 25 km. According to fig.1, almost 2/3 of the sites are located in one grid. You mention that the data were interpolated spatially to the CEAMS and EPA sites, but no detailed information is given. What method do you use to downscale the areal data into point data? How do you check the accuracy of the interpolation results?
2) Section 2.2.3 and figure 6. Question on the validation process of the ML model. As far as I understand, you use the whole dataset to tune the RF model with k-fold cross validation method, and then use the same dataset to validate the model performance with k-fold cross validation method and bootstrapping. In my opinion, to give unbiased evaluation on the robustness of the ML model, the validation dataset should never be used in the training process. Otherwise, the accuracy of the ML model is certain to be high since the model has already learned the pattern. Please clarify if my understanding is wrong.
Line 464-466: “However, even though we do not have confidence that our CEAMS model would have predictive skill for new time periods, we do have more confidence that our interpretation of the top meteorological and geographical relationships is valid under the conditions of the CEAMS campaign.”
I do not agree with this sentence. A well-developed ML model should be able to work on new datasets. This is why we test the model’s ability with new datasets in the model validation process. If the model can only work well on the training dataset, it may have an overfitting problem.
3) Question on the temporal resolution of the inputs and outputs. In the 24-hour RF model case, the model inputs and outputs are not of the same temporal resolution. The output/prediction is 24-hour PM2.5. But the meteorology inputs, separated into daytime (11am-3pm) group and nighttime (11pm-3am) group, only cover the information of these 8 hours of a day. This method is valid if you can prove that the 8-hour data is enough to represent the whole day.
Since these comments are related to the fundamental methodology of the study, I cannot recommend this study for publication before these questions are explained.
Specific comments:
1) The target of this study is to “investigate the potential drivers of fine-scale PM2.5 spatiotemporal variability in wintertime Denver…” (line 93). However, you pay a lot of attention on testing the importance of including co-located AOD measurements in the RF model. What is the reason of picking this specific variable out of all factors that could contribute to the spatiotemporal variation of PM2.5? The motivation sounds weak especially when your conclusion is that adding co-located AOD data makes very little improvements to model prediction (line 508-509 and line 514-515).
2) Line 149-150: “In this study, the Plantower PM2.5 data were not corrected using the time-integrated filter measurements of PM2.5 taken by the AMODs as in Ford et al., (2019)”.
Did you compare the real-time measurement with time-integrated filter measurements? Are they in good agreement? The word “corrected” sounds that the real-time measurement is not so reliable as the filter measurements. Please rewrite it.
3) Line 372-373: “We also found that the RF models were better at capturing temporal variability than spatial variability during the CEAMS deployment.”
Is Figure S13 the average results of all available monitoring sites? If so, I can only see the model's ability on temporal variability but not on spatial variability. Please give more explanation on this finding.
Citation: https://doi.org/10.5194/acp-2021-751-RC1 -
RC2: 'Comment on acp-2021-751', Anonymous Referee #2, 04 Apr 2022
The authors propose a Random Forest model to predict sub-city-scale PM2.5 concentrations. The studied case is wintertime in Denver, captured by CEAMS’ low-cost sensor network on the one hand, and EPA’s reference monitors on the other. A permutation metric is applied to conclude predictor importance, with a special interest in AOD.
While this is an interesting approach to quantify the influence of various drivers, I would like to point out some insufficiently discussed choices in applying the methods that might compromise the results.
Main concerns:
- From line 283 I conclude that the model was trained and tested on the same dataset that was used to tune the hyperparameters beforehand. Therefore, the test data can’t strictly be considered unseen. The extent to which this limits the detection of overfitting and therefore validity of the results should at least be discussed. Potential overfitting is also implied by the authors’ lack of confidence in the predictive skills of their model for new data (lines 464-466).
- Caveats in the analysis of predictor importance. A citation introducing and discussing the permutation metric seems to be missing. To my knowledge, the current gold standard to deduce predictor importance are Shapley-value based methods, due to their favorable theoretical properties. Therefore, it would be nice to justify the choice (presumably computational cost?). Especially the presence of a competitor like RH, that apparently got an unfair advantage by the correction factor (lines 428-430), seems to call for a metric where subsets of predictors are left out in the training. It is also questionable how well models trained on highly autocorrelated data are suited for the importance analysis, as stated in lines 314-316. Further justification is needed.
Minor concerns:
- Further investigation of the impact of interpolating the data could be insightful.
- To me, the main purpose of the paper is partly unclear. While transparency about the training and tuning process is important, the extensive explanation of Random Forests, cross validation and parameter tuning seems a bit convoluted for a paper whose foremost goal is to investigate the impact of different factors on the spatiotemporal variability of PM5, and not necessarily to serve as a guide on applying RF models.
Technical notes:
- Line 160: consistency in use of special characters in “Angstrom.”
- Line 262: missing hyphen in “over- or underfitting”
- Line 278: “depth of 15, 2 samples needed” – as far as I know, starting a clause with a symbol is considered bad style and also interrupts the reading flow here
- Table 2: The explanation for min_samples_leaf seems misleading, since leaf nodes aren’t split. Do you mean the minimum samples stored in a leaf?
- Line 289: “This process was repeated until a distribution of each error statistic was created” makes it sounds as if there was an absolute threshold on how often to repeat a process before you can apply statistics. Maybe rather something like: “…repeated to create a distribution…”?
- It seems counterintuitive that the shuffled folds entail more autocorrelation than the consecutive ones. A very brief explanation or some numbers in the supplementary material could be helpful. On a positive note, I appreciate the topic is addressed at all.
Citation: https://doi.org/10.5194/acp-2021-751-RC2 - AC1: 'Comment on acp-2021-751', Michael Cheeseman, 30 Jul 2022
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
1,225 | 391 | 51 | 1,667 | 113 | 45 | 46 |
- HTML: 1,225
- PDF: 391
- XML: 51
- Total: 1,667
- Supplement: 113
- BibTeX: 45
- EndNote: 46
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1