Technical note: Investigating sub-city gradients of air quality: lessons learned with low-cost PM<sub>2.5</sub> and AOD monitors and machine learning

Cheeseman, Michael; Ford, Bonne; Rosen, Zoey; Wendt, Eric; DesRosiers, Alex; Hill, Aaron J.; L'Orange, Christian; Quinn, Casey; Long, Marilee; Jathar, Shantanu H.; Volckens, John; Pierce, Jeffrey R.

doi:https://doi.org/10.5194/acp-2021-751

Preprints

https://doi.org/10.5194/acp-2021-751

Preprints

29 Oct 2021

| 29 Oct 2021

Status: this preprint was under review for the journal ACP but the revision was not accepted.

Technical note: Investigating sub-city gradients of air quality: lessons learned with low-cost PM_2.5 and AOD monitors and machine learning

Michael Cheeseman, Bonne Ford, Zoey Rosen, Eric Wendt, Alex DesRosiers, Aaron J. Hill, Christian L'Orange, Casey Quinn, Marilee Long, Shantanu H. Jathar, John Volckens, and Jeffrey R. Pierce

Abstract. Accurate sub-city fine particulate matter (PM_2.5) estimates could improve epidemiological and health-impact studies in cities with heterogeneous distributions of PM_2.5, yet most cities globally lack the monitoring density necessary for sub-city-scale estimates. To estimate spatiotemporal variability in PM_2.5, we use machine learning (Random Forests; RFs) and concurrent PM_2.5 and AOD measurements from the Citizen Enabled Aerosol Measurements for Satellites (CEAMS) low-cost sensor network as well as PM_2.5 measurements from the Environmental Protection Agency’s (EPA) reference monitors during wintertime in Denver, CO, USA. The RFs predicted PM_2.5 in a 5-fold cross validation (CV) with relatively high skill (95% confidence interval R2=0.74–0.84 for CEAMS; R2=0.68–0.75 for EPA) though the models were aided by the spatiotemporal autocorrelation of the PM_2.5 measurements. We found that the most important predictors of PM_2.5 were factors associated with pooling of pollution in wintertime, such as low planetary boundary layer heights (PBLH), stagnant wind conditions, and, to a lesser degree, elevation. In general, spatial predictors were less important than spatiotemporal predictors because temporal variability exceeded spatial variability in our dataset. Finally, although concurrent AOD was an important predictor in our RF model for hourly PM_2.5, it did not improve model performance with high statistical significance. Regardless, we found that low-cost PM_2.5 measurements incorporated into an RF model were useful in interpreting meteorological and geographic drivers of PM_2.5 over wintertime Denver. We also explored how the RF model performance and interpretation changes based on different model configurations and data processing.

Received: 10 Sep 2021 – Discussion started: 29 Oct 2021

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 1248 KB)

Supplement (2351 KB)

Download & links

Michael Cheeseman, Bonne Ford, Zoey Rosen, Eric Wendt, Alex DesRosiers, Aaron J. Hill, Christian L'Orange, Casey Quinn, Marilee Long, Shantanu H. Jathar, John Volckens, and Jeffrey R. Pierce

Status: closed

RC1: 'Comment on acp-2021-751', Anonymous Referee #1, 09 Jan 2022

General comments:

This paper uses the machine learning (ML) method to investigate the spatiotemporal variability of PM_2.5 in winter over Denver. Although this is an interesting attempt, I found the methodology of this study has not been clearly stated, so that I can’t confirm the results are scientifically sound under current condition. Following are my main concerns:

1) Section 2.1.3. Question on the spatial resolution of the meteorological inputs. The meteorological inputs for the RF model are derived from the GESO-FP data with a coarse resolution of 25 km. According to fig.1, almost 2/3 of the sites are located in one grid. You mention that the data were interpolated spatially to the CEAMS and EPA sites, but no detailed information is given. What method do you use to downscale the areal data into point data? How do you check the accuracy of the interpolation results?

2) Section 2.2.3 and figure 6. Question on the validation process of the ML model. As far as I understand, you use the whole dataset to tune the RF model with k-fold cross validation method, and then use the same dataset to validate the model performance with k-fold cross validation method and bootstrapping. In my opinion, to give unbiased evaluation on the robustness of the ML model, the validation dataset should never be used in the training process. Otherwise, the accuracy of the ML model is certain to be high since the model has already learned the pattern. Please clarify if my understanding is wrong.

I do not agree with this sentence. A well-developed ML model should be able to work on new datasets. This is why we test the model’s ability with new datasets in the model validation process. If the model can only work well on the training dataset, it may have an overfitting problem.

3) Question on the temporal resolution of the inputs and outputs. In the 24-hour RF model case, the model inputs and outputs are not of the same temporal resolution. The output/prediction is 24-hour PM_2.5. But the meteorology inputs, separated into daytime (11am-3pm) group and nighttime (11pm-3am) group, only cover the information of these 8 hours of a day. This method is valid if you can prove that the 8-hour data is enough to represent the whole day.

Since these comments are related to the fundamental methodology of the study, I cannot recommend this study for publication before these questions are explained.

Specific comments:

1) The target of this study is to “” (line 93). However, you pay a lot of attention on testing the importance of including co-located AOD measurements in the RF model. What is the reason of picking this specific variable out of all factors that could contribute to the spatiotemporal variation of PM_2.5? The motivation sounds weak especially when your conclusion is that adding co-located AOD data makes very little improvements to model prediction (line 508-509 and line 514-515).

2)

Did you compare the real-time measurement with time-integrated filter measurements? Are they in good agreement? The word “corrected” sounds that the real-time measurement is not so reliable as the filter measurements. Please rewrite it.

3)

Is Figure S13 the average results of all available monitoring sites? If so, I can only see the model's ability on temporal variability but not on spatial variability. Please give more explanation on this finding.

Citation: https://doi.org/10.5194/acp-2021-751-RC1
RC2:
'Comment on acp-2021-751', Anonymous Referee #2, 04 Apr 2022
The authors propose a Random Forest model to predict sub-city-scale PM_2.5 concentrations. The studied case is wintertime in Denver, captured by CEAMS’ low-cost sensor network on the one hand, and EPA’s reference monitors on the other. A permutation metric is applied to conclude predictor importance, with a special interest in AOD.

While this is an interesting approach to quantify the influence of various drivers, I would like to point out some insufficiently discussed choices in applying the methods that might compromise the results.

Main concerns:

From line 283 I conclude that the model was trained and tested on the same dataset that was used to tune the hyperparameters beforehand. Therefore, the test data can’t strictly be considered unseen. The extent to which this limits the detection of overfitting and therefore validity of the results should at least be discussed. Potential overfitting is also implied by the authors’ lack of confidence in the predictive skills of their model for new data (lines 464-466).

Caveats in the analysis of predictor importance. A citation introducing and discussing the permutation metric seems to be missing. To my knowledge, the current gold standard to deduce predictor importance are Shapley-value based methods, due to their favorable theoretical properties. Therefore, it would be nice to justify the choice (presumably computational cost?). Especially the presence of a competitor like RH, that apparently got an unfair advantage by the correction factor (lines 428-430), seems to call for a metric where subsets of predictors are left out in the training. It is also questionable how well models trained on highly autocorrelated data are suited for the importance analysis, as stated in lines 314-316. Further justification is needed.

Minor concerns:

Further investigation of the impact of interpolating the data could be insightful.

To me, the main purpose of the paper is partly unclear. While transparency about the training and tuning process is important, the extensive explanation of Random Forests, cross validation and parameter tuning seems a bit convoluted for a paper whose foremost goal is to investigate the impact of different factors on the spatiotemporal variability of PM₅, and not necessarily to serve as a guide on applying RF models.

Technical notes:

Line 160: consistency in use of special characters in “Angstrom.”

Line 262: missing hyphen in “over- or underfitting”

Line 278: “depth of 15, 2 samples needed” – as far as I know, starting a clause with a symbol is considered bad style and also interrupts the reading flow here

Table 2: The explanation for min_samples_leaf seems misleading, since leaf nodes aren’t split. Do you mean the minimum samples stored in a leaf?

Line 289: “This process was repeated until a distribution of each error statistic was created” makes it sounds as if there was an absolute threshold on how often to repeat a process before you can apply statistics. Maybe rather something like: “…repeated to create a distribution…”?

It seems counterintuitive that the shuffled folds entail more autocorrelation than the consecutive ones. A very brief explanation or some numbers in the supplementary material could be helpful. On a positive note, I appreciate the topic is addressed at all.
Citation: https://doi.org/10.5194/acp-2021-751-RC2
AC1: 'Comment on acp-2021-751', Michael Cheeseman, 30 Jul 2022

I have attached the referee response document.

Citation: https://doi.org/10.5194/acp-2021-751-AC1

Status: closed

RC1: 'Comment on acp-2021-751', Anonymous Referee #1, 09 Jan 2022

General comments:

This paper uses the machine learning (ML) method to investigate the spatiotemporal variability of PM_2.5 in winter over Denver. Although this is an interesting attempt, I found the methodology of this study has not been clearly stated, so that I can’t confirm the results are scientifically sound under current condition. Following are my main concerns:

1) Section 2.1.3. Question on the spatial resolution of the meteorological inputs. The meteorological inputs for the RF model are derived from the GESO-FP data with a coarse resolution of 25 km. According to fig.1, almost 2/3 of the sites are located in one grid. You mention that the data were interpolated spatially to the CEAMS and EPA sites, but no detailed information is given. What method do you use to downscale the areal data into point data? How do you check the accuracy of the interpolation results?

2) Section 2.2.3 and figure 6. Question on the validation process of the ML model. As far as I understand, you use the whole dataset to tune the RF model with k-fold cross validation method, and then use the same dataset to validate the model performance with k-fold cross validation method and bootstrapping. In my opinion, to give unbiased evaluation on the robustness of the ML model, the validation dataset should never be used in the training process. Otherwise, the accuracy of the ML model is certain to be high since the model has already learned the pattern. Please clarify if my understanding is wrong.

I do not agree with this sentence. A well-developed ML model should be able to work on new datasets. This is why we test the model’s ability with new datasets in the model validation process. If the model can only work well on the training dataset, it may have an overfitting problem.

3) Question on the temporal resolution of the inputs and outputs. In the 24-hour RF model case, the model inputs and outputs are not of the same temporal resolution. The output/prediction is 24-hour PM_2.5. But the meteorology inputs, separated into daytime (11am-3pm) group and nighttime (11pm-3am) group, only cover the information of these 8 hours of a day. This method is valid if you can prove that the 8-hour data is enough to represent the whole day.

Since these comments are related to the fundamental methodology of the study, I cannot recommend this study for publication before these questions are explained.

Specific comments:

1) The target of this study is to “” (line 93). However, you pay a lot of attention on testing the importance of including co-located AOD measurements in the RF model. What is the reason of picking this specific variable out of all factors that could contribute to the spatiotemporal variation of PM_2.5? The motivation sounds weak especially when your conclusion is that adding co-located AOD data makes very little improvements to model prediction (line 508-509 and line 514-515).

2)

Did you compare the real-time measurement with time-integrated filter measurements? Are they in good agreement? The word “corrected” sounds that the real-time measurement is not so reliable as the filter measurements. Please rewrite it.

3)

Is Figure S13 the average results of all available monitoring sites? If so, I can only see the model's ability on temporal variability but not on spatial variability. Please give more explanation on this finding.

Citation: https://doi.org/10.5194/acp-2021-751-RC1
RC2:
'Comment on acp-2021-751', Anonymous Referee #2, 04 Apr 2022
The authors propose a Random Forest model to predict sub-city-scale PM_2.5 concentrations. The studied case is wintertime in Denver, captured by CEAMS’ low-cost sensor network on the one hand, and EPA’s reference monitors on the other. A permutation metric is applied to conclude predictor importance, with a special interest in AOD.

While this is an interesting approach to quantify the influence of various drivers, I would like to point out some insufficiently discussed choices in applying the methods that might compromise the results.

Main concerns:

From line 283 I conclude that the model was trained and tested on the same dataset that was used to tune the hyperparameters beforehand. Therefore, the test data can’t strictly be considered unseen. The extent to which this limits the detection of overfitting and therefore validity of the results should at least be discussed. Potential overfitting is also implied by the authors’ lack of confidence in the predictive skills of their model for new data (lines 464-466).

Caveats in the analysis of predictor importance. A citation introducing and discussing the permutation metric seems to be missing. To my knowledge, the current gold standard to deduce predictor importance are Shapley-value based methods, due to their favorable theoretical properties. Therefore, it would be nice to justify the choice (presumably computational cost?). Especially the presence of a competitor like RH, that apparently got an unfair advantage by the correction factor (lines 428-430), seems to call for a metric where subsets of predictors are left out in the training. It is also questionable how well models trained on highly autocorrelated data are suited for the importance analysis, as stated in lines 314-316. Further justification is needed.

Minor concerns:

Further investigation of the impact of interpolating the data could be insightful.

To me, the main purpose of the paper is partly unclear. While transparency about the training and tuning process is important, the extensive explanation of Random Forests, cross validation and parameter tuning seems a bit convoluted for a paper whose foremost goal is to investigate the impact of different factors on the spatiotemporal variability of PM₅, and not necessarily to serve as a guide on applying RF models.

Technical notes:

Line 160: consistency in use of special characters in “Angstrom.”

Line 262: missing hyphen in “over- or underfitting”

Line 278: “depth of 15, 2 samples needed” – as far as I know, starting a clause with a symbol is considered bad style and also interrupts the reading flow here

Table 2: The explanation for min_samples_leaf seems misleading, since leaf nodes aren’t split. Do you mean the minimum samples stored in a leaf?

Line 289: “This process was repeated until a distribution of each error statistic was created” makes it sounds as if there was an absolute threshold on how often to repeat a process before you can apply statistics. Maybe rather something like: “…repeated to create a distribution…”?

It seems counterintuitive that the shuffled folds entail more autocorrelation than the consecutive ones. A very brief explanation or some numbers in the supplementary material could be helpful. On a positive note, I appreciate the topic is addressed at all.
Citation: https://doi.org/10.5194/acp-2021-751-RC2
AC1: 'Comment on acp-2021-751', Michael Cheeseman, 30 Jul 2022

I have attached the referee response document.

Citation: https://doi.org/10.5194/acp-2021-751-AC1

Michael Cheeseman, Bonne Ford, Zoey Rosen, Eric Wendt, Alex DesRosiers, Aaron J. Hill, Christian L'Orange, Casey Quinn, Marilee Long, Shantanu H. Jathar, John Volckens, and Jeffrey R. Pierce

Supplement

https://doi.org/10.5194/acp-2021-751-supplement

Michael Cheeseman, Bonne Ford, Zoey Rosen, Eric Wendt, Alex DesRosiers, Aaron J. Hill, Christian L'Orange, Casey Quinn, Marilee Long, Shantanu H. Jathar, John Volckens, and Jeffrey R. Pierce

Viewed

Total article views: 1,848 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,342	445	61	1,848	134	67	83

HTML: 1,342
PDF: 445
XML: 61
Total: 1,848
Supplement: 134
BibTeX: 67
EndNote: 83

Views and downloads (calculated since 29 Oct 2021)

Month	HTML	PDF	XML	Total
Oct 2021	177	52	3	232
Nov 2021	164	70	4	238
Dec 2021	40	16	3	59
Jan 2022	56	23	6	85
Feb 2022	36	16	3	55
Mar 2022	29	19	1	49
Apr 2022	44	12	0	56
May 2022	44	12	1	57
Jun 2022	38	6	2	46
Jul 2022	44	6	1	51
Aug 2022	35	13	0	48
Sep 2022	21	4	0	25
Oct 2022	6	2	0	8
Nov 2022	17	5	0	22
Dec 2022	21	4	0	25
Jan 2023	62	10	0	72
Feb 2023	43	10	1	54
Mar 2023	48	8	1	57
Apr 2023	31	4	0	35
May 2023	15	4	1	20
Jun 2023	8	2	0	10
Jul 2023	15	6	1	22
Aug 2023	21	3	1	25
Sep 2023	22	11	1	34
Oct 2023	11	6	2	19
Nov 2023	14	5	1	20
Dec 2023	14	5	1	20
Jan 2024	20	10	0	30
Feb 2024	17	7	1	25
Mar 2024	39	14	4	57
Apr 2024	13	3	3	19
May 2024	17	2	3	22
Jun 2024	10	4	1	15
Jul 2024	12	3	3	18
Aug 2024	10	3	1	14
Sep 2024	9	7	1	17
Oct 2024	7	8	0	15
Nov 2024	13	2	0	15
Dec 2024	9	1	0	10
Jan 2025	10	6	1	17
Feb 2025	6	4	1	11
Mar 2025	19	7	3	29
Apr 2025	5	6	0	11
May 2025	7	11	1	19
Jun 2025	23	8	2	33
Jul 2025	16	3	1	20
Aug 2025	4	2	1	7

Cumulative views and downloads (calculated since 29 Oct 2021)

Month	HTML	PDF	XML	Total
Oct 2021	177	52	3	232
Nov 2021	164	70	4	238
Dec 2021	40	16	3	59
Jan 2022	56	23	6	85
Feb 2022	36	16	3	55
Mar 2022	29	19	1	49
Apr 2022	44	12	0	56
May 2022	44	12	1	57
Jun 2022	38	6	2	46
Jul 2022	44	6	1	51
Aug 2022	35	13	0	48
Sep 2022	21	4	0	25
Oct 2022	6	2	0	8
Nov 2022	17	5	0	22
Dec 2022	21	4	0	25
Jan 2023	62	10	0	72
Feb 2023	43	10	1	54
Mar 2023	48	8	1	57
Apr 2023	31	4	0	35
May 2023	15	4	1	20
Jun 2023	8	2	0	10
Jul 2023	15	6	1	22
Aug 2023	21	3	1	25
Sep 2023	22	11	1	34
Oct 2023	11	6	2	19
Nov 2023	14	5	1	20
Dec 2023	14	5	1	20
Jan 2024	20	10	0	30
Feb 2024	17	7	1	25
Mar 2024	39	14	4	57
Apr 2024	13	3	3	19
May 2024	17	2	3	22
Jun 2024	10	4	1	15
Jul 2024	12	3	3	18
Aug 2024	10	3	1	14
Sep 2024	9	7	1	17
Oct 2024	7	8	0	15
Nov 2024	13	2	0	15
Dec 2024	9	1	0	10
Jan 2025	10	6	1	17
Feb 2025	6	4	1	11
Mar 2025	19	7	3	29
Apr 2025	5	6	0	11
May 2025	7	11	1	19
Jun 2025	23	8	2	33
Jul 2025	16	3	1	20
Aug 2025	4	2	1	7

Viewed (geographical distribution)

Total article views: 1,900 (including HTML, PDF, and XML) Thereof 1,900 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 08 Aug 2025

Download

Preprint (1248 KB)
Metadata XML

Short summary

This article predicts concentrations of airborne particulate matter over wintertime Denver, CO, USA, using meteorological and geographic information, as well as low-cost aerosol optical depth (AOD) measurements captured by citizen scientists. Machine learning methods revealed that low boundary layer heights and stagnant air were the best predictors of poor air quality, while AOD provided little skill in predicting particulate matter for this location and time period.


Total:	0
HTML:	0
PDF:	0
XML:	0

Technical note: Investigating sub-city gradients of air quality: lessons learned with low-cost PM2.5 and AOD monitors and machine learning

Supplement

Viewed

Viewed (geographical distribution)

Technical note: Investigating sub-city gradients of air quality: lessons learned with low-cost PM_2.5 and AOD monitors and machine learning