Global impact of COVID-19 restrictions on the surface concentrations of nitrogen dioxide and ozone

Keller, Christoph A.; Evans, Mathew J.; Knowland, K. Emma; Hasenkopf, Christa A.; Modekurty, Sruti; Lucchesi, Robert A.; Oda, Tomohiro; Franca, Bruno B.; Mandarino, Felipe C.; Díaz Suárez, M. Valeria; Ryan, Robert G.; Fakes, Luke H.; Pawson, Steven

doi:https://doi.org/10.5194/acp-21-3555-2021

Articles | Volume 21, issue 5

https://doi.org/10.5194/acp-21-3555-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/acp-21-3555-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 21, issue 5

Research article

|

09 Mar 2021

Research article |

| 09 Mar 2021

Global impact of COVID-19 restrictions on the surface concentrations of nitrogen dioxide and ozone

Christoph A. Keller, Mathew J. Evans, K. Emma Knowland, Christa A. Hasenkopf, Sruti Modekurty, Robert A. Lucchesi, Tomohiro Oda, Bruno B. Franca, Felipe C. Mandarino, M. Valeria Díaz Suárez, Robert G. Ryan, Luke H. Fakes, and Steven Pawson

Download

Final revised paper (published on 09 Mar 2021)
Preprint (discussion started on 18 Sep 2020)

Interactive discussion

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

RC1: 'review', Anonymous Referee #1, 14 Oct 2020
- AC1: 'Reply on RC1', Christoph A. Keller, 23 Dec 2020
RC2: 'Review #2', Anonymous Referee #2, 20 Oct 2020
- AC2: 'Reply on RC2', Christoph A. Keller, 23 Dec 2020
SC1: 'Comment Keller et al. (2020)', Kirsten de Nooijer, 02 Nov 2020
- AC3: 'Reply on SC1', Christoph A. Keller, 23 Dec 2020

Peer-review completion

AR: Author's response | RR: Referee report | ED: Editor decision

AR by Christoph A. Keller on behalf of the Authors (23 Dec 2020) Author's response Manuscript

ED: Referee Nomination & Report Request started (07 Jan 2021) by Andreas Richter

RR by Anonymous Referee #1 (07 Jan 2021)

Suggestions for revision or reasons for rejection

The revised version has been substantially improved but I must say I am still a bit confused about the methodology implemented for training and evaluating the machine learning models.
First, the authors are not performing any tuning of their models while this may substantially improve the performance of the predictions. Rather, they are using the default hyper-parameters. Why so? This does not follow the good practices of the field. Is this choice made for computational reasons?
Secondly, cross-validation can be used for two different purposes : (1) for tuning the ML model, and/or (2) for estimating the performance of the final model. Given that no tuning is performed, I understand the authors are thus using cross-validation here only to estimate the performance of their models. Then, regarding their strategy, at each station, the authors are training 8 different models (model M[8] trained on X[1,2,3,4,5,6,7] and tested on X[8], model M[2] trained on X[1,3,4,5,6,7,9] and tested on X[2], etc.), which should give them 8 values of RMSE (computed on X[8], X[2], etc., respectively) or any other statistical metric they are interested in. A simple and relatively robust approach to estimate the (test) performance of their predictions would consist in computing the corresponding average RMSE (ideally providing also the standard deviation). Which average RMSEs are obtained following this simple approach? Eventually, another approach could be to first gather all the test subsets on which predictions are made (X[8], X[2], etc.) and compute the overall RMSE. Any of these approaches would provide an estimate of the performance of their predictions. Then, in a second step, in order to get the best possible final ML model, a last ML model (to be used to make predictions in 2020) could be trained using the entire 2018-2019 dataset in order to take benefit from the largest possible dataset during the training phase. The performance previously estimated could be used as a conservative estimate of the performance of this final model (“conservative” because this final model may perform slightly better than the 8 models previously evaluated given that it has been trained on a slightly larger dataset).
Rather, for a reason I don’t really understand, the authors are finally considering a new model that is the average of the 8 models initially trained (“Once trained, the final model prediction at each location consists of the average prediction of the eight models.”), which sounds strange to me. Then, in order to estimate the performance of this final model, the authors are “[omitting] the center week of each training segment from the 8-fold cross validation and use it for testing only”. Why one week? All this part of the methodology seems a bit “baroque” to me, both for evaluating a ML model and for taking into account the auto-correlation. Regarding the auto-correlation, considering a 8-fold cross-validation is already an improvement compared to the random splitting proposed in the first version of the manuscript. I do not really understand why the authors then need to left apart only one week for testing.
These different aspects of the methodology should be clarified and eventually corrected. The choices made need to be comprehensively described and justified, ideally following the good practices in the field of machine learning.

About the estimation of the uncertainties (Section 2.3.4), the authors are computing the uncertainties as the standard deviation of the model-observation residuals. One potential issue I see here is that they are assuming implicitly that individual ML models do not have any bias, which is roughly true when averaging all models at all stations, but not at individual stations where NMB ranges between -20 and +10% roughly (Fig. 3). As an illustration, if we consider an hypothetical ML model that would represent perfectly the observations but with a 1 ppbv (systematic) bias. In this case, the residuals all worth 1 ppbv, and the corresponding standard deviation is thus zero. So this model would be considered as perfect while it is not.
Then, another aspect is how to translate uncertainties estimated for hourly predictions at a given individual station to uncertainties over a longer period (7 days for instance) and entire country. While it is likely reasonable to consider that predictions on longer time scales are reduced due to error compensations, it might not be always and fully the case on the spatial dimension on which model-observation errors might be at least partly correlated to each other. Consider for instance a set of 2 stations located close to each other. The concentrations observed at these stations might be quite well correlated to each other given the short distance separating them, as well as the ML predictions given the fact that the input variables used are taken from a geophysical model at 25x25 km resolution. Therefore, the way I understand it, the model-observation residuals at these 2 stations might not fully compensate each other, while the authors implicitly assume so. As a consequence, the uncertainties affecting the combination of these two stations would be reduced by a factor of 1.4 (=2^0.5), which might be overly optimistic, as might also be the uncertainties close to zero mechanically obtained in countries with numerous stations, as shown in Fig. 5. I think this should be further discussed, and the assumptions used to estimate the uncertainties should be clarified.

Hide

ED: Publish subject to minor revisions (review by editor) (15 Jan 2021) by Andreas Richter

AR by Christoph A. Keller on behalf of the Authors (21 Jan 2021) Author's response Author's tracked changes Manuscript

ED: Publish as is (21 Jan 2021) by Andreas Richter

AR by Christoph A. Keller on behalf of the Authors (26 Jan 2021)

Short summary

This study combines surface observations and model simulations to quantify the impact of COVID-19 restrictions on air quality across the world. The presented methodology removes the confounding impacts of meteorology on air pollution. Our results indicate that surface concentrations of nitrogen dioxide, an important air pollutant emitted during the combustion of fossil fuels, declined by up to 60 % following the implementation of COVID-19 containment measures.