Meteorology-driven variability of air pollution (PM<sub>1</sub>)  revealed with explainable machine learning

Stirnberg, Roland; Cermak, Jan; Kotthaus, Simone; Haeffelin, Martial; Andersen, Hendrik; Fuchs, Julia; Kim, Miae; Petit, Jean-Eudes; Favez, Olivier

doi:https://doi.org/10.5194/acp-21-3919-2021

Articles | Volume 21, issue 5

https://doi.org/10.5194/acp-21-3919-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/acp-21-3919-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 21, issue 5

Research article

|

17 Mar 2021

Research article |

| 17 Mar 2021

Meteorology-driven variability of air pollution (PM₁) revealed with explainable machine learning

Roland Stirnberg, Jan Cermak, Simone Kotthaus, Martial Haeffelin, Hendrik Andersen, Julia Fuchs, Miae Kim, Jean-Eudes Petit, and Olivier Favez

Download

Final revised paper (published on 17 Mar 2021)
Preprint (discussion started on 27 Jul 2020)

Interactive discussion

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

EC1: 'Comments from the previous-round of review that yet to be addressed', Leiming Zhang, 04 Aug 2020
- AC1: 'Author response to Intital Comment', Roland Stirnberg, 30 Nov 2020
RC1: 'Review', Anonymous Referee #1, 24 Aug 2020
- AC2: 'Author response to Referee#1', Roland Stirnberg, 30 Nov 2020
RC2: 'review of Stirnberg et al', Anonymous Referee #3, 16 Sep 2020
- AC3: 'Author response to Referee#3', Roland Stirnberg, 30 Nov 2020

Peer-review completion

AR: Author's response | RR: Referee report | ED: Editor decision

AR by Roland Stirnberg on behalf of the Authors (07 Dec 2020) Author's response Manuscript

ED: Referee Nomination & Report Request started (08 Dec 2020) by Leiming Zhang

RR by Anonymous Referee #4 (17 Dec 2020)

Suggestions for revision or reasons for rejection

The paper presents a machine learning approach to assess the impact of several meteorological features on air quality in Paris metropolitan area. A tree-based machine learning algorithm is used for modelling and a Shapley Additive Explanation is applied to interpret the resulting models. This is a quite interesting study that requires, however, major revisions before a possible publication.
In the next version, the authors must address all the points as follows:
1) The abstract should be improved. On one hand, the it is too long. On the other hand, important information is missing, such as the accuracy/performance of the models.
2) Even if the approach is interesting, it is a quite local study (Paris area). I would like that the authors provide a further discussion about the general impact of their work. In other words, you should discuss to which extend the study has implications in other urban areas worldwide, inclusively cities with more complex terrains than Paris.
3) Lines 163-164. The sentence “Note that PM1 data is not normally distributed, i.e. there is more data available for mid-range concentrations” is awkward. Is it not a characteristic of a normal distribution to have more data in mid-range? Please, clarify this sentence.
4) Section 4.1. The ten models you are talking about are not clear. More details must be provided regarding what is modelled by each model and the acronyms BCwb, BCff, etc… have to be defined.
5) Section 4.2. Why did you focus on temperature, MLH and wind direction, only? Considered that NO3 fraction and Wind Speed are also strong drivers, why did you skip a deep interpretation of the effect of these variables, as well?
6) Line 247. Change “Fig. 6” to “Fig. 5-7”.
7) Lines 282-285. You noticed that north/north-eastern winds increase air pollution and you conclude that this pollution should come from Paris, which is located north-eastern from SIRTA. Did you confirm this assumption by analysing wind data from the Airport Charles de Gaulle? If the hypothesis is true, bad air should come from south/south-western in this case. Right?
8) Section 4.2.4. It is not clear which species you are interested in for the interaction analysis. Is it PM1, only? Please, be more specific.
9) Figure 8. How do you explain the red cluster on the top-right corner of the right panel? In other words, how do you explain that high wind speed and high MLH tend to increase the Shap values?
10) Section 4.4, 1st paragraph. This paragraph should be reorganized. You give several details about Figs 11-16, which are irrelevant here (lines 361-363). On the other hand, this information misses in the caption of these respective figures.
11) Line 393-395. You explain the high pollution in terms of weak “north-north-easterly winds, i.e. a regime of low ventilation”. However, it can also be a weak wind that brings pollution from Paris. Please, comment on this point.
12) Figures 11-14. The quality of these figures must be improved. First, the legend is too small. Second, indexes a)-h) are missing in Fig. 11. Third, it is not straight-forward to understand the matching between the bar/scatter plots and the right/left side of the Y-axes. Finally, you do not describe in the caption how the predicted vs the observed PM1 are represented. So, the caption needs to be improved, based on my comment 10, as well.
13) Conclusion. We understand that your models do a better job in Winter and Summer than in Spring. So, what about Fall? Why do you not present data for this period? Is it also more difficult to do a good prediction at this season? If, yes, can we conclude that the approach is less suitable for the midseason, maybe because the meteorological conditions are less “extreme” (e.g., average temperature)?
14) Lines 474-475. Which evidences support this quite strong statement. More arguments are expected, especially to address my comment 2).
15) Conclusion, last paragraph. This paragraph is very redundant. We understood at the first sentence that a meteorological prediction is important if we want to use your approach. However, it seems that you repeat the same idea again and again. The proof is the fact that the word “expected” appears three times in the next sentences. This last paragraph must be improved by reorganizing its structure.

Hide

ED: Reconsider after major revisions (17 Dec 2020) by Leiming Zhang

AR by Roland Stirnberg on behalf of the Authors (04 Feb 2021) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (05 Feb 2021) by Leiming Zhang

RR by Yves Rybarczyk (08 Feb 2021)

ED: Publish as is (09 Feb 2021) by Leiming Zhang

AR by Roland Stirnberg on behalf of the Authors (14 Feb 2021) Author's response Manuscript

Short summary

Air pollution endangers human health and poses a problem particularly in densely populated areas. Here, an explainable machine learning approach is used to analyse periods of high particle concentrations for a suburban site southwest of Paris to better understand its atmospheric drivers. Air pollution is particularly excaberated by low temperatures and low mixed layer heights, but processes vary substantially between and within seasons.

Meteorology-driven variability of air pollution (PM1) revealed with explainable machine learning

Download

Interactive discussion

Peer-review completion

Meteorology-driven variability of air pollution (PM₁) revealed with explainable machine learning