|The paper presents a machine learning approach to assess the impact of several meteorological features on air quality in Paris metropolitan area. A tree-based machine learning algorithm is used for modelling and a Shapley Additive Explanation is applied to interpret the resulting models. This is a quite interesting study that requires, however, major revisions before a possible publication.|
In the next version, the authors must address all the points as follows:
1) The abstract should be improved. On one hand, the it is too long. On the other hand, important information is missing, such as the accuracy/performance of the models.
2) Even if the approach is interesting, it is a quite local study (Paris area). I would like that the authors provide a further discussion about the general impact of their work. In other words, you should discuss to which extend the study has implications in other urban areas worldwide, inclusively cities with more complex terrains than Paris.
3) Lines 163-164. The sentence “Note that PM1 data is not normally distributed, i.e. there is more data available for mid-range concentrations” is awkward. Is it not a characteristic of a normal distribution to have more data in mid-range? Please, clarify this sentence.
4) Section 4.1. The ten models you are talking about are not clear. More details must be provided regarding what is modelled by each model and the acronyms BCwb, BCff, etc… have to be defined.
5) Section 4.2. Why did you focus on temperature, MLH and wind direction, only? Considered that NO3 fraction and Wind Speed are also strong drivers, why did you skip a deep interpretation of the effect of these variables, as well?
6) Line 247. Change “Fig. 6” to “Fig. 5-7”.
7) Lines 282-285. You noticed that north/north-eastern winds increase air pollution and you conclude that this pollution should come from Paris, which is located north-eastern from SIRTA. Did you confirm this assumption by analysing wind data from the Airport Charles de Gaulle? If the hypothesis is true, bad air should come from south/south-western in this case. Right?
8) Section 4.2.4. It is not clear which species you are interested in for the interaction analysis. Is it PM1, only? Please, be more specific.
9) Figure 8. How do you explain the red cluster on the top-right corner of the right panel? In other words, how do you explain that high wind speed and high MLH tend to increase the Shap values?
10) Section 4.4, 1st paragraph. This paragraph should be reorganized. You give several details about Figs 11-16, which are irrelevant here (lines 361-363). On the other hand, this information misses in the caption of these respective figures.
11) Line 393-395. You explain the high pollution in terms of weak “north-north-easterly winds, i.e. a regime of low ventilation”. However, it can also be a weak wind that brings pollution from Paris. Please, comment on this point.
12) Figures 11-14. The quality of these figures must be improved. First, the legend is too small. Second, indexes a)-h) are missing in Fig. 11. Third, it is not straight-forward to understand the matching between the bar/scatter plots and the right/left side of the Y-axes. Finally, you do not describe in the caption how the predicted vs the observed PM1 are represented. So, the caption needs to be improved, based on my comment 10, as well.
13) Conclusion. We understand that your models do a better job in Winter and Summer than in Spring. So, what about Fall? Why do you not present data for this period? Is it also more difficult to do a good prediction at this season? If, yes, can we conclude that the approach is less suitable for the midseason, maybe because the meteorological conditions are less “extreme” (e.g., average temperature)?
14) Lines 474-475. Which evidences support this quite strong statement. More arguments are expected, especially to address my comment 2).
15) Conclusion, last paragraph. This paragraph is very redundant. We understood at the first sentence that a meteorological prediction is important if we want to use your approach. However, it seems that you repeat the same idea again and again. The proof is the fact that the word “expected” appears three times in the next sentences. This last paragraph must be improved by reorganizing its structure.