18 Nov 2022
18 Nov 2022
Status: this preprint is currently under review for the journal ACP.

Technical note: Improving the European air quality forecast of Copernicus Atmosphere Monitoring Service using machine learning techniques

Jean-Maxime Bertrand, Frédérik Meleux, Anthony Ung, Gaël Descombes, and Augustin Colette Jean-Maxime Bertrand et al.
  • Institut National de l'Environnement Industriel et des Risques (INERIS), Parc Alata, BP2, 60550 Verneuil-en-Halatte, France

Abstract. Model Output Statistics (MOS) approaches relying on machine learning algorithms were applied to downscale regional air quality forecasts produced by CAMS (Copernicus Atmosphere Monitoring Service) at hundreds of monitoring sites across Europe. Besides the CAMS forecast, the predictors in the MOS typically include meteorological variables but also ancillary data. We explored first a “local” approach where specific models are trained at each site. An alternative “global” approach where a single model is trained with data from the whole geographical domain was also investigated. In both cases, local predictors are used for a given station in predictive mode. Because of its global nature, the latter approach can capture a variety of meteorological situation within a very short training period and is thereby more suited to cope with operational constraints in relation with the training of the MOS (frequent upgrades of the modelling system, addition of new monitoring sites). Both approaches have been implemented using a variety of machine learning algorithms: random forest, gradient boosting, standard and regularized multi-linear models. The quality of the MOS predictions is evaluated in this work for four key pollutants, namely particulate matter PM10 and PM2.5, ozone O3 and nitrogen dioxide NO2, according to scores based on the predictive errors and on the detection of pollution peaks (exceedances of the regulatory thresholds). Both the local and the global approaches significantly improve the performances of the raw Ensemble forecast. The most important result of this study is that the global approach competes with and can even outperform the local approach in some cases. This global approach gives the best RMSE scores when relying on a random forest model, for the prediction of daily mean, daily max and hourly concentrations. By contrast, it is the gradient boosting model which is better suited for the detection of exceedances of the European Union regulated threshold values for O3 and PM10.

Jean-Maxime Bertrand et al.

Status: open (until 30 Dec 2022)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse

Jean-Maxime Bertrand et al.

Jean-Maxime Bertrand et al.


Total article views: 190 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
139 49 2 190 2 3
  • HTML: 139
  • PDF: 49
  • XML: 2
  • Total: 190
  • BibTeX: 2
  • EndNote: 3
Views and downloads (calculated since 18 Nov 2022)
Cumulative views and downloads (calculated since 18 Nov 2022)

Viewed (geographical distribution)

Total article views: 218 (including HTML, PDF, and XML) Thereof 218 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 09 Dec 2022
Short summary
Post-processing methods based on machine learning algorithms were applied to refine the concentration forecasts of 4 key pollutants at monitoring sites across Europe. Performances show significant improvements compared to that of the deterministic model raw outputs. Taking advantage of the large modelling domain extension, a an innovative “global” approach is proposed to drastically reduce the period necessary to train the models and thus facilitate the implementation in an operational context.