Articles | Volume 21, issue 5
Research article
17 Mar 2021
Research article |  | 17 Mar 2021

Meteorology-driven variability of air pollution (PM1) revealed with explainable machine learning

Roland Stirnberg, Jan Cermak, Simone Kotthaus, Martial Haeffelin, Hendrik Andersen, Julia Fuchs, Miae Kim, Jean-Eudes Petit, and Olivier Favez

Air pollution, in particular high concentrations of particulate matter smaller than 1 µm in diameter (PM1), continues to be a major health problem, and meteorology is known to substantially influence atmospheric PM concentrations. However, the scientific understanding of the ways in which complex interactions of meteorological factors lead to high-pollution episodes is inconclusive. In this study, a novel, data-driven approach based on empirical relationships is used to characterize and better understand the meteorology-driven component of PM1 variability. A tree-based machine learning model is set up to reproduce concentrations of speciated PM1 at a suburban site southwest of Paris, France, using meteorological variables as input features. The model is able to capture the majority of occurring variance of mean afternoon total PM1 concentrations (coefficient of determination (R2) of 0.58), with model performance depending on the individual PM1 species predicted. Based on the models, an isolation and quantification of individual, season-specific meteorological influences for process understanding at the measurement site is achieved using SHapley Additive exPlanation (SHAP) regression values. Model results suggest that winter pollution episodes are often driven by a combination of shallow mixed layer heights (MLHs), low temperatures, low wind speeds, or inflow from northeastern wind directions. Contributions of MLHs to the winter pollution episodes are quantified to be on average ∼5µg/m3 for MLHs below <500 m a.g.l. Temperatures below freezing initiate formation processes and increase local emissions related to residential heating, amounting to a contribution to predicted PM1 concentrations of as much as ∼9µg/m3. Northeasterly winds are found to contribute ∼5µg/m3 to predicted PM1 concentrations (combined effects of u- and v-wind components), by advecting particles from source regions, e.g. central Europe or the Paris region. Meteorological drivers of unusually high PM1 concentrations in summer are temperatures above ∼25C (contributions of up to ∼2.5µg/m3), dry spells of several days (maximum contributions of ∼1.5µg/m3), and wind speeds below ∼2 m/s (maximum contributions of ∼3µg/m3), which cause a lack of dispersion. High-resolution case studies are conducted showing a large variability of processes that can lead to high-pollution episodes. The identification of these meteorological conditions that increase air pollution could help policy makers to adapt policy measures, issue warnings to the public, or assess the effectiveness of air pollution measures.

1 Introduction

Air pollution has serious implications on human well-being, including deleterious effects on the cardiovascular system and the lungs (Hennig et al.2018; Lelieveld et al.2019) and an increased number of asthma seizures (Hughes et al.2018). This includes particles smaller than 1 µm in diameter (PM1), which are associated with fits of coughing (Yang et al.2018) and an increase in emergency hospital visits (Chen et al.2017). The adverse health effect lead to an increase in mortality of people exposed to high particle concentrations (Samoli et al.2008, 2013; Lelieveld et al.2015). People living in urban areas are particularly affected by poor air quality, and with increasing urbanization their number is projected to grow (Baklanov et al.2016; Li et al.2019). These developments have motivated several countermeasures to improve air quality. Proposed efforts to reduce anthropogenic particle emissions include partial traffic bans (Su et al.2015; Dey et al.2018) and the reduction of solid fuel use for domestic heating (Chafe et al.2014). Although emissions play an important role for PM concentrations in the atmosphere, meteorological conditions related to large-scale circulation patterns as well as local-scale boundary layer processes and interactions with the land surface are major drivers of PM variability as well (Cermak and Knutti2009; Bressi et al.2013; Megaritis et al.2014; Dupont et al.2016; Petäjä et al.2016; Yang et al.2016; Li et al.2017). Wind speed and direction generally have a strong influence on air quality as they determine the advection of pollutants (Petetin et al.2014; Petit et al.2015; Srivastava et al.2018). Limiting the vertical exchange of air masses, the mixed layer height (MLH) governs the volume of air in which particles are typically dispersed. Although some authors indicate that mixed layer height cannot be related directly to concentrations of pollutants and that other meteorological parameters and local sources need to be considered (Geiß et al.2017), a lower MLH can increase PM concentrations as particles are not mixed into higher atmospheric levels and accumulate near the ground (Gupta and Christopher2009; Schäfer et al.2012; Stirnberg et al.2020).

Higher MLHs in combination with high wind speeds increase atmospheric ventilation processes, thus decreasing near-surface particle concentrations (Sujatha et al.2016; Wang et al.2018). Air temperature can influence PM concentrations in multiple ways, e.g. by modifying the emission of secondary PM precursors such as volatile organic compounds (VOCs) during summer (Fowler et al.2009; Megaritis et al.2013; Churkina et al.2017), and by condensating high saturation vapour pressure compounds such as nitric acid and sulfuric acid (Hueglin et al.2005; Pay et al.2012; Bressi et al.2013; Megaritis et al.2014). The wet removal of particles by precipitation is known to be an efficient atmospheric aerosol sink (Radke et al.1980; Bressi et al.2013), while moisture in the atmosphere can stimulate secondary particle formation processes (Ervens et al.2011). Although all these atmospheric conditions and processes have been identified as drivers of local air quality, it is usually a complex combination of meteorological and chemical processes that lead to the formation of high-pollution events (Petit et al.2015; Dupont et al.2016; Stirnberg et al.2020).

The metropolitan area of Paris is one of the most densely populated and industrialized areas in Europe. Thus, air quality is a recurring issue and has been at the focus of many studies in recent years (Bressi et al.2014; Petetin et al.2014; Petit et al.2015, 2017; Dupont et al.2016; Srivastava et al.2018). Results indicate that the Paris metropolitan region is often affected by mid-range to long-range transport of pollutants, as due to the city’s flat orography, an efficient horizontal exchange of air masses is frequent (Bressi et al.2013; Petit et al.2015). High-pollution events commonly occur in late autumn, winter, and early spring. Often, these episodes are characterized by stagnant atmospheric conditions and a combination of local contributions, e.g. traffic emissions, residential emissions, or regionally transported particles, such as ammonium nitrates from manure spreading or sulfates from point sources (Petetin et al.2014; Petit et al.2014, 2015; Srivastava et al.2018). High-pressure conditions with air masses originating from continental Europe (Belgium, Netherlands, western Germany) are generally associated with an increase in particle concentrations, especially of secondary inorganic aerosols (SIAs, Bressi et al.2013; Srivastava et al.2018). The regional contribution has been found to be approximately 70 % for background concentrations in Paris of particles with a diameter smaller 2.5 µm (Petetin et al.2014). Hence the variability between high-pollution episodes in terms of timing, sources, and meteorological boundary conditions is considerable (Petit et al.2017). Previous approaches to determine meteorological drivers of air pollution included, for example, the use of chemical transport models (CTMs), which, however, require comprehensive knowledge on emission sources and secondary particle formation pathways and are associated with considerable uncertainties (Sciare et al.2010; Petetin et al.2014; Kiesewetter et al.2015). Further methods rely on data exploration, e.g. the statistical analysis of time series (Dupont et al.2016), which can be coupled with positive matrix factorization (PMF, Paatero and Tapper1994) to derive PM sources (Petit et al.2014; Srivastava et al.2018). To take into account the interconnected nature of PM drivers, multivariate statistical approaches such as principal component analysis (PCA) have been applied (Chen et al.2014; Leung et al.2017). In recent years, machine learning techniques have been increasingly used to expand the analysis of PM concentrations with respect to meteorology, allowing general patterns to be retraced (Hu et al.2017; Grange et al.2018).Here, the multivariate and highly interconnected nature of meteorology-dependent atmospheric processes influencing local PM1 concentrations at a suburban site southwest of Paris is analysed in a data-driven way. Therefore, a state-of-the-art explainable machine learning model is set up to reproduce the variability of PM1 concentrations, thereby capturing empirical relationships between PM1 concentrations and meteorological parameters. The goal is to separate and quantify influences of the meteorological variables on PM1 concentrations to advance the process understanding of the complex mechanisms that govern pollution concentrations at the measurement site. Localized (i.e. situation-based) and individualized attributions of feature contributions are performed using SHapley Additive exPlanation regression (SHAP) values (Lundberg and Lee2017; Lundberg et al.2019, 2020), allowing the meteorology-dependent processes driving PM concentrations at high temporal resolution to be inferred. Typical situations that lead to high PM1 concentrations are identified, serving as a decision support to policymakers to issue preventative warnings to the public if these situations are to be expected. In addition, by directly accounting for meteorological effects on PM1 concentrations, such a machine-learning-based framework could help in assessing the effectiveness of measures towards better air quality. Furthermore, the proposed ML framework can be viewed as a first step towards a data-driven, prognostic tool in operational air quality forecasting, complementary to CTM approaches.

2 Data sets

Seven years (2012–2018) of meteorological and air quality data from the Site Instrumental de Recherche par Télédétection Atmosphérique (SIRTA;  Haeffelin et al.2005) supersite are the basis of this study. The SIRTA Atmospheric Observatory is located about 25 km southwest of Paris (48.713 N and 2.208 E; Fig. 1). This study focuses on day-to-day variations of total and speciated PM1, a highly health-relevant fraction of PM including small particles that can penetrate deep into the lungs (Yang et al.2018; Chen et al.2017). To separate diurnal effects, e.g. the development of the boundary layer during morning hours (Petit et al.2014; Dupont et al.2016; Kotthaus and Grimmond2018a), from day-to-day variations of PM1, mean concentrations of total and speciated PM1 for the afternoon period 12:00–15:00 UTC are considered, when the boundary layer is fully developed. In Sect. 2.1 and 2.2, the PM1 and meteorological data and preprocessing steps before setting up the machine learning model are described. The applied machine learning model and data analysis techniques are presented in Sect. 3.1 and 3.2.

Figure 1Location of the SIRTA supersite southwest of Paris. © OpenStreetMap contributors 2020. Distributed under a Creative Commons BY-SA License.

2.1 Submicron particle measurements

Aerosol chemical speciation monitor (ACSM;  Ng et al.2011) measurements are conducted at SIRTA in the framework of the ACTRIS project. The ACSM provides continuous and near-real-time measurements of the major chemical composition of non-refractory submicron aerosols, i.e. organics (Org), ammonium (NH4+), sulfate (SO42-), nitrate (NO3-), and chloride (Cl). A detailed description of its functionality can be found in Ng et al. (2011). The data processing and validation protocol can be found in Petit et al. (2015) and Zhang et al. (2019). In addition, black carbon (BC) has been monitored by a seven-wavelength Magee Scientific Aethalometer AE31 from 2011 to mid-2013, and a dual-spot AE33 (Drinovec et al.2015) from mid-2013 onwards. The consistency of both instruments has been checked in Petit et al. (2014). Using the multispectral information, a differentiation into fossil-fuel-based BC (BCff) and BC from wood burning (BCwb) is achieved (Sciare et al.2010; Healy et al.2012; Petit et al.2014; Zhang et al.2019). Here, the sum of all measured species is assumed to represent the total PM1 content (see  Petit et al.2014, 2015). The consistency of ACSM and Aethalometer measurements is checked by comparing the sum of all monitored species with measurements of a nearby Tapered Element Oscillating Microbalance equipped with a Filter Dynamic Measurement System (TEOM-FDMS). PM1 measurements are representative of suburban background pollution levels of the region of Paris (Petit et al.2015). As an additional input to the machine learning model, the average fraction of NO3- of the previous day is added (NO3_frac). Pollution events dominated by NO3- are often linked to regional-scale events, which depend on anthropogenically influenced processes in the source regions of NO3- precursors (Petit et al.2017). This is approximated by the inclusion of the average fraction of NO3- of the previous day, assuming that a high fraction of NO3- indicates the occurrence of such an anthropogenically influenced regime.

2.2 Meteorological data

Following the objective of this study, a set of meteorological variables is chosen as inputs for the ML model that either influence PM concentrations directly via dilution (MLH, wind speed (ws), and wet scavenging of particles (precipitation)) and particle transport (wind direction as u, v components, air pressure, AirPres), as a proxy for emissions (e.g. from residential heating: temperature at a height of 2 m (T)), and as a proxy for transformation processes (total incoming solar radiation (TISR), relative humidity (RH), T). Data are taken from the quality-controlled and 1 h averaged re-analysed observation (ReObs) data set. Further information on the instrumentation used for the acquisition of these variables is provided in Chiriaco et al. (2018). MLH is derived from automatic lidar and ceilometer (ALC) measurements of a Vaisala CL31 ceilometer using the CABAM algorithm (Characterising the Atmospheric Boundary layer based on ALC Measurements;  Kotthaus and Grimmond2018a, b). Due to an instrument failure, during the period July to mid-November 2016, SIRTA ALC measurements had to be replaced with measurements conducted at the Paris Charles de Gaulle Airport, located northeast of Paris. A comparison of measured MLHs at SIRTA and Charles de Gaulle Airport for the available measurements in 2016 (Appendix A) shows generally good agreement, which is why only minor uncertainties are expected due to the replacement.

Meteorological factors are chosen as input features for the statistical model based on findings of previous studies (see Sect. 1). Meteorological observations are converted to suitable input information for the statistical model (see Sect. 3.1). Wind speed (ws) is derived from the ReObs u and v components [m/s], and the maximum wind speed of the afternoon period (12:00–15:00 UTC) is included in the model. U and v wind components are then normalized to values between 0 and 1, thus only depicting the direction information. To reduce the impact of short-term fluctuation in wind direction, the 3 d running mean is calculated based on the normalized u and v wind components (umean and vmean). Hours since the last precipitation event (Tprec) are counted and used as input to capture the particle accumulation effect between precipitation events (Rost et al.2009; Petit et al.2017).

3 Methods

3.1 Machine learning model: technique and application

Gradient boosted regression trees (GBRTs, used here in a Python 3.6.4 environment with the scikit-learn module;  Friedman2002; Pedregosa et al.2012) are applied to predict daily total and speciated PM1 concentrations. As a tree-based method, GBRTs use a tree regressor, which sets up decision trees based on a training data set. The trees split the training data along decision nodes, creating homogeneous subsamples of the data by minimizing the variance of each subsample. For each subsample, regression trees fit the mean response of the model to the observations (Elith et al.2008). To increase confidence in the model outputs, decision trees are combined to form an ensemble prediction. Trees are sequentially added to the ensemble (Elith et al.2008; Rybarczyk and Zalakeviciute2018), and each new tree is fitted to the predecessor’s previous residual error using gradient descent (Friedman2002). This is an advantage of GBRT over standard ensemble tree methods (e.g. random forests (RF);  Just et al.2018) as trees are built systematically and fewer iterations are required (Elith et al.2008). Characteristics of the meteorological training data set with respect to observed total and speciated PM1 concentrations are conveyed to the statistical model. The learned relationships are then used for model interpretation and to produce estimates of PM1 based on unseen meteorological data to test the model. The architecture of the statistical model is determined by the hyperparameters, e.g. the number of trees, the maximum depth of each tree (i.e. the number of split nodes on each tree), and the learning rate (i.e. the magnitude of the contribution of each tree to the model outcome, which is basically the step size of the gradient descent). The hyperparameters are tuned by executing a grid search, systematically testing previously defined hyperparameter combinations and determining the best combination via a three-fold cross-validation. Note that PM1 data are not uniformly distributed; i.e. there are more data available for mid-range PM1 concentrations. To avoid the model primarily optimizing its predictions on these values, a least-squares loss function was chosen. This loss function is more sensitive to higher PM1 values (i.e. outliers of the PM1 data distribution), as it strongly penalizes high absolute differences between predictions and observations. Accordingly, the model is adjusted to reproduce higher concentrations as well. For each PM species, a specific GBRT model is set up and used for the analysis of meteorological influences on individual PM1 species (see Sect. 4.2). Additionally, a quasi-total PM1 model is used to reproduce the sum of all species at once, which is used for an analysis of meteorological drivers of high-pollution events (see Sect. 4.3 and 4.4). Train and test data sets to evaluate each model are created by randomly splitting the full data set. These splits, however, are the same for the species models and the full PM1 model to ensure comparability between the models. Three-quarters of the data are used for training and hyperparameter tuning with cross-validation (n=1086), and one-quarter for testing (n=363). In addition, the robustness of the model results is tested by repeating this process 10 times, resulting in 10 models with different training–test splits and different hyperparameters.

3.2 Explaining model decisions to infer processes: SHapley Additive exPlanation (SHAP) values

While being powerful predictive models, tree-based machine learning methods also have a high interpretability (Lundberg et al.2020). In order to understand physical mechanisms on the basis of model decisions, the contributions of the meteorological input features to the model outcome are analysed. Feature contributions are attributed using SHAP values, which allow for an individualized, unique feature attribution for every prediction (Shapley1953; Lundberg and Lee2017; Lundberg et al.2019, 2020). SHAP values provide a deeper understanding of model decisions than the relatively widely used partial dependence plots (Friedman2001; Goldstein et al.2015; Fuchs et al.2018; Lundberg et al.2019; McGovern et al.2019; Stirnberg et al.2020). Partial dependence plots show the global mean effect of an input feature to the model outcome, while SHAP values quantify the feature contribution to each single model output, accounting for multicollinearity. Feature contributions are calculated from the difference in model outputs with that feature present, versus outputs for a retrained model without the feature. Since the effect of withholding a feature depends on other features in the model due to interactive effects between the features, differences are computed for all possible feature subset combinations of each data instance (Lundberg and Lee2017).

Summing up SHAP values for each input feature at a single time step yields the final model prediction. SHAP values can be negative since SHAP values are added to the base value, which is the mean prediction when taking into account all possible input feature combinations. Negative (positive) SHAP values reduce (raise) the prediction below (above) the base value. The higher the absolute SHAP value of a feature, the more distinct is the influence of that feature on the model predictions. The sum of all SHAP values at one time step yields the final prediction of PM1 concentrations. An example of breaking down a model prediction into feature contributions using SHAP values is shown schematically in Fig. 2. The computation of traditional Shapley regression values is time consuming, since a large number of all possible feature combinations have to be included. The SHAP framework for tree-based models allows a faster computation compared to full Shapley regression values while maintaining a high accuracy (Lundberg and Lee2017; Lundberg et al.2019) and is therefore used here. The SHAP Python implementation is used for the computation of SHAP values (, last access: 15 March 2021).

The interactions of input features contribute to the model output and thus reflect empirical patterns that are important to deepen the process understanding. Interactive effects are defined as the difference between the SHAP values for one feature when a second feature is present and the SHAP values for the one feature when the other feature is absent (Lundberg et al.2019).

Figure 2Conceptual figure illustrating the interaction of SHAP values and model output. Starting with a base value, which is the mean prediction if all data points are considered, positive SHAP values (blue) increase the final prediction of total and speciated PM1 concentrations, while negative SHAP values (red) decrease the prediction. The sum of all SHAP values for each input feature yields the final prediction. Depending on whether positive or negative SHAP values dominate, the prediction is higher or lower than the base value (Lundberg et al.2018). Adapted from (last access: 15 March 2021).


4 Results and discussion

4.1 Model performance

The performance of the species and total PM1 models, each with 10 model iterations (of which each has different hyperparameters) is assessed by comparing the coefficient of determination (R2) and normalized root mean square error (NRSME) for the independent test data that were withheld during the training process (Fig. 3). While the models for BCwb, BCff, and total PM1 show small spread, Cl and NO3- exhibit larger variations between model runs (indicated by horizontal and vertical lines in Fig. 3). This suggests that while drivers of variations in BCff concentration are well covered by the model, this is less so in the case of Cl and NO3-. Possible reasons for this are that no explicit information on anthropogenic emissions or chemical formation pathways are included in the models. Still, the model performance indicators highlight that a large fraction of the variations in particle concentrations are explained by the meteorological variables used as model inputs. Performances of model iterations of the species-specific and total PM1 are generally similar, suggesting a robust model outcome.

Figure 3Performance indicators for 10 model iterations: coefficient of determination R2 against normalized root mean squared error (NRMSE) for the separate species models (Org: organics, NH4+: ammonium, SO42-: sulfate, NO3-: nitrate, Cl: chloride, BCff: black carbon from fossil fuel combustion, and BCwb: black carbon from wood burning), and the total PM1 model. Vertical and horizontal lines indicate the maximum spread in R2 and NRMSE, respectively, between the 10 model iterations.


The mean input feature importance, ordered from high to low, of the total PM1 model run by means of the SHAP feature attribution values is shown in Fig. 4. The NO3- fraction of the previous day has the highest impact on the model, followed by temperature, wind direction information, and MLH. To some extent, NO3- fraction can be related to PM1 mass concentrations (Petit et al.2015; Beekmann et al.2015). This means that the higher the PM1 levels one day, the greater the chances of having higher PM1 levels the next day (see Fig. B1). Lower wind speeds generally lead to higher particle concentrations (see Fig. B2) due to a lack of dispersion (Sujatha et al.2016). Temperature, MLH, and wind direction require an in-depth analysis, as changes of these variables cause nonlinear responses in PM1 predictions, which also vary between species.

Figure 4Ranked median SHAP values of the model input features, i.e. the average absolute value that a feature adds to the final model outcome, referring to the total PM1 model [µg/m3] (Lundberg et al.2018). Horizontal lines indicate the variability between model runs.


4.2 Influence of meteorological input features on modelled particle species and total PM1 concentrations

To gain insights into relevant processes governing particle concentrations at SIRTA, the contribution of input features on species and total PM1 concentration outcomes from the statistical model, i.e. the SHAP values, are plotted as a function of absolute feature values (Figs. 57). The contribution of an input feature to each (local) prediction of the species or total PM1 concentrations is shown while taking into account intra-model variability. Intra-model variability of SHAP values, i.e. different SHAP value attributions for the same feature value within one model, is shown by the vertical distribution of dots for absolute input feature values. Intra-model variability is caused by interactions of the different model input features.

4.2.1 Influence of temperature

The impact of ambient air temperature on modelled species concentrations is highly non-linear (Fig. 5). All species show increased contributions to model outcomes at temperatures below ∼4C while the contribution of high temperatures on model outcomes differs substantially between species. The statistical model is able to reproduce well-known characteristics of species concentration variations related to temperature. For example, sulfate formation is enhanced with increasing temperatures (Fig. 5d) due to an increased oxidation rate of SO2 (see Dawson et al.2007; Li et al.2017) and strong solar irradiation due to photochemical oxidation (Gen et al.2019). Dawson et al. (2007) reported an increase of 34 ng/m3K for PM2.5 concentrations using a CTM. The increase in sulfate at low ambient temperatures as suggested by Fig. 5d is not reported in this study. It is likely linked to increased aqueous-phase particle formation in cold and foggy situations (Rengarajan et al.2011; Petetin et al.2014; Cheng et al.2016). Considerable local formation of nitrate at low temperatures (Fig. 5b) is consistent with results from previous studies in western Europe, and enhanced formation of ammonium nitrate at lower temperatures (Fig. 5c) by the shifting gas-particle equilibrium is a well-known pattern (e.g. Clegg et al.1998; Pay et al.2012; Bressi et al.2013; Petetin et al.2014; Petit et al.2015). The increase in organic matter and BCwb concentrations at low temperatures (Fig. 5g) is likely related to the emission intensity, as biomass burning is often used for domestic heating in the study area (Favez et al.2009; Sciare et al.2010; Healy et al.2012; Jiang et al.2019). In addition, organic matter concentrations are linked to the condensation of semi-volatile organic species at low temperatures (Putaud et al.2004; Bressi et al.2013). The sharp increase in modelled concentrations of organics above 25 C (Fig. 5a) could be due to enhanced biogenic activity leading to a rise in biogenic emissions and secondary aerosol formation (Guenther et al.1993; Churkina et al.2017; Jiang et al.2019).

The contribution of temperature on modelled total PM1 concentrations (Fig. 6h) is consistent with the response patterns to changes in temperatures described for the individual species in Fig. 6a–g, with positive contributions at both low (<4C) and high air temperatures (>25C). For temperatures below freezing, the model allocates maximum contributions to modelled total PM1 concentrations of up to 12 µg/m3. The spread of SHAP values between model iterations is generally higher for low temperatures (vertical grey bars in Figs. 5–7), where SHAP values are of greater magnitude, but in all cases the signal contained in the feature contributions far exceeds the spread between model runs.

Figure 5Air temperature SHAP values (contribution of temperature to the prediction of species and total PM1 concentrations [µg/m3] for each data instance) vs. absolute air temperature [C]. Inter-model variability of allocated SHAP values is shown as the variance of predicted values between the 10 model iterations and plotted as vertical grey bars. The dotted horizontal line indicates the transition from positive to negative SHAP values.


4.2.2 Influence of the mixed layer height (MLH)

Variations in MLH can have a substantial impact on near-surface particle concentrations, as the mixed layer is the atmospheric volume in which the particles are dispersed (see  Klingner and Sähn2008; Dupont et al.2016; Wagner and Schäfer2017). The effect of MLH variations on modelled particle concentrations is highly nonlinear and varies in magnitude for all species (Fig. 6). Similar to the patterns observed for temperature SHAP values, the inter-model variation of predictions is highest for low MLHs where predicted particle concentrations have the highest variation. For predicted total PM1 concentrations, the maximum positive contribution of the MLH is as high as 5.5 µg/m3 while negative contributions can amount to −2µg/m3. While the maximum influence of MLH is lower than the maximum influence determined for air temperature, the frequency of shallow MLH is far greater than that of the minimum temperatures that have the largest effect (Figs. 5d and 6d). Contributions of MLH to predicted particle concentrations are highest for very shallow mixed layers due to the accumulation of particles close to the ground (Dupont et al.2016; Wagner and Schäfer2017). In addition to causing particles to accumulate near the surface, low MLH can also provide effective pathways for local new particle formation. Secondary pollutants, such as ammonium nitrate, are increased at low MLHs when conditions favourable to their formation usually coincide with reduced vertical mixing (i.e. low temperatures, often in combination with high RH;  Pay et al.2012; Petetin et al.2014; Dupont et al.2016; Wang et al.2016). BC concentrations, on the other hand, are dominated by primary emissions, as is a substantial fraction of organic matter (Petit et al.2015). Hence, the accumulation of these particles during low buoyancy conditions can explain the strong influence of MLH on BCwb and BCff. A relatively distinct transition from positive contributions during shallow boundary layer conditions (∼0–800 m) towards negative contributions at high MLHs is evident for all species except SO42-. Modelled SO42- concentrations show a less distinct response to changes in MLH as they are largely driven by gaseous precursor sources and particle advection, both rather independent of MLH (Pay et al.2012; Petit et al.2014, 2015), so that the accumulation effect is less important. The increase of SO42- concentrations with higher MLHs (1500 m a.g.l.) could be linked to the effective transport of SO42- and its precursor SO2. In agreement with results from previous studies focusing on PM10 (Grange et al.2018; Stirnberg et al.2020) or PM2.5 (Liu et al.2018), SHAP values do not change much for MLH above ∼800–900 m; i.e. boundary layer height variations above this level do not influence submicron particle concentrations. Positive contributions of MLHs above ∼800–900 m on predicted PM1 concentrations, as visible in Fig. 6 for some species, have been previously reported by Grange et al. (2018), who relate this pattern to enhanced secondary aerosol formation in a very deep and dry boundary layer. The positive influence of high MLHs on species that are partly secondarily formed, e.g. SO42- and Org, could be explained following this argumentation. The increase in SHAP values observed for BCff at high MLHs could be also related to secondary aerosol formation processes, causing an “encapsulation” of BC within a thick coating of secondary aerosols (Zhang et al.2018).

Figure 6As Fig. 5 for MLH SHAP values (contribution of MLH to the prediction of species and total PM1 for each data instance) vs. absolute MLH values [m. a.g.l.].


4.2.3 Influence of wind direction

To analyse the contribution of wind direction to predicted particle concentrations, SHAP values of normalized 3 d mean u and v wind components were added up and transformed to units of degrees (Fig. 7). Generally, wind direction has a positive contribution to the model outcome when winds from the northern to northeastern sectors prevail, while negative contributions are evident for southwesterly directions. Given the location of the measurement site, this pattern undoubtedly reflects the advection of particles emitted from continental Europe and/or the Paris metropolitan area under high-pressure system conditions versus cleaner marine air masses during southwesterly flow (Bressi et al.2013; Petetin et al.2014; Petit et al.2015; Srivastava et al.2018). Increased concentrations of organic matter are predicted for northerly, northeasterly, and easterly winds. These patterns suggest a significant contribution of advected organic particles from a specific wind sector. This is in agreement with the findings of Petetin et al. (2014) who estimated that 69 % of the PM2.5 organic matter fraction is advected by northeasterly winds, which is related to advected particles from wood burning sources in the Paris region and SOA formation along the transport trajectories. While Petit et al. (2015) did not find a wind direction dependence of organic matter measured at SIRTA using wind regression, they reported the regional background of organic matter to be of importance. Comparing upwind rural stations to urban sites, Bressi et al. (2013) concluded organic matter is largely driven by mid-range to long-range transport. Influences on the SO42--model are highest for northeastern and eastern wind direction, which aligns with previous findings by Pay et al. (2012), Bressi et al. (2014), and Petit et al. (2017), who identified the Benelux region and western Germany as strong emitters of sulfur dioxide (SO2). SO2 can be transformed to particulate SO42- (Pay et al.2012) while being transported towards the measurement site. Nitrate concentrations are affected by long-range transport from continental Europe (Benelux, western Germany), which are advected towards SIRTA from northeastern directions (Petetin et al.2014; Petit et al.2014). It is to be expected that the influence of mid-range to long-range transport on the particle observations at SIRTA is rather substantial, with most high-pollution days affected by particle advection from continental Europe (Bressi et al.2013). Concerning BCff and BCwb, model results suggest a dependence on wind direction during northwestern to northeastern inflow. Although BC concentrations are expected to be largely determined by local emissions (Bressi et al.2013), e.g. from local residential areas, a substantial contribution of imported particles from wood burning and traffic emissions from the Paris region (Laborde et al.2013; Petetin et al.2014) and continental sources is likely (Petetin et al.2014).

Figure 7As Fig. 5 for wind direction SHAP values (contribution of 3 d mean wind direction to the prediction of species and total PM1 for each data instance) vs. absolute wind direction [].


4.2.4 Influence of feature interactions

Pairwise interaction effects, where the effect of a specific predictor on the total PM1 prediction is dependent on the state of a second predictor, are analysed in the model. Strong pairwise interactive effects are found between MLH vs. time since last precipitation and MLH vs. maximum wind speed and shown in Fig. 8a and b. SHAP interaction effects between MLH and time since last precipitation are most pronounced for MLHs below ∼500 m a.g.l. (Fig. 8a). Interaction values are negative for low MLHs paired with time since last precipitation close to zero hours. With increasing time since last precipitation, interaction effects become positive, thus increasing the contribution of Tprec and MLH to the model outcome. An explanation of this pattern concerning underlying processes could be that due to the lack of precipitation, a higher number of particles is available in the atmosphere for accumulation, hence increasing the accumulation effect of a shallow MLH. In case of recent precipitation, the accumulation effect of a shallow MLH is weakened. For higher MLHs, interactive effects with time since the last precipitation event are marginal. Interactive effects between MLH and wind speed are shown in Fig. 8b. Positive SHAP values for maximum wind speeds below ∼2 m/s reflect stable situations, favouring the accumulation of particles, whereas high wind speeds enhance the ventilation of particles (Sujatha et al.2016). This can also be deduced from Fig. 8b, which shows increased SHAP values for low wind speeds in combination with a low MLH. Low wind speeds combined with a high MLH (1000 m a.g.l.), on the other hand, result in decreased SHAP values. Similarly, low MLHs combined with higher wind speeds (2 m/s) also decrease predictions of total PM1 concentrations. High MLHs in combination with high wind speeds, however, reduce SHAP values. A physical explanation of this pattern could be the more effective transport of SO42- and its precursor SO2 as well as ammonium nitrate under high-MLH conditions and stronger winds (Pay et al.2012). Maximum wind speed and time since last precipitation (plot not shown here) interact in a similar way. The positive effect of low wind speeds on the model outcome is increasing with increasing time since last precipitation.

Figure 8MLH vs. (a) time since last precipitation and (b) maximum wind speed, coloured by the SHAP interaction values for the respective features.


4.3 Meteorological conditions of high-pollution events

To further identify conditions that favour high-pollution episodes, the data set is split into situations with exceptionally high total PM1 concentrations (>95th percentile) and situations with typical concentrations of total PM1 (interquartile range, IQR). This is done for the meteorological summer and winter seasons to contrast dominant drivers between these seasons. Mean SHAP values refer to the total PM1 model; corresponding input feature distributions and species fractions for the two subgroups are aggregated seasonally. This allows for a quantification of seasonal feature contributions to average or polluted situations.

Table 1Statistics for typical PM1 concentrations (mean, median, IQR) and high-pollution concentrations (>95th percentile).

Download Print Version | Download XLSX

Figures 9 and 10 show mean SHAP values for typical (left) and high-pollution (right) situations in the upper panel. The distribution of SHAP values are shown as box plots for each feature. Absolute feature value distributions are given in the bottom of the figure. In the lowest subpanel, the chemical composition of the total PM1 concentration for each subgroup is shown. The largest contributor to high-pollution situations in winter is air temperature (Fig. 9). SHAP values for temperature are substantially increased during high-pollution situations, when temperatures are systematically lower. Further contributing factors to high-pollution situations are the low MLHs, low wind speeds, a high average NO3- fraction of the previous day, and negative u (i.e. winds from the east) and v (i.e. winds from the north) wind components. In winter, the PM1 composition shows a relatively large fraction of nitrates, which is increased during high-pollution situations (Fig. 9, lower panel). High concentrations of nitrate in winter can be linked to advection or to enhanced formation due to the temperature-dependent low volatility of ammonium nitrate (Petetin et al.2014). The organic matter fraction is slightly decreased during high-pollution situations. MLH and maximum wind speed influences on high-pollution situations are linked to low-ventilation conditions which are very frequent in winter (Dupont et al.2016). Positive influences of wind direction for inflow from the northern and eastern sectors are dominant during high-pollution situations while inflow from the southern and western sectors prevails during average-pollution situations (see Fig. 7;  Bressi et al.2013; Petetin et al.2014; Srivastava et al.2018). Note that the time since the last precipitation is increased during high-pollution situations, but the effects on the model outcome is weak. This suggests that lack of precipitation is not a direct driver of modelled total PM1 concentrations but increases the contribution of other input features (see Fig. 8a) or is a meaningful factor in only some situations.

Figure 9Mean feature contributions (i.e. SHAP values) for situations with low total PM1 concentrations (left) and situations with high pollution (right), respectively, during winter (December, January, February). Respective ranges of SHAP values by species are shown as box plots, with median (bold line), 25–75th percentile range (boxes), and 10–90th percentile range (whiskers). Both training and test data are included. Absolute feature value distributions (given as normalized frequencies) as well as the chemical composition of the total PM1 concentration are shown in the subpanels. Colours of the box plots correspond to colours in the feature distribution subpanels. SHAP values of the input features u_norm_3d and u_norm as well as v_norm_3d and v_norm were merged to “u_norm, merged” and “v_norm, merged” to achieve better transparency.


Summer total PM1 composition (Fig. 10) is characterized by a larger fraction of organics compared to the winter season (Fig. 9). As a considerable fraction of organic matter is formed locally (Petetin et al.2014), the increased proportion of organics could be due to more frequent stagnant synoptic situations that may limit the advection of transported SIA particles. In addition, the positive SHAP values of solar irradiation and temperature highlight that the solar irradiation stimulates transformation processes and increases the number of biogenic SOA particles (Guenther et al.1993; Petetin et al.2014). As mean temperatures are highest in summer, positive temperature SHAP values are associated with increased organic matter concentrations (Fig. 5). The higher importance (i.e. higher SHAP values) of time since the last precipitation event during high-pollution situations points to an accumulation of particles in the atmosphere. Dry situations can also enhance the emission of dust over dry soils (Hoffmann and Funk2015). The negative influences of MLH during both typical and high-pollution situations reflects seasonality, as afternoon MLHs in summer are usually too high to have a substantial positive impact on total PM1 concentrations (see Fig. 6). MLH is thus not expected to be a driver of day-to-day variations of summer total PM1 concentrations. Note that the average MLH is higher during high-pollution situations, which likely points to increased formation of SO42- (see Fig. 6).

Figure 10As Fig. 9 for mean feature contributions (i.e. SHAP values) for situations with low total PM1 concentrations (left) and situations with high-pollution (right), respectively, during summer (July, June, August).


4.4 Day-to-day variability of selected pollution events

Analysing the combination of SHAP values of the various input features on a daily basis allows for direct attribution of the respective implications for modelled total PM1 concentrations (Lundberg et al.2020). Here, four particular pollution episodes are selected to analyse the model outcome with respect to physical processes (Figs. 1114). The examples highlight the advantages but also the limitations of the interpretation of the statistical model results. The high-pollution episodes took place in winter 2016 (10–30 January and 25 November–25 December), spring 2015 (11–31 March), and summer 2017 (8–28 June).

4.4.1 January 2016

Prior to the onset of the high-pollution episode in January 2016 (Fig. 11), the situation is characterized by MLHs at approximately 1000 m, temperatures above freezing (∼5–10C), frequent precipitation, and winds from the southwest. The organic matter fraction dominates the particle speciation. The episode itself is reproduced well by the model. According to the model results, the event is largely temperature-driven, i.e. SHAP values of temperature explain a large fraction of the total PM1 concentration variation (note the adjusted y axis of the temperature SHAP values). On 18 January, temperatures drop below freezing, coupled with a decrease in MLH. As a consequence, both modelled and observed PM1 concentrations start to rise. A further increase in total PM1 concentrations is driven by a sharp transition from stronger southwestern to weaker northeastern winds (strong negative u component, weak negative v component) on 19 January. The combined effects of these changes lead to a marked increase in total modelled PM1 concentrations, peaking at 37 µg/m3 on 20 January. On the following days, temperatures increase steadily; thus the contribution of temperature decreases. At the same time, although values of MLH remain almost constant, the contribution of MLH drops substantially from ∼5 to ∼2µg/m3. This is due to interactive effects between MLH and the features wind speed, time since last precipitation, and normalized v wind component. All of these features increase the contribution of MLH on 20 January but decrease its contribution on 21–23 January. The physical explanation behind this pattern would be that a lack of wet deposition and low wind speeds increase particle numbers in the atmosphere, while inflows from northeasterly directions increase particle numbers in the atmosphere. Given that there is now a large number of particles present, the accumulation effect of a low MLH is more efficient. The high-pollution episode ceases after a shift to southeastern winds and the increasing temperatures. The pollution episode is characterized by a relatively large fraction of NO3- and NH4+, which explains the strong feature contribution of temperature to the modelled total PM1 concentration, as the abundance of these species is temperature dependent (see Fig. 5) and points to a large contribution of locally formed inorganic particles. Still, the contribution of wind direction and speed also suggests that advected secondary particles and their build-up in the boundary layer are relevant factors during the development of the high-pollution episode (Petetin et al.2014; Petit et al.2014; Srivastava et al.2018).

Figure 11Winter pollution episode in January 2016. Panel (a) indicates the total PM1 prediction as a horizontal black line with vertical black lines denoting the range of predictions of all 10 models. The observed species concentrations are shown as stacked planes in the corresponding colours. The subsequent panels show absolute values (left y axis, solid lines) and SHAP values (right y axis, pink bars for positive and blue bars for negative values) for the most relevant meteorological input features: MLH (b), temperature (c), hours after rain (d), maximum wind speed (e), normalized u wind (f), and normalized v wind (g) component.


4.4.2 December 2016

A high-pollution episode with several peaks of total PM1 is observed in November and December 2016. The first peak on 26 November is followed by an abrupt minimum in total PM1 concentrations on 28 November, and a build-up of pollution in a shallow boundary layer towards the second peak on 2 December with total PM1 concentrations exceeding 40 µg/m3. In the following days, total PM1 concentrations continuously decrease, eventually reaching a second minimum on 11 December. A gradual increase in total PM1 concentrations follows, resulting in a third (double-)peak total PM1 concentration on 17 December. Total PM1 concentrations drop to lower levels afterwards. Throughout the 3.5-week-long episode, high pollution is largely driven by shallow MLH (500 m) and weak north-northeasterly winds, i.e. a regime of low ventilation associated with high pressure conditions favourable for emission accumulation and possibly some advection of polluted air from the Paris region. During the brief periods with lower total PM1 concentrations, these conditions are disrupted by a higher MLH (∼28 November) or a change in prevailing winds (∼11 December). In contrast to the pollution episode in January 2016, this December 2016 episode is not driven by temperature changes. Temperatures range between ∼5–12C and have a minor contribution to predicted total PM1 concentrations (see also Fig. 5), emphasizing the different processes causing air pollution in the Paris region. Note that the model is not able to fully reproduce the pollution peak on 2 December, which may be indicative of missing input features in the model. Judging from the PM1 species composition during this time (relatively high fraction of NO3- and BC), it seems likely that missing information on particle emissions may be the reason for the difference between modelled and observed total PM1 concentration.

Figure 12As Fig. 11 for a further winter pollution episode in December 2016.


4.4.3 June 2017

A period of above-average total PM1 concentrations occurred in June 2017. The episode is very well reproduced by the model, suggesting a strong dependence of the observed total PM1 concentration to meteorological drivers. Although absolute total PM1 concentrations are substantially lower than during the previously described winter pollution episodes, the event is still above average for summer pollution levels. Organic matter particles dominate the PM1 fraction throughout the episode, with a relatively high SO42- fraction. Conditions during this episode are characterized by strong solar irradiation (positive SHAP values) and high MLHs (mostly negative SHAP values), which show low day-to-day variability and reflect characteristic summer conditions. A lack of precipitation (no rain for a period of more than 2 weeks) and high temperatures also contribute to the total PM1 concentrations during this episode. While solar irradiation and time since last precipitation are associated with positive SHAP values throughout this period, air temperature only has a positive contribution when exceeding ∼25C. This aligns with patterns shown in Fig. 5, where increased concentrations of organic matter and SO42- are identified for high temperatures. Peak total PM1 concentrations of 17 µg/m3 are observed on 20 and 21 June. A change in the east–west wind component from western to eastern inflow directions in conjunction with an increase in temperatures to above 30 C are the drivers of the modelled peak in total PM1 concentrations. MLH is also increased with values ∼2000 m a.g.l., which are associated with slightly positive SHAP values. This observation fits with findings described in Sect. 4.2.2 and is likely linked to enhanced secondary particle formation (Megaritis et al.2014; Jiang et al.2019). As suggested by response patterns of species to changes in MLH shown in Fig. 7, this effect is linked to an increase in SO42- concentrations. The main fraction of the peak total PM1 values, however, is linked to an increase in organic matter concentrations due to the warm temperatures (see Fig. 5).

Figure 13As Fig. 11 for an exemplary summer pollution episode in June 2017.


4.4.4 March 2015

High particle concentrations are measured in early March 2015 with high day-to-day variability. This modelled course of the pollution episode is chosen to compare results to previous studies focusing on the evolution of this episode (Petit et al.2017; Srivastava et al.2018). The episode is characterized by high fractions of SIA particles, in particular SO42-, NH4+, and NO3- (Fig. 14a) and similar concentrations observed at multiple measurement sites in France (Petit et al.2017). Contributions of local sources are low, and much of the episode is characterized by winds blowing in from the northwest, advecting aged SIA particles (Petit et al.2017; Srivastava et al.2018) and organic particles of secondary origin (Srivastava et al.2019) towards SIRTA. A widespread scarcity of rain probably enhanced the large-scale formation of secondary pollution across western Europe (in particular western Germany, the Netherlands, Luxemburg;  Petit et al.2017), which were then transported towards SIRTA. This is reflected by the SHAP values of the u and v wind components, which are positive throughout the episode (see Fig. 14g and h). Concentration peaks of total PM1 are measured on 18 and 20 March. Both peaks are characterized by a rapid development of total PM1 concentrations. As described in Petit et al. (2017), these strong daily variations of total PM1, which are mainly driven by the SIA fraction, could be due to varying synoptic cycles, especially the passage of cold fronts. The influence of MLH and temperature is relatively small, which is consistent with the high influence of advection on total PM1 concentrations during the episode. The exceptional character of the episode (see Petit et al.2017) partly explains the bad performance of the model in capturing total PM1 variability during the event. An unusual rain shortage is observed in large areas of western Europe prior to the episode (Petit et al.2017). While time since precipitation at the SIRTA site is a large positive contributor to the model outcome (see Fig. 14d), it is not driving the day-to-day variations. The unusual nature of this event and lack of information on emission in the source regions and formation processes along air mass trajectories in the model likely explain why the model has difficulties in reproducing this pollution episode. While this has implications for the application of explainable machine learning models for rare events, this is not expected to be an issue for the other cases and seasonal results presented here.

Figure 14As Fig. 11 for an exemplary spring pollution episode in March 2015.


5 Conclusions and outlook

In this study, dominant patterns of meteorological drivers of PM1 species and total PM1 concentrations are identified and analysed using a novel, data-driven approach. A machine learning model is set up to analyse measured speciated and total PM1 concentrations based on meteorological measurements from the SIRTA supersite, southwest of Paris. The machine learning model is able to reproduce daily variability of particle concentrations well and is used to analyse and quantify the atmospheric processes causing high-pollution episodes during different seasons using a SHAP-value framework. As interactions between the meteorological variables are accounted for, the model enables the separation, quantification, and comparison of their respective impacts on the individual events. It is shown that ambient meteorology can substantially exacerbate air pollution. Results of this study point to the distinguished role of shallow MLHs, low temperatures, and low wind speeds during peak PM1 episodes in winter. These conditions are often amplified by northeastern wind inflow under high-pressure synoptic circulation. A detailed analysis reveals how the meteorological drivers of winter high-pollution episodes interact. For an episode in January 2016, model results show a strong influence of temperature to the elevated PM1 concentrations during this episode (up to 11 µg/m3 are attributed to temperature), suggesting enhanced local, temperature-dependent particle formation. During a different, prolonged pollution episode in December 2016, temperature levels were relatively stable and had no influence. Here, MLH (<500 m a.s.l.) was quantified to be the main driver of modelled PM1 peak concentrations with contributions up to 6 µg/m3, along with wind direction contributions of up to 6 µg/m3. Total PM1 concentrations in spring can be as high as 50 µg/m3. These peaks in spring are not as well reproduced by the model as winter episodes and are likely due to new particle formation processes along the air mass trajectories, in particular of nitrate. Summer PM1 concentrations are lower than in other seasons. Model results suggest that summer peak concentrations are largely driven by high temperatures, particle advection from Paris and continental Europe with low wind speeds, and prolonged periods without precipitation. For an example episode in June 2017, temperatures above 30 C contribute 3 µg/m3 to the total PM1 concentration. On-site scarcity of rain increases air pollution but does not appear to be a major driver of strong day-to-day variations in particle concentrations. Presumably, this is because droughts are synoptic and are spread over several days or even weeks. Thus, they present very low inter-daily variability on the local scale. Nonetheless, Petit et al. (2017) have highlighted the link between extreme PM concentrations (especially during spring) and extreme precipitation deficit (compared to average conditions). The main drivers of day-to-day variability of predicted PM1 concentrations are changes in wind direction, air temperature, and MLH. These changes often superimpose the influence of time without precipitation. Individual PM1 species are shown to respond differently to changes in temperature. While SO42- and organic matter concentrations are increased during both high- and low-temperature situations, NH4+ and NO3- are substantially increased only at low temperatures. Model results indicate that SIA particle formation is enhanced during shallow MLH conditions. The presented findings refer to the SIRTA supersite but the results are nevertheless transferable to other regions as well. For example, the importance of temperature-induced particle formation processes have been shown for the USA (Dawson et al.2007), Europe (Megaritis et al.2014), and China (Wang et al.2016). Hence, it is likely that the detailed, species-dependent disclosure of the nonlinear relationship between temperature and PM1 of this study holds for other urban and suburban areas. This has implications for the PM concentrations in the context of climate change. The empirical perspective of the current study complements the findings of various modelling studies (Dawson et al.2007; Megaritis et al.2013, 2014; Sá et al.2016; Doherty et al.2017); the insights provided here from an empirical perspective could increase the confidence in air quality estimations under climate change. Furthermore, the impact of shallow MLHs on PM1 concentrations investigated here is comparable to results found in a previous, regional-scale study over central Europe that highlighted the dominant role of MLH on PM10 concentrations (Stirnberg et al.2020). The importance of wind direction highlights the role of advected pollution by remote, highly polluted urban or industrial hotspots. In general, the interpretation of pollution advection patterns requires knowledge on source regions and terrain. Here, the Paris agglomeration is a major source of pollutants while the relatively flat terrain allows unimpeded advection of air masses. Urban areas in a more complex terrain would likely be affected by slightly different and possibly more complex mechanisms, such as terrain- and meteorology-dependent air stagnation events Wang et al. (2018) as well as orography-driven wind and precipitation patterns (Rosenfeld et al.2007). Still, given the task of disentangling the impact of the various meteorological drivers on air quality is already a complex scientific subject, a continental, flat terrain city such as Paris was chosen as the subject area precisely to exclude other factors (such as orographic flow or sea breeze) that would add further complexity. Certainly, the methods developed here could be transferred to other urban areas in more complex settings in the framework of future studies.

Furthermore, the analysis of meteorological drivers could be extended in future studies, e.g. by including information on anthropogenic emissions or further stations down- and upwind of SIRTA, which would allow further analysis of dominant advection patterns. Furthermore, information on emissions or meteorology in the source region of air masses, e.g. using satellite-based observations, might be helpful to better reproduce particle transport patterns. This could be complemented by incorporating synoptic variables, e.g. the North Atlantic Oscillation (NAO) index.

For policy makers, the presented approach could prove beneficial in multiple ways. Knowledge of meteorological conditions that exacerbate air pollution could be used to issue preventative warnings to the public if these conditions are forecasted. Another potential future application could be the quantitative assessment of policy measures, e.g. traffic bans, by comparing an “expected” level of air pollution under given meteorological conditions to actual observations (e.g.  Cermak and Knutti2009). Finally, the presented model framework could be combined with short-term weather forecasts, which would allow an air quality forecast based on the predictions of the statistical models to be provided.

Appendix A: Comparison of mixed layer height (MLH) measured at SIRTA and Charles de Gaulle airport

As mentioned in Sect. 2.2, ca. 90 missing MLH values in 2016 were replaced with measurements conducted at the Charles de Gaulle airport (see Fig. 1). Figures A1 and A2 summarize MLH values for 2016 when measurements from both sites are available (afternoon period). As shown in Fig. A1, measurements at both sites generally agree well, except for some outliers. Spearman's rank coefficient is significant (p value < 0.05) and has a value of 0.51.

Figure A1Scatterplot for MLH [m a.g.l.] measured at SIRTA vs. MLH measured at Charles de Gaulle airport.


A comparison of the frequency of occurrence is shown as histogram in Fig. A1 and indicates good agreement as well.

Figure A2Histogram showing the frequency of occurrence for MLH [m a.g.l.] measured at SIRTA (red) vs. MLH measured at Charles de Gaulle airport (black).


Appendix B: Illustration of the influence of NO3- fraction and wind speed

Figures B1 and B2 illustrate the influence of the NO3- fraction and maximum wind speed on the model outcome using SHAP values.

Figure B1As Fig. 5 for fraction of NO3- SHAP values (contribution of NO3- fraction to the prediction of species and total PM1 for each data instance) vs. absolute NO3- fraction.


Figure B2As Fig. 5 for the maximum wind speed SHAP values (contribution of maximum wind speed to the prediction of species and total PM1 for each data instance) vs. absolute maximum wind speed.


Data availability

SIRTA-ReOBS data can be accessed online (, last access: 15 March 2021) (SIRTA/IPSL2021). ACSM data are available upon request.

Author contributions

RS, JC, MH, and SK developed the study concept. The data were acquired by JEP, OF, and SK. Formal analysis, investigation, and visualization was performed by RS. The methodology was developed by RS, JC, MH, SK, HA, JF, and MK. RS wrote the original draft. All authors contributed to writing, reviewing, and editing.

Competing interests

The authors declare that they have no conflict of interest.


The authors would like to acknowledge the ACTRIS-2 project that received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement no. 654109. Acknowledgements are extended to Rodrigo Guzman and Christophe Boitel for providing the latest update of the SIRTA ReOBS data set. Furthermore, the authors acknowledge Scott Lundberg for his work on the TreeSHAP algorithm. Roland Stirnberg was supported by the KIT Graduate School for Climate and Environment (GRACE).

Financial support

This research has been supported by European Union's Horizon 2020 research and innovation programme (grant no. 654109).

Review statement

This paper was edited by Leiming Zhang and reviewed by Yves Rybarczyk and three anonymous referees.


Baklanov, A., Molina, L. T., and Gauss, M.: Megacities, air quality and climate, Atmos. Environ., 126, 235–249,, 2016. a

Beekmann, M., Prévôt, A. S. H., Drewnick, F., Sciare, J., Pandis, S. N., Denier van der Gon, H. A. C., Crippa, M., Freutel, F., Poulain, L., Ghersi, V., Rodriguez, E., Beirle, S., Zotter, P., von der Weiden-Reinmüller, S.-L., Bressi, M., Fountoukis, C., Petetin, H., Szidat, S., Schneider, J., Rosso, A., El Haddad, I., Megaritis, A., Zhang, Q. J., Michoud, V., Slowik, J. G., Moukhtar, S., Kolmonen, P., Stohl, A., Eckhardt, S., Borbon, A., Gros, V., Marchand, N., Jaffrezo, J. L., Schwarzenboeck, A., Colomb, A., Wiedensohler, A., Borrmann, S., Lawrence, M., Baklanov, A., and Baltensperger, U.: In situ, satellite measurement and model evidence on the dominant regional contribution to fine particulate matter levels in the Paris megacity, Atmos. Chem. Phys., 15, 9577–9591,, 2015. a

Bressi, M., Sciare, J., Ghersi, V., Bonnaire, N., Nicolas, J. B., Petit, J.-E., Moukhtar, S., Rosso, A., Mihalopoulos, N., and Féron, A.: A one-year comprehensive chemical characterisation of fine aerosol (PM2.5) at urban, suburban and rural background sites in the region of Paris (France), Atmos. Chem. Phys., 13, 7825–7844,, 2013. a, b, c, d, e, f, g, h, i, j, k, l

Bressi, M., Sciare, J., Ghersi, V., Mihalopoulos, N., Petit, J.-E., Nicolas, J. B., Moukhtar, S., Rosso, A., Féron, A., Bonnaire, N., Poulakis, E., and Theodosi, C.: Sources and geographical origins of fine aerosols in Paris (France), Atmos. Chem. Phys., 14, 8813–8839,, 2014. a, b

Cermak, J. and Knutti, R.: Beijing Olympics as an aerosol field experiment, Geophys. Res. Lett., 36, L10806,, 2009. a, b

Chafe, Z. A., Brauer, M., Klimont, Z., Van Dingenen, R., Mehta, S., Rao, S., Riahi, K., Dentener, F., and Smith, K. R.: Household Cooking with Solid Fuels Contributes to Ambient PM2.5 Air Pollution and the Burden of Disease, Environ. Health Perspect., 122, 1314–1320,, 2014. a

Chen, G., Li, S., Zhang, Y., Zhang, W., Li, D., Wei, X., He, Y., Bell, M. L., Williams, G., Marks, G. B., Jalaludin, B., Abramson, M. J., and Guo, Y.: Effects of ambient PM1 air pollution on daily emergency hospital visits in China: an epidemiological study, Lancet Planet. Heal., 1, 221–229,, 2017. a, b

Chen, Y., Schleicher, N., Chen, Y., Chai, F., and Norra, S.: The influence of governmental mitigation measures on contamination characteristics of PM2.5 in Beijing, Sci. Total Environ., 490, 647–658,, 2014. a

Cheng, Y., Zheng, G., Wei, C., Mu, Q., Zheng, B., Wang, Z., Gao, M., Zhang, Q., He, K., Carmichael, G., Pöschl, U., and Su, H.: Reactive nitrogen chemistry in aerosol water as a source of sulfate during haze events in China, Sci. Adv., 2, e1601530,, 2016. a

Chiriaco, M., Dupont, J.-C., Bastin, S., Badosa, J., Lopez, J., Haeffelin, M., Chepfer, H., and Guzman, R.: ReOBS: a new approach to synthesize long-term multi-variable dataset and application to the SIRTA supersite, Earth Syst. Sci. Data, 10, 919–940,, 2018. a

Churkina, G., Kuik, F., Bonn, B., Lauer, A., Grote, R., Tomiak, K., and Butler, T. M.: Effect of VOC Emissions from Vegetation on Air Quality in Berlin during a Heatwave, Environ. Sci. Technol., 51, 6120–6130,, 2017. a, b

Clegg, S. L., Brimblecombe, P., and Wexler, A. S.: Thermodynamic Model of the System H+–NH4+–Na+–SO42-–NO3-–Cl–H2O at 298.15 K, J. Phys. Chem. A, 102, 2155–2171,, 1998. a

Dawson, J. P., Adams, P. J., and Pandis, S. N.: Sensitivity of PM2.5 to climate in the Eastern US: a modeling case study, Atmos. Chem. Phys., 7, 4295–4309,, 2007. a, b, c, d

Dey, S., Caulfield, B., and Ghosh, B.: Potential health and economic benefits of banning diesel traffic in Dublin, Ireland, J. Transp. Heal., 10, 156–166,, 2018. a

Doherty, R. M., Heal, M. R., and O'Connor, F. M.: Climate change impacts on human health over Europe through its effect on air quality, Environ. Heal., 16, 118,, 2017. a

Drinovec, L., Močnik, G., Zotter, P., Prévôt, A. S. H., Ruckstuhl, C., Coz, E., Rupakheti, M., Sciare, J., Müller, T., Wiedensohler, A., and Hansen, A. D. A.: The “dual-spot” Aethalometer: an improved measurement of aerosol black carbon with real-time loading compensation, Atmos. Meas. Tech., 8, 1965–1979,, 2015. a

Dupont, J.-C., Haeffelin, M., Badosa, J., Elias, T., Favez, O., Petit, J., Meleux, F., Sciare, J., Crenn, V., and Bonne, J.: Role of the boundary layer dynamics effects on an extreme air pollution event in Paris, Atmos. Environ., 141, 571–579,, 2016. a, b, c, d, e, f, g, h, i

Elith, J., Leathwick, J. R., and Hastie, T.: A working guide to boosted regression trees, J. Anim. Ecol., 77, 802–813,, 2008. a, b, c

Ervens, B., Turpin, B. J., and Weber, R. J.: Secondary organic aerosol formation in cloud droplets and aqueous particles (aqSOA): a review of laboratory, field and model studies, Atmos. Chem. Phys., 11, 11069–11102,, 2011. a

Favez, O., Cachier, H., Sciare, J., Sarda-Estève, R., and Martinon, L.: Evidence for a significant contribution of wood burning aerosols to PM2.5 during the winter season in Paris, France, Atmos. Environ., 43, 3640–3644,, 2009. a

Fowler, D., Pilegaard, K., Sutton, M., Ambus, P., Raivonen, M., Duyzer, J., Simpson, D., Fagerli, H., Fuzzi, S., Schjoerring, J., Granier, C., Neftel, A., Isaksen, I., Laj, P., Maione, M., Monks, P., Burkhardt, J., Daemmgen, U., Neirynck, J., Personne, E., Wichink-Kruit, R., Butterbach-Bahl, K., Flechard, C., Tuovinen, J., Coyle, M., Gerosa, G., Loubet, B., Altimir, N., Gruenhage, L., Ammann, C., Cieslik, S., Paoletti, E., Mikkelsen, T., Ro-Poulsen, H., Cellier, P., Cape, J., Horváth, L., Loreto, F., Niinemets, Ü., Palmer, P., Rinne, J., Misztal, P., Nemitz, E., Nilsson, D., Pryor, S., Gallagher, M., Vesala, T., Skiba, U., Brüggemann, N., Zechmeister-Boltenstern, S., Williams, J., O'Dowd, C., Facchini, M., de Leeuw, G., Flossman, A., Chaumerliac, N., and Erisman, J.: Atmospheric composition change: Ecosystems-Atmosphere interactions, Atmos. Environ., 43, 5193–5267,, 2009. a

Friedman, J. H.: Greedy Function Approximation: A Gradient Boosting Machine, Ann. Stat., 29, 1189–1232,, 2001. a

Friedman, J. H.: Stochastic gradient boosting, Comput. Stat. Data Anal., 38, 367–378,, 2002. a, b

Fuchs, J., Cermak, J., and Andersen, H.: Building a cloud in the southeast Atlantic: understanding low-cloud controls based on satellite observations with machine learning, Atmos. Chem. Phys., 18, 16537–16552,, 2018. a

Geiß, A., Wiegner, M., Bonn, B., Schäfer, K., Forkel, R., von Schneidemesser, E., Münkel, C., Chan, K. L., and Nothard, R.: Mixing layer height as an indicator for urban air quality?, Atmos. Meas. Tech., 10, 2969–2988,, 2017. a

Gen, M., Zhang, R., Huang, D. D., Li, Y., and Chan, C. K.: Heterogeneous SO2 Oxidation in Sulfate Formation by Photolysis of Particulate Nitrate, Environ. Sci. Technol. Lett., 6, 86–91,, 2019. a

Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E.: Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation, J. Comput. Graph. Stat., 24, 44–65,, 2015. a

Grange, S. K., Carslaw, D. C., Lewis, A. C., Boleti, E., and Hueglin, C.: Random forest meteorological normalisation models for Swiss PM10 trend analysis, Atmos. Chem. Phys., 18, 6223–6239,, 2018. a, b, c

Guenther, A. B., Zimmerman, P. R., Harley, P. C., Monson, R. K., and Fall, R.: Isoprene and monoterpene emission rate variability: Model evaluations and sensitivity analyses, J. Geophys. Res., 98, 12609–12617,, 1993. a, b

Gupta, P. and Christopher, S. A.: Particulate matter air quality assessment using integrated surface, satellite, and meteorological products: Multiple regression approach, J. Geophys. Res., 114, D14205,, 2009. a

Haeffelin, M., Bock, O., Boitel, C., Bony, S., Bouniol, D., Chepfer, H., Chiriaco, M., Cuesta, J., Drobinski, P., Flamant, C., Grall, M., Hodzic, A., Hourdin, F., Lapouge, F., Mathieu, A., Morille, Y., Naud, C., Pelon, J., Pietras, C., Protat, A., Romand, B., Scialom, G., and Vautard, R.: SIRTA, a ground-based atmospheric observatory for cloud and aerosol research, Ann. Geophys., 23, 253–275, 2005. a

Healy, R. M., Sciare, J., Poulain, L., Kamili, K., Merkel, M., Müller, T., Wiedensohler, A., Eckhardt, S., Stohl, A., Sarda-Estève, R., McGillicuddy, E., O'Connor, I. P., Sodeau, J. R., and Wenger, J. C.: Sources and mixing state of size-resolved elemental carbon particles in a European megacity: Paris, Atmos. Chem. Phys., 12, 1681–1700,, 2012. a, b

Hennig, F., Quass, U., Hellack, B., Küpper, M., Kuhlbusch, T. A. J., Stafoggia, M., and Hoffmann, B.: Ultrafine and Fine Particle Number and Surface Area Concentrations and Daily Cause-Specific Mortality in the Ruhr Area, Germany, 2009–2014, Environ. Health Perspect., 126, 027008,, 2018. a

Hoffmann, C. and Funk, R.: Diurnal changes of PM10-emission from arable soils in NE-Germany, Aeolian Res., 17, 117–127,, 2015. a

Hu, X., Belle, J. H., Meng, X., Wildani, A., Waller, L. A., Strickland, M. J., and Liu, Y.: Estimating PM2.5 Concentrations in the Conterminous United States Using the Random Forest Approach, Environ. Sci. Technol., 51, 6936–6944,, 2017. a

Hueglin, C., Gehrig, R., Baltensperger, U., Gysel, M., Monn, C., and Vonmont, H.: Chemical characterisation of PM2.5, PM10 and coarse particles at urban, near-city and rural sites in Switzerland, Atmos. Environ., 39, 637–651,, 2005. a

Hughes, H. E., Morbey, R., Fouillet, A., Caserio-Schönemann, C., Dobney, A., Hughes, T. C., Smith, G. E., and Elliot, A. J.: Retrospective observational study of emergency department syndromic surveillance data during air pollution episodes across London and Paris in 2014, BMJ Open, 8, 1–12,, 2018. a

Jiang, J., Aksoyoglu, S., El-Haddad, I., Ciarelli, G., Denier van der Gon, H. A. C., Canonaco, F., Gilardoni, S., Paglione, M., Minguillón, M. C., Favez, O., Zhang, Y., Marchand, N., Hao, L., Virtanen, A., Florou, K., O'Dowd, C., Ovadnevaite, J., Baltensperger, U., and Prévôt, A. S. H.: Sources of organic aerosols in Europe: a modeling study using CAMx with modified volatility basis set scheme, Atmos. Chem. Phys., 19, 15247–15270,, 2019. a, b, c

Just, A., De Carli, M., Shtein, A., Dorman, M., Lyapustin, A., and Kloog, I.: Correcting Measurement Error in Satellite Aerosol Optical Depth with Machine Learning for Modeling PM2.5 in the Northeastern USA, Remote Sens., 10, 803,, 2018. a

Kiesewetter, G., Borken-Kleefeld, J., Schöpp, W., Heyes, C., Thunis, P., Bessagnet, B., Terrenoire, E., Fagerli, H., Nyiri, A., and Amann, M.: Modelling street level PM10 concentrations across Europe: source apportionment and possible futures, Atmos. Chem. Phys., 15, 1539–1553,, 2015. a

Klingner, M. and Sähn, E.: Prediction of PM10 concentration on the basis of high resolution weather forecasting, Meteorol. Z., 17, 263–272,, 2008. a

Kotthaus, S. and Grimmond, C. S. B.: Atmospheric boundary-layer characteristics from ceilometer measurements. Part 1: A new method to track mixed layer height and classify clouds, Q. J. Roy. Meteorol. Soc., 144, 1525–1538,, 2018a. a, b

Kotthaus, S. and Grimmond, C. S. B.: Atmospheric boundary-layer characteristics from ceilometer measurements. Part 2: Application to London's urban boundary layer, Q. J. Roy. Meteorol. Soc., 144, 1511–1524,, 2018b. a

Laborde, M., Crippa, M., Tritscher, T., Jurányi, Z., Decarlo, P. F., Temime-Roussel, B., Marchand, N., Eckhardt, S., Stohl, A., Baltensperger, U., Prévôt, A. S. H., Weingartner, E., and Gysel, M.: Black carbon physical properties and mixing state in the European megacity Paris, Atmos. Chem. Phys., 13, 5831–5856,, 2013. a

Lelieveld, J., Evans, J. S., Fnais, M., Giannadaki, D., and Pozzer, A.: The contribution of outdoor air pollution sources to premature mortality on a global scale, Nature, 525, 367–371,, 2015. a

Lelieveld, J., Klingmüller, K., Pozzer, A., Pöschl, U., Fnais, M., Daiber, A., and Münzel, T.: Cardiovascular disease burden from ambient air pollution in Europe reassessed using novel hazard ratio functions, Eur. Heart J., 40, 1–7,, 2019. a

Leung, D. M., Tai, A. P. K., Mickley, L. J., Moch, J. M., van Donkelaar, A., Shen, L., and Martin, R. V.: Synoptic meteorological modes of variability for fine particulate matter (PM2.5) air quality in major metropolitan regions of China, Atmos. Chem. Phys., 18, 6733–6748,, 2018. a

Li, Y., Zhang, J., Sailor, D. J., and Ban-Weiss, G. A.: Effects of urbanization on regional meteorology and air quality in Southern California, Atmos. Chem. Phys., 19, 4439–4457,, 2019. a

Li, Z., Guo, J., Ding, A., Liao, H., Liu, J., Sun, Y., Wang, T., Xue, H., Zhang, H., and Zhu, B.: Aerosol and boundary-layer interactions and impact on air quality, Natl. Sci. Rev., 4, 810–833,, 2017. a, b

Liu, Q., Jia, X., Quan, J., Li, J., Li, X., Wu, Y., Chen, D., Wang, Z., and Liu, Y.: New positive feedback mechanism between boundary layer meteorology and secondary aerosol formation during severe haze events, Sci. Rep., 8, 1–8,, 2018. a

Lundberg, S. M. and Lee, S.-I.: A Unified Approach to Interpreting Model Predictions, arXiv [preprint], arXiv:1705.07874, 25 November 2017. a, b, c, d

Lundberg, S. M., Nair, B., Vavilala, M. S., Horibe, M., Eisses, M. J., Adams, T., Liston, D. E., Low, D. K.-W., Newman, S.-F., Kim, J., and Lee, S.-I.: Explainable machine-learning predictions for the prevention of hypoxaemia during surgery, Nat. Biomed. Eng., 2, 749–760,, 2018. a, b

Lundberg, S. M., Erion, G. G., and Lee, S.-I.: Consistent Individualized Feature Attribution for Tree Ensembles, arXiv [preprint],, 7 March 2019. a, b, c, d, e

Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., and Lee, S.-I.: From local explanations to global understanding with explainable AI for trees, Nat. Mach. Intell., 2, 56–67,, 2020. a, b, c, d

McGovern, A., Lagerquist, R., John Gagne, D., Jergensen, G. E., Elmore, K. L., Homeyer, C. R., and Smith, T.: Making the Black Box More Transparent: Understanding the Physical Implications of Machine Learning, B. Am. Meteorol. Soc., 100, 2175–2199,, 2019. a

Megaritis, A. G., Fountoukis, C., Charalampidis, P. E., Pilinis, C., and Pandis, S. N.: Response of fine particulate matter concentrations to changes of emissions and temperature in Europe, Atmos. Chem. Phys., 13, 3423–3443,, 2013. a, b

Megaritis, A. G., Fountoukis, C., Charalampidis, P. E., Denier van der Gon, H. A. C., Pilinis, C., and Pandis, S. N.: Linking climate and air quality over Europe: effects of meteorology on PM2.5 concentrations, Atmos. Chem. Phys., 14, 10283–10298,, 2014. a, b, c, d, e

Ng, N. L., Herndon, S. C., Trimborn, A., Canagaratna, M. R., Croteau, P. L., Onasch, T. B., Sueper, D., Worsnop, D. R., Zhang, Q., Sun, Y. L., and Jayne, J. T.: An Aerosol Chemical Speciation Monitor (ACSM) for routine monitoring of the composition and mass concentrations of ambient aerosol, Aerosol Sci. Technol., 45, 770–784,, 2011. a, b

Paatero, P. and Tapper, U.: Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values, Environmetrics, 5, 111–126,, 1994. a

Pay, M. T., Jiménez-Guerrero, P., and Baldasano, J. M.: Assessing sensitivity regimes of secondary inorganic aerosol formation in Europe with the CALIOPE-EU modeling system, Atmos. Environ., 51, 146–164,, 2012. a, b, c, d, e, f, g

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Müller, A., Nothman, J., Louppe, G., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, É.: Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res., arXiv [preprint], arXiv:1201.0490,, 2012. a

Petäjä, T., Järvi, L., Kerminen, V.-M., Ding, A., Sun, J., Nie, W., Kujansuu, J., Virkkula, A., Yang, X., Fu, C., Zilitinkevich, S., and Kulmala, M.: Enhanced air pollution via aerosol-boundary layer feedback in China, Sci. Rep., 6, 18998,, 2016. a

Petetin, H., Beekmann, M., Sciare, J., Bressi, M., Rosso, A., Sanchez, O., and Ghersi, V.: A novel model evaluation approach focusing on local and advected contributions to urban PM2.5 levels – application to Paris, France, Geosci. Model Dev., 7, 1483–1505,, 2014. a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r

Petit, J.-E., Favez, O., Sciare, J., Canonaco, F., Croteau, P., Močnik, G., Jayne, J., Worsnop, D., and Leoz-Garziandia, E.: Submicron aerosol source apportionment of wintertime pollution in Paris, France by double positive matrix factorization (PMF2) using an aerosol chemical speciation monitor (ACSM) and a multi-wavelength Aethalometer, Atmos. Chem. Phys., 14, 13773–13787,, 2014. a, b, c, d, e, f, g, h, i

Petit, J.-E., Favez, O., Sciare, J., Crenn, V., Sarda-Estève, R., Bonnaire, N., Močnik, G., Dupont, J.-C., Haeffelin, M., and Leoz-Garziandia, E.: Two years of near real-time chemical composition of submicron aerosols in the region of Paris using an Aerosol Chemical Speciation Monitor (ACSM) and a multi-wavelength Aethalometer, Atmos. Chem. Phys., 15, 2985–3005,, 2015. a, b, c, d, e, f, g, h, i, j, k, l, m, n

Petit, J. E., Amodeo, T., Meleux, F., Bessagnet, B., Menut, L., Grenier, D., Pellan, Y., Ockler, A., Rocq, B., Gros, V., Sciare, J., and Favez, O.: Characterising an intense PM pollution episode in March 2015 in France from multi-site approach and near real time data: Climatology, variabilities, geographical origins and model evaluation, Atmos. Environ., 155, 68–84,, 2017. a, b, c, d, e, f, g, h, i, j, k, l, m

Putaud, J.-P., Raes, F., Van Dingenen, R., Brüggemann, E., Facchini, M.-C., Decesari, S., Fuzzi, S., Gehrig, R., Hüglin, C., Laj, P., Lorbeer, G., Maenhaut, W., Mihalopoulos, N., Müller, K., Querol, X., Rodriguez, S., Schneider, J., Spindler, G., ten Brink, H., Tørseth, K., and Wiedensohler, A.: A European aerosol phenomenology – 2: chemical characteristics of particulate matter at kerbside, urban, rural and background sites in Europe, Atmos. Environ., 38, 2579–2595,, 2004. a

Radke, L. F., Hobbs, P. V., and Eltgroth, M. W.: Scavenging of aerosol particles by precipitation, J. Appl. Meteorol., 19, 715–722,<0715:SOAPBP>2.0.CO;2, 1980. a

Rengarajan, R., Sudheer, A., and Sarin, M.: Wintertime PM2.5 and PM10 carbonaceous and inorganic constituents from urban site in western India, Atmos. Res., 102, 420–431,, 2011. a

Rosenfeld, D., Dai, J., Yu, X., Yao, Z., Xu, X., Yang, X., and Du, C.: Inverse relations between amounts of air pollution and orographic precipitation, Science, 315, 1396–1398, 2007. a

Rost, J., Holst, T., Sahn, E., Klingner, M., Anke, K., Ahrens, D., and Mayer, H.: Variability of PM10 concentrations dependent on meteorological conditions, Int. J. Environ. Pollut., 36, 3–18,, 2009. a

Rybarczyk, Y. and Zalakeviciute, R.: Machine learning approaches for outdoor air quality modelling: A systematic review, Appl. Sci., 8, 2570,, 2018. a

Sá, E., Martins, H., Ferreira, J., Marta-Almeida, M., Rocha, A., Carvalho, A., Freitas, S., and Borrego, C.: Climate change and pollutant emissions impacts on air quality in 2050 over Portugal, Atmos. Environ., 131, 209–224,, 2016. a

Samoli, E., Peng, R., Ramsay, T., Pipikou, M., Touloumi, G., Dominici, F., Burnett, R., Cohen, A., Krewski, D., Samet, J., and Katsouyanni, K.: Acute effects of ambient particulate matter on mortality in Europe and North America: Results from the APHENA study, Environ. Health Perspect., 116, 1480–1486,, 2008. a

Samoli, E., Stafoggia, M., Rodopoulou, S., Ostro, B., Declercq, C., Alessandrini, E., Díaz, J., Karanasiou, A., Kelessis, A. G., Le Tertre, A., Pandolfi, P., Randi, G., Scarinzi, C., Zauli-Sajani, S., Katsouyanni, K., and Forastiere, F.: Associations between Fine and Coarse Particles and Mortality in Mediterranean Cities: Results from the MED-PARTICLES Project, Environ. Health Perspect., 121, 932–938,, 2013. a

Schäfer, K., Wagner, P., Emeis, S., Jahn, C., Münkel, C., Suppan: Mixing layer height and air pollution levels in urban area, Proc. SPIE, 8534, 1–10,, 2012. a

Sciare, J., d'Argouges, O., Zhang, Q. J., Sarda-Estève, R., Gaimoz, C., Gros, V., Beekmann, M., and Sanchez, O.: Comparison between simulated and observed chemical composition of fine aerosols in Paris (France) during springtime: contribution of regional versus continental emissions, Atmos. Chem. Phys., 10, 11987–12004,, 2010. a, b, c

Shapley, L.: A Value for n-Person Games, in: Contributions to Theory Games, edited by: Kuhn, H. W. and Tucker, A. W., Princeton University Press, Princeton, USA, 307–318,, 1953. a

SIRTA/IPSL: SIRTA-ReOBS, available at:, last access: 15 March 2021. a

Srivastava, D., Favez, O., Bonnaire, N., Lucarelli, F., Haeffelin, M., Perraudin, E., Gros, V., Villenave, E., and Albinet, A.: Speciation of organic fractions does matter for aerosol source apportionment. Part 2: Intensive short-term campaign in the Paris area (France), Sci. Total Environ., 634, 267–278,, 2018. a, b, c, d, e, f, g, h, i, j

Srivastava, D., Favez, O., Petit, J., Zhang, Y., Sofowote, U. M., Hopke, P. K., and Bonnaire, N.: Science of the Total Environment Speciation of organic fractions does matter for aerosol source apportionment. Part 3: Combining off-line and on-line measurements, Sci. Total Environ., 690, 944–955,, 2019. a

Stirnberg, R., Cermak, J., Fuchs, J., and Andersen, H.: Mapping and Understanding Patterns of Air Quality Using Satellite Data and Machine Learning, J. Geophys. Res.-Atmos., 125, e2019JD031380,, 2020. a, b, c, d, e

Su, J. G., Apte, J. S., Lipsitt, J., Garcia-Gonzales, D. A., Beckerman, B. S., de Nazelle, A., Texcalac-Sangrador, J. L., and Jerrett, M.: Populations potentially exposed to traffic-related air pollution in seven world cities, Environ. Int., 78, 82–89,, 2015. a

Sujatha, P., Mahalakshmi, D., Ramiz, A., Rao, P., and Naidu, C.: Ventilation coefficient and boundary layer height impact on urban air quality, Cogent. Environ. Sci., 2, 1–9,, 2016. a, b, c

Wagner, P. and Schäfer, K.: Influence of mixing layer height on air pollutant concentrations in an urban street canyon, Urban Clim., 22, 64–79,, 2017. a, b

Wang, G., Zhang, R., Gomez, M. E., Yang, L., Zamora, M. L., Hu, M., Lin, Y., Peng, J., Guo, S., Meng, J., Li, J., Cheng, C., Hu, T., Ren, Y., Wang, Y., Gao, J., Cao, J., An, Z., Zhou, W., Li, G., Wang, J., Tian, P., Marrero-Ortiz, W., Secrest, J., Du, Z., Zheng, J., Shang, D., Zeng, L., Shao, M., Wang, W., Huang, Y., Wang, Y., Zhu, Y., Li, Y., Hu, J., Pan, B., Cai, L., Cheng, Y., Ji, Y., Zhang, F., Rosenfeld, D., Liss, P. S., Duce, R. A., Kolb, C. E., and Molina, M. J.: Persistent sulfate formation from London Fog to Chinese haze, P. Natl. Acad. Sci. USA, 113, 13630–13635,, 2016. a, b

Wang, X., Dickinson, R. E., Su, L., Zhou, C., and Wang, K.: PM2.5 Pollution in China and How It Has Been Exacerbated by Terrain and Meteorological Conditions, B. Am. Meteorol. Soc., 99, 105–119,, 2018. a, b

Yang, M., Chu, C., Bloom, M. S., Li, S., Chen, G., Heinrich, J., Markevych, I., Knibbs, L. D., Bowatte, G., Dharmage, S. C., Komppula, M., Leskinen, A., Hirvonen, M.-R., Roponen, M., Jalava, P., Wang, S.-Q., Lin, S., Zeng, X.-W., Hu, L.-W., Liu, K.-K., Yang, B.-Y., Chen, W., Guo, Y., and Dong, G.-H.: Is smaller worse? New insights about associations of PM1 and respiratory health in children and adolescents, Environ. Int., 120, 516–524,, 2018.  a, b

Yang, Y. Q., Wang, J. Z., Gong, S. L., Zhang, X. Y., Wang, H., Wang, Y. Q., Wang, J., Li, D., and Guo, J. P.: PLAM – a meteorological pollution index for air quality and its applications in fog-haze forecasts in North China, Atmos. Chem. Phys., 16, 1353–1364,, 2016. a

Zhang, Y., Favez, O., Canonaco, F., Liu, D., Močnik, G., Amodeo, T., Sciare, J., Prévôt, A. S., Gros, V., and Albinet, A.: Evidence of major secondary organic aerosol contribution to lensing effect black carbon absorption enhancement, npj Clim. Atmos. Sci., 1, 1–8, 2018. a

Zhang, Y., Favez, O., Petit, J.-E., Canonaco, F., Truong, F., Bonnaire, N., Crenn, V., Amodeo, T., Prévôt, A. S. H., Sciare, J., Gros, V., and Albinet, A.: Six-year source apportionment of submicron organic aerosols from near-continuous highly time-resolved measurements at SIRTA (Paris area, France), Atmos. Chem. Phys., 19, 14755–14776,, 2019. a, b

Short summary
Air pollution endangers human health and poses a problem particularly in densely populated areas. Here, an explainable machine learning approach is used to analyse periods of high particle concentrations for a suburban site southwest of Paris to better understand its atmospheric drivers. Air pollution is particularly excaberated by low temperatures and low mixed layer heights, but processes vary substantially between and within seasons.
Final-revised paper