An assessment of aerosol optical properties from remote-sensing observations and regional chemistry–climate coupled models over Europe

. Atmospheric aerosols aerosol–radiation–cloud interactions


Introduction
The uncertainty of atmospheric aerosol effects on the Earth's radiative budget is much greater than for any other climateforcing agent (Boucher et al., 2013). In this sense, aerosol properties, with emphasis placed on optical properties, are vastly variable on scales of space and time due to short-lived aerosol particles and non-uniform emissions (Ramanathan et al., 2001;Kaufman et al., 2002;Randall et al., 2007). According to the International Panel on Climate Change (IPCC) Fifth Assessment Report (AR5), aerosol effects, which depend on their properties, can be classified into aerosolradiation and aerosol-cloud interactions (respectively ARI and ACI; Boucher et al., 2013).
In order to reduce, or to at least characterise, this uncertainty, there are two main approaches to study aerosol optical properties and their influences on the climate system: (1) observations and (2) climate models. Aerosol properties cannot be constrained by measurements alone and using models can improve knowledge and our understanding of physical, chemical and optical aerosol properties. However, models are based on the previous observational study of these properties in order to define and implement the behaviour of aerosol particles in modelling systems.
When aerosol optical properties are evaluated, remote sensing is one of the most widely used techniques in the observational approach. The main advantages of remote sensing are (1) it does not perturb the observed sample (aerosol particles in this case), and is sensitive to different properties, particularly to aerosol optical properties on which this study focuses; and (2) they can provide point, column or profile data. For these reasons, several studies using this approach were carried out to improve knowledge of aerosol properties (e.g. Dubovik et al., 2000;Tanré et al., 2001;Dubovik et al., 2002;Levy et al., 2007;Garland et al., 2008) and to establish their radiative effects (Haywood et al., 2001;Chou et al., 2002;Bellouin et al., 2005;Huang et al., 2006, among many others).
These studies used data that were measured chiefly from two main platforms: from the Earth's surface, the so-called ground-based measurement and from space by using satellites. Moreover, different instruments, such as sun photometers, spectroradiometers or Lidar were used. Major efforts have been made to create networks of ground-based measurements of aerosol optical properties around the world, e.g. as the AErosol RObotic NETwork (AERONET; Holben et al., 1998), the European Aerosol Research Lidar Network (EARLINET;Pappalardo et al., 2014) or the Latin American Lidar Network (LALINET; Antuña-Marrero et al., 2016). There are instruments aboard satellites that provide information about aerosol optical properties with a wide spatial coverage, e.g. the Multi-angle Imaging SpectroRadiometer (MISR; Diner et al., 1998), the Moderate Resolution Imaging Spectroradiometer (Remer et al., 2005, MODIS), the Spinning Enhanced Visible and Infrared Imager (SEVIRI; Aminou et al., 1997Aminou et al., , 1999, the Ozone Monitoring Instrument (Schoeberl et al., 2006, OMI) or the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO; Winker et al., 2003).
At this point, it is important to highlight that several studies (Fuzzi et al., 2015;Van Donkelaar et al., 2015) have discussed the differences between aerosol properties measured by ground-based stations, with a limited spatial coverage, but with more reliable information, and by satellite, with broader spatial coverage, but not with such reliable information as ground-based observations. These aspects are important, together with the wide variability in aerosol optical properties on scales of space and time (Ramanathan et al., 2001;Kaufman et al., 2002;Randall et al., 2007), the huge number of different instruments aboard satellites and the algorithms used, as they produce considerable uncertainty in the measured data.
By considering the modelling approach to reduce or characterise the uncertainty of aerosol effects as part of the COST Action ES1004 EuMetChem (European framework for online integrated air quality and meteorology modelling; see http://www.eumetchem.info/), a list of interactions were identified as the most relevant coupling processes for regional air-quality and weather predictions. These interactions were chosen because experts considered them to be the most important, but were poorly represented at the same time in the current online coupled models (Kong et al., 2015). This type of model offers the possibility to account for the feedback mechanisms between simulated aerosol concentrations and meteorological variables, and thus permits the simulation of ARI and ACI, which form part of the above-mentioned list of interactions.
The efforts made by EuMetChem Action focused on two case studies, which were chosen from the previous experience of Phase 2 of the Air Quality Model Evaluation International Initiative (AQMEII; Galmarini et al., 2015), when important aerosol episodes took place in 2010 all over Europe with potential aerosol effects on meteorology. These cases consist of the Russian wildfires and heat wave episode (in July and August 2010) and a Sahara desert dust outbreak with enhanced cloud and rain over the Mediterranean Sea (in October 2010). These case studies were chosen given the evidence of the particularly significant interactions between meteorology and chemistry during such strong air pollution episodes (Konovalov et al., 2011;Chen et al., 2014;Wong et al., 2012). This work is a companion paper of Baró et al. (2017b). While in this work the representation of optical properties is assessed, in Baró et al. (2017b) the effects of atmospheric aerosols on regional temperature during the same episodes was evaluated.
Several studies have highlighted the relevance of including aerosol radiative feedbacks during such episodes to improve meteorological forecasts (among many others Pérez et al., 2006;Bangert et al., 2012;Makar et al., 2015a, b). For example, the impacts of high aerosol loading during the desert dust events have been described by Shao et al. (2011) to demonstrate interactions with climate and ecosystem, impacts on the Earth's radiative budget and a drop in photolysis rates.
Furthermore, Konovalov et al. (2011) and Chubarova et al. (2012) have found that the very high aerosol concentration due to the fire emissions during the specific Russian wildfires episode significantly changed the regional conditions of the climate system by changes in atmospheric gas composition, as well as the optical and radiative characteristics of aerosols. In Péré et al. (2014), a reduction of solar radiation on the ground of up to 50 W m −2 in diurnal averages was found, with a major reduction in the near-surface air temperature between 0.2 and 2.6 K on a regional scale over most of eastern Europe, a reduction in the atmospheric boundary layer height from 13 to 65 % and the vertical wind speed from 5 to 80 % when the ARI effects during the Russian wildfire episode were included. Similar results have been reported by Baró et al. (2017a), which reveals the effects of ARI on surface winds.
Moreover, Forkel et al. (2016) analysed the different chemistry and physics options of the same online coupled simulations used in this work to evaluate the effects of ARI on radiation and temperature during both episodes studied herein. During the fires episode, the inclusion of this effect led to a reduction of between 10 and 100 W m −2 in the average downward short-wave radiation at the ground level and a drop in the mean 2 m-temperature of almost 1 K over the area where the fires took place. During the dust outbreak episode, the ARI effect resulted in lower mean 2 mtemperature (−0.25 K over the Mediterranean region).
However, none of the aforementioned studies have evaluated the representation of aerosol optical properties and the effects of ARI+ACI on these properties. As the previous revision reveals, a strong influence of the ARI+ACI exists on meteorological variables. Several studies untangled the influence of atmospheric aerosols on the atmospheric boundary layer (Roy and Sharp, 2013), atmospheric stability and winds (Péré et al., 2014;Baró et al., 2017a) with a consequent effect on aerosol concentrations . The winds variation affects emissions of wind-dependent particles over land, such as desert dust and ocean sea salt (Boucher et al., 2013;Prijith et al., 2014;Li et al., 2015). Moreover, the transport of pollutants (e.g. Yang et al., 2017) and the aerosol vertical distribution (Mishra et al., 2015) could be altered. The modification of all of these processes influences aerosol optical properties (e.g. Huang et al., 2010). On the other hand, ARI+ACI effects on relative humidity (Baró et al., 2017a) modify the size of the particles due to the hygroscopic growth affecting the particle extinction .
Due to the uncertainty of aerosol effects, the abovementioned coupling processes are treated differently in online coupled chemistry-meteorology models. This fact highlights the need to study the response of this type of model to the same aerosol emissions. Although the scope of this work is not to study ensemble forecasting, an ensemble mean was performed by using a set of simulations. The reason was because the ensemble mean is likely to provide the most skilful simulations compared to individual ones (Baklanov et al., 2014), like the results found by Fernández et al. (2009), Knutti et al. (2010 and Kjellström et al. (2011). These results can be explained by the paradigm of different models being considered independent samples from a given distribution that is centred on the truth (Annan and Hargreaves, 2010). Hence, the ensemble mean improves modelling-based forecasts compared to the performance of individual models, and this ensemble could be expected to converge to the truth as more models are added to the ensemble.
Hence the main objectives can be summarised as (1) assessing the representation of aerosol optical properties by a set of online coupled chemistry-climate simulations and (2) determining whether the inclusion of aerosol radiative feedbacks in this type of model improves the modelling outputs of aerosol optical properties over Europe. In order to achieve these objectives and to characterise the uncertainty in the measured data, the first step was to determine the "best" satellite data according to AERONET to assess simulations. Afterwards, the representation of the aerosol optical properties by regional online coupled models was evaluated by comparing the ensemble of the models against this "best" satellite estimation.

Data and methods
Aerosol optical properties considered for the evaluation in this work were aerosol optical depth (AOD) and Ångström exponent (AE) at different wavelengths. These optical properties were assessed during two different episodes with a high aerosol load over Europe in 2010, as established in COST Action ES1004 EuMetChem. They were identified from the previous experience of Phase 2 of the AQMEII modelling inter-comparison exercise , and consist of the Russian wildfires and heat wave episode (from 25 July to 15 August 2010) and a Saharan desert dust outbreak with enhanced cloud and rain (from 2 to 15 October 2015).

Model simulations
The used simulations were run by working group 2 (WG2) of COST Action ES1004 EuMetChem (http://www. eumetchem.info/). WG2 of EuMetChem investigated the importance of different processes and feedback in online coupled chemistry-meteorology models for air-quality simulations and weather forecasts.
The used simulations are summarised in Table 1. All these simulations (according to the model, chemical or physical options, and feedbacks used) were denoted as an experiment. Two different online coupled models were used: COSMO-MUSCAT (Wolke et al., 2012) and WRF-Chem, version 3.4.1 (Grell et al., 2005) with different configurations (CS1, CS2, ES1 and ES3).
Simulations covered two different episodes with a high aerosol load over Europe in 2010. The Russian wildfires episode lasted 22 days and the Sahara desert dust outbreak lasted 14 days. Both episodes were simulated following the common strategy for AQMEII Phase 2, a sequence of 2-day time slices in which the meteorology is re-initialised every 2 days and the chemical state is adopted from the final step of the previous time slice (Galmarini et al., 2012(Galmarini et al., , 2017. A spin-up of 5 days was used for the chemistry. The target domains covered Europe. However, for an easy and clear analysis of aerosol effects, the assessment was done in a window of the larger domain (see Fig. 1). The majority of the simulations were carried out with a grid width of approximately 23 km. Exceptions were CS2 (9.9 km) and DE3 (0.125 • , around 14 km), both with higher resolutions for the Russian wildfires episode. The outputs of all the simulations were interpolated to a common lat-long regular grid at a resolution of 0.1 • . The analysis grid for the Russian wildfires and heat wave case (Fig. 1, blue box) covered between 40 and 60 • N and 20 and 60 • E, with a grid size of 50 000 cells; that of the Saharan desert dust outbreak (Fig. 1, green box) covers 25 and 55 • N and −10 and 30 • E, with a grid size of 120 000 cells.
The simulations were run for three different feedback configurations: a base case called "no radiative feedbacks" (NRF), which does not take into account the radiative feedbacks of atmospheric aerosols in meteorology; a case that only bears in mind aerosol-radiation interactions (ARI); and another case that considers aerosol-radiation and aerosolcloud interactions (ARI+ACI).
The meteorological initial and boundary conditions came from 3-hourly data with 0.25 • of resolution from the European Centre for Medium-Range Weather Forecasts igure 1. Target (grey) and analysis domains (blue for the Russian wildfires and heat wave case and green for the Saharan desert dust outbreak).
(ECMWF) operational archive. The chemistry boundary conditions for the main trace gases and particulate matter concentrations are available from the ECMWF IFS-MOZART model run in the MACC-II project (Monitoring Atmospheric Composition and Climate; Inness et al., 2013) with a grid width of 1.125 • and a 3-hourly temporal resolution.
Anthropogenic emission details can be found in Im et al. (2015a, b). They came from the TNO (Netherlands Organisation for Applied Scientific Research) MACC emissions inventory Kuenen et al., 2014;Pouliot et al., 2015, http://www.gmes-atmosphere.eu/) with a spatial resolution of ∼ 7 km. Annual emissions of methane (CH 4 ), carbon monoxide (CO), ammonia (NH 3 ), total nonmethane volatile organic compounds (NMVOCs), nitrogen oxides (NO x ), particulate matter (PM 10 and PM 2.5 ) and sulphur dioxide (SO 2 ) were made available by 10 activity sectors. As part of the AQMEII and EuMetChem initiatives, temporal profiles (diurnal, day-of-week, seasonal) were provided from Schaap et al. (2005) and vertical distributions were also made available.
The biomass burning emission data of the total PM emissions with a spatial resolution of 0.1 • were estimated from the project IS4FIRES (Integrated monitoring and modelling system for wildland fires; Sofiev et al., 2009). Other biomass burning emission species were estimated as detailed in . No heat release due to fires was taken into account.
In the COSMO-MUSCAT simulations, biogenic emissions were treated with the model described in Guenther (1993). Dust emission and transport were computed on the basis of meteorological and hydrological fields from COSMO . In the WRF-Chem simulations, MEGAN (Model of Emissions of Gases and Aerosols from Nature; Guenther et al., 2006) was online coupled with WRF-Chem to estimate the biogenic emissions. Dust emissions were modelled according to Shaw et al. (2008), with an adjustment made to avoid extremely high desert dust fluxes .
For the COSMO-MUSCAT simulations, the physics options were the following: δ-2-stream for long-wave and short-wave fluxes (Ritter and Geleyn, 1992), prognostic turbulent kinetic energy (TKE) planetary boundary layer (PBL), a multi-layer version of the former two-layer soil model after Jacobsen and Heise (1982) and the Tiedtke (1989) mass-flux convection scheme. More details for the physical parameterisation are published in Doms et al. (2011). The cloud microphysics and chemistry options are summarised in Table 1. COSMO-MUSCAT takes into account ARI following Helmert et al. (2007). Radiative fluxes are computed online with the modified COSMO radiation scheme (Ritter and Geleyn, 1992) by considering variations in the modelled sizeresolved aerosol fields. Radiation effects can influence the COSMO meteorology and feedback on emission and aerosol transport. No COSMO-MUSCAT simulations that took into account ACI were used in this work.
The common physics options for the WRF-Chem simulations were the following: the Rapid Radiative Transfer Model for Global long-wave and short-wave radiation schemes (RRTMG; Iacono et al., 2008), the Yonsei University (YSU) PBL scheme (Hong et al., 2006), the NOAH land-surface model (Chen and Dudhia, 2001) and the Grell 3D ensemble cumulus parameterisation (Grell and Dévényi, 2002). The cloud microphysics and chemistry options are summarised in Table 1. However, a more detailed description of the WRF-Chem simulations can be found in Forkel et al. (2015) and Im et al. (2015a, b).
ARI are treated in WRF-Chem following Fast et al. (2006), Chapman et al. (2009) and Barnard et al. (2010). Each chemical constituent of the aerosol was associated with a complex index of refraction. The overall refractive index for a given size bin was determined by volume averaging. The Mie theory and the summation over all size bins were used to determine the composite aerosol optical properties. Wet particle diameters were used in the calculations. Chapman et al. (2009) also described ACI. These interactions were implemented by linking the simulated cloud droplet number with the short-wave radiation and microphysics schemes. Therefore, the droplet number affected both the calculated droplet mean radius and the cloud optical depth when using the short-wave radiation scheme. One limitation of WRF-Chem in the treatment of aerosol-cloud interactions was that these couplings were not computed in the convective clouds simulated by the cumulus parameterisation Yang et al., 2011;Archer-Nicholls et al., 2016;Palacios-Peña et al., 2017).

Observational data
The observational data used to evaluate the representation of the aerosol optical properties by the EuMetChem simulations came from different remote-sensing instruments: ground-based data from AERONET and different sensors aboard a satellite: the twin MODIS, OMI and SeaWIFS (Sea-viewing Wide Field-of-view Sensor).
The data used from AERONET were AOD of level 2.0 at different wavelengths from the available European stations during these episodes. The variables used to the satellite-AERONET comparison were AOD at the closer wavelength to satellite retrievals and those used to the model-AERONET comparison were AOD 670 nm and AE between 440 and 870 nm. Typically, the total uncertainty for the AOD data under cloud-free conditions is < ±0.01 for λ > 440 nm and < ±0.02 for shorter wavelengths (Holben et al., 1998).
For MODIS, the data used were the level 2 of the Atmospheric Aerosol Product (MxD04_L2) from collection 6 (C6) with a resolution of 10 km. Data came from the two available MODIS platforms, Aqua (MYD04_L2) and Terra (MOD04_L2). The MODIS data were estimated by two different algorithms, Dark Target (DT) and Deep Blue (DB). The DT algorithm variables used were AOD at 550 nm for both ocean (estimated error, EE; for negative EE of −0.02 − 10 % × AOD, and for positive EE of +0.04 + 10 % × AOD; http://darktarget.gsfc.nasa.gov/ products/ocean) and land (EE ±0.05 + 15 % × AOD; http:// darktarget.gsfc.nasa.gov/products/land-10), and AE between 550 and 840 nm over the ocean (preliminary EE is 0.45 on pixels with an AOD > 0.2; Levy et al., 2013). The DB algorithm used were AOD at 550 nm over land, with an EE of approximately ±0.05 + 20 % × AOD Sayer et al., 2013), and AE between 412 and 470 nm over land. Finally, a combined variable of the DT and DB algorithms with a wider coverage was used. This was AOD at 550 nm for both ocean and land, and its error has not yet been estimated (Levy et al., 2013).
The daily global level-2G, gridded at 0.25 • , data from the algorithm called near UV of the OMI sensor were also used. The used products were AOD at different UV wavelengths (342.5, 388, 442 nm). The typical values for the retrieval error of the AOD obtained by the non-linear fitting routine were below 0.03 (OMI Team, 2009).
Finally, the daily level 3, gridded at 0.5 • , data from the SeaWIFS sensor were used. As with MODIS, this sensor employed the DB algorithm; consequently the error of these products was the same Sayer et al., 2013). The used products were AOD at different wavelengths over land (412, 490 and 670 nm), over oceans (510, 670 and 865 nm), and both (550 nm); and AE over land between 412 and 490 nm and over oceans between 510 and 670 nm.

Evaluation methodology
The first evaluation step consisted of establishing the "best" satellite data to assess simulations; for this purpose, different satellite datasets were compared with the AERONET observations by calculating the linear regression and the coefficient of correlation between the daily data. Afterwards, the evaluation of the model outputs against the selected satellite database was done by using classical statistics as the individual model-prediction error or bias (e i ), the mean bias error (MBE), the mean absolute error (MAE), the coefficient of correlation (R) and the coefficient of determination (R 2 ) according to Willmott et al. (1985), Weil et al. (1992) and Willmott and Matsuura (2005). Satellite data and model data were bilinearly interpolated to a common working grid, which corresponded with the analysis grid (described above) according to case studies. After the interpolation, and in order to compare with AERONET, the satellite and simulation data were extracted from the cell that covered the corresponding station coordinates of the AERONET station by a nearest neighbour approach. When the models were compared with the satellite data, only those data that matched in time and space with the satellite retrieval was selected. The evaluation was done using both daily and hourly data.
Finally, to evaluate whether the inclusion of the radiative feedbacks in the simulations would lead to an improvement in the error of the model, two variables were defined: the Normalised Improvement of the MAE (NI_MAE = Eq. 1) and the Normalised Improvement of the Coefficient of Determination (NI_R 2 = Eq. 2), both in %. These variables were described so that a positive value would indicate an improvement due to the inclusion of the aerosol radiative feedbacks and a negative value, e.g. worsening. |e i | is the absolute error of the simulations.
3 Results and discussion The first part of this section identified the satellite data with the best skill according to the AERONET observations, which were considered referential. Then according to these results, the selected data were used to evaluate the representation of the aerosol optical properties by the different EuMetChem simulations during the Russian heat wave and wildfires episode, described in the second part of this section, and the Sahara desert dust outbreak, described in the third part.

Satellite-AERONET comparison
The results of the linear regression between the daily mean of the different satellite sensors and the AERONET data are shown in this subsection. Table 2 lists the correlation coefficient values for this statistical analysis. AE was excluded from this analysis as satellite data matching with AERONET sites were not enough, which made the evaluation statistically non-significant. In general, the best skills of the evaluation of the satellite products against AERONET were found for MODIS, as also indicated in Myhre et al. (2005) or Bibi et al. (2015). Focusing on the results obtained during the Russian wildfires episode, the best skills were obtained for the DB algorithm of MODIS (correlation values over 0.80 both from the Terra and Aqua platforms). Indeed the best AOD estimation was obtained from the DB algorithm of MODIS from the Terra platform, with an R 2 of 0.82. The aerosol products retrieved by the DB algorithm provide useful information about aerosol properties over bright-reflecting land surfaces, such as desert, semiarid and urban regions. In C6, the algorithm was enhanced in order to improve retrievals over regions of mixed vegetated and non-vegetated surfaces , and to cover all the cloud-free and snow-free land surfaces (Sayer et al., 2014). This fact could explain the good representation of this satellite product over the study area. The best estimated variable from the OMI data was AOD at 342.5 nm with a correlation value of 0.80. This could be because organic species, such as biomass burning aerosols, present a greater absorption at near-ultraviolet and blue wavelengths (Andreae and Gelencsér, 2006;Sun et al., 2007). For this episode, no data from the SeaWIFS sensor were available.
During the Saharan dust episode, the best skills of the AOD satellite products were found for the combined and DT algorithm products of MODIS from the Aqua platform. The correlation values were much higher than for the other satellite products (R 2 = 0.92). The DT algorithm was composed of two different algorithms, which were used to retrieve aerosol information over land (dark at visible and longer wavelengths) and over vegetated or dark-soiled land (dark at visible wavelengths; Sayer et al., 2014). Its combination with the enhanced DB algorithm in C6 provided expanded coverage over most of Europe, which covered most of the domain. The OMI products estimation presented lower correlation values (0.26) than for the Russian wildfires episode. In the dust episode, visible (550 nm in MODIS) wavelengths presents better results than UV wavelengths, possibly be-cause the dust spectral dependence is greater in visible bands (Müller et al., 2009).
SeaWIFS sensor data were available during the Saharan dust episode. The evaluation of the AOD estimation at 670 nm showed that this sensor produced better estimations over oceans (correlation values of 0.90) than over land (0.53). SeaWIFS data were partnered with an expanded DB algorithm and the SOAR (SeaWiFS Ocean Aerosol Retrieval) algorithm, with retrievals over oceans and inland water bodies (Sayer et al., 2012). Thus using this combination improved retrievals over oceans, as our results showed.
Therefore, in order to evaluate the representation of aerosol optical properties by the modelling experiments for the Russian wildfires episode, the variables of the DB algorithm from MODIS aboard the Terra platform were used. For the Saharan dust episode, the combined (DT and DB algorithms) AOD at 550 nm and the AE of the DT algorithm from MODIS aboard the Aqua platform were selected.

Evaluation of simulations
Once the most skilful satellite data were defined, these data (together with AERONET observations) were used to establish the biases and errors of the modelling experiments for the episodes covered in this work.

Russian wildfires case
The evaluation of the experiments for the Russian wildfires and heat wave episode are detailed in this subsection. The variables of the DB algorithm from MODIS aboard the Terra platform were used. In order to show more confident results, a mask was implemented for areas where the satellite observations were higher than 30 % of their maximum. Figure 2 shows the evaluation of AOD at 550 nm and Fig. 4 shows AE between 412 and 470 nm. For AERONET data, Fig. 3 depicts the results for AOD at 670 nm and Fig. 5 for AE between 440 and 870 nm. Figure 2a shows the values of AOD measurements by MODIS. The highest values around 2.6 were found over Russia and its surrounding areas due to emissions from the wildfires. The estimation of the mean bias error (MBE) is shown  Table 3) was obtained with the WRF-Chem simulations and the ensemble mean. For DE3, the underestimation was lower and did not cover such a large area as the rest of the experiments. However, a higher overestimation was found in DE3 over the rest of the domain.
WRF-Chem tended to underestimate high AOD values, so these AOD levels were lower than the MODIS levels, especially over the areas with high aerosol loads. When comparing the WRF-Chem experiments, following a modal aerosol approach (CS1, CS2, ES1) led to lower error values than a sectional approach (ES3). Conversely, COSMO-MUSCAT tended to present higher AOD values than WRF-Chem. Thus its underestimation was lower than WRF-Chem, but some-times led to a high overestimation of the satellite-observed AOD. As mentioned in the Introduction, the ensemble mean generally presented the best skills, as stated for other variables in different works (e.g. Baklanov et al., 2014;Fernández et al., 2009;Knutti et al., 2010;Kjellström et al., 2011).
Generally for the ARI and ARI+ACI simulations, slightly lower MBE values than NRF were found in all of the experiments (e.g. in the ES1 simulations, NRF is −1.70, ARI is −1.66, ARI+ACI is −1.64). However, the MBE for the ensemble did not show this improvement with values of −1.55 in the NRF scenario, −1.49 in ARI and −1.71 in ARI+ACI. Its analysis should be carefully taken into account because the ARI ensemble mean did not include the CS1 simulation and the ARI+ACI, the DE3 simulation.
The MBE indicates the bias of the model error, that means the tendency to over-or underestimate satellite observations, but the mean average error (MAE,  over Russia and the surrounding areas were similar or came close to the maximum or minimum MBE values, which indicated that the biases observed above moved in the same way, positive or negative, throughout the simulations period. As indicated by Palacios-Peña et al. (2017), the AOD discrepancies between simulations and observations can be attributed to the errors in the model estimations of the aerosol dry mass, the fraction of particles for a given mass or water associated with aerosols. According to the aerosol dry mass estimation,  found an overestimation in the simulated PM 2.5 levels for the AQMEII Phase 2 simulations which are comparable to EuMetChem simulations, and Soares et al. (2015) established a rough overestimation at about 50 % of the total biomass burning emissions used here. According to this evidence, an overestimation of the AOD levels over the fire area can be expected for this episode. However, our results showed an AOD underestimation. This behaviour could be due to the understated injection height of the total biomass burning emissions found by Soares et al. (2015). On the other hand, the heat released by the fires, which is responsible for the strong updraughts transporting the emitted tracers rapidly to the free troposphere and even the stratosphere (Freitas et al., 2007), was not taken into account in the simulations and could affect the aerosol vertical distribution. In other words, an inaccurate description in the simulation of the aerosol vertical distribution could affect the representation of the aerosol optical depth. We established this statement following Kipling et al. (2016), who found the extent to which the biomass burning emissions to the free troposphere affected the vertical profile of aerosols, causing changes in AOD and radiative forcing with the HadGEM3-UKCA model.
The estimated NI_MAE (Fig. 2c) indicates whether the inclusion of aerosol radiative feedbacks reduces the magnitude of model errors, i.e. whether the inclusion of ARI or ARI+ACI improves the skills of the models as regards the absolute error. The results indicated a generalized improvement of the MAE over the whole domain. For ARI, the ensemble mean presented an improvement above 30 %, but the simulations with the greatest NI_MAE were ES1 (> 40 %, over a large area of the domain) and CS2 (NI_MAE up to 90 %). These improvements took place especially over eastern Europe where medium to high AOD values were found. For ARI+ACI over the same areas, the ensemble mean presented an improvement of up to 45 %. However, in the southwest area of the domain, improvements reached 70 %. Over the fire-affected areas, the experiment that led to major improvement was CS1 (> 50 %).
The causes for the variation in AOD levels has to be sought in a number of physical mechanisms. Péré et al. (2014) and Baró et al. (2017a) found a reduction in the atmospheric boundary layer height and the wind speed over most of eastern Europe when the ARI were taken into account in their simulations to study the same wildfires episode. Due to these effects on meteorology, Zhang et al. (2015) revealed an effect on aerosol concentrations which could explain the observed improvement of AOD over the eastern part of the domain observed herein. Regarding the determination coefficient (Fig. S1 in the Supplement, which only show the areas where correlation results were significant at 90 %), no clear spatial pattern was associated with this parameter or with its improvement.
However, over the whole domain the ensemble mean presented higher values than each specific experiment for R 2 , and a clear improvement in the determination coefficient was observed when ARI+ACI were taken into account. The hourly AOD values recorded at 670 nm of the different simulations correlated with the AERONET values at this same wavelength. Only the correlation results that were significant at the 95 % level in a certain station are shown in Fig. 3a. The model-AERONET R values were > 0.4 in nearly 75 % of the stations, and were above 0.5 in nearly 50 % of the stations used for all the experiments.
The station where ensemble means showed the best skills was the Tõravere station with R in NRF of 0.68, ARI of 0.73 and ARI+ACI of 0.73 (Fig. 3c). At this station, located in northerly areas, AOD was higher between 25 and 30 July, and between 5 and 10 August. However, the maximum correlation values among all the experiments were found at the Eforie station (Fig. 3b) for the CS2 configuration with correlation coefficient for NRF of 0.83, ARI of 0.84 and ARI+ACI of 0.82. This can be explained by the enhanced resolution of CS2 around 9.9 km. This fine resolution may lead to improvements in the local representation of AOD. It should be highlighted that no clear improvement in the model-AERONET correlations was noted when the aerosol radiative feedbacks were taken into account.
The MODIS AE satellite values are shown in Fig. 4a. AE is a variable that provides an idea of particle size in the atmosphere. High AE values indicate fine particles and low values denote coarse particles. High values between 1.7 and 1.8 were found over Russia and the surrounding areas given the fine particles emitted by wildfires. Low AE values were found over the southeastern areas of the domain. This fact could be explained by the presence of coarse particles from the mineral dust that came from nearby desert areas.
A general MBE overview (Fig. 4b) indicated that all the experiments underestimated high AE values and overestimated low AE ones. Hence the model underestimated the variability in this variable. Over the Russian wildfires area, where the highest AE values were found, simulations underestimated (minimum MBE values for NRF in Table 3) this variable. Over the area with the lowest AE values in the southeast, the model produced an overestimation (maximum MBE values in Table 3). According to these results, and in the same way as AOD, the most skilful simulation was the ensemble mean. ES3, the WRF-Chem simulations using a sectional approach, presented higher AE values than the other experiments and thus, the lowest minimum and highest maximum MBE. No simulations for the DE3 experiments were available for AE. Figure 4c shows the MAE values and the improvements for AE. Maximum MAE values (Table 3) were found over the southeast of the domain. Only the ES3 experiment, which used the aerosol sectional approach, presented an improvement in AE representation when ARI+ACI were taken into account (Fig. 4c, first column). This improvement, with values around 20 %, was found over the areas where the MAE values were lower.
The determination coefficient values (Fig. S2b, where only the areas where correlation results were significant at 90 % are shown) of the evaluation made of AE were worse than they were for AOD. The highest values above 0.4 were observed over the Russian wildfires area. Values below 0.25 were found for the rest of the domain. Although the minimum MAE values were shown for ES3, this experiment presented a worse determination coefficient over the Russian wildfires area than the other experiments. Despite there being no clear improvement pattern for the determination coefficient, the ARI+ACI scenario showed isolated improvements with values between 50 and 100 %.
Regarding the model-AERONET comparison (Fig. 5a), the number of stations with significant results for AE data was very limited and substantially lower than for AOD and the results show lower correlation coefficient values. Between 10 and 40 % of the used stations obtained model-AERONET correlations above 0.5, depending on the model used. The largest number (> 40 %) of stations with correlation coefficients > 0.5 was found for the ensemble mean. The station where the ensemble mean had the most skill was Bucharest with a correlation coefficient for NRF of 0.72, ARI of 0.69 and ARI+ACI of 0.74 (Fig. 5b). For this variable, no station had higher correlation coefficient values for any specific experiment. As for AOD, the inclusion of aerosol radiative feedbacks did not improve the AE representation.

Saharan desert dust outbreak case
This section describes the evaluation of the representation of AOD and AE by the simulations done for the Saharan dust episode (Figs. 6 to 9). In this case, CS2 simulations were not available and so the combined (DT and DB algorithms) AOD at 550 nm (Fig. 6) and the AE of the DT algorithm (Fig. 8) from MODIS aboard the Aqua platform were used. As with the Russian wildfires episode, the mask of 30 % of the maximum of observations was implemented. The model-AERONET comparisons of AOD at 670 nm (Fig. 7) and AE between 440 and 870 nm (Fig. 9) were also shown. Figure 6 shows the AOD values measured by MODIS. Over the southeastern area of the domain, values of > 0.6 were observed due to transported dust. This value was not very high for a dust outbreak, but was caused by wet deposition (rain during the episode).
In Fig. 6b, all of the experiments underestimated high AOD values over the southeastern area of the domain. The MBE values over this area were around −0.3 for DE3, but were lower, around −0.2, for the other experiments (WRF-Chem simulations). However, small areas with a higher underestimation were found over this zone (minimum MBE values for NRF in Table 3). Over the rest of the domain, minor overestimations were modelled with MBE values around 0.1. Conversely, small sporadic areas with high overestimations were found (maximum MBE values for NRF in Table 3).
In order to consider the magnitude of these under or overestimations, the MAE is depicted in Fig. 6c (first column). Over the southeast of the domain, the MAE gave values around 0.2 for all of the experiments, except for DE3, which showed a MAE value > 0.3. The most relevant improvements to this statistic (Fig. 6c, second and third column) were observed for the ARI+ACI ensemble mean, with improvements of up to 25 %, over desert areas. Smaller areas over the Mediterranean Sea with improvements of up to 75 % were found for the ES3 simulation. These improvements could be explained by a better representation of the coarse mode in the sectional approach.
The determination coefficient values (Fig. S3a) were higher than for the Russian wildfires episode. These values came close to 1 for all the experiments over most of the domain. No improvement in the representation of R 2 when ARI+ACI were taken into account was observed. For both variables AOD and AE, the established requirement of showing only the areas where correlation results were significant at 90 % was implemented.
Both WRF-Chem and COSMO-MUSCAT tended to underestimate high AOD values. COSMO-MUSCAT was the model with the highest underestimation. Regarding the WRF-Chem simulations, ES3, which used the sectional approach, presented higher AOD values and thus lower minimum and higher maximum MBE values than the other experiments. According to , this underestimation can be explained by a systematic underestimation for all of the models of the PM 10 levels with the highest underestimation for the Mediterranean region. Moreover, this major underestimation can be attributed to natural emissions, such as the desert dust studied in this episode.
Regarding the model-AERONET comparison (Fig. 7a), the ensemble mean showed the best skill at the Toulon station with correlation values of 0.98, 0.98 and 0.96 for NRF, ARI and ARI+ACI, respectively (Fig. 7b). According to these values, it was not possible to establish an improvement when aerosol radiative feedbacks were taken into account. Besides, more than 60 % of the stations used gave correlation values above 0.5 for all of the experiments. Around 80 % of the used stations presented this behaviour for ES1. For these experiments, the highest correlation values were above 0.90 at stations like Toulon (Fig. 7b), Eforie (Fig. 7c) or Thessaloniki (not shown). Although the number of data per station was smaller for this episode than for the Russian wildfires episode, all of the shown stations complied with the established requirement of the correlation results, which were significant at the 95 % level (explained in the previous section).
For AE, the MODIS satellite values are shown in Fig. 8a. Values between 0 and 1 were found over the southeastern area of the domain due to coarse dust particles. When the bias was analysed (Fig. 8b, Table 3), ES3, which uses the aerosol sectional approach, displayed a different behaviour from the rest of the experiments. ES3 underestimated low AE levels over the area affected by the Saharan dust outbreak, with a minimum MBE for NRF of −1.43. An improvement to the AE representation was also achieved for ARI+ACI, with a minimum MBE error of −1.31 (this could also be observed for the NI_MAE, explained below). Small overestimations of the values with a maximum MBE of 0.33 were found for NRF. The other experiments exhibited similar behaviour as they did for the Russian wildfires episode. Low values over the southeast of the domain were overestimated and high values were underestimated. MAE (Fig. 8c, first column) estimates the magnitude of errors. Maximum MAE errors (Table 3) were found over the dust-outbreak-affected area. Regarding improvements to this statistical figure shown in Fig. 8c (second and third column), the ensemble mean presented improvements of up to 50 % over most of the domain when ARI were included. However, when considering ARI+ACI, ES3 had the highest improvement values with values of up to 75 % over the dust-outbreakaffected area. This high improvement could be caused because the inclusion of ARI+ACI affects relative humidity (Baró et al., 2017a) modifying the size of the particles due to the hygroscopic growth and hence may improve the representation of the particles size distribution.
As for AOD, the determination coefficient values (Fig. S3b) were higher than for the Russian wildfires episode and came close to 1 over most of the domain. All the experiments provided these R 2 values, with no significant improvement to this statistical figure when ARI and ACI were taken into account.
Finally, the ensemble and CS1 presented the best skill when they were compared with AERONET (Fig. 9a). Athens (Fig. 9b) was the station where the ensemble mean presented its best skill with correlation coefficients of 0.96, 0.91 and 0.92 for NRF, ARI and ARI+ACI, respectively; and Helgoland (Fig. 9c), done by CS1 simulations with correlation values for NRF of 0.96 and ARI+ACI of 0.95. The latter station did not present any strong dust outbreak influence. It was not possible to establish a significant improvement when aerosol radiative feedbacks were taken into account. Notwithstanding, more than 40 % of the used stations presented correlation values above 0.5 for all the experiments. For the ensemble mean, correlation values above 0.5 were found from 70 to 80 % of the used stations.

Summary and conclusions
This work attempts to identify the skill of a set of satellite observations and online coupled models to represent the aerosol optical properties during two important episodes: a biomass burning episode and a dust outbreak episode that occurred over Europe in 2010. The first task consisted of identifying the skills of different satellite observations compared with AERONET. In this sense, MODIS showed the highest correlation coefficient with the ground-based observations. The DB algorithm of MODIS provided the best skilful products for the Russian wildfires and heat wave case. This was because the algorithm was enhanced over regions of mixed vegetated and non-vegetated surfaces, like the study area. The SOAR algorithm also provided a high-quality product over oceans when it was used for the SeaWIFS sensor during the dust outbreak episode because SOAR was developed to ob-tain retrievals over oceans and inland water bodies. However, the most skilful AOD product for the dust outbreak episode was that provided by the combined DB and DT algorithm of MODIS. This combined product provided an expanded coverage over bright-reflecting land surfaces, such as desert, semiarid and urban regions, which was the case for this dust outbreak. For AE, there were not enough data to estimate how good its representation was.
Using the best available satellite information, a set of regional online chemistry-climate/meteorology models over Europe were evaluated for the two aforementioned episodes. The results for the Russian wildfires episode indicated that AOD was better represented by the models than AE. AOD was generally underestimated by all simulations. This behaviour could be explained by a misunderstanding in the aerosol vertical distribution due to the understated injection height of the total biomass burning emissions or because the heat released by the fires was not taken into account, which could affect the representation of AOD. The underestimation by WRF-Chem was higher than by COSMO-MUSCAT, and the latter simulated higher AOD values than all the WRF-Chem configurations. Among the WRF-Chem experiments, the modal aerosol approach (CS1, CS2 and ES1) presented lower error values than the sectional approach (ES3). The inclusion of ARI+ACI led to an improvement above 30 % to the MAE of the ensemble mean. The modification to these AOD levels could be explained by a reduction in the atmospheric boundary layer height and the wind speed over most of eastern Europe when the ARI were taken into account. It could affect the aerosol concentrations and modify the AOD representation. The ensemble mean was more skillful than the single simulations compared with the satellite products. However, at a local level, the fire episode simulation at a fine resolution (CS2) presented the highest model-AERONET correlations (around 0.8). No clear improvement to the model-AERONET correlations was found when ARI and ACI were included.
In the Russian wildfires case, the MBE, MAE and R 2 values were worse for AE than for AOD. All the experiments underestimated the variability in AE (i.e. low values were represented by too high simulated values and high val-ues by lower values than actually observed). According to all the calculated statistics, the most skillful experiment was the ensemble mean. When considering individual models, the MBE indicated that ES3, the only WRF-Chem simulation that used a sectional approach, presented higher AE values than the other experiments; and thus resulted in a lower underestimation. Despite these higher values, ES3 was the only experiment to present improvements (around 20 %) when ARI+ACI were taken into account. For R 2 no clear improvement was noted when aerosol radiative feedbacks were included. For AE, the highest model-AERONET correlations were found for the ensemble mean and they were lower (around 0.7) than for AOD.
The Saharan desert dust outbreak case results indicated that high AOD values, which were found over the southeast of the domain (dust-affected area), were underestimated by all the models and configurations. According to the results from , this underestimation can be explained by an underestimation in the PM 10 levels, attributed mainly to natural emissions, such as desert dust. DE3 presented the most pronounced underestimation. The ES3 experiments, which used the sectional approach, estimated higher AOD values than the other simulations. The most obvious improvement to the skill from including ARI went to the en- semble mean, with an improvement of up to 25 %. However, we can find a better improvement due to the inclusion of ARI+ACI over the Mediterranean Sea with values around 75 %, which can be explained by a better representation of the coarse mode in the sectional approach when the radiative feedbacks were taking into account. R 2 for AOD during the dust outbreak episode obtained values close to 1 over most of the domain. Although this statistical figure showed better results for this episode than for the Russian wildfires episode, no clear improvement for the aerosol radiative feedbacks was obtained. When models and AERONET were compared, the ensemble mean provided the best skill with correlation values above 0.9. More than 60 % of the used stations presented correlation values above 0.5 for all the experiments. ES1 showed skills for correlations over 0.5 in 80 % of the used stations, sometimes with correlations > 0.9.
Finally, the representation of AE in the dust outbreak case presented similar performance to that for the Russian wildfires episode. The modelling experiments underestimated the variability in this variable, with ES3 providing lower AE values than the other experiments. However, for ARI+ACI, an improvement by up to 75 % was found for the ES3 experiment. This could be due to a better representation of the aerosol size distribution as a result of the hygroscopic growth. The ensemble mean offered the best skill and showed an improvement by up to 50 % when ARI was taken into account. The AE determination coefficient was better for this episode than for the Russian wildfires episode. Locally, the model-AERONET comparison indicated that the ensemble mean and CS1 presented the best skills, with correlation values above 0.9. For the ensemble mean, correlation values above 0.5 were found for 70 to 80 % of the used stations; this number was lower (around 40 %) for the single experiments.
L. Palacios-Peña et al.: An assessment of aerosol optical properties over Europe By way of conclusion, the modelling experiments generally underestimated high AOD values. According to Palacios-Peña et al. (2017), AOD discrepancies between simulations and observations can be attributed to errors in the model estimation of the aerosol dry mass, the fraction of particles for a given mass or the water associated with aerosols. A model underestimation of the aerosol dry mass of dust can explain the AOD underestimation during the Saharan desert dust episode because of the underestimation in the PM 10 levels found by . However, this fact cannot explain the AOD underestimation obtained during the Russian wildfires episode, when a misunderstanding in the aerosol vertical distribution simulation, due to an understated injection height of the total biomass burning emission or because of the heat released by the fires was not taken into account, could affect the AOD representation. Locally, fine resolutions can help improving the representation of this variable. For AE, the aerosol representation by the modal WRF-Chem approach resulted in a substantial underestimation in this variable's variability. The sectional WRF-Chem approach provided high AE values for the case with fine fire particles and low values for the case with coarse particles (dust). Thus the bin representation for different particle sizes strongly influenced the estimation of this variable when a sectional approach was used. Hence the use of a sectional approach improved the size distribution representation in the models, but led to a worse skill to represent AOD.
Albeit sometimes an individual experiment may show better results, as we assumed in the first section, the ensemble mean generally provides the most skillful simulation. Regarding the inclusion of aerosol radiative feedbacks, the observed improvements (especially for a sectional approach) justify not only the inclusion of radiative feedbacks in online coupled models, but also the increase in the computational time to include these effects. Further studies are necessary to improve the model representation of aerosol optical properties.
Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "Coupled chemistry-meteorology modelling: status and relevance for numerical weather prediction, air quality and climate communities (SI of EuMetChem COST ES1004) (ACP/GMD inter-journal SI)". It is not associated with a conference.
Acknowledgements. The authors acknowledge Project REPAIR-CGL2014-59677-R of the Spanish Ministry of the Economy and Competitiveness and the FEDER European program for support to conduct this research. Work was possible thanks to the fellowship 19677/EE/14 funded by Fundación Séneca-Agencia de Ciencia y Tecnología de la Región de Murcia, Programme Jiménez de la Espada de Movilidad, Cooperación e Internacionalización, within the framework of II PCTIRM 2011-2014 and the fellowship FPU14/05505 funded by the Spanish Ministry of Education, Culture and Sport. The authors thank the support from EuMetChem COST Action ES1004 and the Air Quality Modelling Evaluation International Initiative (AQMEII). We also thank the researchers and their staff for establishing and maintaining the European AERONET sites used in this research in the network AERONET, and many others who have been involved in the MODIS, OMI and SeaWIFS datasets (NASA).
Edited by: Saulo Freitas Reviewed by: two anonymous referees