Multi-model evaluation of aerosol optical properties in the AeroCom phase III Control experiment, using ground and space based columnar observations from AERONET, MODIS, AATSR and a merged satellite product as well as surface in-situ observations from GAW sites

Within the framework of the AeroCom (Aerosol Comparisons between Observations and Models) initiative, the present day modelling of aerosol optical properties has been assessed using simulated data representative for the year 2010, from 14 global aerosol models participating in the Phase III Control experiment. The model versions are close or equal to those used for CMIP6 and AerChemMIP and inform also on bias in state of the art ESMs. Modelled column optical depths (total, fine and coarse mode AOD) and Ångström Exponents (AE) were compared both with ground based observations from the Aerosol 5 1 https://doi.org/10.5194/acp-2019-1214 Preprint. Discussion started: 18 March 2020 c © Author(s) 2020. CC BY 4.0 License.

adopted in all models. They concluded that this diversity in component contribution adds (via differences to aerosol size and absorption), to uncertainties for associated aerosol direct radiative effects.
This study investigates modelled aerosol optical properties of the most recent models participating in the AeroCom 2019 control experiment (in the following denoted CTRL, https://wiki.met.no/aerocom/phase3-experiments) on a global scale. Making 110 use of the increasing amount of data which have become available during the past decade, we are able to extend the assessment of modelled optical properties beyond what was originally presented in Kinne et al. (2006). Here, we use observations of ground and space based observations of the above introduced columnar variables of total, fine and coarse AOD and AE as well as, for the first time, surface in-situ measurements of scattering and absorption coefficients, primarily from surface observatories contributing to Global Atmospheric Watch (GAW), obtained from the World Data Centre for Aerosols (GAW-WDCA) 115 archive.
This paper is structured as follows. The next section 2 introduces the observations (OBS), variables (VAR) and models (MOD) used, followed by a discussion of the analysis details for the model evaluation and a short section discussing the representativity of the results. The section ends with a brief discussion of results from a validation study investigating the performance of the satellites used against ground based AERONET data. Section 3 starts with an overview of globally aver-120 aged emissions, burdens, lifetimes, mass-extinction-coefficients (MECs) and optical depths (ODs) for each model and aerosol species, followed by a discussion of the results from the AeroCom MEDIAN model and regional model diversity in the optical parameters considered. The section ends with a discussion of the results from comparison of modelled optical properties with the different observation records used. These results are presented in the form of performance charts of retrieved normalised biases and correlation coefficients for each OBS / VAR / MOD combination. This is followed by a dedicated section 4, which 125 discusses the results for each model individually in order to identify strengths and weaknesses of each model in comparison with the observations and the other models. The paper ends with our conclusions from this comprehensive inter-comparison study.

Observations and variables
130 Several ground and space-based observations have been utilised in order to establish a comprehensive evaluation at all scales. Table 1 summarises all variables and observation networks that have been used. They are introduced in more detail below. Dust dominated regions such as Northern Africa and Southwest Asia are clearly visible both in the coarse AOD and the AE, but also in the total AOD, indicating its importance for the global AOD signal due to dust. The displayed satellite fields of 135 AOD (MERGED-FMI) and AE (ATSR-SU) are particularly useful in remote regions and over the oceans where ground based measurements are less common, and, thus, add substantially to the global picture when assessing models. For example, satellites capture the nearly constant ocean AOD background of around 0.1 (mostly arising from sea salt) which is not really measured by the land dominated, ground-based observation networks. The AE from AATSR-SU for instance, shows a latitudinal southwards decreasing gradient in remote ocean regions, indicating coarse(r) particle sizes, likely due to cleaner and, thus, more sea salt dominated regions. Transatlantic dust transport results in an increased particle size west of the Sahara (e.g. Kim et al. (2014)) as is captured by AATSR-SU. Finally, as can be seen in the lowermost panel of Fig. 1, in-situ sites from GAW show the highest density in Europe, followed by North America, while other regions are poorly represented. The differences in the spatial coverage for each observation data-set are important to keep in mind when interpreting the results presented in Sect. 3 (especially Figs. 10 and 11). 145 The following subsections introduce briefly each of the observation data-sets used.

AERONET
The Aerosol Robotic Network (AERONET, Holben et al. (1998)) is a ground-based, well established remote sensing network based on sun photometer measurements of columnar optical properties. The network comprises several hundred measurement sites around the globe (Fig. 1).

150
In this paper, cloud screened and quality assured daily aggregates of AERONET AODs, AOD < 1 µm , AOD > 1 µm and AE from the version 3 (Level 2) Sun and SDA products (e.g. O'Neill et al. (2003), Giles et al. (2019)) have been used. No further quality control measures have been applied due to the already high quality of the data.
For the analysis, the spectral AOD values were used to derive an AOD at 550 nm using the provided AE. Data from the short term DRAGON campaigns (Holben et al. (2018)) was excluded in order to avoid putting too much weight on the associated 155 regions (that show high density of measurement sites) with respect to the network averaged statistical parameters used in this study. No further site selection has been performed, since potential spatial representativity issues associated with some AERONET sites were found to be of minor relevance for this study (Sect. 2.4, Fig. A5). Fig. 1 shows the sites used for all variables, where colors indicate the 2010 mean values at each location. Table 1 includes relevant information about the data-set. 160 Surface in-situ measurements of the aerosol light scattering and absorption coefficients, were accessed through the GAW-WDCA database EBAS (http://ebas.nilu.no/). The EBAS database also includes various observations of atmospheric chemical composition and physical parameters, although those were not used here. For both scattering and absorption variables, only level 2 data from the EBAS database were used (i.e., quality controlled, hourly averaged, reported at STP). All data in EBAS have version control, and a detailed description of the quality assurance and quality control procedures for GAW aerosol in-situ with the monthly model data. For most of the absorption data, the measurements are performed at wavelengths other than 550 nm. These were converted to 550 nm assuming an absorption Angstrom exponent (AAE) of 1 (e.g. Bond and Bergstrom (2006)). For the scattering coefficients, only measurements at RH≤40% were considered. For the model evaluation, the 2010 175 monthly model data was converted to STP using the following formula:

Surface in-situ data
where p STP and T STP are standard IUPAC standard pressure and temperature, and p AMB and T AMB are air pressure and temperature at the corresponding site location. The correction was performed on a monthly basis using the station altitude to estimate the pressure and monthly near surface (2m) temperature from ERA5.

180
A few urban sites were removed from consideration for the model analysis, as these sites are likely not representative on

MODIS data
Daily gridded level 3 AOD data data from the Moderate Resolution Imaging Spectroradiometer (MODIS) has been used from both satellite platforms (Terra and Aqua) for evaluation of the models. The merged land and ocean global product (named Aerosol_Optical_Depth_Land_Ocean_Mean in the product files) of the recent collection 6.1 was used. This is an updated and improved version of collection 6 (e.g. Levy et al. (2013), Sayer et al. (2014). For changes between both data-sets, see Hubanks The monthly AeroCom ensemble mean and median fields were computed in 2 • × 3 • resolution (Tab. 2). Model fields were all re-gridded to this resolution before the ensemble mean and median was computed. Only those models were considered that had submitted all required optical property variables used in this study (Tab. 1). Those used for constructing the ensemble model are indicated in Table A1. In this paper, the output from the median model is used (denoted MEDIAN below) if not otherwise indicated.

250
In addition local diversity δ fields were computed for each variable and are defined as follows: -Diversity ensemble mean: δ 1 =std. dev./mean (not so good / meaningful in case of outliers) -Diversity ensemble median: δ 2 = IQR/median (outlier resistant definition) where IQR denotes the interquartile range (i.e. the difference between the 3rd and 1st quartile).

Data analysis 255
The analysis of the data was performed using the pyaerocom software (Appendix C). The ground and space based observations are colocated with the model simulations by matching with the closest model grid-point in the originally provided model resolution. In the case of ground-based observations (AERONET and GAW in-situ), the model grid point closest to each measurement site is used. For the satellite observations, both the model data and the (gridded) satellite product are re-gridded to a resolution of 5 • × 5 • and the closest model grid point to each satellite pixel is used. The choice of this rather coarse 260 resolution is a compromise, mostly serving the purpose of increasing the temporal representativity (i.e. more data points per grid cell) in order to meet the time resampling constraints (defined below).
Since many model fields were only available in monthly resolution, the colocation of the data with the observations (and the computation of the statistical parameters used to compare the models) was performed in monthly resolution. Any model data provided in higher temporal resolution was averaged to obtain monthly mean values, prior to the analysis. For the higher 265 resolution observations (1), the computation of monthly means was done using a hierarchical resampling scheme, requiring at least 25% coverage. Practically this means that the daily AERONET data was resampled to monthly, requiring at least 7 daily values in each month. For the hourly in-situ data, first a daily mean was computed (requiring at least 6 valid hourly values) and from these daily means, monthly means were computed requiring at least 7 daily values. Data that did not match these coverage constraints were invalidated.

270
Throughout this paper, the discussion of the results will use two statistical parameters to assess the model performance. The normalised mean bias (NMB) is defined as NMB = The next section presents several sensitivity studies that were performed in order to investigate the spatio-temporal represen-275 tativity of this analysis strategy, which is based on network-averaged, monthly aggregates, as representativity (or lack thereof) comprises a major source of uncertainty (e.g. Schutgens et al. (2016), Schutgens et al. (2017), Sayer and Knobelspiesse (2019)).
The focus here was to assess how such potential representation errors affect the biases and correlation coefficients used to assess the model performance and comparison with other models.

280
As described in the previous Section 2.3, monthly aggregates of the models and observations were colocated in space and time. The resulting point cloud of monthly mean values from all sampling coordinates (sites / aggregated satellite pixels) was then used to compute the biases and correlation coefficients. These are then used to assess the performance of individual models and the ensemble mean, discussed in the following sections (Figs. 10 and 11). The comparison of the (often) temporally incomplete observational records (that are sampled at distinct locations) can introduce considerable representation errors both 285 on spatial and on temporal scales (see e.g. Schutgens et al. (2016), Schutgens et al. (2017), Wang et al. (2018), Sayer and Knobelspiesse (2019) and references therein). These errors can affect established biases between model and observations but also other performance measures such as correlation coefficients. We consider this to be the major source of uncertainty for this study. Therefore, several sensitivity studies have been performed in order to investigate how potential spatio-temporal representation errors affect the global monthly statistical parameters used in this study. Temporal representation uncertainties 290 were investigated 1. for in-situ absorption coefficients using hourly TM5 data from the AeroCom INSITU experiment evaluated against GAW measurements (Fig. A4) and 2. for columnar AOD using 3-hourly data from ECMWF-IFS, evaluated against AERONET AODs (Fig. A3). In addition, spatial representativity errors were investigated by colocating the ensemble mean AOD field both with observations from all AERONET sites (available in 2010) and with a selection of sites that are considered representative on spatial scales covered by a typical model grid cell. The latter was selected based on Wang et al. (2018) using 295 only sites that show an absolute spatial representation error smaller than 10% and the result of this comparison is shown in Fig. A5). The results of these 3 sensitivity studies are summarised in Tab. 3) and show that the overall differences are of the order of 10% and 0.2 for NMB and correlation, respectively. For the in-situ absorption inter-comparison, the results in monthly resolution show better performance in nearly all statistical parameters, compared to hourly (Fig. A4).
From these results, we conclude that differences in these network averaged statistics, arising from spatio-temporal represen-300 tation errors, are small compared to the diversity in the results found among the different models participating in this study (shown in Figs. 10 and 11).
Based on these findings, and due to the fact that some model data was only available in monthly resolution, it was therefore decided that all model and observation comparisons in this study would be performed in monthly resolution. This was done because we believe that it will make the inter-model results more consistent and hence, more suitable for inter-comparison, 305 since they carry similar representation errors (which are introduced by the incompletely sampled observational records).
The small differences in bias and correlation that we find in our sensitivity tests (Figs. A3, A4, A5) are important results that indicate that the magnitude of spatio-temporal representation uncertainties (in statistical parameters derived from annual averages over whole networks) is of the order of ± 10%. For non-geostationary satellites, the absolute temporal representation errors are likely larger due to the low sampling coverage, combined with cloud contamination in certain regions (e.g. South

310
Pacific). A detailed investigation of these uncertainties is beyond the scope of this work. Nonetheless, a further simple sensitivity study was performed aiming to investigate, how our choice of resolution in the satellite/model comparison (i.e. based on 5 • × 5 • resolution and monthly averages) would affect the results (NMB and R), as compared to an analysis that is performed in daily resolution and using the highest available horizontal resolution for each model / satellite (see Tab. 1 for an overview of the satellites used). This was done for each model that provided daily (or higher resolution) data and for the variables AOD,

315
AOD < 1 µm , AOD > 1 µm and AE. The results are shown in Table A2. In most cases, the higher resolution data results in slightly less negative biases and differences can be up to +10% in NMB (e.g. AE SPRINTARS vs. AATSR-SU). However, in most cases the differences are marginal and are well below 5%.
Finally, we want to stress that the uncertainties established here and discussed above are not to be misinterpreted with corresponding uncertainties over sub-domains or at specific locations and times, which can be significantly larger as shown in 320 the various literature referred to above.

Evaluation of satellite products at AERONET stations
All satellite data-sets were evaluated against the ground based AERONET data in order to establish an estimate of the relative differences (biases, correlation coefficients) between the different data-sets when comparing with the models. The evaluation of the gridded satellite level 3 products was performed in the same manner as the evaluation of the models (see previous Sect.

325
2.3). Note that for this analysis the satellite data was used in the original 1 • × 1 • resolution.
The results of this analysis are shown in Figure 2 and reveal generally high correlation with AERONET measurements (R > 0.80). In terms of NMB, AATSR-SU and the MERGED-FMI product show slight underestimations (NMB ≈ −5%) while MODIS Aqua and Terra yield slightly overestimated AODs of approximately +9% and +18%, respectively.
We remark that this analysis is biased by the uneven distribution of AERONET sites (highest density in Europe and North 330 America, Fig. 1) and that problematic regions in the satellite retrievals (e.g. Sahara) may not be well represented in this comparison.
In case of the AATSR-SU data, the retrieval includes a conservative cloud mask utilising thermal channels in additional to optical, and thereby avoids retrieval near cloud edges. Evaluation under aerosol CCI of six data-sets showed AATSR and SeaWifs exhibited the lowest bias (with SeaWifs) with respect to ocean and coastal sun photometers (Popp et al. (2016)).

335
In this section the results from the model evaluation are presented, starting with an overview of annual averaged emissions, burdens, lifetimes, MECs and ODs for each aerosol species and model, followed by a discussion of the results from the ensemble model and regional model diversity. Finally, the results of the optical property evaluation are presented. This is followed by a discussion section for the results from the individual models. OA (78 Tg/yr, primary) and BC (10 Tg/yr). Models agree well in their BC emissions, which is expected since most models used the CMIP6 BC emission inventories (see AeroCom optics questionnaire (supplementary material)). The highest diversity 355 is found for OA (64%) followed by sea salt (51%) and DMS (42%) and dust (32%). These differences are not surprising, since the emissions of these species are typically computed online (fully or partly) in the models and hence, are highly dependent on the individual parameterisations applied (see AeroCom optics questionnaire (supplementary material)).
The lifetimes shown in Fig. 4 were computed using the provided burdens (Fig. 5) and total deposition for each variable (not 360 shown). BC lifetime is around 5.5 days and, in contrast to the BC emissions, shows a rather high diversity of 42% between the models. The modelled NO 3 lifetimes show the largest diversity with values ranging between 2.7 days (GEOS) up to around 10 days (TM5 and EC-Earth). SO 4 and OA have lifetimes of around 5 and 6 days, respectively (and diversities of around 30%). The ensemble median lifetimes of dust and sea salt are around 0.6 and 3.7 days, respectively. However, the individual models tend to show show high variability in these (globally dominant) species with diversities of around 100% and 52% for 365 the residence times of dust and sea salt, respectively.
The modelled atmospheric burdens for each species are shown in Fig. 5). They mostly reflect the corresponding diversities that could be associated with their main sources (emissions) and sinks (deposition). Dust and sea salt burdens, for instance, show considerable variability among the models, with median values of 15 ± 8 Tg and 9 ± 3.4 Tg, respectively. The highest di-370 versity is found for NO 3 (among the 8 models accounting for it) and burdens range between 0.08 Tg (OsloCTM3) and 0.93 Tg (GEOS) The modelled BC burdens also exhibit a considerable spread of around 65% with a median value of 0.16 Tg (Fig. 5).
Since the BC emissions are relatively harmonised among the models, the variability in the BC burden is likely due to (ageing / mixing induced) differences in the BC deposition efficiencies, particularly in strong source regions such as China and India The plotted diversity maps provide insights into the regional model-spread. These may be useful, for instance, to identify regions where models tend to disagree which ultimately may help to explain differences observed when comparing the models with observations (which may be performed in different regions due to lack of spatial coverage). The overall highest diversity, 400 for instance, is found for the simulated surface in situ aerosol absorption coefficients and is particularly prominent in Amazonia, a region of substantial regular biomass burning events (peaking in early September in 2010) and also new particle formation (NPF) events from biogenic emissions. Reasons for these differences may be a combination of the different treatments of SOA formation (and absorptive properties of OA), or potential differences in the emission altitudes (see AeroCom optics questionnaire (supplementary material)). The diversity in in-situ surface absorption is also high in the South Pacific / Antarctica and 405 Australia, which is also affected by regular biomass burning events. Interestingly, models tend to agree in major source regions such as China and India (low diversity in surf. absorption).
The dust dominated Sahara region shows considerable diversity in surface absorption but little diversity in surface scattering.
This is an indication of differences in the treatment of dust absorption optical properties. The increased diversity in AE in this 410 region suggests differences in dust size distribution, which may, to a certain degree, be linked with the increased diversity seen in AOD > 1 µm , which reflects the diversity between the models found for dust emissions, burdens, lifetimes and MECs (Figs. 3 and 5,4,6). Explaining these dust related differences in detail is beyond the scope of this work, and needs further investigation.
Another notable region is the (comparatively clean) South Pacific and Antarctica which shows a belt of high diversity in 415 surface absorption (but not scattering) and AE and considerable diversity in AOD > 1 µm (over land). This behaviour may arise from a combination of differences in sources, lifetimes and long range transport of the aerosol (e.g. dust shows > 50% diversity in lifetime, see also Li et al. (2008)). It may also be due to differences in the absorption optical properties of OA (due to organic Ocean emissions), combined with potential differences in sea ice retreat. Most likely, it is a combination of all these effects.

420
Furthermore, elevated and / or mountainous desert regions such as the Southern Peruvian and Northern Chilean Andes, Tibet show high diversity in AOD > 1 µm . These regions are however, associated with generally low AODs and thus such differences may not have a significant impact on the global radiation budget.
Unfortunately, most ground based observations (used in the following Sect. 3.3 to evaluate the individual models) provide little or no coverage in these remote regions, where the models show high diversity.
425 Figure 9 shows annual mean biases retrieved when evaluating the ensemble AODs against the merged satellite product as well as biases established against AERONET AODs and the surface in-situ scattering measurements. The legend provides the network biases and correlation coefficients for each data-set.
South-east Asia appears to be a region where modelled AOD is low (by about -40%) both compared to MERGED-FMI and to AERONET. It can also be clearly seen that the underestimated scattering (by 44% over all GAW stations) is mostly 430 representative for Europe and the US, where the site density is highest. These regions also show underestimated AERONET AODs, but only by about -14% (as can be seen in web visualisation, see Appendix C).
Furthermore, models tend to underestimate scattering and AOD at the few available polar sites. This is also the case for surface absorption (e.g. Barrow, Alert, Tiksi and Neumeyer in Figs. A1, A2). However, models tend to yield rather diverse results at some of these stations, showing over and underestimations (e.g. absorption at Barrow, scattering at Neumeyer). In terms of bias (NMB) the mean model shows slightly better performance compared to the observations with up to +10% improvement (e.g. surface scattering and AOD > 1 µm ). In terms of correlation (Fig. 11) both median and mean show similar results. Relative biases between the different satellite AOD products mostly resemble the biases found when evaluating the satellites against AERONET (Fig. 2). However, compared to the ground based observations, the satellites can show significantly different results as can be seen, for instance, in the AOD > 1 µm from NorESM2 vs. AATSR-SU and AERONET,

445
respectively. This is because the satellites generally show higher spatial coverage and are thus, also sensitive to the oceans ( Fig. 1). This demonstrates the usefulness of incorporating satellite data, even though these may carry larger uncertainties and representativity errors (Sect. 2.4). For instance, compared to AODs from the two MODIS instruments, models show the largest negative biases, which mostly reflects the results from the satellite evaluation (Sect. 2.5, Fig. 2, i.e. positive biases of +9% and +18% for Aqua and Terra against AERONET).

450
The differences in NMB for AOD > 1 µm and AE between AERONET and AATSR-SU for the models mostly reflect the respective biases found in the satellite assessment (i.e. ca. -15% for AOD > 1 µm and +15% for AE).
The comparison with the surface in-situ data shows considerably large negative biases (and the lowest correlations) of -44% and -32%, for dry scattering and absorption, respectively at the GAW site locations (Fig. 1). In case of scattering, a small fraction (but likely not more than 20%) of these biases may be due to the fact that models reported at RH=0% and the 455 observations are being performed at RH between 0% -40%.
Correlation coefficients (Fig. 11) are generally high for the median model (> 0.6) but can be as low as 0.12 for individual model assessments.  (2017)) calculates the following atmospheric aerosol and chemistry processes: emissions, gas-phase chemistry, new particle formation, condensation of sulphate, nitrate, and organic 465 aerosols, coagulation, cloud activation, aqueous-phase chemistry, dry and wet deposition, and aerosol-radiation and aerosol-cloud interactions. Aerosol particles from 1 nm to 10000 nm in dry diameter are represented with a two-dimensional sectional representation with 12 size bins and 8 BC mixing state bins. Meteorological nudging was used for temperature and wind fields in the free troposphere (<800 hPa) using the MERRA2 data.
The sources and burden of OA exceed the model ensemble by 90% and 50%, respectively (Figs. 3,5). This is likely because in addition, vorticity, divergence and surface pressure fields were nudged to ERA-Interim, using a Newtonian relaxation scheme with a time constant of 8 h and 15 min in the whole atmosphere.
TM5 uses the aerosol scheme M7 (Vignati et al. (2004)), which represents sulphate, black carbon, organic aerosols, sea salt 495 and mineral dust with seven lognormal size distributions or modes. Aerosol components are assumed to be internally mixed inside the modes. The formation of secondary organic aerosols in the atmosphere is described following Bergmann et al. (in preparation)). Ammonium-nitrate and methane sulphonic acid (MSA) are described by their total mass, and assumed to be present only in the soluble accumulation mode (see van Noije et al. (2014) for more details). TM5 has an interactive tropospheric chemistry scheme ), which also describes the aqueous-phase oxidation of dissolved sulphur 500 dioxide in clouds.
When calculating the dust source, TM5 does not include particles with dry diameter larger than 16 µm. This may explain why the mean emitted dust mass is smaller than in other models. Differences in 10 m wind speeds generally reduce the dust emissions from the main source regions in EC-Earth compared to TM5 (Fig. 3), leading to proportionally lower dust burdens.
Sea salt emissions, on the other hand, which depend on 10 m wind speeds and sea surface temperatures, are very similar in the 505 two models. The mean OA lifetime in EC-Earth is 9% longer than in TM5, and in both models are longer than in the other models. This may be in part due to the use of interactive chemistry in TM5 (and EC-Earth), which may lead to a depletion of oxidants over regions with high biogenic VOC emissions, thereby increasing their lifetime (Sporre et al. (in preparation)). The aerosol components, using volume weighting. In this way the extinction due to the presence of water is associated with the other aerosol components. This will enhance the species AOD and MEC values for TM5 and EC-Earth compared to models in which the water contribution is not included, such as ECHAM-HAM and ECHAM-SALSA (Fig. 7). and AOD < 1 µm show good performance with biases smaller than 10% and high correlation (R ≤ 0.79), with AOD < 1 µm being slightly overestimated and AOD being slightly underestimated. The latter is due to a slightly underestimated AOD > 1 µm , both against AERONET and AATSR-SU, which is also reflected in the slightly positive AE bias. Comparison of the diagnosed dry scattering with surface in-situ measurements (at RH<40%) results in biases of -15%. The corresponding comparison of dry 520 absorption, indicates a slightly better performance in TM5 (-2%) than in EC-Earth (-7%), which may be due to the fact that the dust burden in TM5 is about 35% larger than in EC-Earth (and corresponding MECs are similar). The latter would also explain why AOD > 1 µm biases are less negative (by about +10%) in TM5 compared to EC-Earth.

ECHAM-HAM
The global aerosol-climate model ECHAM6. When comparing the total simulated (clear sky) AOD of SALSA to the observations (Fig. 10), values are biased low compared to AERONET as well as MODIS Aqua and Terra. The latter is likely due to the positive biases found for the MODIS instruments ( Fig. 2) especially also because a positive AOD bias is found against the other two satellites (AATSR and MERGED-565 FMI). This indicates, that SALSA underestimates AOD over most of the land area while overestimating AOD over the oceans.
Exceptions for the underestimation are Australia and North Africa where SALSA exhibit high values for the total AOD. This is likely due to the contribution of dust to the AOD and is also reflected in the coarse mode AOD. Compared to AATSR-SU, the coarse mode AOD of SALSA is significantly overestimated with a normalized bias of +24%, while the AERONET comparison indicates good agreement over land in AOD > 1 µm (NMB=-3%). On the other hand, over regions affected by 570 dust, coarse mode AOD is overestimated in SALSA. For example, AERONET sites north of Africa exhibit simulated values higher than those measured. While the apparent high overestimation against AATSR-SU may be, to a certain degree, due to low biased AATSR-SU data (Fig. 2), these results indicate that possible overestimates in AOD > 1 µm are likely due to ocean regions. Regions with high dust loads also exhibit overestimation of coarse mode AOD. These is in agreement with the findings of Kokkola et al. (2018) who find large positive biases in AOD > 1 µm over the oceans, in addition to dusty regions. This is 575 expected to be due to high simulated relative humidity in ECHAM over the oceans or too high hygroscopicity for SS aerosol.
It is noteworthy that although coarse mode AOD is overestimated over regions where AOD is dominated by sea salt and dust, their emissions are not higher in SALSA (Fig. 3) and it is likely that the simulated size distribution of SALSA is such that SS and DU particles influence radiation effectively.

580
As part of the Copernicus Atmosphere Monitoring Service (CAMS; https://atmosphere.copernicus.eu/), ECMWF runs a version of the IFS model that includes prognostic aerosol and tropospheric chemistry schemes to produce global forecasts of atmospheric composition. The underlying meteorological model is essentially identical to that used for operational mediumrange weather forecasting and is documented at https://www.ecmwf.int/en/forecasts/documentation-and-support, but at a lower resolution of 40 km to offset the cost of the extra schemes. The results presented here are from a "cycling forecast" configu-585 ration, that is, a forecast with free-running aerosols and chemical species (no assimilation of atmospheric composition), with meteorology reinitialised at 00 UTC each day from operational ECMWF analyses.
The aerosol component is described in Rémy et al. (2019) and based on the earlier work of Morcrette et al. (2009). This is an externally-mixed hybrid bin/bulk scheme, consisting of three size bins each for desert dust (up to 20µm dry radius) and sea salt (up to 20µm radius at 80% relative humidity), and bulk tracers for organic matter, black carbon and sulphate aerosol.

590
For organic matter and black carbon, there are separate hydrophobic and hydrophilic tracers, with a fixed ageing timescale for conversion of the former to the latter. There is also an SO 2 precursor tracer driving the sulphate production via a latitudeand temperature-dependent conversion timescale. There is no separate DMS tracer, and no primary sulphate aerosol emission, but all sulphate and precursor emissions are treated as SO 2 (resulting in a seemingly large contribution of SO 2 in Fig. 3). The tropospheric chemistry scheme is described in Flemming et al. (2015), but in the version described here this is not directly 595 coupled to the aerosol scheme.
Compared to the other AP3 models, the total sea salt emissions and burden are very large, as can be see in Figures 3 and 5.
Emissions are three times larger than the ensemble mean, but due to a short lifetime (see Figure 4) the burden is only three times larger. However, the sea salt contribution to AOD remains similar to other models because the large size distribution reduces the extinction per unit mass. These are known issues with the emission scheme in this version of the model (based on The model also has one of the smallest sulphate burdens, which appears to be the result of both relatively low total sulphur emissions and a short lifetime (Fig. 4). Organic aerosol emissions are higher than most models, although the burden and lifetime are similar to other models. This is likely due to the fact that there is no secondary organic precursor scheme, and secondary organic production is included instead as if it were a primary emission.

605
Although correlation coefficients for AOD ( Figure 11) for this model exhibit relatively high values, there is a significant low bias against all the AOD data-sets (satellite and AERONET, Fig. 10). This is likely related to the relatively short lifetimes of many species compared to other models, which can be seen in Figure 4. There is also a low bias against both AERONET and AATSR AE, suggesting that particles are on average too large; this may well be due at least in part to the unusually high sea salt burden in the model noted above.

620
The calculated all-sky AOD is -10% lower compared to globally averaged annual AOD from AERONET (correlation 0.76).
Comparison with satellite AOD shows suggests underestimations between 34%-51%, and the relative differences here mostly reproduce the biases observed between the satellites (Fig. 2). These results indicate that EMEP underestimates AOD more over the oceans than over land. Evaluation results against those observations for different world regions are inconclusive in terms of model bias (inferred from web visualisation of the results, Appendix C). Furthermore, fine AOD is overestimated by 20% 625 compared with AERONET data and slightly (by only 11%) underestimated compared to AATSR-SU, whereas coarse AOD is considerably underestimated (by 68 and 70% respectively). Consistently with that, the AE is somewhat overestimated (by 36% and 44%), indicating a disproportion between the contributions to AOD from the fine and coarse aerosols. This suggests that either the EMEP model calculates too few coarse particles or the applied MECs are too low (which may be the case for dust, Fig. 6). One of the possible reasons for that is that fine sea salt and dust particles are assumed to have diameters 630 smaller than 2.5 µm, so that the extinction due to sea salt and dust aerosols with diameters between 1-2.5 µm contributes to the (overestimated) AOD < 1 µm rather than the (underestimated) AOD > 1 µm .
Aerosol specific ODs (Fig. 7) of NO 3 and OA are somewhat larger than the corresponding ensemble median values. This is in agreement with the relatively large loads for those components (Fig. 5) and may be due to the fact that the model calculates both fine ammonium nitrate and coarse NO 3 on sea salt and dust. Also, the OA burdens include both primary sources as well SO 4 , which is one of the largest (probably due to too effective hygroscopic growth). The latter, however, is compensated by the comparatively low SO 4 burden (SO x emissions from ECLIPSE6b used by the EMEP model are smaller than from CMIP6).
The small discrepancy between Total AOD and the sum of the aerosol specific AODs is because the modelled BC AOD is 640 only due to anthropogenic emissions (and does not include forest fires) and DU AOD is only due to windblown dust (while some fugitive anthropogenic dust is also included in the total AOD).
Absorption coefficient is diagnosed from BC and dust mass concentrations, using mass-absorption coefficients. Compared to the climatological GAW observations (at RH<40%), the 2010 dry (RH=0%) modelled absorption coefficients are biased low (by 40%) and the correlation is 0.66, which is a fair result given the crude simulation approach. The dry scattering coefficient 645 is underestimated by 47% on average (R = 0.74).

GEOS
GEOS is a global Earth system model, containing components for atmospheric circulation and composition, ocean circulation and biogeochemistry, land surface processes, and data assimilation (Rienecker et al. (2008)). The version of GEOS Earth System Model (with a GOCART aerosol module) used for this study is Icarus-3_3_p2. The simulations run at a spatial resolution OA, which is closer to the ensemble mean (Fig. 3). The simulated atmospheric burdens are within 30% of the ensemble median with the exception of dust and nitrate (Fig. 5). The higher dust burden given by GEOS can be explained by its long lifetime (with 9.7 days the longest among the models, Fig. 4). However, the higher nitrate burden cannot simply be explained with its lifetime (Fig. 5). According to the AeroCom Phase III nitrate experiment, the majority of nitrate formed in the atmosphere is associated with atmospheric dust and sea salt in coarse mode (Bian et al. (2017)). A careful budget analysis for nitrate would 665 need more information in its chemistry formation and particle size distribution, which is beyond the scope of this paper.

GFDL-AM4
The Geophysical Fluid Dynamics Laboratory Atmospheric Model version 4 has cubed-sphere topology with 96 × 96 grid boxes per cube face (approximately 100 km grid size) and 33 levels in the vertical, contains an aerosol bulk model that generates mass concentration of aerosol fields (sulphate, carbonaceous aerosols, sea salt and dust) from emissions and a "light" chemistry mechanism designed to support the aerosol model but with prescribed ozone and radicals (Zhao et al., 2018a). The model is 675 driven by time-varying boundary conditions, and natural and anthropogenic forcings developed in support of CMIP6 (Eyring et al., 2016), except for ship emission of SO 2 (BC ship emission is included), which has unintentionally not been included.
The dust is emitted from constant sources with their erodibility expressed as a function of surrounding topography (Ginoux et al., 2001). The sea salt emissions are based on Mårtensson et al. (2003) and Monahan et al. (1986) for fine and coarse mode particles, respectively. Aerosols are externally mixed except for black carbon, which is internally mixed with sulphate. The 680 optical properties of the mixture are calculated by volume weighting of their refractive indices using a Mie code. In the present configuration, the model is run with observed sea surface temperatures (SSTs) and sea-ice distribution (Taylor et al., 2000). In addition, the wind components are nudged, with a 6-hour relaxation time, towards the NCEP-NCAR re-analysis provided on a T62 Gaussian grid with 192 longitude equally spaced and 94 latitude unequally spaced grid points (Kalnay et al., 1996). This resolution is lower than in GFDL-AM4, which may create a low bias of aerosol emission depending on surface winds.

685
In Fig. 3, aerosol emission from GFDL-AM4 are within 25% of the ensemble mean, except for SO 2 and SO 4 , which are the lowest among all models essentially because ship emissions are missing in the simulations. The lower emissions of sulphur compounds does not translate in low atmospheric burden (Fig. 5) as their lifetime is among the highest between the models ( Fig. 4), either because of weak oxidation or deposition. On the other hand, the other aerosols have a shorter lifetime than other models (Fig. 4) while their burdens are well within 25% the AP3 mean values (Fig. 5). The opposite bias between sulphur 690 compounds and the other aerosols suggest an issue with oxidation of SO 2 rather than wet or dry deposition. In Figure 6 the MEC values are within the diversity of the AP3 models except for sea salt which is lower by a third. This may be because of the cap on hygroscopic growth at 97% relative humidity or the emission parameterisation, as the scheme of Mårtensson et al. (2003) generates much less sea salt sub-micron particles than Monahan et al. (1986). An alternative explanation is that dry deposition velocity is too strong. The GFDL-AM4 AODs from individual species (Fig. 7) are within the AP3 model diversity 695 except BC, which has the highest value most likely due to the treatment of its internal mixing with sulphate. This high bias will convert into high bias of fine mode AOD, as it appears in Figure 10 where the positive biases of fine mode AOD compare to AERONET and AATSR-SU are the largest among all models. Other normalized biases are relatively weak compared to other models ( Figure 10). AOD bias is slightly negative against AERONET and the different satellites. The differences in these biases mostly represent the biases found for the different satellites at AERONET stations (Fig. 2). However, it is important to 700 note, that this model version reported all-sky AOD, while most other models report AOD at clear-sky, which would likely shift the biases towards increased underestimation of AOD (e.g. Sect. 4.11, see also AeroCom optics questionnaire (supplementary material)). Overall, optical properties are well correlated with observations with coefficients greater than 0.74 except for the scattering and absorption coefficients provided by the surface in-situ data with values at 0.49 and 0.57, respectively (Fig. 11).

GISS-OMA
GISS-OMA is the short name of the GISS ModelE Earth system model (Kelley et al. (in preparation)), coupled with the One-Moment Aerosol scheme (OMA; Bauer and Tsigaridis (submitted)). In OMA, all aerosols are externally mixed and tracked by their total mass only, except for sea salt and dust where 2 and 5 size-resolved sections are used, respectively. OMA tracks 710 sulphate, nitrate, ammonium, carbonaceous aerosols (black and organic carbon), dust (up to 16 µm) and sea salt (up to 4 µm).
Relevant to this work, a random maximum cloud overlap is calculated in the column, which is then used to define a totally cloudy or totally cloud-free state in radiation, using a pseudo-random number generation. This is described in Hansen et al. (1983). For all-sky AOD calculations 100% relative humidity is used, while for clear-sky we use ambient. This applies to the whole atmospheric column, as dictated by the random maximum cloud overlap calculation. In GISS-OMA there is no 715 calculation from AE. Instead, we calculate it from the AOD calculations in radiation, which are probably underestimating AOD at 870nm by about 10%.
The results from the evaluation of optical properties shown in Figs. 10 and 11 show a comparatively good agreement with the observations in terms of bias and correlation. The simulated CS AOD shows a bias of -26% against AERONET, which is slightly lower than the ensemble median. In comparison with the satellites, biases of -14% and -19% are found against the 720 MERGED-FMI data-set and AATSR-SU. Similar to the other models, and as explained above, the comparison with MODIS AODs indicates larger negative biases (and slightly decreased correlation) as these satellites show the overall highest AODs (Fig. 2). Considering these relative biases established for the satellites at AERONET sites, AE, AOD > 1 µm and AOD < 1 µm show similar results when compared with AERONET and AATSR-SU, with biases of the order of -20 to -40% for all three variables.

725
A possible explanation for these underestimated AODs could be that burdens of SO 4 and sea salt are comparatively low (Fig. 5), which is also reflected by the fact that both AOD < 1 µm and AOD > 1 µm appear to be underestimated, both against AERONET and AATSR-SU. In case of sea salt, however, the comparatively low burden is likely due to low emissions (Fig. 3) and may, to a certain degree, be compensated by a relatively high SS MEC (+44% compared to median, Fig. 6). A comparatively low burden for nitrate (-33%) is compensated by the largest MEC (ca. +166%). The increased dust emissions, together with 730 an increased lifetime yield a comparatively high burden (Fig. 5) and the fact that the corresponding DU OD is close the the median (Fig. 7) may be due to the low dust MEC (6). In case of BC, the low burden (likely arising from short lifetime) is compensated by the highest BC MEC among the models.
Compared to the in-situ measurements, GISS-OMA shows good agreement (NMB=1%) and comparatively high correlation with surface scattering, and fairly good performance also for the surface absorption coefficients (NMB=-24%), with compara-

INCA
The INCA (INteraction with Chemisty and Aerosols) and ORCHIDEE land surface modules has been coupled to LMDZ dynamical core to conform the LMDZORINCA model. It has been run with forced sea-surface temperatures, sea-ice concentrations and with nudged monthly wind-fields from ERA-Interim. The comparisons with the climatological simulations Our values of MEC are close to the ensemble mean. For those species modelled by a single mode (like dust) we expect 760 less spatial variation of MEC compared to other models with several modes. Regarding optical properties, the AE is highly underestimated both against AERONET and AATSR-SU (ca. -65%). This is due to a smaller dynamical response for wavelength in the visible with respect to other models. The total AOD indicates a slight overestimation compared to the multi-model central values, which is likely due to the overestimations of SO 4 and dust contributions to optical depth, which may partially be compensated by the expected lower values of OA optical depths (Fig. 7).

NorESM2
The atmosphere module in NorESM2 (NorESM2-MM, see Seland et al. (in preparation)), ), is an updated version of CAM5.3-Oslo, for which optical properties have been described and validated by Kirkevåg et al. (2018). Seen in conjunction with these studies, the results presented here can be interpreted as follows. The dust burden is the lowest (5.7 Tg) among the AP3 models, and also low compared to the burden in the un-nudged NorESM2-LM simulation (9.9 Tg), and in CAM5.3-Oslo with fSST and nudged meteorology for year 2000 (16.3 Tg). The lifetime of dust is 1.9 days and is about the same in all these simulations. This is consistently also the lowest among the AP3 models. The large drop in burden from CAM5.3-Oslo and the un-nudged NorESM2 is to a large degree a result of tuned dust emissions, while the change between the un-nudged (1870 Tg/yr) and the nudged (1090 Tg/yr) NorESM2 simulations with fSST is consistent with the considerably lower U10 (especially over land) and dust emissions in nudged vs. free meteorology. While NorESM2 sea salt 775 emissions are among the lowest for AP3, the burden is mid-range, and with the highest MEC (4.1 m 2 /g), this model has the highest sea salt AOD values, which is reflected in the positive coarse mode bias against AATSR satellite observations (Fig 10).
The relatively high MEC is likely due to SS particle sizes which are shifted towards the more optically efficient accumulation mode, compared to other AP3 models. Sea salt MEC was even higher in CAM5.3-Oslo (5.0 m 2 /g), but a change in assumed RH (from all-sky to clear-sky) for hygroscopic growth brought about a ca. 19% reduction. The excessive sea salt AOD is implemented for OA (Lund et al. (2018)). OsloCTM3 has a BC MAC value of 12 m 2 /g and BC MEC is among the highest between the models (Fig. 6). The implementation of stronger absorption contributes to the high positive bias (+73%) in surface 800 absorption compared to the surface in-situ observations and in contrast to the other models, which tend to underestimate surface absorption at the in-situ locations (Fig. 10). The burden of nitrate is low, and sulphate high compared to the other models, whereas all other aerosol species in OsloCTM3 are close to model mean values. An evaluation of the burdens and AOD simulated by the OsloCTM3 for year 2010 CEDS emissions against in-situ and remote sensing observations is provided by Lund et al. (2018). The optical properties for aerosols emitted from biomass burning assume internally mixed aerosol and 805 thus, the reported AOD from BC and OA includes only fossil fuel and biofuel emissions (Fig. 7). This results in lower AOD from OA for OsloCTM3 compared to the other models. The combined BC+OA contribution to AOD amounts to 0.0086. Only all-sky (AS) AOD is provided from OsloCTM3 (Tab. A1 for models that provided CS). This is done because a reliable subgrid scale parameterisation for RH is unavailable, in order to avoid the AOD used in the radiative transfer calculations to be biased low or high. Compared with the observations, AOD is slightly underestimated, both at AERONET sites (-6%) and the 810 satellite comparisons suggest slightly higher underestimations. The low bias (ca. -20%) for AE is consistent between ground and satellite retrievals and is also reflected in the low bias for coarse and high bias for fine AOD (Fig. 10). In contrast to surface absorption, the surface scattering is biased low compared to observations, which would result in a strong low bias in single scattering albedo. Correlation with the observations is generally among the higher ones compared to the other models ( Fig.   11).

SPRINTARS
SPRINTARS (Takemura et al. (2005(Takemura et al. ( , 2009), coupled with a coupled atmosphere-ocean general circulation model (MIROC, Tatebe et al. (2019)), is used in this study although there is also a version coupled with a global cloud resolving model, NICAM (e.g., Sato et al. (2016)). The calculated dust and sea salt emissions with nudged wind field by meteorological reanalysis data are smaller than those without nudging since the emission amounts strongly depend on the wind speed near the surface (see also 820 Sect. 4.11), which are proportional to 3rd and 3.41th powers, respectively. The 6-hourly reanalysis data cannot represent the gust of wind. The difference between the case with and without nudging is larger in finer horizontal resolution. SPRINTARS has one of the finest resolutions among the participating models in this study. SPRINTARS estimates the global and annual total emissions of dust and sea salt to be 1390 Tg/yr and 3390 Tg/yr, respectively ( Fig. 3) with the horizontal resolution of T85 (approx. 1.4˚×1.4˚). Both the lifetime of sea salt and dust are short compared to the other models (Fig. 4), and in case of dust 825 this may be attributed to strong wet deposition over the outflow regions. This, combined with the low emissions, explains the low burdens of these natural species (Fig. 5 which is consistent with the high underestimation of the AOD > 1 µm (Fig. 10). On the other hand, the calculated AE by SPRINTARS is underestimated, which would rather suggest an overestimation of particle size. However, for this model, this could be attributed to an inappropriate computation of standard deviations of log-normal size distributions of SO 4 and OA, when calculating optical properties (based on the Mie theory). An internal investigation has 830 confirmed that the diagnosed AE calculated from prognostic mass mixing ratio of each aerosol component is around 1.5 over the industrialized and biomass burning regions, with the appropriate standard deviations of the size distributions. This revision (which is not shown in this article) results in a better AOD performance, with an global annual mean of 0.112, as opposed to 0.072 found in this study (Fig. 7).
Overall, the underestimated dust and sea salt sources result in an underestimation and low correlation in all optical properties 835 that have been investigated in this study (Figs. 10 and 11). Consistently, the largest negative biases are found in the evaluation of the coarse AOD, both for AERONET and AATSR (Fig. 4).
In this study a comprehensive inter-comparison of 14 models from the Phase III AeroCom Control experiment has been performed. The focus was on the assessment of the modelled column integrated aerosol optical properties AOD, AOD < 1 µm 840 , AOD > 1 µm , and AE, as well as, for the first time, surface (dry) scattering and absorption coefficients. The columnar data was compared to ground based observations from AERONET as well as to several space based observations. In addition to the model evaluation, the performance of the satellite products -in the resolution as aggregated and used for this study -was investigated by comparison with AERONET observations. This was done in order to establish potential relative biases when evaluating the models using satellite observations (Fig. 2). From this analysis, AATSR-SU and MERGED-FMI showed slight 845 underestimations of AOD (ca. -5%) and MODIS Aqua and Terra showed overestimations of about 10% and 20%, respectively at AERONET sites. AE from AATSR-SU was found to be biased high by about 15% against AERONET, while AOD > 1 µm was found to be underestimated by about 15%. AOD < 1 µm from AATSR-SU showed good agreement with AERONET. All satellite products showed high correlation against AERONET.
The results of the model evaluation against all ground based observations are summarised in Fig. 12. It shows results of 850 the AeroCom MEDIAN and MEAN (triangles) and corresponding uncertainties estimated from the results of the individual models (plotted as circles). The AE is underestimated by about -9% and shows considerable spread between the models. This suggests that, on average, the simulated particle size is overestimated. This may imply a too short aerosol lifetime or too large fraction of coarse particles present in the models. It may also impact the atmospheric radiation budget due to shifts in the wavelength dependency of aerosol scattering. While the underestimated AE suggests too coarse particles in the models, the 855 analysis of the AOD > 1 µm reveals an underestimation by -40%, with a considerable inter-model spread. The average AOD bias amounts to -20% and shows highest consistency (lowest spread) between the models. The AOD bias primarily appears to arise from the low AOD > 1 µm , since AOD < 1 µm shows a smaller bias (-10%, i.e. smallest underestimation) against the respective observations, with a similar spread as for AOD. Compared to Kinne et al. (2006), our AOD bias indicates a slightly larger underestimation in the more recent model versions (AP3 relative to earlier AeroCom phases). This may partly 860 be attributed to the fact that in this study, 10 out of 14 models reported clear-sky (CS) AODs (see Tab. A1 and AeroCom optics questionnaire (supplementary material)), while the AOD diagnostics used by Kinne et al. (2006) were likely based on more models reporting AOD under all-sky (AS) assumptions. This hypothesis is underpinned by a +20% increase in NMB in NorESM2, when using AS instead of CS (results available here, leftmost simulation: https://aerocom-evaluation.met.no/ overall.php?project=aerocom&exp=hygro#).

865
The recent findings from the trends analysis by Mortier et al. (2020, submitted)  be somewhere between 0% and 40%. Thus, on average, the measurements should show larger scattering due to hygroscopic growth. However, the models overestimate the scattering enhancement factor due to hygroscopic growth, as found by Burgos et al. (2020, submitted) (Fig. 5 therein). From a qualitative perspective, a potential overestimation of the scattering enhancement factor in the models agrees well with our finding that models underestimate (ambient) AOD less than dry scattering (by about 875 a factor of 2).
Altogether it is noteworthy that most models underestimate consistently several of the different extensive aerosol optical properties (AOD, fine and coarse mode AOD, scattering and absorption coefficients), both derived from in-situ and remote sensing sensors. This suggests that aerosol loads might be underestimated in the models for the year 2010. Such underesti-880 mates are partly compensated by different aerosol optical models and, for instance, higher mass exctinction coefficients.
In future studies the biases found in this study should be investigated, for instance, by incorporating different aspects into the analysis, such as model resolution (particularly vertical), profile extinction data (to investigate "where" the mass is located) and column water content (to assess hygroscopic growth). In addition, a comparison with surface mass concentration measurements 885 could provide valuable insights related to the question, whether the models are missing mass or whether assumptions about optical properties are causing the underestimated scattering coefficients and optical depth. Such an analysis would certainly benefit also from a better global coverage of surface measurement sites, since the analysis performed in this study is mostly representative for Europe and the US, where the density of GAW sites is highest (Fig. 1).
Code and data availability. Most of the data analysis was performed using the open source software pyaerocom (version 0.9.0, release upcoming). All additional analysis scripts are stored in a private GitHub repository and can be provided upon request. All data used in this study is stored on servers of the Norwegian Meteorological Institute and can be provided upon request. A3, A4, the former being an analysis of monthly vs. 3hourly AOD data vs. AERONET and the latter being an analysis of hourly vs. monthly using surface in-situ absorption data. Both tests do not indicate that the magnitude of these uncertainties Matsui, H., Hamilton, D. S., and Mahowald, N. M.: Black carbon radiative effects highly sensitive to emitted particle size when resolving mixing-state diversity, Nature Communications, 9, 3446, https://doi.org/10.1038/s41467-018-05635-1, https://doi.org/10.1038/    Table A2. Comparison of statistics (NMB and R) retrieved when co-locating models with satellite data a) in monthly resolution and 5 • × 5 • horizontally with requirement of at least 7 daily values to compute a monthly mean, as done in this study (Low) and b) in daily resolution and in highest available horizontal resolution from both data-sets (High).