Role of climate model dynamics in estimated climate responses to anthropogenic aerosols

Significant discrepancies remain in estimates of climate impacts of anthropogenic aerosols between different general circulation models (GCMs). Here, we demonstrate that eliminating differences in model aerosol or radiative forcing fields results in close agreement in simulated globally averaged temperature and precipitation responses in the studied GCMs. However, it does not erase the differences in regional responses. We carry out experiments of equilibrium climate response to modern day anthropogenic aerosols using an identical representation of anthropogenic aerosol optical properties and aerosol-cloud 5 interactions, MACv2-SP, in two independent climate models (NorESM and ECHAM6). We carry out experiments of equilibrium climate response to modern day anthropogenic aerosols using an identical representation of anthropogenic aerosol optical properties and the first indirect effect of aerosols, MACv2-SP, in two independent climate models (NorESM and ECHAM6). We find consistent global average temperature responses of 0.48(±0.02) K and 0.50(±0.03) K and precipitation responses of 1.69 (±0.04)% and 1.79(±0.05)% in NorESM1 and ECHAM6, respectively, compared to modern-day equilibrium climate with10 out anthropogenic aerosols. However, significant differences remain between the two GCMs regional temperature responses around the Arctic circle and the equator and precipitation responses in the tropics. The scatter in the simulated globally averaged responses is small in magnitude when compared against literature data from modern GCMs using model intrinsic aerosols but same aerosol emissions −(0.5–1.1) K and (1.5–3.1)% for temperature and precipitation, respectively). The Pearson correlation of regional temperature (precipitation) response in these literature model experiments with intrinsic aerosols is 0.79 15 (0.34). The corresponding correlation coefficients for NorESM1 and ECHAM6 runs with identical aerosols are 0.78 (0.41). The lack of improvement in correlation coefficients between models with identical aerosols and models with intrinsic aerosols implies that the spatial distribution of regional climate responses is not improved via homogenizing the aerosol descriptions in the models. Rather, differences in the atmospheric dynamic and high-latitude cloud and snow/sea ice cover responses dominate the differences in regional climate responses. 20 Hence, further improvements in the model aerosol descriptions can be expected to have a limited value in improving our understanding of regional aerosol climate impacts, unless the dynamical cores of the climate models are improved as well. Hence, even if we would have perfect aerosol descriptions inside the global climate models, uncertainty arising from the differences in circulation responses between the models would likely still result in a significant uncertainty in regional climate responses 25


Introduction
Making reliable predictions on future changes in regional climates is crucial for estimating how climate change will impact people and societies (Hawkins et al., 2016), but there are still large uncertainties related to climate change predictions on regional scales (Giorgi and Francisco, 2000;Feser et al., 2011). Anthropogenic aerosol particles can be an important driver for regional climate change due to the near-instantaneous response of local aerosol concentrations to changes in emissions, 5 their direct radiative properties, and their ability to modify cloud microphysical processes. However, reliable implementation of aerosol effects into global climate models has been challenging. Several aerosol processes are still not well-understood , and there exists an enormous scale difference between the microphysical processes and the resolution of global scale models (Carslaw et al., 2013).
Varying descriptions of aerosols and aerosol-cloud interactions cause a wide spread in aerosol radiative forcing and climate 10 impacts between different GCMs (Wilcox et al., 2015). Shindell et al. (2015) compared historical CMIP5 runs with and without anthropogenic forcing from aerosols, ozone, and land use. The forcing showed a very large spatial variation with globally averaged values that ranged between 0.15 Wm −2 and -1.44 Wm −2 (the aerosol contribution being between −0.29 Wm −2 and −1.44 Wm −2 ). The combined changes in aerosol, ozone and land use produced globally averaged transient temperature responses between 0.00 K and −1.33 K over the twentieth century, with the spatial pattern of the temperature response varying 15 significantly between the models. Overall, the inclusion of aerosols in CMIP5 models nevertheless improved the historical temperature trends compared to observations. This applied particularly to models including sophisticated parameterizations for aerosol cloud droplet activation (Ekman, 2014).
Besides reducing the global temperature, anthropogenic aerosols are also known to reduce global precipitation (Ramanathan, 2005) and to significantly modify the Asian monsoon (Bollasina et al., 2011;Salzmann et al., 2014) . Wang (2015) demon- 20 strated that among CMIP5 models the changes in anthropogenic aerosols dominated the total precipitation changes from the pre-industrial era to the present day. Most of this change was caused by the remote impact of aerosols rather than by direct effects on local cloud processes, and cloud optical depth in all but heavily aerosol-loaded regions, such as in the Indian monsoon region. Also for precipitation changes, an improved representation of aerosol-cloud interactions was found to be the key factor in reproducing consistent distributions of past precipitation change. 25 Improvements in model aerosol descriptions have not succeeded to remove the large uncertainty in aerosol climate effects.
After CMIP5, the most representative multi-model results on aerosol climate impacts have been provided by Samset et al. (2018). They compared the equilibrium climate responses for complete removals of model intrinsic anthropogenic aerosols among four state-of-art fully coupled climate models, with aerosol emissions from CMIP5 (Lamarque et al., 2010). In their study, removing the aerosols produced global-mean temperature increases between 0.5 and 1.1 K and precipitation increases 30 between 1.5% and 3.1%. In another recent study, Kasoar et al. (2016) reduced anthropogenic SO 2 emissions from China in three independent climate models. There, identical emission reductions lead to simulated changes in aerosol optical depth and shortwave radiative flux over China that varied by up to a factor of 6 between the models. The three models also exhibited large differences in their global and regional temperature responses. However, it is unclear to which degree the existing spread in aerosol climate impacts among current climate models results from differences in modeled aerosols or from differences in model dynamical responses to aerosols. Only standardized aerosol perturbations across different models can entangle these sources of uncertainties in aerosol climate effects .
Here, we explore how robust the aerosol climate response would be in modern GCMs if the anthropogenic aerosols and their cloud interactions could be modeled exactly. To assess this question we carry out long equilibrium climate simulations 5 with fixed greenhouse gas concentrations and prescribed aerosol fields using the MACv2-SP aerosol description (Stevens et al., 2017) in two modern GCMs, NorESM1 and ECHAM6. The MACv2-SP is partly based on observational data and provides a simple representation of global aerosol optical properties. It also includes a simple empirical fit for aerosol-cloud-albedo effects. These experiments allow us to single out the contribution of climate model dynamics to the intermodel differences in the response to anthropogenic aerosols. We will compare our results against the dataset by Samset et al. (2018) to investigate 10 the robustness of global and regional climate responses in modern climate models using interactive or prescribed aerosols.

Applied climate models and set-up
We carry out modern day equilibrium climate simulations with two independent climate models, ECHAM6.1 and NorESM1.
ECHAM6.1  is the sixth generation of ECHAM general circulation model developed in Max Planck 15 Institute with 47 sigma hybrid vertical levels, with the model top at 0.01 hPa and a horizontal resolution of 1.9 • × 1.9 • . The original ECHAM model branched from an early version of the European Centre for Medium-Range Weather Forecasts (ECMWF) model for climate studies. NorESM1 is the Norwegian Earth system model with 26 sigma hybrid vertical levels (the highest model level at 2.9 hPa) and 1.9 • × 2.5 • horizontal resolution (Bentsen et al., 2013;Kirkevåg et al., 2013). NorESM1 is based on the CCSM4 model operated at NCAR. Thus, the two models applied in our study do not share 20 a common development history. Here, both models were run with identical fixed modern-day greenhouse gas concentrations.
Oceans were simulated with the intrinsic slab ocean configurations of the models. This idealization removes the effect of natural and aerosol induced variations in ocean circulation and restricts our study to the response in atmospheric circulation, oceanic heat exchange, and sea ice dynamics only.

Standardized aerosol representation 25
MACv2-SP is a standardized representation of anthropogenic aerosol radiative effects, accounting for the direct radiative as well as the cloud albedo effect of anthropogenic aerosol (Stevens et al., 2017). However, the cloud lifetime effect is not taken into account. Anthropogenic aerosols are represented by nine 3D time-varying Gaussian plumes defining the aerosol optical depth, single scattering albedo and asymmetry parameter. Four of these plumes represent aerosol emissions from biomass burning and the other five are associated with industrial emissions. The industrial plumes originate from Europe, North Amer- 30 ica, East Asia, South Asia and Australia and the biomass plumes from North Africa, South America, South central Africa and Maritime Continent ( Fig. 1 and Table 1 in Stevens et al. (2017)). The plumes differ in their annual cycle and optical properties, and have a realistic horizontal and vertical structure that represents the transports of aerosols with prevailing winds.
The aerosol properties are based on aerosol climatology by , derived from ground-based sun-photometer networks (AERONET) merged onto background maps from global models participating in the Aerosol Model Intercomparison Project (AeroCom). The cloud albedo effect in MACv2-SP is parameterized by modifying the model-intrinsic natural cloud 5 droplet number concentration (CDNC) via a relation based on the total change in AOD. This parametrization is derived from MODIS data. MACv2-SP allows for a simple and observation-based representation of the changes in aerosol optical properties and cloud droplet number concentrations due to anthropogenic aerosols.

Model experiments and analysis
Sets of 100-year equilibrium climate runs for the year 2005 were conducted with both models, with the last 60 years used for 10 the analysis: (1) The control run (CTRL) included only natural aerosols, and was constructed from two runs for each model with small initial condition perturbations.; (2) The MACSP run included both natural and anthropogenic aerosols for the year 2005. In addition, for NorESM1, a third run EF was carried out. This run employed the time-varying 3D aerosol radiative forcing field computed from the ECHAM6's MACSP run. A more detailed description of the implementation is given in the appendix A. A summary of the runs is given in Table 1 Based on these runs, the following three experiments were defined to estimate the effect of anthropogenic aerosols: ECHAM6-MACSP (the difference between the MACSP and CTRL runs for ECHAM6), NorESM1-MACSP (MACSP minus CTRL for NorESM1), and NorESM1-EF (EF minus CTRL for NorESM1). The analysis of the results was based on monthly-mean values of data, and focused on the effects of MACv2-SP aerosols on near-surface temperature, precipitation, surface albedo and total cloud cover. The statistical significance of the responses was evaluated using Student's t-test with an auto-correlation 20 correction according to Zwiers and von Storch (1995). The response uncertainties in global mean values were estimated by the standard error of means taking into account lag-1 auto-correlation according to Zwiers and von Storch (1995). The instantaneous radiative forcing was calculated using double radiation calls with and without MACv2-SP aerosols during the slab ocean runs.

Aerosol radiative forcing
The total radiative forcing from the MACv2-SP anthropogenic aerosol description was found to be very similar for the two models (see Fig. 1). For ECHAM6, the MACv2-SP aerosol scheme produces a −0.64 Wm −2 global average total shortwave radiative forcing at the top of the atmosphere (TOA) for the year 2005, with −0.35 Wm −2 arising from direct and −0.29 Wm −2 5 from indirect radiative forcing. For NorESM1, the same aerosol scheme produces a slightly higher global radiative forcing of −0.69 Wm −2 at TOA, with −0.36 Wm −2 direct and −0.33 Wm −2 indirect radiative forcing. Figure C1 shows the maps of aerosol direct and indirect radiative forcings in the two models as calculated here. The largest difference in the total forcing was found over South East Asia up to 3.20 Wm −2 , where also the largest absolute forcing was found in both models. Fiedler et al. (2019) have calculated both the MACv2-SP effective radiative forcing as well as the instantaneous radiative forcing using 10 double radiation calls with fixed sea surface temperature for the two climate models used here. They showed that with fixed sea surface temperature the MACv2-SP aerosols produce an instantaneous radiative forcing of −0.60 Wm −2 and −0.68 Wm −2 in ECHAM6 and NorESM1, respectively. The correlation coefficient for the regional total forcing in the two models due to MACv2-SP is 0.97, and 0.90 for direct and 0.89 indirect forcings only. Thus, the regional differences in direct and indirect forcing somewhat compensate for each other. 15 We used a Gaussian process emulation technique (O'Hagan, 2006) to assess the causes for the regional differences in aerosol radiative forcing (see Appendix B for details). Our analysis showed that differences in cloud cover and surface albedo can explain nearly all of the variance in the difference in total instantaneous shortwave radiative forcing between ECHAM6 and NorESM1. Our sensitivity analysis reveals that in the regions with the largest radiative forcing (close to the center of the MACv2-SP plumes) the difference in model cloud cover dominates the difference in model shortwave forcing. Vice versa, in 20 regions with low aerosol radiative forcing the differences in surface albedo dominates the differences in forcing. We note that these results apply only to fixed aerosol fields produced by the MACv2-SP representation. Previous research shows that the aerosol radiative forcing can also depend on the meteorology (surface winds and precipitation) produced by the models, partly driven by the natural variability of the climate system (Fiedler et al., 2019).

Temperature
We obtain a robust global temperature response of −0.5 K due to the inclusion of MACv2-SP anthropogenic aerosols in both models. For ECHAM6-MACSP experiment the global mean near-surface temperature response is −0.50(±0.03) K, with regional values ranging from +0.30 K to −2.10 K. For NorESM1-MACSP experiment the global mean value is −0.48(±0.02) K and the regional values range between +0.39 K and −2.28 K.  MACSP and CTRL runs in both models. Largest cooling in ECHAM6 is located in Southeast Asia whereas in NorESM1 the largest cooling is found near the Russian Far East and north of Japan, with a second minimum over the Greenland sea.
Small positive temperature responses are found close to the Antarctic coast in both models, but these temperature responses are not statistically significant and are related to natural variations in sea ice. We found some significant correlation between the regional aerosol forcing and regional temperature response in both models: 0.39 in ECHAM6 and 0.29 in NorESM1, re-5 spectively. Among the CMIP5 model considered in Shindell et al. (2015), the multimodel mean regional correlation between the combined effective aerosol and ozone forcing and temperature response was slightly negative (−0.1), varying between negative values in some models and positive values among others. Figure 3a shows the zonal-mean temperature responses obtained from ECHAM6-MACSP and NorESM1-MACSP experiments. These experiments show a moderate cooling due to anthropogenic aerosols across the Southern hemisphere latitudes, 10 whereas in the Northern hemisphere the cooling response clearly strengthens towards the high-latitudes. The modeled regional temperature responses between ECHAM6 and NorESM1 simulations disagree the most in mid-and high-latitude regions as seen in Figure 2c. In high-latitude regions temperature differences are associated with surface albedo responses (snow/sea ice) between the models (see Figure A2). Changes in surface albedo are known to amplify changes in Arctic temperatures (albedo feedback). Hence, differences in snow and sea ice responses may partly explain the difference in temperature responses in the high-latitudes. This feedback, together with ocean circulation feedback, also dominates at high-latitudes the regional dif-5 ferences in temperature responses to homogeneous greenhouse gas forcing among different climate models (Shindell et al., 2015).

Precipitation
The  2). The regional changes of the precipitation patterns are shown in Figure 4. The spatial correlation between the precipitation responses in full ECHAM6-MACSP and NorESM1-MACSP experiments is 0.47, which is much lower than the corresponding correlation for temperature. In addition, while the temperature responses are negative almost globally, both positive and negative responses occur for precipitation, with relatively sharp edges between regions with different signs of changes. While similar large-scale features of precipitation changes can be seen in both models, their dislocation leads to a weaker regional 5 correlation than for the temperature response. In both models, the relative changes in the convective precipitation are larger than the relative changes in large-scale precipitation. Also consistently across the two models, the seasonal response in the total precipitation is similar, with the largest changes in June-July-August (see Table A1). Both models consistently show an overall drying of the Northern Hemisphere, with some statistically significant regional increase in precipitation over the Northwest Africa. itation in the tropics are also related to changes in vertical motion in the same region (see Fig. C4). This is suggestive of a southward shift of the Intertropical convergence zone (ITCZ) associated with a change in hemispheric temperature gradient There appear to be several causes for the differences in the precipitation response between the two models. For instance, there is a relationship between the difference of the regional precipitation response and the difference in vertical velocity response (correlation coefficient 0.44 between Figure 4c and Figure C4c). However, it cannot be concluded that change in precipitation is caused by the change in vertical velocity. Probably, both the changes in vertical velocity and precipitation are related to changes in circulation. Also the difference in the initial equilibrium state of precipitation patterns correlates weakly with the 5 difference in the precipitation response (correlation coefficient 0.23). Furthermore, differences in the cloud cover responses (see Figure C3) are also related to differences in precipitation responses (correlation coefficient 0.32). The vertical velocity correlates also with the total cloud cover response (correlation coefficient 0.41)

Comparison between the NorESM1-MACSP and NorESM1-EF experiments
We now briefly discuss the differences between the NorESM1-MACSP and NorESM1-EF experiments. As noted in Sect. 2.3, 10 the difference between these experiments is that in NorESM1-MACSP, the radiative forcing due to the MACv2-SP aerosols is computed using NorESM1's own meteorology and own radiation scheme, while in NorESM1-EF, forcings from ECHAM6's MACSP run are applied. The forcing results are shown in Figure 1d. The minor differences seen in Fig. 1d  Also, the zonal-mean and regional temperature and precipitation responses in NorESM1-MACSP and NorESM1-EF are very similar (Figs. 2d, 3, and 4d). The spatial correlation in response between full NorESM1-MACSP and NorESM1-EF experiments is as high as 0.97 for temperature and 0.95 for precipitation, which are much higher than the correlations between NorESM1-MACSP and ECHAM6-MACSP responses (0.81 and 0.47). Indeed, with the exception of the global-mean precipi-20 tation response, for which the ECHAM6-MACSP value (-1.79 ±0.05%) falls between NorESM1-MACSP and NorESM1-EF, the responses in the two NorESM1 experiments are closer to each other than the ECHAM6-MACSP response. Therefore, it can be concluded that the differences in the effects of MACv2-SP aerosols between ECHAM6 and NorESM1 are mainly related to differences in the model circulation responses, not to the differences in the aerosol forcing fields. 25 Finally, we compare the obtained equilibrium temperature and precipitation responses with prescribed MACv2-SP aerosols in ECHAM6 and NorESM1 against those equilibrium climate responses from four fully coupled climate models (CESM1, GISS, HadGEMS2, and NorESM1) with intrinsic aerosol schemes but the same aerosol emissions, reported by Samset et al. (2018).

Comparison to models with interactive aerosols
In the four models considered by Samset et al. (2018), the global average temperature responses were −(0.5 K, 0.5 K, 1.1 K and 0.6) K, and precipitation responses −(1.5%, 1.8%, 2.6% and 3.1%), respectively. We obtain similar temperature responses  and without anthropogenic aerosols both for our and Samset et al. (2018) datasets. Note that these coefficients do not depend on the magnitude of the average responses in the models, but only on the relative regional distributions of the responses. Perhaps surprisingly, the average correlation coefficient for regional temperature response between interactive aerosol models (i.e., the Samset et al. (2018) models), 0.79, is almost identical to the correlation between our prescribed aerosol models (0.78). Also, the average correlation coefficient between experiments using interactive aerosols and a fully coupled ocean model (Samset the fully coupled interactive aerosol models only. The similar regional correlation between different experiments is remarkable considering large differences in the aerosol descriptions between the different models. It appears that the differences in aerosol descriptions do not dominate the differences in regional temperature response. The average correlation coefficient for regional precipitation changes within Samset et al. (2018) models with intrinsic aerosol descriptions is 0.34, while it is 0.41 within our models with prescribed aerosols. The average correlation coefficient for The correlation coefficient between NorESM1 experiments using different aerosol descriptions and ocean models is now only 0.33/0.38. Thus, differences in aerosol descriptions, ocean models and atmospheric responses all contribute to differences in regional precipitation responses. The correlation coefficients for precipitation responses are, however, more uncertain than 10 those for temperature responses, due to a stronger impact of natural variability.
Even long equilibrium climate runs cannot fully eliminate the natural climate variability on a regional level. With our full dataset (60 of MACSP runs+120 years of CTRL run) we obtain a spatial correlation of 0.47 between NorESM1-MACSP and ECHAM6-MACSP precipitation responses, a slight improvement over the correlation coefficient of 0.41(±0.02) for 50+50 year datasets. The spatial correlation for temperature improves from 0.78(±0.02) to 0.81. The fully coupled ocean models in 15 the Samset et al. (2018) dataset also feature long-term internal variability in the ocean states that adds to the level of natural variation with respect to our models with simpler slab ocean representations used in this paper. Therefore, we would expect the Samset et al. data to include more noise than our results with slab ocean configurations. Furthermore, it is important to note that differences in the ocean descriptions are known to have a large impact in the regional climate responses between different models ( (Deser et al., 2016;Kay et al., 2016)). Overall, we would expect that due to these differences the climate 20 signals obtained from fully coupled models would intrinsically correlate less well with each other than those from models with slab ocean configurations. Somewhat surprisingly, this turns out not to be the case.
The dependence of the calculation of time-averaged correlation coefficients on the simulation length for our data is shown in Fig. 5. There, the blue and red shaded regions represent the level of expected variation in the regional correlation coefficients between two climate models obtained from equilibrium model experiments with and without anthropogenic aerosols. 25 We obtained a correlation coefficient of 0.78 with a standard deviation of ±0.02 for temperature response and 0.41(±0.02) for precipitation after 50 years of simulation, these periods being representative for the Samset experiments but neglecting the impact of long-term ocean variations. The corresponding correlation coefficients for full model runs (60+120 years of simulation) are 0.47 for precipitation and 0.81 for temperature.

30
We have here provided results on the equilibrium climate response of modern day anthropogenic aerosols using two different climate models, ECHAM6 and NorESM1, with the MACv2-SP (Stevens et al., 2017) anthropogenic aerosol representations The results were obtained both using the same representations of aerosol optical properties and cloud-albedo effect and for identical instantaneous aerosol radiative forcing fields in the models.
The MACv2-SP aerosols produced a very similar total instantaneous anthropogenic aerosol radiative forcing in the two models (−0.64 Wm −2 in ECHAM6-MACSP and −0.69 Wm −2 in NorESM1-MACSP experiments). We found that there are differences up to 3.2 Wm −2 in the instantaneous regional aerosol forcing between the models when using the same aerosol MACSP experiments, respectively. The largest disagreements in regional temperature response were found at high-latitude regions associated with largest differences in surface albedo feedback (snow/sea ice), while the largest differences in regional 10 precipitation response were located mainly in the tropics. These key regional differences remained even when using exactly the same aerosol radiative forcing fields in both models. Several previous studies have discussed that the main driver for ITCZ shift is the northern hemisphere cooling due to anthropogenic aerosols (Broccoli et al., 2006;Hwang et al., 2013;Wang, 2015). Chiang and Bitz (2005) showed with Community Climate Model version 3 a connection between ITCZ shift and added Arctic ice cover. Based on these previous studies, it seems plausible that different responses in Arctic sea ice and snow cover in 15 ECHAM6-MACSP and in the two NorESM1 experiments result in different high-latitude temperature responses, which in turn are reflected as differences in the ITCZ shift that drives the precipitation change at low latitudes. However, it should be noted that the ITCZ shift is also sensitive to the type of ocean model used, and slab ocean models tend to exaggerate the change in ITCZ (Kay et al., 2016).
We compared our results using uniform aerosol representations to a set of four current climate models using their intrinsic aerosol representations but the same aerosol emissions, reported by Samset et al. (2018). Among the Samset et al. (2018) models the global responses to additions of anthropogenic aerosol varied between −0.5 K and −1.1 K for temperature and between −1.5% and −3.1% for precipitation. However, the correlation coefficients for regional distributions of climate responses, averaged over equal run length, were nearly as good among our experiments with prescribed aerosols and slab ocean representation The lack of improvement in the correlation coefficients suggests that differences in aerosol descriptions are not the only cause of regional differences in climate signals between the models. Rather, the differences in model circulation responses appear to dominate the differences in regional climate responses. Figure C5 shows the average 850 hPa wind responses for  (2014) argued that the differences in circulation responses cause variation in the 15 regional temperature and precipitation responses in future climate scenarios. Li et al. (2018a) showed that model consensus for circulation response is low even for atmosphere-only models forced with same time-varying SST and sea ice, anthropogenic greenhouse gases, ozone, land use, land cover, and aerosols. Both in Shepherd (2014) and Li et al. (2018a) data, the NH wintertime circulation response over the North Atlantic disagrees significantly between models. Also for ECHAM6-MACSP and NorESM1-MACSP the circulation response over the North Atlantic show differences in magnitude and pattern. Differences 20 are also seen over the North Pacific region. Combined with the difference in the sea ice and surface albedo change in the North Pacific, these circulation changes can drive the temperature response differences in the region.
Our results imply that in current global climate models the regional aerosol climate impacts cannot be better constrained by further improving aerosol descriptions alone. More extensive model comparisons are needed to explain the model discrepancies in response to aerosol forcing. Improvements on the dynamical cores, physical parameterizations and ocean models are needed 25 to narrow down model uncertainties in the regional aerosol climate responses.
Data availability. Data and scripts used for data analysis can be obtained by contacting the corresponding author by the monthly-mean incoming solar radiation at model top. Third, during the NorESM1-EF run, these normalized forcings were multiplied by the TOA incoming solar radiation at each radiation time step, and they were added to the radiative fluxes and heating rates computed without MACv2-SP aerosols.
This treatment ensures that the diurnal cycle of the aerosol forcing is approximately correct; in particular there is no aerosol forcing during the night. However, the computed forcing is independent of the clouds simulated by NorESM1. Thus, while the 5 aerosol radiative forcing is computed correctly in a monthly-mean sense, its sub-monthly correlation with clouds is ignored. In principle, this could impact the differences between NorESM1-EF and ECHAM6-MACSP. The impact is, however, most likely small. If neglecting the sub-monthly correlation between clouds and aerosol forcing were to have a substantial impact on the climate response to MACv2-SP aerosols, this should also show up in the differences between NorESM1-EF and NorESM1- MACSP. Yet the differences between NorESM1-EF and NorESM1-MACSP are very small (Tables 2 and A1), in fact much 10 smaller than the corresponding differences between ECHAM6-MACSP and either NorESM1-EF or NorESM1-MACSP. This strongly suggests that the differences between NorESM1-EF and ECHAM6-MACSP are primarily caused by the use of a different climate model rather than by the subtle differences in radiative forcing.

Appendix B: Sensitivity analysis of model aerosol forcing
We used a Gaussian process emulation technique (O'Hagan (2006)) to evaluate the regional differences in aerosol radiative 15 forcing. First, we simply assume that the forcing difference depends only on the differences in model output values, and not on the actual values themselves. Second, we selected the differences in modeled output (total cloud cover, surface albedo, precipitation, surface temperature, surface wind u-component) as trial sets for these values. These can be described via a relation Y = η(X), where X = [∆α, ∆β, ..., ξ], where α and β are total cloud cover and surface albedo, ξ is pure noise (Gaussian) variable. Next the function Y = η(X) is inferred using a Gaussian Process prior emulator for a part of the yearly 20 averaged radiative forcing data (in our case, 40 years). Each variable is assigned with a sensitivity index, which describes the relative sensitivity of Y to that variable. The sensitivity analysis of the estimated Y function was done by using Extended Fourier Amplitude Sensitivity Test (FAST) (Saltelli et al. (1999)). As an end result, FAST assesses the contributions of each emulator input variable (components of X = (X i )) to the variance in emulator output variable (Y ), where it's assumed the input variables X i have an independent and identical distribution uniform prior. The inferred function Y is finally validated by 25 comparing the emulated forcing field against validation data separate from the training data (here, 20 yearly averaged forcing fields from the model experiments).  The green dots presents the area where anthropogenic aerosols do not have a statistically significant impact at the level p < 0.05 (in panel c), or where the difference between the models is not statistically significant (in panels (a) and (b)). The units are in m/s. NorESM1-MACSP -2.27 ± 0.14 -1.93 ± 0.13 -1.71 ± 0.11 -1.82 ± 0.09 -1.93 ± 0.08 NorESM1-EF -2.28 ± 0.11 -2.08 ± 0.09 -1.83 ± 0.08 -2.12 ± 0.09 -2.08 ± 0.06