The role of anthropogenic aerosols in the anomalous cooling from 1960 to 1990 in the CMIP6 Earth System Models

. The Earth system models (ESMs) that participated in the sixth Coupled Model Intercomparison Project (CMIP6) tend to simulate excessive cooling in surface air temperature (TAS) between 1960 and 1990. The anomalous cooling is pronounced over the Northern Hemisphere (NH) midlatitudes, coinciding with the rapid growth of anthropogenic sulfur dioxide (SO 2 ) emissions, the primary precursor of atmospheric sulfate aerosols. Structural uncertainties between ESMs have a larger impact on the anomalous cooling than internal variability. Historical simulations with and without anthropogenic aerosol emissions indicate that the anomalous cooling in the ESMs is attributed to the higher aerosol burden in these models. The aerosol forcing sensitivity, estimated as the outgoing shortwave radiation (OSR) response to aerosol concentration changes, cannot well explain the diversity of pothole cooling (PHC) biases in the ESMs. The relative contributions to aerosol forcing sensitivity from aerosol–radiation interactions (ARIs) and aerosol–cloud interactions (ACIs) can be estimated from CMIP6 simulations. We show that even when the aerosol forcing sensitivity is similar between ESMs, the relative contributions of ARI and ACI may be substantially different. The ACI accounts for between 64 % and 87 % of the aerosol forcing sensitivity in the models and is the main source of the aerosol forcing sensitivity differences between the ESMs. The ACI can be further decomposed into a cloud-amount term (which depends linearly on cloud fraction) and a cloud-albedo term (which is independent of cloud fraction, to the ﬁrst order), with the cloud-amount term accounting for most of the inter-model differences


Introduction
Surface air temperature (TAS) variation is an essential indicator of climate change, and reproducing the evolution of historical TAS is a crucial criterion for model evaluation.However, the historical TAS anomaly simulated by the models in the sixth Coupled Model Intercomparison Project (CMIP6) is on average colder than that observed in the mid-20th century, whereas the CMIP5 models tracked the instrumental TAS variation quite well (Flynn and Mauritsen, 2020).This is surprising because the transient climate response in CMIP6 models is generally higher than in CMIP5 models (e.g., Flynn and Mauritsen, 2020;Meehl et al., 2020).
As a result of anthropogenic emissions, atmospheric aerosol concentrations increased along with rising greenhouse gases, but with greater decadal variability.Aerosols are generally not evenly distributed around the planet as greenhouse gases, and they have relatively short lifetimes of the order of a week.Aerosols increased rapidly in the mid-20th century, predominantly due to US and European emissions.The rate of change of global aerosol emissions slowed down in the late 20th century (Hoesly et al., 2018), and the trend of global emission has been negative since the mid-2000s (Klimont et al., 2013).There has also been a shift in emission source regions.European and US emissions have declined following the introduction of clean air legislation since the 1980s, while Asian emissions have risen due to economic development.East Asian emissions clearly increased from 2000 to 2005, followed by a decrease with large uncertainties (Aas et al., 2019).The decade-long emission reduction since 2006 over East China is not well represented by the CMIP6 emission (Wang et al., 2021).
Although greenhouse warming was concluded to be the dominant forcing for long-term changes (e.g., Weart, 2008;Bindoff et al., 2013), multidecadal variability in TAS and the reduced rate of warming in the mid-20th century in particular have been attributed to aerosol forcing (e.g., Wilcox et al., 2013).Ramanathan and Feng (2009) noted that the aerosol cooling effect might have masked as much as 47 % of the global warming by greenhouse gases in the year 2005, with an uncertainty range of 20 %-80 %.The aerosol cooling effect is mainly attributed to the ability of sulfate particles to reflect incoming solar radiation and modify the microphysical properties of clouds (e.g., Charlson et al., 1990;Mitchell et al., 1995;Lohmann and Feichter, 2005).The increase in anthropogenic aerosols was also responsible for weakening the hydrological cycle between the 1950s and the 1980s (Wu et al., 2013).
Previous work has suggested that the anomalous mid-20th century cooling in the CMIP6 models is the result of excessive aerosol forcing.Flynn and Mauritsen (2020) suggested that aerosol cooling is too strong in many CMIP6 models because there is no apparent relationship between the warming trends simulated by models and their transient climate responses (TCRs) before the 1970s.Dittus et al. (2020) found that historical simulations can better capture the observed historical record by reducing the aerosol emissions in HadGEM3-GC3.1,demonstrating an overly strong aerosol cooling effect.In this study we characterize the mid-20th century excessive cooling in CMIP6 Earth system models (ESMs).In order to quantify the role of aerosol processes in this anomalous cooling, historical experiments with and without anthropogenic aerosol emissions are employed.The remainder of the paper is organized as follows.Section 2 introduces the models, data, and a quantitative method to separate the aerosol forcing components.The major features of anomalous cooling in CMIP6 ESMs are examined in Sect.3. Section 4 investigates the possible reasons for the anomalous cooling.The relative importance of aerosol-radiation interactions and aerosol-cloud interactions in each ESM is quantified and discussed in Sect. 5. Conclusion is given in Sect.6.

Data
The CMIP6 historical experiment and hist-piAer experiment are employed.The historical experiment is forced by time-evolving, externally imposed natural and anthropogenic forcings, such as solar variability, volcanic aerosols, greenhouse gases, and aerosol emissions (Eyring et al., 2016).The hist-piAer experiment is designed by the CMIP6endorsed Aerosol Chemistry Model Intercomparison Project (AerChemMIP; Collins et al., 2017).It is run in parallel with the historical experiment but fixes aerosol and aerosol precursor emissions to pre-industrial conditions.Therefore, the differences between these two experiments are attributable to anthropogenic aerosol emissions.The design of the hist-piAer simulation means that it can also capture any nonlinearities resulting from greenhouse-gas-driven changes in clouds.This is in contrast to the hist-aer simulations available from the Detection and Attribution Model Intercomparison Project (DAMIP; Gillett et al., 2016), which resembles the historical simulations but is only forced by transient changes in aerosol.
The monthly outputs from historical and hist-piAer simulations for ESMs are used, including TAS, all-sky outgoing shortwave radiation at the top of atmosphere (OSR), OSR assuming clear sky (OSRclr), mass mixing ratio of sulfate aerosol in the atmosphere (mmrso4), total cloud amount (CLT), and cloud-top effective droplet radius (r eff ).These Atmos.Chem. Phys., 21, 18609-18627, 2021 https://doi.org/10.5194/acp-21-18609-2021variables are summarized in Table 2.The corresponding lower-complexity models have conducted the historical but not the hist-piAer simulations, and only the monthly TAS output from the historical simulations is used.Therefore, we focus on the ESMs when identifying the main aerosol processes contributing to the anomalous cooling.
The verification data used in this study is HadCRUT5, the monthly 5 • latitude by 5 • longitude gridded surface temperature (Morice et al., 2021), a blend of the Met Office Hadley Centre SST data set HadSST4 (Kennedy et al., 2019) and the land surface air temperature CRUTEM5 (Osborn et al., 2021) (Wu et al., 2019a).
The aerosol cooling is dominated by the contribution of sulfate aerosol as estimated by models and observations (Myhre et al., 2013;Smith et al., 2020).We use the evolution of sulfate loading (loadSO4) through the historic simulation as a proxy for total aerosol concentration changes to link estimates of the impact of aerosol forcing.Whilst the overall impact of aerosol forcing will also depend on other aerosol species, we adopt this approach because the sulfates dominate estimates of aerosol forcing during this period and other aerosols species can be assumed (as a first-order approximation) to have covaried with the SO 2 emissions during this period as presented by the Community Emissions Data System (CEDS) inventory adopted by CMIP6 models (Hoesly et al., 2018).As such when we present estimates of the aerosol impact/loadSO4 we are presenting the impact of all aerosol species (including absorbing aerosols such as black carbon) as they covary with the sulfate concentrations during the historic period.The motivation for presenting it in this way is we can separate differences in ESM responses to changes in aerosol amount from the differences in aerosol amount (represented by loadSO4) simulated by the ESMs.
We can estimate the impact of anthropogenic aerosol by using the difference in OSR between the historical and hist-piAer simulations, OSR.OSR of course involves any differences in natural variability and planetary albedo between the two simulations, including clear-sky albedo changes and any adjustments in the microphysical or macroscopic properties of clouds.The sensitivity of the OSR response to aerosol changes, i.e., the aerosol forcing sensitivity, can be measured by the linear fit slope between the annual mean globally averaged OSR differences and loadSO4 differences between the historical and hist-piAer simulations: aerosol forcing sensitivity = OSR/ loadSO4. (1) In this study, we diagnose the OSR differences from historical simulations that also capture the temperature response.As such the OSR differences do not represent a measure of only the aerosol forcing impact but combine OSR differences arising from both the aerosol forcing and the temperature response to this forcing, which we refer to in this paper as the aerosol forcing sensitivity.It presents a measure of the importance of aerosol changes in simulated temperature changes that can be easily calculated for existing transient simulations.The aerosol forcing sensitivity is different from the commonly used aerosol effective radiative forcing (ERFaer), which is the change in net TOA downward radiative flux after allowing adjustments in the atmosphere, but with sea surface temperatures and sea ice cover fixed at climatological values.The ERFaer for each ESM except MPI-ESM-1-2-HAM is listed in Table 3.The ERFaer is not correlated with the aerosol forcing sensitivity.
The aerosol forcing sensitivity can be further partitioned into a contribution from aerosol-radiation interactions (ARIs) and aerosol-cloud interactions (ACIs).ARI and ACI can be readily estimated from the CMIP6 output because annual mean cloud amount (CLT), OSR, and the OSR assuming only clear sky (OSRclr) are available for all the CMIP6 ESMs.For each model, the clear-sky part OSR, OSRclr_p, can be estimated as (1 − CLT/100.)• OSRclr.The aerosol forcing sensitivity in the clear-sky part can therefore be estimated as where CLT_hist and OSRclr_hist are the mean CLT and OS-Rclr in the historical experiment.The aerosol forcing sensitivity in the cloudy part are relative to the cloud-amount response to aerosol loading and cloud radiative effect changes and can be estimated as (3) Therefore, the aerosol forcing sensitivity can be decomposed as OSR/ loadSO4 aerosol forcing sensitivity where M, N , and A are empirically determined parameters.to loadSO4 and therefore measures the strength of the aerosol-radiation interactions in each model.The first term on the right-hand side of Eq. ( 4), (1 − CLT_hist/100.)• M, can therefore be identified with ARI.The parameter A is the slope of a linear fit of OSRcld to CLT and therefore measures the correlation of the shortwave radiation reflected by clouds with changes in cloud amount.That is, the parameter A represents the baseline cloud albedo which is sensitive to the cloud parameterizations via cloud droplet number concentration (CDNC), cloud-droplet effective radius, and other factors.The parameter N is the slope of a linear fit of CLT to loadSO4 and therefore measures the sensitivity of cloud amount to aerosols.Note that changes in cloud amount by definition also affect the fraction of clear sky; hence, increases in OSRcld due to increases in CLT (i.e., A • N) can be partly offset by changes in area of clear sky containing aerosols (OSRclr_hist/100.•N).The second term on the right-hand side of Eq. ( 4), (A−OSRclr_hist/100.)•N,can therefore contribute to the ACI.Specifically, it is the part of ACI that is linearly proportional to changes to cloud fraction, which we will refer to in this paper as the cloud-amount term.It is therefore sensitive to any aerosol-induced cloud fraction changes (Lohmann and Feichter, 2005), including any slow adjustments in clouds due to feedbacks within the Earth system.In addition to depending on CLT, ACI is also influenced by any changes in cloud albedo that might occur independently of cloud-amount changes.Such adjustments would include increases in cloud droplet number concentration and increases in simulated cloud-droplet effective radius without accompanying changes in cloud cover.Changes purely in the brightness of clouds, without changes in macroscopic properties of clouds, are difficult to identify from the CMIP6 output because all the bulk properties of clouds co-vary over the course of the projections.However, subtracting ARI and the cloud-amount term from the aerosol forcing sensitivity gives a residual that is, by definition, linearly independent of cloud fraction differences (since by construction these have been regressed out).This residual can then be interpreted as due to differences in the albedo of clouds between the historical and hist-piAer and will be called the "cloud-albedo term".Note that this method of calculation implies that purely albedo effects cannot be distinguished from general residual terms that result from the linear approximation made.
Decomposition of the ARI, the cloud-amount term, and cloud-albedo term of ACI is detailed further in the Appendix.The aerosol-cloud feedbacks are mainly in the ACI term which includes cloud spatial extent (amount), cloud albedo on radiative fluxes, and cloud particle swelling by humidification (Christensen et al., 2017;Neubauer et al., 2017).There is also a (smaller) effect of feedback on the ARI term that is also affected by cloud-amount changes insofar as increased/decreased cloud cover can obscure/reveal clear-sky radiative fluxes.We acknowledge that the linear approximation in our method does not explicitly account for the absorption above clouds or the adjustments to aerosol-radiation interactions (e.g., Carslaw et al., 2013) that are known to be locally important.Our formulation explicitly assumes that https://doi.org/10.5194/acp-21-18609-2021Atmos.Chem.Phys., 21, 18609-18627, 2021 there is a broadly linear relationship between (i) loadSO4 and emissions and (ii) aerosol radiation with loadSO4 (and nonlinearity due to cloud albedo or amount or any interaction is small at global scale as suggested in Booth et al., 2018).Should these interaction terms be non-negligible in this analysis, we still expect the broader attribution of the reasons for the model diversity in temperature response over the PHC period, either how they simulate aerosol concentrations or how they simulate the response to this, to generally hold.This decomposition method is an approximate approach designed to be used with existing simulations, rather than a strict decomposition by dedicated simulations/output variables not included in CMIP6.It cannot tell us precise information about each interaction and adjustment, but it can give us an indication of why models behave differently.

The pothole bias in CMIP6 ESMs
Figure 1a shows the near-global averaged time series of the annual mean TAS anomaly relative to 1850 to 1900 in Had-CRUT5 during the historical period from 1850 to 2014, as well as the ensemble means for each model except for EC-Earth3-AerChem and GFDL-ESM4 (where only a single re-alization is available for the hist-piAer experiment).The unforced, long-term drifts in TAS may occur in some of the ESMs, as estimated by their control simulation under preindustrial conditions (Yool et al., 2020).We have not accounted for long-term control simulation drifts in our study as we are assuming that our focus on inter-decadal-scale variability of TAS anomalies is likely to be fairly insensitive to any century-scale drifts.
The TAS anomaly in HadCRUT5 is generally above the baseline climate from the 1940s onwards and warms fastest from the 1980s to 1990s.Compared with the observations, all the ESM simulations have negative TAS anomaly biases after the 1940s, which are also evident in the ensemble-mean historical TAS of 25 CMIP6 models with and without interactive chemistry schemes (Flynn and Mauritsen, 2020).In the ESMs and their ensemble mean (MMM), the cold anomaly biases resemble a pothole shape (Fig. 1a), which is relatively small before the 1950s and after the 2000s but prominent from the 1960s to 1990s.To reduce the impact of the change in the spatial pattern of the emissions in the late 20th century and the Pinatubo eruption in the early 1990s, we mainly focus on the excessively cold anomaly from 1960 to 1990 in this study.The impacts from the Agung (1963) and El Chichón (1982) eruptions have been left in the PHC period as their effect on the simulated temperature is not as pronounced as the response to Pinatubo and are short-lived in time compared to the period we study.The period of anomalous cold in the global mean from 1960 to 1990 in model simulations is defined as the pothole cooling (PHC).Table 3 shows the TAS anomaly biases in two periods: the pre-PHC period  and the PHC period .The cold bias in the MMM is −0.14 in the pre-PHC period and intensified to −0.40 in the PHC period.The PHC bias ranges from −0.20 to −0.58 • C among the ESMs with a standard deviation of 0.11 • C. Intra-model spread of PHC is relatively smaller.That is, model structural uncertainty is more responsible for PHC than internal climate variability.
The PHC bias is generally smaller in the corresponding lower-complexity models (Fig. 1b and Table 3).For models with prescribed chemistry and aerosol (BCC-CSM2-MR and MPI-ESM1-2-LR), the TAS anomaly is reasonably reproduced during the pre-PHC period and the PHC period.The PHC bias is large (−0.37 • C) in EC-Earth3, which has prescribed chemistry and aerosol.The large bias may be a reflection of the large internal variability on TAS in EC-Earth3 (Döscher et al., 2021), for which we have only one member.For models with a prescribed chemistry and interactive aerosol scheme (GFDL-CM4 and HadGEM3-GC31-LL), the cold biases during the PHC period are comparable with that in the corresponding ESMs.
The spatial and temporal evolution of annually averaged TAS anomalies are further examined (Fig. 2).In HadCRUT5, TAS anomalies are generally positive after the 1940s.The most significant TAS anomalies are evident in the late 20th century and at the beginning of the 21st century, especially over the NH midlatitudes.The results from BCC-CSM2-MR and MPI-ESM1-2-LR agree well with the observations.However, the ESMs and the other lower-complexity models simulate pronounced cold anomalies over NH subtropicalto-high latitudes during the PHC period.The overestimated tropical and southern hemispheric warming in NorCPM1 offsets most of the cooling biases over NH subtropical-to-high latitudes.
Surface anthropogenic SO 2 emissions rapidly increase during the PHC period (the line contours in Fig. 2).The latitudes of the cooling centers in the ESMs and lowercomplexity models with an interactive aerosol scheme are spatially co-located with the SO 2 emission sources -North America and East Asia (at around 30 • N) and western Europe (at around 50 • N).Generally, the different behaviors seen in Figs. 1 and 2 suggest that aerosol forcings may be overestimated in the ESMs and lower-complexity models with an interactive aerosol scheme, and the anomalous cooling is a result of the extra complexity associated with aerosol processes.
During the PHC period, the TAS anomalies in HadCRUT5 are generally positive and are the largest over Eurasia and North America (Fig. 3a).The warm anomalies are on av-erage more than 0.4 • C along the 30-60 • N latitudinal belt.However, the ESMs show anomalies with the opposite sign (Fig. 3b-g) as do the lower-complexity models with an interactive aerosol scheme (figures not shown).The PHC is pronounced over major SO 2 emission centers (western Europe, East Asia, and the eastern US) and their downstream regions.The cold anomalies over Eurasia and North America are lower than −0.6 • C in the ESMs.The PHC biases are strongest at lower levels (figures not shown), which is distinct from the amplified upper-tropospheric warming response to greenhouse gases.

Possible reasons for the excessive cooling
The differences between the historical and hist-piAer simulations help to investigate the impact of anthropogenic aerosol emissions and their possible contribution to the PHC biases.In this section, we examine the TAS, OSR, and sulfate loading differences and look in detail at their relationship.As shown by the evolution of TAS anomalies in the two experiments (Fig. 4a1-6), during the PHC period, TAS anomalies in HadCRUT5 (black line) are higher than those in the historical members but lower than those in the hist-piAer members in all ESMs.That is, the model responses to anthropogenic aerosol emissions are larger than the amplitude of the PHC.The temporal evolution of the OSR corresponds with that of the TAS but occurs in the opposite direction (Fig. 4b1-6).The sulfate loading differences are relatively small in the 19th century, mildly increase in the first half of the 20th century, grow most rapidly during the PHC period, and remain high afterward (Fig. 4c1-6).The growing sulfate loading during the PHC period corresponds with the increase in Northern Hemisphere anthropogenic surface SO 2 emissions (line contours in Fig. 2).In comparison with the TAS and OSR differences, the intra-model spread of sulfate loading for each ESM is relatively small.However, the intermodel diversity of sulfate loading is large.For example, the sulfate loading difference between the historical and hist-piAer experiments around the year 2000 is about 4 mg m −2 in EC-Earth3-AerChem, almost twice of that in GFDL-ESM4.With similar anthropogenic SO 2 emission rates, the lower sulfate loading difference in GFDL-ESM4 indicates it has a shorter sulfate aerosol residence time than that in EC-Earth3-AerChem, which may be due to their different sulfate production and deposition schemes.The sulfate loading diversity is also evident in CMIP5 models and is partly responsible for the diversity in modeled radiative forcing (Wilcox et al., 2015).
The latitudinal movement of the SO 2 emission center from the 1990s affects the relative strength of aerosol forcing.Due to the more rapid oxidation and higher incoming solar flux at lower latitudes, an equatorward shift in SO 2 emissions around the 1990s results in a more efficient production of sulfate and stronger aerosol forcing (Manktelow et al., 2007)   The northern midlatitude temperature is also more sensitive to the distribution of aerosols, which is approximately twice as large as the global average (Collins et al., 2013;Shindell and Faluvegi, 2009).Therefore, we focus on the relationships between TAS, OSR, and sulfate loading after 1900 when SO 2 emission changes are dominated by its anthropogenic component, as well as before 1990.As shown in Fig. 5a, the TAS differences between the historical and hist-piAer simulations vary approximately linearly with the differences in the sulfate loading.The OSR differences are approximately linearly correlated with sulfate loading differences (Fig. 5b).In both cases, the approximation of linearity holds less well for UKESM1-0-LL, especially at small sulfate loadings.This reflects the behavior of HadGEM2, a predecessor of UKESM1-0-LL (Wilcox et al., 2015), and is likely to be due to the strong aerosol-cloud-albedo effect in these models.The global mean annual mean r eff decreases by about 0.7 µm since pre-industrial era -more than twice the magnitude of change seen in the other models (Fig. 1b in Wilcox et al., 2015;Fig. 9b in this study).
The slope of the linear fitting equation between TAS (OSR) and sulfate loading as shown in the captions in Fig. 5a (Fig. 5b) is a measure of the sensitivity of TAS (aerosol forcing) to perturbations in atmospheric aerosol.Moreover, TAS response and aerosol forcing sensitivity are linearly correlated across the ESMs (Fig. 5c).That is, the strength of the TAS response can be understood as the magnitude of aerosol forcing sensitivity within each ESM.The TAS response and aerosol forcing sensitivity is the lowest in GFDL-ESM4.The TAS response and aerosol forcing sensitivity in UKESM1-0-LL (the purple marker in Fig. 5c) are the strongest, as well as their intra-model spread (the length of arrows), indicating that TAS and aerosol forcing in this model are relatively more susceptible to changes in aerosol.
Considering the close relationship between TAS anomalies and aerosol loading (Fig. 5a), as well as the impact of aerosol forcing sensitivity on the TAS response in ESMs https://doi.org/10.5194/acp-21-18609-2021Atmos.Chem.Phys., 21, 18609-18627, 2021 (Fig. 5c), their relative contributions to the PHC biases are examined.Figure 6a  more than 0.1 • C lower; and the aerosol forcing sensitivity in UKESM1-0-LL is the strongest (∼ 1.5 W mg −1 ) but not the PHC bias.Therefore, the aerosol forcing sensitivity is not able to explain the different PHC biases among ESMs.
As shown in Fig. 6b, the sulfate loading differences between the historical and hist-piAer experiments during the PHC period are large among ESMs (the x axis), which are about 1.5 mg m −2 in GFDL-ESM4 but approximately 2.9 mg m −2 in EC-Earth3-AerChem.The sulfate loading differences during the PHC period and PHC biases show a negative correlation: the PHC bias is generally larger in models with higher sulfate loading over this period; the ESMs with similar PHC biases (MPI-ESM1-2-LR, NorESM2-LM, and UKESM1-0-LL) show similar aerosol loading differences.Therefore, the excessive cooling during the PHC period and the inter-model diversity in ESMs are attributed to the higher aerosol burden in these models.

The proportions of ARI and ACI
Although the aerosol forcing sensitivity is not responsible for the anomalous cooling biases in ESMs, it is a good way to identify model differences in the response to aerosol changes.As shown in Fig. 5c, there are significant differences in the aerosol forcing sensitivity among ESMs.The aerosol forcing sensitivity in UKESM1-0-LL is almost 3 times of that in https://doi.org/10.5194/acp-21-18609-2021Atmos.Chem.Phys., 21, 18609-18627, 2021 GFDL-ESM4.Due to the uncertainties in processes and cloud parameterizations, the dominant component (ARI or ACI) of aerosol forcing sensitivity may also vary among ESMs.Here, we separate the different components of the aerosol forcing sensitivity in each ESM by the method introduced in the Sect.2.3 and the Appendix.Sulfate loading is used as a proxy of aerosol amount for all aerosol components in the quantification of the total effect because of its dominant contribution to anthropogenic aerosol load during this period and its covariation with the other aerosol species.
The ARI can be approximated to (1 − CLT_hist/100.)• M, where CLT_hist is cloud amount in the historical simulation and parameter M is a measure of the strength of aerosol-radiation interactions ( OSRclr/ loadSO4).Parameter M varies widely from about 0.35 W mg −1 in NorESM2-LM to about 0.79 W mg −1 in BCC-ESM1 (captions in Fig. 7a-c).Since parameter M does not change much among ensemble members in each ESM, their ARI is similar across members.That is, the impact of internal climate variability on the ARI is relatively small, which is consistent with the quantitative analysis in Fig. 8 (red bars).
The ACI can be estimated from the difference between the aerosol forcing sensitivity and the ARI.The proportion of the aerosol forcing sensitivity arising from the ACI is higher than 64 % in all ESMs (Fig. 8).The inter-model variation of the ACI (0.37 W mg −1 ) is much larger than that for the ARI (0.09 W mg −1 ).For example, the ACI in UKESM1-0-LL (∼ 1.2 W mg −1 ) is higher than all the others and is about 3 times of that in GFDL-ESM4 (0.41 W mg −1 ).This demonstrates that differences in the aerosol forcing sensitivity across the ESMs are dominated by the differences in their individual representation of ACI.Chen et al. ( 2014) also suggested that ACI is the main contribution to the aerosol radiative forcing uncertainty, and the response of marine clouds to aerosol changes is paramount.The intra-model variations in the ACI are also larger than that for the ARI.That is because the intra-model variations of the ACI are influenced by the effects of climate system internal variability on aerosolinduced cloud microphysics, with cloud radiative properties and cloud lifetimes varying regionally.The intra-model variations are also attributable to the differences in atmospheric circulation among different ensemble members, which may affect the geographical distributions of aerosols and clouds and lead to a different magnitude of interactions.
The quantitative analysis in Fig. 8 also indicates that ESMs with similar aerosol forcing sensitivity may have different contributions from ARI and ACI.The aerosol forcing sensitivity is similar in BCC-ESM1, EC-Earth3-AerChem, MPI-ESM-1-2-HAM, and NorESM2-LM, but the fractional contribution from the ACI is the largest in NorESM2-LM, and its ARI is less than half of that in BCC-ESM1.Generally, BCC-ESM1 has the largest fractional ARI contribution (34 %), whereas NorESM2-LM has the largest fraction of ACI contribution (86 %).

The proportions of cloud-amount and cloud-albedo terms
Our ACI metric includes several mechanisms by which aerosols can alter cloud properties.This includes the cloudalbedo effects (or Twomey effect), referred to as the radiative forcing part of ACI, and effects of aerosols on the macroscopic properties of clouds (for example, cloud extent and lifetime), referred to as the adjustment part of ACI.However, it is complicated to separate these two parts of ACI directly using available CMIP6 diagnostics, because the former is most accurately defined as a change in cloud albedo with all other cloud properties held constant (i.e., a change in cloud-droplet number concentration only), whilst the latter allows cloud properties to respond.
Figure 9 shows the evolution of global mean differences in total cloud amount ( CLT) and cloud-top effective droplet radius ( r eff ) between the historical and hist-piAer experiments.The CLT and r eff in UKESM1-0-LL are the largest and highly correlated with each other (with a correlation coefficient of −0.92 during the 1900 to 1990 period).For the other two ESMs for which r eff was archived, the correlation coefficient is −0.40 for MPI-ESM-1-2-HAM and insignificant for GFDL-ESM4 (−0.09).The CLT and r eff differences are smaller in MPI-ESM-1-2-HAM and GFDL-ESM4 than in UKESM1-0-LL, especially for the r eff differences.r eff is generally related to the cloud optical depth https://doi.org/10.5194/acp-21-18609-2021Atmos.Chem.Phys., 21, 18609-18627, 2021 and cloud water path, and CLT is related to adjustments in cloud cover due to ACI.Therefore, the radiative forcing part and adjustment part of ACI may be closely coupled in UKESM1-0-LL and are hard to separate statistically.The strong correlation between cloud amount and r eff response in UKESM1-0-LL indicates that this model is sensitive to aerosol-cloud interactions, which likely contributes to it having the strongest aerosol forcing sensitivity and intra-model spread of all the CMIP6 models (Fig. 5c).MPI-ESM-1-2-HAM and UKESM1-0-LL have similar ensemble mean PHC biases and close sulfate burden, but the aerosol forcing sensitivity difference in UKESM1-0-LL is almost twice of that in MPI-ESM-1-2-HAM (Fig. 5).That is, the overestimated sulfate burden dominates the PHC biases, but the ACI sensitivity may partly affect the amplitude and uncertainty ranges of PHC biases.Despite of the closely coupled radiative forcing part and adjustment part of ACI in UKESM1-0-LL, it is still possible to split the ACI into a part that is correlated with cloudamount differences and a residual term.This can be done statistically by regressing out the approximate linear dependence of the differences between historical and hist-piAer simulations of the cloudy part of OSR (OSRcld_p) on cloud fraction in each ESM (parameter A in Fig. 7g-i).We call the degree of linear correlation of OSRcld_p with CLT the "cloud-amount term", and the residual will be referred to as the "cloud-albedo term".However, we reiterate that the so-called "cloud-amount term" may also include changes in the reflectivity of clouds if these are correlated with changes in cloud amount.Similarly, the cloud-albedo term will contain any sources of cloud-amount changes which have not been removed by linearly regressing OSRcld_p against cloud amount.As such, we do not intend for this nomenclature to indicate a precise separation of the radiative forcing part and adjustment part of ACI.Our decomposition allows first-order assessment of these terms from historical simulations without the need for extra simulations or calls and also allows estimates from observations and intermodel comparisons.
As described in the Sect.2.3 and the Appendix, the cloud-amount term is sensitive to two parameters: the cloudamount response (parameter N in Fig. 7d-f) and the sensitivity of OSR reflected from clouds to cloud-amount changes (parameter A, Fig. 7g-i).As shown in Fig. 8, UKESM1-0-LL has the largest contribution of the cloud-amount term to aerosol forcing sensitivity (62 %, 0.91 W mg −1 ); the cloud-amount term is the smallest in GFDL-ESM4 (∼ 0.18 W mg −1 ).The cloud-albedo term is defined to be linearly independent of cloud-amount changes (adjustments).For the CMIP6 ESMs, it can only be estimated as the residual after subtracting the cloud-amount term from the ACI.The cloud-albedo term is similar in BCC-ESM1, MPI-ESM-1-2-HAM, and NorESM2-LM.The inter-model variation for the cloud-amount term is about twice of that for the cloud-albedo term (0.29 W mg −1 vs. 0.16 W mg −1 ).That is, the variations of cloud-amount term are the major source of inter-model ACI (and the aerosol forcing sensitivity) differences between ESMs.Therefore, difference in the cloud-amount terms, across the ESMs, dominates the uncertainties in the aerosol forcing sensitivity.
Note that our definitions do not correspond to the effects measured by using multiple calls to the radiation scheme of a model, with and without aerosols, which measure instantaneous radiative effects; multiple calls give a measure of the fast response of clouds to aerosol perturbations in a fixed thermodynamic and dynamical background, allowing for a clear separation between ACI and rapid adjustments (e.g., Bellouin et al., 2013).This differs from aerosol forcing diagnosed by differencing climate projections with different aerosol forcings, which include the slow effects of other feedbacks.For example, differences in climate forcings can lead to different SST patterns, which in turn alter the location and characteristics of clouds.Despite these differences, an advantage of our classification is that it provides a possible method for model evaluation since the variables used are also, in principle, available from the observations.

Conclusions
This study focuses on the reproduction of historical surface air temperature anomalies in six CMIP6 ESMs.The ESMs systematically underestimate TAS anomalies relative to 1850 to 1900 in the NH midlatitudes, especially from 1960 to 1990, the pothole cooling (PHC) period.Previous studies suggested that aerosol cooling is too strong in many CMIP6 models.Our study more specifically found that the PHC is concurrent in time and space with anthropogenic SO 2 emissions, which rapidly increase in the PHC period in NH.Models with larger aerosol burdens have larger PHC biases.The primary role of aerosol emissions in these biases is further supported by the differences between ESMs and the lowercomplexity models with prescribed aerosol.
Differences between historical simulations and simulations with aerosol emissions fixed at their pre-industrial levels (hist-piAer) are used to isolate the impacts of industrial aerosol emission.We propose that the overestimated aerosol concentrations in the ESMs are responsible for the spurious drop in TAS in the mid-20th century, rather than a high sensitivity of the models to aerosol forcing.Although the aerosol forcing sensitivity differences in ESMs cannot explain the PHC biases, they are a good measurement of aerosol effects that can be used to explore structural differences between models.A simple metric is derived for determining the dominant contribution to the aerosol forcing sensitivity in any specific model: ARI or ACI.The ACI accounts for more than 64 % of the aerosol forcing sensitivity in all analyzed ESMs.The considerable inter-model variation in the aerosol forcing sensitivity is mainly attributable to the uncertainty in the ACI within models.The ACI can be further decomposed into a cloud-amount term and a cloud-albedo term.The cloud-amount term is found to be the major source of inter-model diversity of ACI.Considering the crucial role of cloud properties on the inter-model spread in aerosol forcing sensitivity, the aerosol-cloud interactions should be a focus in the development of aerosol schemes within ESMs.

Appendix A: Decomposition of the aerosol-radiation interaction and aerosol-cloud interaction
Considering the dominant role of sulfate aerosol on anthropogenic aerosol forcing, we use the sulfate loading (loadSO4) as a proxy for all aerosol in our analysis.The aerosol forcing sensitivity (as determined by the difference between the historical and hist-piAer experiments) is estimated by the all-sky OSR differences per sulfate burden unit ( OSR/ loadSO4), and it is the combination of OSR differences in the clear-sky parts ( OSRclr_p/ loadSO4) and the cloudy parts ( OSRcld_p/ loadSO4): OSR/ loadSO4 = OSRclr_p/ loadSO4 + OSRcld_p/ loadSO4. (A1) The OSRclr_p for a particular experiment can be calculated as where N is the parameter defined above.Therefore, the first term on the right-hand side of Eq. (A4), A•N, corresponds to the impact of cloud-amount changes on the cloud radiation; and the cloud-albedo term can be obtained as a residual after subtracting A • N from OSRcld_p/ loadSO4, thereby eliminating any linear dependence of the cloudy-sky shortwave flux response on cloud-amount changes.
As with the clear-sky decomposition, residual_cld is a possible non-linear term and is assumed to be small.This term cannot in fact be distinguished from the cloud-albedo term in this analysis: we must therefore accept that cloudalbedo changes could be accompanied by non-linear changes in macroscopic cloud properties (in this framework).
The total aerosol forcing sensitivity can be measured by substituting the derived values of OSR/ loadSO4 from both the clear sky (Eq.A3) and cloudy (Eq.A4) parts back into Eq.(A1):

Figure 1 .
Figure 1.(a) Historical near-global mean (60 • S to 65 • N) surface air temperature (TAS) anomalies relative to 1850-1900 mean from HadCRUT5 (thick black line), the ensemble mean for each ESM (solid color lines), and multi-model mean (MMM, dashed black line).Panel (b) is the same as panel (a) but for the lower-complexity models.Units: • C. Value in bracket is the number of available members for each model.

Figure 2 .
Figure 2. Time-latitude cross section for annual-mean TAS anomalies (shaded) from (a) HadCRUT5, the ensemble mean for each ESM (b, d, f, h, j, and l), and the corresponding lower-complexity model (c, e, g, i, k, and m).The anomalies are related to the 1850-1900 mean.Units: • C. Note that the color scale intervals in the positive and negative directions are 0.2 and −0.1 • C, respectively.Line contours range from 20 to 40 ng m −2 s −1 with an interval of 10 ng m −2 s −1 showing the zonal mean anthropogenic surface SO 2 emission provided by CMIP6.

Figure 3 .
Figure 3.The TAS anomalies during the pothole period (1960-1990) from (a) HadCRUT5 and (b-g) the ensemble mean for each ESM.The anomalies are relative to the 1850-1900 mean.Units: • C.

Figure 5 .
Figure 5. Scatter plots of 1900-1990 yearly sulfate loading differences between the historical and hist-piAer simulations (x axis) versus (a) TAS differences and (b) OSR (y axis).Results are from the ensemble mean for each ESM.The captions are the linear fitting equations.Panel (c) shows the TAS response (x axis) and aerosol forcing sensitivity (y axis) which is equal to slope of linear fitting for each ESM (markers), as well as the corresponding intra-model spread (arrows).

Figure 6 .
Figure 6.Pothole cooling (PHC) bias in ESMs ( • C) versus (a) the aerosol forcing sensitivity (W mg −1 ) and (b) sulfate loading differences (mg m −2 ) during the PHC period.The arrows show the uncertainty ranges among the members in each ESM.

Figure 7 .
Figure 7. Annual mean differences between the historical and hist-piAer simulations in the ESM members during the 1900-1990 period for (a-c) sulfate loading (mg m −2 ) versus clear-sky OSR (OSRclr, W m −2 ), (d-f) sulfate loading versus total cloud fraction (%), and (g-i) total cloud fraction versus OSR in cloudy parts (W m −2 ).Slopes of the linear fitting equations from the top row to the bottom row refer to the parameters M, N, and A, respectively.

Figure 8 .
Figure 8.Total aerosol forcing sensitivity from each member in ESMs.The number marked on the top is the total aerosol forcing sensitivity.Partitions of aerosol-radiation interaction term, cloudalbedo term, and cloud-amount term are marked in the corresponding color bars.Unit: W mg −1 .Where multiple realizations are available for a model, a bar is shown for each member.

Figure 9 .
Figure 9. (a) Evolutions of global mean cloud-amount differences between the historical and hist-piAer simulations in ensemble mean for each ESM (units: %).Panel (b) is the same as panel (a) but for cloud-top effective droplet radius (r eff , µm).The r eff data are only available for GFDL-ESM4, MPI-ESM-1-2-HAM, and UKESM1-0-LL.
OSRclr_p = (1 − CLT/100.)• OSRclr, (A2)where CLT is the total cloud amount (unit: %), and OS-Rclr is the OSR assuming all clear sky (unit: W m −2 ).The cloud-amount changes ( CLT) will modify the proportion of clear sky and then affect the OSR changes attributed to the clear-sky part by covering or uncovering aerosols in clear sky.Therefore, based on Eq. (A2), OSRclr_p/ loadSO4 can be decomposed into the OS-Rclr response ( OSRclr/ loadSO4) and CLT response ( CLT/ loadSO4):OSRclr_p/ loadSO4 = (1 − CLT_hist/100.)• OSRclr/ loadSO4 − OSRclr_hist/100 • CLT/ loadSO4 + residual_clrp = (1 − CLT_hist/100.)• M − OSRclr_hist/100 • N + residual_clr, (A3)where CLT_hist and OSRclr_hist are the mean CLT and OS-Rclr in the historical experiment.Residual_clr is the residual term that is non-linear in OSRclr and CLT.The parameter M = OSRclr/ loadSO4 is related to the strength of the aerosol-radiation interaction and can be estimated by linear fitting of OSRclr on loadSO4.The parameter N = CLT/ loadSO4 is related to CLT response and estimated by linear fitting of CLT on loadSO4.Therefore, the first term on the right-hand side of Eq. (A3), (1 − CLT_hist/100.)• M, corresponds to the aerosol radiative effect; the second term, −OSRclr_hist/100 • N, corresponds to the impact of changes in the clear-sky area.The OSRcld_p is the cloudy part of OSR, accounting for the difference between OSR and OSRclr_p.The cloudy part of the OSR differences ( OSRcld_p) can be generally estimated as OSRcld_p = A • CLT + cloud-albedo relative changes + residual_cld, where the parameter A = (OSR − OSRclr_p)/ CLT is the sensitivity of the shortwave flux reflected by clouds to changes in cloud amount.The parameter A depends on the baseline cloud albedo (radiative flux per cloud-amount unit) and can be estimated by linear fitting of OSRcld_p on CLT.Hence, OSRcld_p/ loadSO4 = A • CLT/ loadSO4 + cloud-albedo term + residual_cld, = A • N + cloud-albedo term + residual_cld, (A4)

Table 1 .
Information of the ESMs with an interactive chemistry and aerosol scheme, as well as the corresponding lower-complexity models.

Table 2 .
Variables used in this study. .

Table 3 .
Biases in near-global averaged TAS anomalies relative to 1850-1900 from the ensemble mean and standard deviation (SD) for each ESM and the corresponding lower-complexity model in the pre-PHCand the PHC period.Units: • C. Biases are relative to the HadCRUT5.The multi-model mean (MMM) and the SD of the ESMs are shown in the bottom row.The aerosol effective forcing (ERFaer) is also shown for each ESM except for MPI-ESM-1-2-HAM.ERFaer result for MPI-ESM-1-2-HAM are not available.