Change from aerosol-driven to cloud-feedback-driven trend in short-wave radiative ﬂux over the North Atlantic

. Aerosol radiative forcing and cloud–climate feedbacks each have a large effect on climate, mainly through modiﬁcation of solar short-wave radiative ﬂuxes. Here we determine what causes the long-term trends in the upwelling short-wave (SW) top-of-the-atmosphere (TOA) ﬂuxes ( F SW ↑ ) over the North Atlantic region. Coupled atmosphere–ocean simulations from the UK Earth System Model (UKESM1) and the Hadley Centre General Environment Model (HadGEM3-GC3.1) show a positive F SW ↑ trend between 1850 and 1970 (increasing SW reﬂection) and a negative trend between 1970 and 2014. We ﬁnd that the 1850–1970 positive F SW ↑ trend is mainly driven by an increase in cloud droplet number concentration due to increases in aerosol, while the 1970–2014 trend is mainly driven by a decrease in cloud fraction, which we attribute mainly to cloud feedbacks caused by greenhouse gas-induced warming. In the 1850


Introduction
Many changes have occurred over the historical period, 1850-2014: the industrial revolution, during which North America and Europe in particular emitted increasing amounts of greenhouse gases, aerosols, and aerosol precursors; the introduction of clean air acts by North America and Europe starting in the 1950s that led to reduced aerosol emissions from those regions; the industrialization of China and India leading to increased emissions of greenhouse gases and aerosols; and the general continued rise in the rate of global greenhouse gas emissions. Climate change over the North Atlantic (NA) on decadal and longer timescales is influenced by many different factors, with the most significant likely being greenhouse gas forcing, aerosol forcing, mid-latitude cloud feedbacks mediated by temperature changes, natural variability and the Atlantic Meridional Overturning Circulation (AMOC). Many of the processes involve changes in the upwelling short-wave (SW) radiative flux at the top of the atmosphere (F SW↑ ); therefore, F SW↑ is a key property of the Earth system when considering climate variability.
Fairly long-term observational records of F SW↑ exist for the recent part of the historical period (e.g. 1985Allan et al., 2014a;Liu et al., 2015Liu et al., , 2017a) that may be useful for evaluating models and attributing changes in climate to various causes. However, to understand what model evaluation using such long-term datasets is telling us about the causes of regional climate change, it is necessary to understand the driving factors of long-term changes in F SW↑ . In this study we use the UK Met Office climate models to better understand the underlying processes and what the observed long-term trends in F SW↑ might be telling us about model performance and causes of regional climate change.
We focus on the NA region, because it plays a major role in several aspects of the Earth system. The NA ocean sequesters large amounts of carbon and heat from the atmosphere and therefore helps to regulate the global climate (Buckley and Marshall, 2016). Processes in the NA are thought to help determine the speed of the AMOC (Buckley and Marshall, 2016), which transports a significant amount of heat northwards, representing ∼ 25 % of the total (atmospheric plus oceanic) global northward heat transport at 24-26 • N (Srokosz et al., 2012). The AMOC transports a large amount of energy from the Southern Hemisphere to the Northern Hemisphere, something that is not true for the equivalent circulations in the Pacific (Buckley and Marshall, 2016). This cross-equatorial oceanic heat flow is important, because it leads to a compensating southward cross-Equator heat flow within the atmosphere, and this in turn causes the Intertropical Convergence Zone (ITCZ) to be positioned north of the Equator. Changes in the AMOC can therefore lead to changes in the ITCZ position, which could bring great disruption to the climate of not only the Atlantic region but also the climates of the global tropics, subtropics and potentially the midlatitudes via changes in precipitation and changes to the Indian and Asian monsoons (Buckley and Marshall, 2016;Chiang and Friedman, 2012;Srokosz et al., 2012).
The NA is surrounded by North and Central America, Europe, and North Africa, which are large regions of high population density. This means that (1) there is a great deal of influence from short-lived anthropogenic species such as aerosols over the NA and (2) that changes in the NA climate system can have large impacts on human society. Sea surface temperature (SST) variability in the NA has been associated with impacts on important phenomena such as tropical storm and hurricane activity (Zhang and Delworth, 2006;Smith et al., 2010;Dunstone et al., 2013); anomalies in rainfall in Europe and North America (Sutton and Hodson, 2005;Sutton and Dong, 2012); African Sahel and Amazonian droughts (Hoerling et al., 2006;Knight et al., 2006;Ackerley et al., 2011); Greenland ice-sheet melt rates (Holland et al., 2008;Hanna et al., 2012); sea-level anomalies (McCarthy et al., 2015); and the strength of the mid-latitude jet (Woollings et al., 2015). Robson et al. (2018) provides a review of changes in the North Atlantic climate system, with a focus on more recent changes.
Aerosol effective radiative forcing ( F eff aer ) is a key driver of long-term changes in F SW↑ over the NA and globally. For the calculation of F eff aer , all physical variables are allowed to respond to perturbations except for those concerning the ocean and sea ice, (e.g. see Myhre et al., 2013), meaning that surface temperatures need to be constant. As discussed further below, this makes F eff aer difficult to calculate using time series from observations or coupled climate models since radiative fluxes respond to changes in temperature, for example, due to cloud feedbacks. F SW↑ is a focus for aerosol forcing, because aerosol forcing generally occurs through the effect of aerosols on short-wave radiative fluxes rather than long-wave fluxes; for example, O'Connor et al. (2021) estimate a global SW F eff aer of −1.26 W m −2 and a longwave F eff aer of 0.17 W m −2 for UKESM1 (UK Earth System Model v1). Henceforth in this paper, we only consider shortwave fluxes, forcings and feedbacks.
F eff aer can be separated into a component due to aerosolradiation interaction (ARI) that occurs in cloud-free air ( F eff ari ; sometimes also known as the direct effect) and a component due to aerosol-cloud interaction (ACI, or indirect effects), designated as F eff aci . The ACI component of F eff aer can also be broken down into two further components. First is that due to a change in cloud droplet concentration (N d ) at constant liquid water content (LWC) and constant cloud fraction (f c ), which causes a change in the cloud droplet effective radius (r e ) and hence cloud albedo. Here we will designate this component F eff aci N d , often termed the instantaneous radiative forcing or the Twomey effect (Twomey, 1977). Second is that due to rapid adjustments of LWC (or the vertical integral of this, which is the liquid water path (LWP), L) and/or adjustments in f c that occur in response to the initial decrease in droplet size associated with the N d increase. Note here that we define L to be the in-cloud value not the mean of a partly cloudy sky. We designate the forcings from these adjustments as F eff aci L and F eff aci f c and note that F eff aci ≈ F eff aci N d + F eff aci L + F eff aci f c . The mechanisms that cause these adjustments involve several microphysical and thermodynamical processes (Albrecht, 1989;Stevens et al., 1998;Ackerman et al., 2004;Bretherton et al., 2007;Hill et al., 2009;Berner et al., 2013;Feingold et al., 2015). For regions of the NA north of 18 • N, UKESM1 suggests that F eff aci greatly dominates over F eff ari . Decomposing F eff aci further, Grosvenor and Carslaw (2020) found that F eff aci N d and F eff aci L dominate the F eff aci forcing in the northern regions of the NA (north of around 40 • N), whereas F eff aci f c dominates further south. Models show that aerosol forcing has influenced the climate variability of the NA. Booth et al. (2012) showed that surface aerosol radiative forcing was the dominant driver of decadal changes in sea surface temperatures (SSTs) for the atmosphere-ocean coupled global circulation model (the UK Met Office HadGEM2-ES model) that was used in the Fifth Coupled Model Intercomparison Project (CMIP5). Menary et al. (2020) showed that for the CMIP6 models aerosols acted to speed up the AMOC during the historical period, whereas greenhouse gases slowed it down. Climate models also predict that during the 21st century a region in the northern NA will experience less warming under the influence of greenhouse gases than the rest of the globe (termed the NA "warming hole"), related to the slowing down of the AMOC (Manabe and Stouffer, 1993;Robson et al., 2016;Chemke et al., 2020). Over the historical period, aerosols have likely delayed the formation of this warming hole by speeding up the AMOC (Dagan et al., 2020).
Despite its importance, aerosol forcing remains the most uncertain of the forcings. It would be desirable to be able to use long-term trends in observable quantities like F SW↑ to determine aerosol forcing from the observations in order to constrain models. Long-term records of F SW↑ (e.g. the DEEP-C dataset for 1985-2019; Allan et al., 2014a;Liu et al., 2015Liu et al., , 2017a have the potential to evaluate some aspects of model performance in terms of aerosol forcing. However, in order to understand what model evaluation using such a dataset is telling us about model performance, it is necessary to understand what has been driving long-term changes in F SW↑ . There has been some previous work towards using long time records to estimate aerosol forcing and evaluate models, although the feasibility of this approach remains in question. Cherian et al. (2014) used observations of surface SW flux from the GEBA (Global Energy Balance Archive) network over Europe for the period 1990-2005 to attempt a constraint on the global aerosol forcings predicted by the CMIP5 climate models. At the locations of the GEBA stations, the effective global aerosol forcings across the different models correlated with the change in surface SW model flux. The observations of the latter were then used to infer the most realistic range of effective global aerosol forcing. A potential issue with this approach is that it relies on the accuracy of the relationship between the two variables across the different models. For example, the relationship is likely affected by the balance of forcings and feedbacks within the different models, which are highly uncertain and may vary depending on the time period chosen. Kramer et al. (2021) used satellite observations to infer the total instantaneous global radiative forcing of the climate for the 2003-2018 period. This included the effect of greenhouse gases and a variety of other forcings, but for aerosol forcing only the ARI component was included and not ACI. Using MODIS time series from 2003 to 2017 for oceanic regions of the NA (off the east coast of the US and the west coast of Portugal) and off the east coast of China, Bai et al. (2020) found no relationship between long-term changes in aerosol and changes in LWP, which may indicate a forcing from aerosols via cloud adjustments that is too small to be identified over the relatively short time period given the large inter-annual variability in LWP.
One of the main complications with using long-term records to estimate aerosol forcing is that there are several other drivers of changes in clouds over long timescales that we attempt to characterize in this study. One such driver is climate change, i.e. changes in temperature and sea-ice cover, which causes cloud-climate feedbacks. For example, over recent decades, warming due to greenhouse gas emissions has increased rapidly, but aerosol emission rates have also varied over the historical record which will affect temperatures too. The resulting changes in clouds from cloud-climate feedbacks must be taken into consideration when attempting to estimate aerosol forcing using long-term records.
Cloud-climate feedbacks are very important in the NA region. Norris et al. (2016) showed using satellite observations that cloud fraction has changed substantially between 1983 and 2009 and that these changes are well predicted by models. The cloud feedback operating in this region is thought to be the mid-latitude cloud feedback, whereby warming can cause an expansion of the Hadley cell and a poleward shift of the storm tracks (Held and Hou, 1980;Lu et al., 2007;Seidel et al., 2008) that can reduce mid-latitude cloudiness (Norris et al., 2016), leading to an increase in short-wave radiation reaching the surface. This amplifies the temperature change and hence is a positive feedback. Satellite observations have been used to evaluate global model cloud feedbacks, but this approach may lead to an estimate of cloud feedback that is too negative (Armour et al., 2013;Zhou et al., 2016;Andrews et al., 2018) due to the specific pattern of SSTs that occurred over this period, namely a cooling over subtropical stratocumulus regions despite the overall global warming. This caused a local increase in cloud coverage over subtropical stratocumulus regions that acted to increase F SW↑ , thus making the cloud feedback more negative. Care is therefore needed when using observations to infer cloud feedbacks. In this study we use the UK Met Office climate models to better understand the underlying processes and what the observed trends in F SW↑ are telling us about model performance and the causes of climate change over the NA. There has been some work with related aims before. For example, Wang et al. (2021) showed that, across the CMIP6 models, mean cloud feedback strength and an estimate of mean aerosol forcing were negatively correlated over the 1950-2000 period such that models with a stronger negative aerosol forcing tended to have a more positive cloud feedback. This was particularly true for models that were more consistent with the observed historical temperature change. For those models, the equilibrium climate sensitivity was also negatively correlated with the aerosol forcing. These results suggest some degree of model tuning between aerosol forcing (causing a cooling) and cloud feedback (causing a warming) to allow for the recreation of observed temperatures. Changes in radiative flux relative to preindustrial times for the 1950-2000 period in the models with the strongest cloud feedback parameters were caused almost entirely by aerosol forcing rather than temperature-induced feedbacks; the models with small cloud feedback parameters showed very little change in radiative flux for this period.
We go further than the above work since we focus on simulations from one modelling centre and break down the underlying causes of the long-term short-wave radiative changes in that model in terms of clear-sky effects, cloud variables and emission types. We separate the aerosol forcing and cloudclimate feedback effects on short-wave radiative changes using different techniques to those used previously in order to more precisely estimate the aerosol forcing. Finally we also use the results to draw conclusions about the feasibility of using long-term observations to quantify aerosol forcing and to evaluate model performance, and we compare our model results to long-term observations.

The UKESM1 and HadGEM3-GC3.1 climate models
We examine output from the atmosphere-ocean-coupled UKESM1 (UK Earth System Model; Sellar et al., 2019) and the HadGEM3-GC3.1 (Hadley Centre Global Environment Model 3 Global Coupled configuration version 3.1; here shortened to HadGEM; Williams et al., 2017;Kuhlbrodt et al., 2018) models, which were submitted as part of the Sixth Coupled Model Intercomparison Project (CMIP6; Eyring et al., 2016). UKESM1 is based on the atmosphereocean-coupled HadGEM physical climate model but in addition couples several Earth system processes. These additional components include the MEDUSA ocean biogeochemistry model (Yool et al., 2013), the TRIFFID dynamic vegetation model (Cox, 2001;Clark et al., 2011;Wiltshire et al., 2020;Sellar et al., 2019) and the stratospheric-tropospheric version of the United Kingdom Chemistry and Aerosol (UKCA) model of atmospheric composition (Archibald et al., 2020). This version of UKCA allows for a more complete description of atmospheric chemistry compared to HadGEM. For example, the latter uses an offline climatology for oxidants, whereas in UKESM1 oxidants are simulated. An N96 resolution horizontal grid is used in both models, which is 1.875 × 1.25 • (208 × 139 km) at the Equator. Eighty-five vertical levels are used between the surface and 85 km altitude with a stretched grid such that the vertical resolution is 13 m near the surface and around 150-200 m at the top of the boundary layer.

Model data
All CMIP6 model data originate from the Earth System Grid Federation (ESGF) archive (https://esgf-node.llnl.gov/ search/cmip6/, last access: 7 June 2023). Monthly averaged model data are used since higher time resolution data are not available for most variables. We average the monthly data to annual averages for time series but use the monthly data when calculating SW fluxes (Sect. 3.2).

The CMIP6 UKESM1 and HadGEM coupled atmosphere-ocean ensembles
We use output from the 16-member UKESM1 and the 4member HadGEM coupled atmosphere-ocean historical ensemble runs that were performed for CMIP6 Williams et al., 2017;Kuhlbrodt et al., 2018). These ran from 1850 to 2014 using greenhouse gas (GHG), aerosol, natural emissions (e.g. volcanic) and other emissions that were designed to represent the real emissions over this period. We note that there are likely to be uncertainties in these emissions that will lead to model errors. The ensembles were designed to capture a range of possible ocean and atmospheric states in order to sample the natural multi-decadal variability.

The AerChemMIP and DAMIP coupled atmosphere-ocean experiments
We also make use of the DAMIP (Detection and Attribution Model Intercomparison Project; Gillett et al., 2016)  The AerChemMIP experiments are based on a threemember subset of the 16-member UKESM1 ensemble described in Sect. 2.2.1, which we refer to as AerChemMIPall-emissions. The "piAer" experiments used historical emissions for all emission types except for aerosols and aerosol precursors, for which preindustrial emissions were used. We assume that these runs are equivalent to the greenhouse gasonly runs (similar to DAMIP-Hist-GHG) since the DAMIP results (see Appendix A and B) show that aerosols and greenhouse gas emissions are the main drivers of long-term trends for the North Atlantic region. For this reason, we refer to the piAer experiment as AerChemMIP-GHG-only-proxy. We estimate the effects of aerosol emissions alone by subtracting the AerChemMIP-piAer time series from all-emissions UKESM1 runs for the 3-member subset of ensemble members used for the AerChemMIP experiments. The accuracy of this approach is tested using the DAMIP results and is shown in Appendix B. We refer to this as AerChemMIP-aerosolonly-proxy. Box 1 of the schematic in Fig. 1 depicts the above methodology for the AerChemMIP experiments. In the main part of the paper, we focus on the UKESM1 results derived using the AerChemMIP experiments and mostly show the DAMIP/HadGEM in Appendix A.

The UKESM1 atmosphere-only run
We also examine data from the atmosphere-only UKESM1 runs performed as part of CMIP6, which have the same historical forcings as in the coupled CMIP6 runs but with seasurface temperatures (SSTs) and sea-ice concentrations prescribed from observations. Examination of these simulations helps to quantify how deviations of the coupled model SSTs and sea ice from the observed state affect clouds and shortwave fluxes. It also allows for a model assessment of the atmospheric components against observations when given the correct ocean conditions. There is currently only one atmosphere-only run available, which prevents examination of the effects of atmospheric variability via an ensemble; for example, despite SSTs being fixed the atmosphere can exhibit different modes of variability that may not match the actual modes that occurred in reality, and so some differences between the atmosphere-only run and reality might be expected.

Surface albedo calculation and sea-ice screening
The surface albedo (A surf ) is required for the offline radiative calculations described in Sect. 3.2 and for the screening of high-sea-ice regions. It is calculated using the monthly mean upwelling and downwelling clear-sky SW surface fluxes (1) Grid boxes within the NA region were excluded where substantial sea ice was formed in any of the simulations such that the same grid boxes were excluded for all runs; the criteria for exclusion was the annual-mean surface albedo exceeding 20 % at some point during the historical time series.

Uncertainties in trends
Temporal trends are calculated using a linear least squares method, and the errors in the trends are calculated following Santer et al. (2000), where an effective sample size is used that takes into account temporal autocorrelation using the lag-1 autocorrelation coefficient.

MODIS cloud droplet number concentration observations
We use cloud droplet concentration (N d ) as an indicator of aerosol-driven changes in clouds, because it more directly represents the first step in the chain of processes by which aerosols affect clouds. N d gives some indication of the number of cloud condensation nuclei (CCN; a subset of the whole aerosol population) that were available to produce clouds but is also affected by other factors such as updraught speed, droplet collision coalescence, droplet scavenging by rain, cloud evaporation, etc. We evaluate model N d and its trends against MODIS N d observations. We use a 1 × 1 • resolution data set calculated from 1 km MODIS retrievals of cloud optical depth (τ c ) and effective radius (r e ). Two-dimensional fields of N d are derived by the retrieval since it is assumed that N d is constant throughout the depth of the cloud, which has been shown to be a good approximation by aircraft observations of stratocumulus (Painemal and Zuidema, 2011). Details of the retrieval and dataset are given in Grosvenor and Carslaw (2020). For the model, two-dimensional N d fields were obtained from the monthly 3D N d fields by calculating a weighted vertical mean N d , with the liquid water mixing ratio (q L ) on each level used for the weights. This ensures that the levels with the most q L contribute most to the average N d , which is similar to what is assumed in the MODIS retrieval since most of the r e signal comes from near cloud top where q L is usually the largest, and the N d calculation is a strong function of r e . It also reduces the weight of contributions from very thin clouds that would not be detected by MODIS. Only datapoints for which the mean cloud top height is below 3.2 km were included for the satellite N d calculation in order to help exclude satellite retrieval errors for deeper clouds (see Grosvenor et al., 2018).

Variables considered and assumptions for changes in short-wave flux
We attribute trends in F SW↑ to changes in liquid clouds, clear-sky F SW↑ and surface albedo. Changes in clear-sky F SW↑ (F clear−sky SW↑ ) will include the effects of changes in aerosol in cloud-free air, changes in the surface albedo (A surf ) and changes in trace gas concentrations. However, we do not attempt to separate these effects here. For changes in the allsky (i.e. combined cloudy and clear regions) albedo, we consider the effect of changes in the three main variables that affect it, namely cloud fraction (f c ), cloud droplet number concentration (N d ) and in-cloud liquid water path (L), along with F SW↑ from the clear-sky regions above clouds and also A surf in cloudy-sky conditions. However, changes in the latter were found to have negligible impact for the region considered. L is the LWP from the cloudy regions only. For f c , we use the total cloud fraction since liquid-only cloud fractions aggregated over all heights (i.e. accounting for overlap assumptions) were not available. Occasionally, the all-sky liquid water path (L all-sky ) is also considered (i.e. including both the cloudy and clear-sky portions of model grid boxes or observed regions). To calculate L from the L all-sky values provided by the models, we assume that L = L all-sky /f c (e.g. as also in Seethala and Horváth, 2010); we use monthly values for this calculation.

Calculation of aerosol radiative forcing
The effective radiative forcings (ERFs) due to aerosolcloud interactions ( F eff aci ) and aerosol-radiation interactions ( F eff ari ) are considered. The total aerosol ERF ( F eff aer ) is the sum of the two. In the coupled climate runs, SSTs vary over time, and so ERFs cannot be directly calculated from the change in F SW↑ in aerosol-only emissions runs. Instead, the ERFs for the coupled climate runs (see Sect. 3.5) were estimated by scaling F eff ari and F eff aci from nudged simulations based on the ratio of the change in N d over time in the coupled models to the change in N d in the nudged runs (see Box 2 of Fig 1; Appendix C gives more details of the calculations). The nudged model runs consist of a pair of atmosphere-only nudged UKESM1 simulations with prescribed time-varying SSTs, as presented in Grosvenor and Carslaw (2020); one simulation used preindustrial (PI) aerosol emissions and the other present-day (PD) emissions from 2009. The nudging (using 2009 reanalysis) was applied only to the winds above the boundary layer and kept the large-scale meteorology approximately the same in both simulations whilst allowing local boundary layer and associated clouds to respond to the different aerosol loadings.   For each variable, trends have been fitted to the ensemble mean time series for the two periods and then multiplied by the duration of the time periods to give the total change in quantity x (denoted x). These values and the associated uncertainties in the fitted trends are given in Table 1. For the pre-1970 period for UKESM1, F SW↑ was 4.7 ± 0.98 W m −2 , and over the post-1970 period it was −6.0 ± 1.4 W m −2 ; hence, the magnitude of the change in F SW↑ is slightly larger for the second period. Short periods of enhanced F SW↑ are evident, which reach close to or extend beyond the 2σ variation of the ensemble. These are due to volcanic eruptions. One example occurs in 1991 and is due to the Mount Pinatubo eruption.

North Atlantic time series for UKESM1
The maps of F SW↑ in Fig. 3 show that the NA is one of the main oceanic regions that shows large changes in F SW↑ over the chosen time periods, which justifies the choice of this region as the focus of this paper. The other oceanic regions that show large changes are the Barents Sea (north of Scandinavia), the Southern Ocean and the northern Pacific. The Barents Sea and Southern Ocean changes in F SW↑ are likely to be related to sea-ice changes. The North American and western European continental regions also show large changes that are often larger than those over the ocean regions. For the pre-1970 period, the UKESM and HadGEM models are consistent in that larger changes in F SW↑ occur in the western North Atlantic region than in the east, suggesting a connection with pollution outflow from North America. This is also true for the post-1970 period for the HadGEM model, but for the UKESM model there is a stronger change in the eastern part of the North Atlantic, suggesting that dif-D. P. Grosvenor and K. S. Carslaw: Short-wave flux trends in the North Atlantic ferent processes may be occurring compared to pre-1970 or potentially more natural variability in the spatial patterns.
We now discuss the potential drivers of changes in F SW↑ . Cloud fraction shows a small increase over the pre-1970 period ( f c = (8.4 ± 2.9) × 10 −3 ), whereas over the post-1970 period there is a distinct decrease ( f c = (−33.9±3.7) × 10 −3 ). The start of the negative trend in cloud fraction occurs at around the same time as the start of the negative trend in F SW↑ (1971). N d shows a large increase over the pre-1970 period ( N d = 46.3 ± 8.7 cm −3 ) and, similarly to F SW↑ and f c , decreases ( N d = −20.2 ± 6.4 cm −3 ) after around 1971. Aerosol optical depth at 550 nm (τ a , including dust) shows very similar trends to N d although it is more variable.  On the whole the changes in variables and trends in the HadGEM model are very similar to those for UKESM1 although with slightly smaller magnitude changes in F SW↑ , N d and L and larger magnitude changes in τ a (see Table A1 and Appendix A for details on the HadGEM results). In addition, there is a notable difference in the magnitude of F clear-sky SW↑ with HadGEM being around 1 W m −2 higher than UKESM1, which is consistent with the higher τ a values. The reasons for this are left to other work to explore. Due to similarity of the two models, we mostly focus on the UKESM1 model for this paper and show results from HadGEM in Appendix A.

Decomposing the F SW↑ trends in UKESM1 into contributions from individual variables
The above results show that the increase in F SW↑ over the pre-1970 period is likely to be caused by a combination of increases in N d , L and F clear-sky SW↑ since there is little change in f c . In contrast, for the post-1970 period the F SW↑ decrease is likely to be caused by decreases in f c , N d and F clear-sky SW↑ since L is fairly constant. To quantify the relative contribu- Table 1. Changes in radiative fluxes and cloud properties ( x values) over the 1850-1970 and 1971-2014 periods for the ensemble mean time series of the UKESM1 and AerChemMIP simulations. Also shown are the minimum and maximum x values across the UKESM1 16-member ensemble. The uncertainties are from the uncertainty in the fit lines used to calculate x . Period  . Time series of annual mean F SW↑ as calculated using the offline radiative transfer model (labelled "Calculated") and that directly from the model output for the UKESM1. The blue shading denotes ± 2 times the standard deviation across the ensemble for the model output data. The region for area averaging is the same as for Fig. 2. tions of the changes in cloud properties to the changes in F SW↑ , we first recreate the F SW↑ flux time series using offline radiative flux calculations with monthly ensemble mean f c , N d , L, F clear-sky SW↑ , downwelling SW at TOA and A surf as inputs following the technique described in Grosvenor et al. (2017) and Grosvenor and Carslaw (2020) for TOA fluxes. The approach used here differs slightly to those studies due to the inclusion here of F clear−sky SW↑ from the model for the clear-sky regions rather than assuming a constant transmissivity. Multiple scattering between the surface and cloud is also included here following Seinfeld and Pandis (2006). The offline radiative flux calculations can then be used to quantify the individual contributions from the changes in the different cloud properties. Figure 4a compares the reconstructed F SW↑ flux time series with the time series from the model (i.e. that calculated online by the UKESM1 at each radiation time step of the model, as previously shown in Fig. 2). The inter-annual variability of the calculated fluxes match those from the model output very well. The F SW↑ values from the reconstructed time series are similar to the actual model values during the pre-1970 period and the post-1970 period (see Table 2), although with a 6 % overestimate for the pre-1970 period ( F SW↑ = 5.0 vs. 4.7 W m −2 for the estimated vs. actual values, respectively) and a 20 % underestimate in the absolute magnitude of F SW↑ for the post-1970 period ( F SW↑ = −4.8 vs.−6.0 W m −2 ). Despite these discrepancies, the appearance of a positive trend in the pre-1970 period and a negative trend in the post-1970 period, along with trends that are close to those from the full model gives confidence that the reconstructed radiative fluxes are sufficient for estimating the contributions from the individual cloud properties to F SW↑ .
The individual contributions to F SW↑ were estimated by recalculating the F SW↑ fluxes and the linear trends in F SW↑ for the two periods while holding the other cloud properties fixed at the time-mean value for each time period.
For the pre-1970 period, all variables cause an increase in F SW↑ trend. The trend in N d contributes most to F SW↑ ( Fig. 5 and Table 2) with 58.6 % of the total, followed by and L decrease slightly over this period but have minimal influence on the F SW↑ change (10.1 % and 7.5 %), respectively, and with large uncertainties. There is a small residual of −0.57 %.
These results show that the long-term changes in F SW↑ over the pre-1970 period are dominated by cloud brightening via the Twomey effect (i.e. an increase in N d with other cloud properties held constant). The increase in the macrophysical cloud properties, L and f c , which account for a combined 33.8 % of F SW↑ , could indicate some cloud adjustments in response to changes in N d but could also be influenced by non-aerosol factors such as changes in SST, air temperature, or atmospheric and oceanic circulation. These effects will be discussed in the next section. For the post-1970 period, the Twomey effect (−0.98 W m −2 ) is considerably smaller in magnitude than for the pre-1970 period (2.9 W m −2 ), because N d is only −20.2 ± 6.4 cm −3 in the post-1970 period compared to 46.3 ± 8.7 cm −3 in the pre-1970 period. Another factor is that cloud albedo, and hence F SW↑ , is more sensitive to changes in N d when N d is lower , so the reduction in N d between its peak in 1971 and 2014 will have had less effect on F SW↑ compared to the same N d in the pre-industrial-like conditions of 1850; F SW↑ for the post-1970 period is 34 % of the pre-1970 value, whereas the post-1970 N d is 44 % of the pre-1970 value. The much larger change in f c during the post-1970 period compared to the pre-1970 period suggests that the reduction in f c is unlikely to be dominated by cloud adjustments to aerosol given that the change in N d is much smaller over the post-1970 period than over the pre-1970 period. There are several factors that could influence the macrophysical cloud changes during the two periods, and we now attempt to quantify the influence of the individual drivers.

Quantifying the effects of individual emission types on F SW↑ and cloud variable changes
So far we have attributed the changes in F SW↑ to changes in cloud and aerosol properties. We now attempt to attribute the changes in radiative fluxes and the associated cloud variables to changes in emissions (see Sect.  Figure 6 shows the time series of F SW↑ and the cloud variables expressed as an anomaly relative to the 1850-1859 mean for the AerChemMIP aerosol-only and greenhouse gas-only proxies (see Sect. 2.2.2). Anthropogenic aerosol emissions (AerChemMIP-aerosol-only-proxy) generally cause an increase in F SW↑ , whereas greenhouse gas emissions (AerChemMIP-GHG-only-proxy) cause a decrease. When all emissions are applied (AerChemMIP-allemissions), the effects of aerosols and greenhouse gases act in opposite directions, resulting in a smaller-magnitude change in F SW↑ than would occur with only one of the emission types. For the majority of the time series, changes in aerosols have the most influence; therefore, there is an overall increase in F SW↑ over most of the time series. However, by the end of the time series, F SW↑ is similar to the value at the start. Figure 7 and Table 1 summarize the contributions of each emission type to F SW↑ in UKESM1. For the pre-1970 period, the F SW↑ estimated to be due to aerosol emissions is 6.6 ± 1.7 W m −2 (see Table 1), which is much larger in magnitude than the reduction in F SW↑ caused by greenhouse gas emissions (−2.3 ± 0.73 W m −2 ). However, the reduction due to greenhouse gases is still important and shows that in the models with all emissions applied the effect of SW aerosol forcing is offset by around 35 % by opposing greenhouse gas effects. For the post-1970 period, there is less contribution from aerosol emissions (−3.4 ± 3.3 W m −2 ), which is consistent with the smaller-magnitude change in N d due to aerosol emission reductions (−20.9 ± 6.7 vs. 47.7 ± 4.7 cm −3 for the pre-1970 period). There is a similarly sized negative contribution from greenhouse gas emissions (−2.8 ± 1.8 W m −2 ).
For F

clear-sky SW↑
, only the aerosol emissions drive meaningful trends, suggesting that greenhouse gas-driven changes in clear-sky SW are negligible (e.g. those caused by changes in water vapour or gaseous absorption).    We next consider how the individual emission types affect the underlying cloud variables that were shown in the previous sections to drive the changes in F SW↑ . Figure 7 shows the overall changes in f c , N d and L for AerChemMIP-all-emissions, AerChemMIP-aerosolonly-proxy and AerChemMIP-GHG-only-proxy.   Figure 5 showed that in the pre-1970 period for the UKESM1 run there is little net contribution to F SW↑ from changes in f c with changes in N d dominating. The results for the UKESM all-emissions run in Figs. 6 and 7 show that this is due to a fairly small net change in f c during this period for UKESM1 relative to the post-1970 period. However, the AerChemMIP experiments suggest that this small change in f c is the result of opposing large changes due to the aerosol and greenhouse gas emissions. Changes in N d (and τ a ) are dominated by the aerosol emissions during both periods with virtually no contribution from greenhouse gases. This indicates that the substantial changes to climate from greenhouse gases have no effect on N d or aerosols in this model. It is conceivable that changes in cloud location, cloud coverage, updraught speeds or precipitation in response to greenhouse gases could affect N d , but this appears not to be the case for this model.
The dominant driver of L (Figs. 6 and 7) during the pre-1970 period is aerosol emissions (AerChemMIP-aerosolonly-proxy), and there is no significant change in L due to greenhouse gas emissions. During the post-1970 period, contributions to L from greenhouse gases are near zero, and there is a small negative aerosol contribution. However, the uncertainties for this period are larger than the values indicating that they are likely spurious.

Effect of emissions on surface temperature
During the pre-1970 period, the warming from greenhouse gases (0.68 ± 0.14 K) and the cooling from aerosols (−0.82 ± 0.17 K) roughly cancel out to give a net change in temperature that is nearly zero (−0.14 ± 0.19 K). During the post-1970 period, greenhouse gases produce a warming of 0.80 ± 0.25 K that is similar to that for the pre-1970 period, although it occurs within a shorter time frame (i.e. the trend is larger). Aerosol emissions declined during the post-1970 period; hence, there is also a warming effect from aerosols of 0.43 ± 0.12 K, which is around half the greenhouse gas warming.

Decomposing the F SW↑ trends in the single-emissions experiments into contributions from individual cloud and aerosol variables
In this section we perform the same analysis as in Sect. 3.2 to quantify how much the individual changes in aerosol and cloud properties contribute to F SW↑ except for the singleemissions experiments (aerosol-only and GHG-only). It is clear from the DAMIP experiment results in Figs. A1 and A3 and Table A1 (see Appendix A) that the DAMIP natural aerosol forcing, which comes mostly from volcanic aerosols, has almost no influence on the F SW↑ trends; therefore, we do not consider natural aerosols further. However, there could be influences from natural aerosols that are not captured by the DAMIP natural emissions such as feedbacks between seaspray CCN and temperature. Some of these will be represented in the experiments used here such as the effects on sea spray from changes in wind speed as a result of temperature change. Our results (Table 1 and Fig. 6d) show that there is little change in N d in the AerChemMIP-GHGonly-proxy experiment (1.0 ± 0.49 cm −3 for the pre-1970 period and 0.54 ± 0.91 cm −3 for the post-1970 period) in which large changes in temperature occur, which suggests little influence of climate feedbacks on CCN. Our results are likely to exclude the impact of changes in sea spray due to changes in sea-ice coverage since we deliberately excluded sea-ice-covered regions. Therefore we calculated the changes in N d for only the sea-ice regions and found values of 0.57±0.47 and −0.84 ± 0.74 cm −3 for the pre-and post-1970 periods, respectively, suggesting that the effect is small for this model. Figure 8 shows the contributions to F SW↑ from the changes in the different aerosol and cloud variables for the AerChemMIP-aerosol-only-proxy run calculated, as in Sect. 3.2, using offline radiative calculations. Percentages are quoted relative to the offline-estimated total F SW↑ for the AerChemMIP-aerosol-only-proxy (6.7 ± 1.2 W m −2 ) rather than the actual F SW↑ (6.6 ± 1.7 W m −2 ). N d provides the largest contribution during the pre-1970 period (3.0 ± 0.36 W m −2 or 44.2 % of the total). The f c contribution is significantly smaller (1.9 ± 1.1 W m −2 or 27.8 %) with the L contribution (21.8 %) being slightly smaller still. The F clear-sky SW↑ contribution is small and uncertain at 0.53 ± 0.53 W m −2 or 7.8 %.

Aerosol-only emissions
The small F clear-sky SW↑ contribution in the pre-1970 period indicates that the ARI forcing is quite small, which is consistent with Grosvenor and Carslaw (2020). The large N d contribution shows that the Twomey ACI effect is very important in driving the F SW↑ from aerosols. However, the contributions from changes in the cloud macrophysical properties (f c and L) are slightly more important than the Twomey ACI effect when considered together, comprising 49.1 % of the F SW↑ change compared to 44.2 % from the cloud microphysical response (i.e. due to N d changes). However, in Sect. 3.5.2 we show that some of the changes in f c and potentially in L are due to cloud feedbacks that are likely to have been induced by changes in temperature, and hence they do not solely represent forcing via cloud adjustments.
For the post-1970 period, the contribution to the total F SW↑ (−2.7 ± 1.9 W m −2 ) from changes in N d is −0.95 ± 0.38 W m −2 . The contribution from changes in f c is also negative and of a similar magnitude but highly uncertain (−1.1 ± 1.6 W m −2 ). The L and F clear-sky SW↑ contributions (−0.41 ± 0.67 W m −2 and −0.33 ± 1.3 W m −2 , respectively) are smaller and also very uncertain. Changes in the macrophysical cloud properties (f c and L; 54.1 %) therefore dominate over those of the microphysical variables (N d ; 34.7 %), although the macrophysical contributions are highly uncertain.

Greenhouse gas-only emissions
The effects of greenhouse gases on F SW↑ are almost entirely driven by changes in f c for both the pre-1970 period and the post-1970 period (Fig. 9) with a larger magnitude  of contribution for the post-1970 period (−2.0 ± 0.84 vs. −1.5 ± 0.54 W m −2 in the pre-1970 period) despite the post-1970 period being a shorter span of time. This is likely due to an enhanced rate of greenhouse gas emissions during the post-1970 period resulting in a more rapid temperature increase.

Aerosol forcing vs. cloud-climate feedbacks
Here we examine the relative roles of aerosol forcing and feedbacks resulting from climate change (temperature, atmospheric/ocean circulation changes, etc.) on the change in F SW↑ and the cloud variables.
Aerosol forcing is the change in F SW↑ caused by a change in aerosols without a change in climate (SSTs, water vapour, atmosphere and ocean circulation, etc.). This includes rapid cloud adjustments of f c and L which are potentially a major cause of changes in F SW↑ .
For the greenhouse gas-only runs, we assume that the changes in F SW↑ , f c and L are due to climate feedbacks with no effect of greenhouse gases on cloud or clear-sky adjustments. However, we acknowledge that such effects may be possible. For example, the results of Andrews and Forster (2008) showed a −0.18 W m −2 global change in F SW↑ from greenhouse gas adjustments (termed semi-direct forcing in that paper) for the HadGEM1 model in a doubling CO 2 experiment. This would represent a small fraction (6.4 %) of the −2.8 W m −2 change from the AerChemMIP-GHG-onlyproxy run for the post-1970 period (although the latter is for the North Atlantic region only) and is also likely to be an overestimate for our case since the change in CO 2 for the post-1970 period is less than a doubling. Furthermore, Fig. 7.4 of the AR6 assessment (Forster et al., 2021) esti-mates the global CO 2 adjustment effect to be around 5 % of the total ERF, although this is for the combined short-wave and long-wave values.
For the AerChemMIP-aerosol-only-proxy runs, changes in F SW↑ , f c and L are split between aerosol forcing and climate feedback terms using two different methods. The first method estimates the feedback term as the change of the quantity ( X aer , where X represents either F SW↑ , f c and L for the AerChemMIP-aerosol-only-proxy run) minus the change in X induced by the aerosol effective radiative forcing ( X eff aer ): See Box 3 in Fig. 1. Here, X eff aer was calculated using the results from the nudged runs of Grosvenor and Carslaw (2020) (see Box 2 in Fig. 1, Sect. 2.7 and Appendix C).
The second method estimates the change in X due to climate feedbacks ( X T feedback aer ) in the AerChemMIPaerosol-only-proxy run using the temperature change in that run ( T aer ) based on the sensitivity of X to temperature in the AerChemMIP-GHG-only-proxy run: where X GHG and T GHG are the changes in X and temperature, respectively, in the AerChemMIP-GHG-only-proxy run.
The climate feedback term could include several processes. For example, aerosol and greenhouse gas forcing can change global and local temperatures and sea ice which can then cause changes in atmospheric and oceanic circulation, and subsequent changes in the distribution of aerosols and clouds. There is evidence that warming can cause an expansion of the Hadley cell and a poleward shift of the storm tracks (Held and Hou, 1980;Lu et al., 2007;Seidel et al., 2008) that can reduce mid-latitude cloudiness (Norris et al., 2016). Cooling would have the opposite effect, leading to increases in F SW↑ in the North Atlantic region. It has also been suggested that aerosols may have a local influence on the Atlantic Meridional Overturning Circulation (AMOC) that is more direct than the effect of aerosols on large-scale temperatures (Yu and Pritchard, 2019;Robson et al., 2022). Menary et al. (2020) show that the AMOC speeds up in the DAMIP-Hist-Aer run as a result of aerosol emissions, and it is feasible that changes in the AMOC could also lead to changes in cloud cover or properties and hence changes in F SW↑ .

Forcing vs. feedbacks for F SW↑
The balance between aerosol forcing and climate feedbacks is first examined for F SW↑ . Figure 10 shows that for both periods F eff aci is much larger than F eff ari for the AerChemMIP-aerosol-only-proxy run. For the pre-1970 period, the estimated aerosol ERF ( F eff aer = F eff ari + F eff aci ) of the AerChemMIP-aerosol-only-proxy run accounts for 77 % of the F SW↑ of the all-emissions run (see Table 3). Climate responses in the AerChemMIP-aerosol-only-proxy run (labelled "Aerosol Feedback" in Fig. 10) also account for 77 % of the F SW↑ of the all-emissions run, showing that the initial aerosol ERF and the subsequent climate feedbacks are equally important in causing changes in F SW↑ in the aerosol-only run. The F SW↑ from the AerChemMIP-GHG-only-proxy run (assumed to be all due to climate feedback) was −53 % of F SW↑ of the all-emissions run which brings the total of the aerosol forcing, aerosol-driven cloudclimate feedback and greenhouse gas-driven cloud-climate feedback terms to 100 %. Figures 6 and 7 show that during the pre-1970 period aerosols caused a cooling of around 0.85 K in AerChemMIP-aerosol-only-proxy. This is likely to have caused a climate response that affected F SW↑ , for example, via an increase in cloud fraction due to mid-latitude cloud feedbacks. The X T feedback aer value (Eq. 3) is another estimate of this cloud-climate feedback using the above temperature change for aerosol-only emissions and is shown in Fig. 10 as the "Aerosol Feedback from T " datapoint. It shows good agreement with the "Aerosol Feedback" value, suggesting that the local temperature change is a good indicator of the feedback contribution.
If F SW↑ from the greenhouse gas-driven cloud-climate feedback (from the AerChemMIP-GHG-only-proxy run) is added to the aerosol-driven cloud-climate feedback value ("Aerosol Feedback"), then we obtain an estimate of the overall change in F SW↑ due to feedbacks from both types of emissions. For the pre-1970 period, this overall feedback effect on F SW↑ , termed "Total (Aerosol Feedback + GHG) Feedback", is considerably lower in magnitude than the aerosol forcing term and accounts for 23 % of F SW↑ from the all-emissions run (cf. 77 % for aerosol forcing). This indicates that in the all emissions run, which is assumed to be the run most similar to the real world, the aerosol forcing has a larger influence on F SW↑ than climate feedbacks during the pre-1970 period. This dominance of aerosol forcing is mainly due to the cancellation of the warming effect of greenhouse gases and the cooling effect of aerosols (Figs. 6 and 7).
For the post-1970 period, the aerosol ERF is in the opposite direction and is smaller in magnitude than for pre-1970 as expected from the smaller-magnitude change in N d . The estimated change in F SW↑ due to aerosol-driven cloud feedbacks is now negative in contrast to the pre-1970 period, which is consistent with the increase in temperature caused by aerosols during the post-1970 period (Figs. 6 and 7). The sign of F SW↑ estimated from the temperature change ("Aerosol Feedback from T "; Eq. 3) is in agreement with the F SW↑ due to aerosol-driven cloud feedbacks, although it is a little lower in magnitude. For the post-1970 period, the total change in F SW↑ due to feedbacks associated with aerosols and greenhouse gases is considerably larger in magnitude (87 % of the all-emissions run value) than the overall aerosol forcing (13 %). This implies that observations of changes in F SW↑ over the post-1970 period cannot be used directly to evaluate aerosol forcing in models without taking account of feedbacks.

Forcing vs. feedbacks for f c , N d and L
The changes in cloud variables from the AerChemMIPaerosol-only-proxy run are further split into forcing and climate feedback components in a similar way to how the F SW↑ term was split earlier, i.e. using the results from the nudged runs of Grosvenor and Carslaw (2020) (see Sect. 2.7 and Appendix C). Note that for f c and L it is not possible to split the forcing into ARI and ACI terms since in Grosvenor and Carslaw (2020) this could only be done for F SW↑ .
For the pre-1970 period (Fig. 11), slightly more of f c (247 % of the f c of the all-emissions run) in AerChemMIPaerosol-only-proxy comes from the climate feedback effect rather than the aerosol forcing (206 %). Likewise, most of the L (Fig. 12) comes from the climate feedback (93 %) with 30 % coming from the aerosol forcing. Hence most of the contributions to F SW↑ in AerChemMIP-aerosol-onlyproxy from f c and L seen in Fig. 8 are from the climate responses to the increase in aerosol rather than cloud adjustments.
For the post-1970 period, the aerosol-induced changes in f c and L are negative, which is consistent with the sign of the aerosol forcing. The predicted aerosol forcings are very small for both variables. The estimated climate feedback terms are larger in magnitude than the aerosol forcings; however, the uncertainties in the aerosol-induced changes are large, particularly for L. Figure 7 showed that there was little change in L in the AerChemMIP-GHG-only-proxy run for either period. This is a little surprising since greenhouse gas forcing caused a large reduction in f c during both periods, presumably through climate response changes. Hence, given the estimated large response of L to climate responses in AerChemMIP-aerosolonly-proxy (Fig. 12), a fairly large climate response for L due to greenhouse gas forcing may have been expected. It is possible that the aerosol and greenhouse gas-induced climate responses are somewhat different and have different effects on clouds, although we also note that the L time series is particularly noisy (Fig. 6) such that the AerChemMIP-GHG-only-proxy error bar for L (Fig. 7) for the post-1970 period extends into negative values, and the error bar for AerChemMIP-aerosol-only-proxy in Fig. 12 is large enough to be consistent with a much smaller climate response or even a zero climate response with the aerosol forcing accounting for all of the change. However, the uncertainties for the pre-1970 period are much smaller, suggesting that the above arguments do not apply for that period. In that case the large increase in L in response to aerosol-induced climate feedbacks during the pre-1970 period when uncertainties were lower might indicate that some of the L during the post-1970 period was caused by a similar circulation change in reverse (due to the opposite sign of T over the two periods). for the AerChemMIP UKESM1 runs. "Aerosol-only" is the change in the AerChemMIP-aerosol-only-proxy runs as in Fig. 7 ( F SW↑ aer ). "ACI" and "ARI" are the aerosol effective radiative forcings ( F eff ari coupled and F eff aci coupled ). "Aerosol Feedback" is the climate feedback term for the AerChemMIP UKESM1 runs ( F SW↑ feedback aer ) calculated using Eq. (2), and "Aerosol Feedback from T " ( F T SW↑ feedback aer ) is that calculated using Eq. (3). "Total (Aerosol + GHG) Feedback" is the estimated total climate feedback in the all-emissions run ( F SW↑ feedback tot ) calculated by summing F SW↑ from the AerChemMIP-GHG-only-proxy run ( F SW↑ feedback GHG ) and F SW↑ feedback aer . Also shown is F SW↑ for the all-emissions UKESM1 AerChemMIP runs (AerChemMIP-all-emissions). Arrows in panel a are drawn to indicate values that add together to give other values on the plot (see Eq. 2 and Appendix D). These also apply to panel b and to all panels for Figs. 11 and 12 but are omitted for clarity. The black arrows also apply to the DAMIP experiments (Figs. A6, A7 and A8), but the orange ones do not. Arrows for F eff aer coupled = F eff ari coupled + F eff aci coupled have also been omitted. It is also possible that the magnitude of the aerosol forcing for L is underestimated, which would produce a smaller estimate for the magnitude of the climate feedback contribution for AerChemMIP-aerosol-only-proxy. Determining the reasons for the above surprising result is left to future work. As discussed in Sect. 3.3, Figs. 6 and 7 show that there is no change in either N d or F clear-sky SW↑ over the two periods in the AerChemMIP-GHG-only-proxy run despite the large climate responses to greenhouse gas emissions. It is therefore likely that there was also no impact upon N d or F clear-sky SW↑ from the climate responses in the AerChemMIPaerosol-only-proxy run and hence that the changes in these variables are almost entirely driven by the aerosol changes. This suggests that almost all of the F SW↑ that was apportioned to climate responses in the AerChemMIP-aerosolonly-proxy run (Fig. 10) was due to the associated changes in f c and L.

Comparison with observations
We now compare the modelled time series with observations. Reliable observations are only available in the later parts of the time series. For F SW↑ , we use the DEEP-C dataset (Allan et al., 2014b) that is available from 1985-2014; for N d we use MODIS data from 2003-2012 (see Sect. 2.5 for details); for τ a we use 2003-2012 Level-3 MODIS Aqua monthly mean data from the combined 550 nm Dark Target and Deep Blue product "Dark_Target_Deep_Blue_Optical_Depth_550_Combined" (Levy et al., 2013); for f c we use PATMOSx and ISCCP data from Norris et al. (2016Norris et al. ( ) for 1983Norris et al. ( -2009; for L all-sky we use the MAC-LWP (Multi-Sensor Advanced Climatology of Liquid Water Path) microwave satellite instrument dataset (Elsaesser et al., 2017) for 1988-2014 (L is not available from this instrument); and for surface temperature we use the data from the UKESM1 atmosphere-only model run that uses observed SSTs from 1985 to 2014 (chosen to coincide with the F SW↑ observations). Table 3. Contributions to changes in F SW↑ , f c and L from various processes as in Figs. 10, 11 and 12 along with the addition of the changes from the AerChemMIP-GHG-only-proxy run (assumed to be the climate feedback term for that run) for the UKESM1-based AerChem-MIP experiments. The percentages in brackets are the contribution expressed as a percentage of the contribution of the AerChemMIP-allemissions run.
Pre-AerChemMIP-all-emissions ( X all ) 4.3 ± 1.00 4.9 ± 6.1 4.3 ± 0.74 1970 AerChemMIP-aerosol-only-proxy ( X aer ) 6.6 ± 0.81 22.2 ± 7.5 5.3 ± 1.0 period ACI+ARI forcing (  Figure 13 shows the same time series as in Fig. 2 but with the observations added and with the trends shown for the period of the relevant observations. Figure 14 shows the modelled and observed trends for the two time periods along with uncertainties. It shows both the range of trends across the model ensembles and the trend from the ensemble mean (along with its uncertainty). It is clear that the modelled F SW↑ values are too high over the 1985-2014 time period and that the ensemble mean trends are too steep. There is a reasonable amount of spread across the model ensembles, but all of the ensemble members have a stronger trend than the DEEP-C data. However, the trends from some members are within the uncertainty of the observations. The results indicate that most ensemble members have a F SW↑ trend that is too steep, resulting in a F SW↑ that is too high.
Modelled N d trends and absolute values for UKESM1 are very close to those observed, although the time period is quite short and the uncertainties are large. We also note that the time-mean N d in this model tends to be underestimated in the north of the Atlantic and overestimated in the south ; hence, the good agreement may disguise some compensating biases. The HadGEM model slightly underestimates the absolute values and trend, suggesting that the larger aerosol forcing seen in UKESM1 and the larger-magnitude N d values pre-and post-1970 may be more realistic.
The absolute values of τ a match the observations (MODIS Aqua) well for UKESM1, but τ a is overestimated by HadGEM. Since N d was slightly underestimated by HadGEM, this demonstrates that τ a is not always a good proxy for N d . Similar reasoning may explain why there is a fairly small trend in τ a from the observations but a fairly large trend in the observed N d . The UKESM1 trend is slightly larger in magnitude than that from the observations, and the trend from HadGEM is larger still. However, there is considerable uncertainty in the observed trend and considerable spread in the trends across the ensemble members such that it is difficult to conclude that the model τ a trends are too large.
For f c , the observations are not useful to evaluate the absolute magnitude since they are only provided as anomalies, but they are useful for looking at trends. The modelled trends match the ISCCP trend well but slightly overestimate the magnitude of the PATMOSx trend. However, the observation time series is very noisy and the trends are uncertain. There is also a wide spread of model trends across the ensembles showing that cloud fraction trends over these lengths of time are highly variable such that some of the ensemble members agree with both sets of observations. This makes it difficult to evaluate the model against reality; only one realization out of a range of possibilities will have occurred in the real world. Since it was shown earlier that changes in f c are the main driver of the changes in F SW↑ in the post 1970 period (Fig. 5), the expectation is that the model mean f c trends would be too steep in order to produce the F SW↑ trends that were too steep. This is certainly possible given the uncertainties of the observations.
The observed L all-sky shows no trend and a high degree of time variability, whereas the models show negative trends that look similar to the f c time series. Since this is the all sky liquid water path, trends will include the effect of varying f c as well as of varying L. The lack of an observed trend might suggest that a small-magnitude f c trend occurred in reality in order to produce the small L all-sky trend, or it could indicate a compensating small L trend. A small f c trend would be consistent with the small observed F SW↑ trend and might indicate that the PATMOSx f c trend is more accurate so that the model f c trend magnitude is overestimated.
The surface temperatures in the model are too low, and the trends for most ensemble members and the ensemble means are too steep. However, there is a high degree of variability across the ensemble members, and some of the ensemble members do agree with the observations. The ensemble mean temperature trend being too steep is consistent with a picture of too much cloud reduction via cloud feedbacks to temperature, which would in turn cause too strong a reduction in f c , L all-sky and F SW↑ , which is consistent with the other results described in this section. It indicates that the model climate sensitivity is too strong, which may be related to the N. Atlantic cloud feedback (as also suggested in Andrews et al., 2019) but could also be due to unrelated factors.

What causes the too-large ∆F SW↑ ?
The question that arises is what causes the too-large F SW↑ in the models? Assuming that cloud feedbacks and aerosol forcing are likely the two main mechanisms that control F SW↑ , we can approximate F SW↑ as where T is the surface temperature, F eff aer is the aerosol forcing and λ is a measure of the cloud feedback strength. Thus, a too-large F SW could be due to a cloud feedback strength (λ) that is too strong, an aerosol forcing that is too strong, or a too-large T . To rule out the possibility that the F SW↑ model trend is too steep purely because of the too-large temperature trend rather than because the aerosol forcing or cloud feedback are too large, we now make an estimate of the error caused by the too-large model temperature trend alone. We do this using an estimate of λ calculated using the ratio of the change in F SW↑ over the different time periods to the change in temperature (T ) for the greenhouse gas-only runs: We assume that in the greenhouse gas-only run the effect of changes in temperature on clouds via cloud feedbacks is the only factor affecting F SW↑ , which is supported by Figs. 9 and A5. We then further assume that this value of λ applies to the all-emissions runs. Table 4 shows λ values for different time periods. The AerChemMIP-GHG-only-proxy estimates are consistent across the different periods with values ranging between −3.4 and −3.5 W m −2 K −1 . The DAMIP-Hist-GHG (HadGEM-based) value for 1850-1970 (−3.4 W m −2 K −1 ) is also consistent with these values, whereas the DAMIP-Hist-GHG estimates for the 1971-1985 and 1985-2014 periods are quite different (−2.5 and −1.8 W m −2 K −1 ). It has been noted previously that cloud feedback magnitudes can vary over time due to natural variability (Armour et al., 2013;Zhou et al., 2016;Andrews et al., 2018), and the HadGEM results may be indicative of such natural variability. Given the consistency of the UKESM1 results, we therefore choose the 1985-2014 λ value of −3.5 W m −2 K −1 for AerChemMIP-GHG-only-proxy since this is the period of interest when comparing with observations and noting that the HadGEM-based value was similar to this for the longer 1850-1970 period; the longer period is likely to reduce uncertainties from short-term variability. Using the larger-magnitude λ values from AerChemMIP-GHG-onlyproxy also leads to an upper limit on the estimate of the temperature-bias effect (see below).
Multiplying λ by the difference between the observed and modelled T values (i.e. T observed − T model ) gives an estimate of the correction to the modelled F SW↑ that is needed to estimate the F SW↑ from cloud feedbacks that would be produced by using the observed temperature trend in place of the modelled one F SW↑,corrected = F SW↑ + λ( T observed − T model ). (6) For the 1985-2014 period, the corrected estimate ( F SW↑,corrected ) for UKESM1 is −3.6 W m −2 (corrected from −5.0 W m −2 ) and for HadGEM it is −3.1 W m −2 (from −4.5 W m −2 ). These are closer to the observed value of −1.7 W m −2 but are still considerably too negative. This suggests that either the model cloud feedback (λ) is too strong or the aerosol forcing is too strong. Either of these scenarios would cause a temperature increase that is too steep and hence are also consistent with these factors playing a role in causing the too-large temperature increase. We also note here that using the smaller-magnitude λ values from DAMIP-Hist-GHG would lead to a smaller correction and hence would strengthen this conclusion.
A caveat here is that it has been shown that the specific global pattern of SSTs that occurred in reality is likely to have influenced the magnitude of cloud feedbacks and the climate sensitivity (Armour et al., 2013;Zhou et al., 2016;Andrews et al., 2018) in the real world; this is known as the "pattern effect". Thus it could be the case that the model cloud feedback response (i.e. λ), and by extension the model climate sensitivity, to a given pattern and magnitude of SST changes is reasonable, but the model is not capturing the correct pattern of SSTs, and hence this is why the F SW↑ trend is too steep. Figures 13 and 14 also show F SW↑ results from a single-member atmosphere-only (UKESM-AMIP) simulation where observed SSTs and sea-ice concentrations are imposed (see Sect. 2.2.3 for more details). It is clear that this run better matches the observed F SW↑ time series and trend, although the trend is still steeper than that observed. Figure 14 shows that the trend from the atmosphere-only run is actually very similar to the estimates made in the previous paragraph where we used the observed T to correct the F SW↑ (converted to a trend for Fig. 14) of the all-emissions runs (UKESM1 and HadGEM). This hints that the magnitude of Table 4. λ values (W m −2 K −1 ; see Eq. 5) for the UKESM-based (AerChemMIP-piAer) and HadGEM-based (DAMIP-Hist-GHG) greenhouse gas-only simulations (or proxies) for three different time periods. 1850-1970 1971-2014 1985-2014 AerChemMIP-GHG-only-proxy −3.4 −3.4 −3.5 DAMIP-Hist-GHG (HadGEM GHG-only) −3.4 −2.5 −1.8 the SST change may be more important than the spatial pattern for F SW↑ in the N. Atlantic, leaving open the possibility that the cloud feedbacks or aerosol forcing in the model are incorrect. However, the uncertainties are large and further work is needed to determine this.

Discussion and conclusions
In this study we used the HadGEM global coupled climate model and the UKESM1 Earth system model to explore the factors driving historical changes in F SW↑ for the North Atlantic region for ocean grid boxes that contained little sea ice. We found that there is a positive trend in F SW↑ between 1850 and 1970 and then a negative trend until 2014. The analysis shows that the pre-1970 trend is mainly driven by an increase in cloud droplet concentrations (N d ) due to increases in aerosol emissions, and the trend in the later period is mainly driven by a decrease in cloud fraction, likely due to cloud feedbacks caused by greenhouse gas-induced warming.
We also examined the relative effects of aerosol radiative forcing and climate feedbacks on the change in F SW↑ . In the pre-1970 period, aerosol-induced cooling and greenhouse gas warming roughly counteracted each other so that there was little cloud feedback effect. Therefore, in this period aerosol forcing is the dominant cause of changes in F SW↑ . However, in the post-1970 period the warming from greenhouse gases intensified, leading to a large warming over the North Atlantic and reduction in F SW↑ from cloud feedbacks. Combined with a reduction in aerosol forcing during this period, this led to temperature feedbacks dominating over the aerosol forcing. This is summarized in the schematic of Fig. 15. These results suggest that it is unfeasible to use the post-1970 period (during which there are useful satellite observations) to evaluate and constrain ACIs but that cloud feedbacks might be usefully evaluated, although it may be possible to identify smaller regions or specific times during the satellite era when the aerosol effects are stronger, e.g. when temperature changes are small.
Comparisons to satellite observations between 1985 and 2014 indicate that the model reduction in F SW↑ is too strong for both UKESM1 and HadGEM. The simulated increase in temperature during this period is also too strong. We analysed the extent to which the too-strong temperature trend could explain the excess F SW↑ via cloud feedbacks. However, we find that the bias in temperature trend can only  1985-2014 for the surface temperature UKESM atmosphere-only (AMIP) dataset (more data is available for surface temperature, but this period was chosen to coincide with the F SW↑ time period). For F SW↑ in (a), estimates of the model trend that would occur if the model surface temperature was correct (i.e. equal to the observed temperatures) are also shown ("using T obs "; F SW↑,corrected in Eq. 6). For the models, box and whisker plots of the trends across all ensemble members are shown along with the trend from the ensemble mean time series and its uncertainty. The box and whisker plots show the minimum and maximum as whiskers (or errors bars), except when there are outliers when the error bars are the minimum and maximum of the non-outlier values. Outliers are values that are more than 1.5 times the interquartile range away from the bottom or top of the box and are represented as plus signs. The box edges are the 25th and 75th percentiles, the line within the box is the median and the filled circle is the mean. account for part of the F SW↑ discrepancy given the estimated model feedback strength (λ = ∂F SW ∂T ). This suggested that UKESM1 and HadGEM have positive biases in λ or that the negative aerosol effective radiative forcings are too strong (a too-strong aerosol forcing would produce a positive bias in the temperature increase during the 1985-2014 period because aerosol emissions declined). A λ value that is too negative (too strong a cloud feedback) would directly impact the equilibrium climate sensitivity of the model (producing too much warming for a given forcing). Hence, biases in either the aerosol forcing or the feedback strength would have large implications for future climate projections for these models.
The analysis also hints that the pattern effect, whereby a particular spatial pattern of SSTs has a large influence on λ and climate sensitivity (Armour et al., 2013;Zhou et al., 2016;Andrews et al., 2018), is not having a large influence on λ for the North Atlantic region. This conclusion is based on the result that F SW↑ from the domain-mean time series for the 1985-2014 period from a simulation that used observed SSTs and sea ice (the atmosphere-only simulation) was similar to estimates made using the UKESM1 and HadGEM coupled model data with the surface temperature changes from the domain-mean time series substituted for the observed temperature change; this suggests that it is the magnitude of the temperature change rather than the spatial pattern that leads to a difference in F SW↑ between the coupled and the atmosphere-only simulations for the North Atlantic. However, the result may not extend to other regions, and uncertainties are large; further work is required to clarify this. Even if there was a large pattern effect, this would still require an explanation of why the model SST trends in the N. Atlantic were too steep and why the model SST pattern was incorrect. It is possible that the natural SST pattern exhibits a high degree of variability such that it might be difficult for a model to simulate the observed pattern, which may have been a low-probability event. We also note that some of the ensemble members did have reasonable N. Atlantic SST trends. On the other hand, the lack of SST agreement could indicate model issues.
If the model cloud feedback strength is too large, then the conclusion (based on the model results) that feedbacks are the dominant cause of the change in F SW↑ during the post-1970 period in the real world would be weakened. However, for the post-1970 period, the F SW↑ value from feedbacks would have to change from −5.4 to −0.83 W m −2 in order for the feedback and aerosol forcing effects to be equal. Therefore, the conclusion is likely to remain robust. On the other hand, if the model aerosol forcing is too large, then using the correct aerosol forcing would enhance the ratio between cloud feedback and aerosol forcing and hence strengthen the conclusions. Furthermore, the strength of the aerosol forcing was decreased during UKESM1 model development , showing that an excessive forcing strength is a long-standing concern of the model developers.
A recent paper (Dong et al., 2023) examined the individual effects of changes in SSTs/sea-ice extent (SIE), aerosol emissions and GHG emissions for a similar region to that studied here. They used the Met Office GA6.0 atmosphere and land model , which is an older version of the climate model used in this study. They used atmosphereonly simulations with SSTs taken from observations and examined differences between 2000-2015 time averages and 1980-1985 time averages, which is within the post-1970 analysed in our paper. They focused on the June, July and August (JJA) period. Their results showed that aerosol emission changes dominated the change in downwelling surface solar radiation (SSR) with little influence from SSTs/SIE or GHGs. The lack of influence from SSTs/SIE for that period is in contrast to our results where the cloud feedbacks (driven by SST changes) dominated over aerosol forcing in terms of producing changes in F SW↑ (and presumably SSR too). This difference in results could be due to a number of reasons. One is their use of observed SSTs in contrast to our simulations where the SSTs were predicted by the coupled ocean model. We showed earlier that the SST trends in our coupled model simulations for the post-1970 period were too strong, although correcting for that bias did not change our conclusions. Another potential reason is that there have been a number of advancements of the model between the version used in Dong et al. (2023) and that used in our paper. Those changes are likely to have affected the model feedback responses as well as the aerosol responses; hence, a different balance of aerosol forcing to feedbacks is perhaps expected. Finally, they focused on the JJA season, whereas we used annual averages. Further work is recommended to determine the reasons for these differences as well as to examine differences amongst various other models.
A final interesting implication that follows from our results is that the appearance of coincident peaks in the N d and F SW↑ time series from the UKESM1 and HadGEM models at around 1970 is due to chance. The decrease in F SW↑ after 1970 is almost entirely caused by the growing effects of greenhouse gas emissions on the larger-scale atmospheric and/or ocean circulation rather than the decrease in aerosols that also starts around 1970. Hence if the greenhouse gasrelated effects were shifted to earlier or later in the time series (e.g. due to the rapid increase in greenhouse gas emissions occurring earlier or later), we would expect the decline in F SW↑ to occur correspondingly earlier or later such that the peaks would no longer be coincident. This can be contrasted to the situation over land where the turning point in surface SW flux has been associated with a decline in aerosol emissions (Wild et al., 2005). Figure A1 shows the time series of F SW↑ and the cloud variables expressed as an anomaly relative to the 1850-1859 mean for the DAMIP experiments. This is similar to the AerChemMIP proxy time series shown in Fig. 6 except that for the DAMIP experiments the different emission types (aerosol, greenhouse gases and natural emissions) were applied individually, and so there is no need to approx-imate the effects of greenhouse gas-only and aerosol-only emissions as for AerChemMIP (see Sect. 2.2.2). Aerosol emissions (DAMIP-Hist-Aer) cause an increase in F SW↑ over time, whereas greenhouse gas emissions (DAMIP-Hist-GHG) cause a decrease. Natural aerosols (DAMIP-Hist-Nat) produce no trend over the historical period. The sum of the F SW↑ perturbations from the single-emissions runs matches the total from the HadGEM run (with all emissions) reasonably well, suggesting that the main forcing agents are accounted for and that the effects of the individual forcing agents can be combined in a linear sum to approximate the overall change in the full model.  Figure A3 summarizes the contributions of each emission type to F SW↑ in the two periods for the HadGEM model. For the pre-1970 period, the F SW↑ due to aerosol emissions (estimated from the DAMIP-Hist-Aer simulation) is 5.3 ± 0.81 W m −2 (see Table A1 for the values), which is much larger in magnitude than the reduction in F SW↑ caused by greenhouse gas emissions (from DAMIP-Hist-GHG; −1.9 ± 0.46 W m −2 ). However, this reduction is still important and shows that in the models with all emissions applied the effect of SW aerosol forcing is offset by around 36 % by opposing greenhouse gas effects. For the post-1970 period, there is very little contribution from aerosol emissions (0.10 ± 1.1 W m −2 ) despite a reduction in N d that is 40 % of the magnitude of the increase of the pre-1970 period. This differs from the results from the AerChemMIP proxy (Fig. 7) where the negative contribution to F SW↑ from the reduction in aerosol emissions during the post-1970 period was estimated to be larger in magnitude than that from the greenhouse gas emissions increase. For the DAMIP experiment, there is a relatively large negative contribution from greenhouse gas emissions (−2.3 ± 0.68 W m −2 ) for the post-1970 period. For both periods, there is very little contribution to F SW↑ from natural emissions, which justifies the assumption that aerosols and greenhouse gases are the main drivers of changes in F SW↑ that was made for the AerChemMIP calculations.     Here we consider how the individual emission types affect the underlying cloud variables that were shown in the previous sections to drive the F SW↑ changes (Fig. A3). The pre-1970 results for f c for the DAMIP experiments are very similar to those from AerChemMIP with opposing effects on f c from greenhouse gases and aerosols to give little overall f c change. However, post-1970 aerosol emissions actually cause a small increase in f c for DAMIP, which is consistent with the near-zero change in F SW↑ from aerosols. In contrast, for AerChemMIP aerosols caused a decrease in F SW↑ and f c .

A1 North Atlantic time series for HadGEM
The results for changes in L are broadly similar between the DAMIP and AerChemMIP, particularly for the pre-1970 period. However, for the post-1970 period the error bars are quite large, which likely explains any differences.

A3.3 Effect of emissions on surface temperature
For the pre-1970 period, the DAMIP results are similar to the AerChemMIP ones. For the post-1970 period, aerosol emissions in the DAMIP experiment cause the surface temperature to decrease slightly, whereas in AerChemMIP they caused a relatively large increase. This opposing behaviour is consistent with the decrease in F SW↑ caused by aerosols in AerChemMIP and near-zero change in F SW↑ in DAMIP since it is likely that the decrease in F SW↑ in AerChemMIP might cause a warming, but also the warming was shown to

A4.2 Greenhouse gas-only emissions
The results for the DAMIP-Hist-GHG simulations (Fig. A5) are very similar to those from AerChemMIP for both periods.
A5 Aerosol forcing vs. cloud-climate feedbacks A5.1 Forcing vs. feedbacks for F SW↑ The DAMIP results (Fig. A6) are similar to those from AerChemMIP for the pre-1970 period. However, for the post-1970 period cloud-climate feedbacks in the aerosolonly DAMIP-Hist-Aer simulation drive an increase in F SW↑ (consistent with the decrease in surface temperature) as opposed to a decrease in F SW↑ in AerChemMIP (consistent with the increase in surface temperature). As a result, the overall cloud-climate feedback term driven by the combination of aerosols and greenhouse gases is smaller for DAMIP than for AerChemMIP. However, the overall feed-6766 D. P. Grosvenor and K. S. Carslaw: Short-wave flux trends in the North Atlantic   back is still larger than the aerosol radiative forcing for DAMIP/HadGEM.

A5.2 Forcing vs. feedbacks for f c , N d , L
For the pre-1970 period, the DAMIP results for f c (Fig. A7) are again similar to those from AerChemMIP. For the post-1970 period, the cloud feedbacks cause an increase in f c for DAMIP-Hist-Aer, whereas they caused a decrease in AerChemMIP, which is consistent with the respective changes in temperature. For the pre-1970 period, the DAMIP results for changes in L (Fig. A8) are similar to those from AerChemMIP. As for AerChemMIP, there is a large estimated change in L due to the cloud-climate feedback term in the aerosol-only run when calculated as the difference between the total change in L and the estimated change from aerosol radiative forcing. Again, though, this feedback term is a lot larger than the feedback term estimated using λ (Eq. 5) from the greenhouse gas-only simulation and T from the aerosol-only simulation. This suggests that this discrepancy is not due to the particular model setup of either UKESM1 or HadGEM and that it is a robust result between the two sets of ensemble runs. We can speculate that the discrepancy might be due to the temperature change in the NA not being the controlling factor for cloud feedbacks onto L (e.g. the temperature change elsewhere may be more important) or that this result is spurious due to the noisy nature of the L time series (Fig. A1). Further research is needed to fully determine the cause.
For the post-1970 period, the overall feedback term for L for HadGEM is positive, whereas it was negative for UKESM1. This is partly driven by a positive feedback term in DAMIP-Hist-Aer (instead of negative in AerChemMIPaerosol-only-proxy) and a larger feedback term in DAMIP-Hist-GHG than in AerChemMIP-GHG-only-proxy. Again, though, the time series are noisy, and the error bars in Figs. 12 and A7 are large so that confidence in this result is low.

A6 Summary
The HadGEM-based DAMIP results are broadly similar to those from the UKESM1-based AerChemMIP experiments. The most prominent discrepancy is the lack of reduction in F SW↑ , f c and surface temperature during the post-1970 period for the DAMIP-Hist-Aer (aerosol-only) ensemble; all of these quantities reduce for the AerChemMIP-aerosol-onlyproxy ensemble. Here we can only speculate about possible reasons for this; further work would be needed to draw conclusions. One possibility is that the dedicated aerosolonly DAMIP-Hist-Aer simulation allows for the AMOC to increase in strength until 1970 due to the increase in aerosols over that period, which is a proven effect of aerosol forcing in many models (Menary et al., 2020;Robson et al., 2022). This may prevent a rapid response of the climate in the NA to the post-1970 reduction in aerosols due to inertia in the AMOC perhaps related to ocean heat storage, sea-ice changes, etc. Such effects would not be captured by the estimate of aerosol-only effects from the AerChemMIPaerosol-only-proxy time series. Another possibility is that the temperature and cloud-climate feedbacks in DAMIP-Hist-Aer are being controlled by changes in aerosols outside of the NA region where aerosols may continue to rise after 1970 (e.g. Asia). This idea is supported by the dominance of cloud feedbacks in determining the change in f c in DAMIP-Hist-Aer for the post-1970 period (Fig. A7). This hypothesis would require an explanation for why this result is not seen in the AerChemMIP-aerosol-only-proxy ensemble, suggesting a non-linearity between the effects of the aerosol-only and greenhouse gas-only simulations in explaining the allemissions simulation results. Here we examine the DAMIP single-emission experiment results. The DAMIP experiments may provide some extra insight into the range of possible behaviour given the likely high degree of natural variability. Furthermore, they are based on the HadGEM model rather than the UKESM1 model and hence may display some different behaviour due to the slightly different model physics and settings. We also use the DAMIP experiments in order to validate some of the assumptions made when using the AerChemMIP experiment to approximate single-emission experiments. For example, for the AerChemMIP experiment there is not a true aerosol-only or greenhouse gas-only simulation as there are for the DAMIP experiment. We make the assumption that aerosols and greenhouse gases are the main factors that cause changes in the variables of interest. This means that the AerChemMIP-piAer experiment, where the full set of historical emissions are used except for aerosols for which preindustrial (PI) emissions are used, would be equivalent to a greenhouse gas-only experiment. A proxy for an aerosol emissions-only experiment was estimated by subtracting the time series from AerChemMIP-piAer from the full emissions simulations. We do the same here for the DAMIP experiment to estimate the accuracy of the AerChemMIP estimate. Figure A1 shows the proxy aerosol emissions-only time series calculated using the DAMIP results (HadGEM minus DAMIP-Hist-GHG). Comparison of these with the DAMIP-Hist-Aer (true aerosol-only emission experiment) time series shows that the two time series and trends are very similar for all variables, suggesting that the approach used for the calculation of the AerChemMIP-aerosol-only-proxy time series is valid. Table B1 lists those results for F SW↑ . The estimated change in F SW↑ (aerosol proxy) for the pre-1970 period is very similar to that from the DAMIP-Hist-Aer experiment suggesting that the proxy works well. For the post-1970 period, the uncertainties in the F SW↑ values are large for DAMIP-Hist-Aer and the proxy, so a comparison is not meaningful. 6768 D. P. Grosvenor and K. S. Carslaw: Short-wave flux trends in the North Atlantic

Appendix B: Testing the assumptions made for AerChemMIP aerosol and greenhouse gas-only proxies
The table also compares the results of the offline radiative calculations used to estimate the change in F SW↑ and contributions to the changes in F SW↑ from changes in the different cloud variables for the aerosol-only runs. For the pre-1970 period, the aerosol proxy values agree with the DAMIP-Hist-Aer values within the uncertainties, suggesting that the aerosol proxy estimates are sufficient for the AerChemMIP UKESM1 analysis in the main text. The uncertainties are very large for the post-1970 period, and so it is difficult to assess the accuracy of the aerosol proxy method. However, the offline analysis of the contributions to the change in F SW↑ from the change in cloud variables is highlighted as being uncertain in the main text, and is not relied upon for the conclusions that are drawn. We also note that the use of the aerosol proxy only applies for the analysis of the aerosol-only emissions and not the other runs.

Appendix C: Estimation of aerosol forcing based on nudged simulations
Here we utilize output from the same controlled meteorology (nudged) simulations as used in Grosvenor and Carslaw (2020) but for the region of the North Atlantic used in the current paper, to estimate the changes due to aerosol that would occur in the coupled simulations (UKESM1, HadGEM, AerChemMIP and DAMIP) if there were no responses of the large-scale atmospheric or ocean circulation to the aerosol forcing, as was the case in the Grosvenor and Carslaw (2020) simulations where the nudging and use of prescribed SSTs prevents such responses. From the Grosvenor and Carslaw (2020) simulations, changes in various quantities and a change in N d were calculated, and therefore the sensitivities of the quantities to N d changes that result from changing from 1850 to 2009 aerosol emissions could also be calculated. We assume that the same sensitivity (when expressed in relative form) would occur in the coupled runs, and we then scale these sensitivities by the N d from the coupled runs for both the pre-1970 period and the post-1970 period to estimate the change in quantity x, where x is either F eff ari , F eff aci , f c or L. We express the sensitivities in relative form to account for the different mean values between the simulations of Grosvenor and Carslaw (2020) and the coupled simulations. We find that the aerosol increase in Grosvenor and Carslaw (2020) causes a 59 % increase in N d , a 1.1 % increase in f c , a 0.82 % increase in L and a 2.7 % increase in F SW↑ . The F SW↑ increase was further split into a 2.2% increase from F eff aci and a 0.5 % increase from F eff ari again based on the values of these from the nudged runs. We can then use these percentage changes in the various quantities (denoted x eff nudged (%)) and the percentage changes in N d ( N eff d nudged (%)) to estimate the percentage changes in x that would occur in the coupled simulations if the sensitivity of these quantities to N d were the same as in the Grosvenor and Carslaw (2020) simulations: x eff coupled (%) = x eff nudged (%) N eff d nudged (%) where N d coupled (%) = 100 N d coupled Here N d coupled is the change in N d over the period in question from the coupled run, and N d coupled is a mean value of N d from the coupled run used as a baseline for the relative changes. Here we use the mean over the first 5 years of the period. x coupled can then be calculated using where x coupled (%) comes from Eq. (C1), and x coupled is the mean value of x from the coupled run taken over the first 5 years of the period.

Appendix D: Summation of terms in feedback figures
The difference between the overall change in a quantity X in the AerChemMIP-aerosol-only-proxy experiment is denoted as X aer and is estimated as the difference between the all-emissions experiment (AerChemMIP-all-emissions) and the GHG-only proxy (AerChemMIP-GHG-only-proxy; see Sect. 2.2.2): For the AerChemMIP and DAMIP aerosol-only experiments, feedbacks are estimated from the difference between X aer and the change due to aerosol ERF ( X eff aer ) following Eq. (2). For the AerChemMIP experiments, following Eq. (D1), Eq. (2) becomes X feedback aer = X aer − X eff aer = X all − X GHG − X eff aer .
The total feedback is calculated in Figs. 10, 11 and 12 as X feedback aer plus the feedback from the GHG-only experiment ( X feedback GHG ). Since the feedback for the GHG-only experiment is assumed to be X GHG , we have the following (using Eq. D2): X feedback tot = X feedback aer + X feedback GHG = X feedback aer + X GHG = X all − X eff aer .
It then follows that the total feedback plus the aerosol forcing is X feedback tot + X eff aer = X all , such that this term is exactly equal to the change in X from the all-emissions run. However, this does not apply to the DAMIP results for which Eq. (D1) does not apply.