The decomposition of cloud–aerosol forcing in the UK Earth System Model (UKESM1)

Climate variability in the North Atlantic influences processes such as hurricane activity and droughts. Global model simulations have identified aerosol–cloud interactions (ACIs) as an important driver of sea surface temperature variability via surface aerosol forcing. However, ACIs are a major cause of uncertainty in climate forcing; therefore, caution is needed in interpreting the results from coarse-resolution, highly parameterized global models. Here, we separate and quantify the components of the surface shortwave effective radiative forcing (ERF) due to aerosol in the atmosphere-only version of the UK Earth System Model (UKESM1) and evaluate the cloud properties and their radiative effects against observations. We focus on a northern region of the North Atlantic (NA) where stratocumulus clouds dominate (denoted the northern NA region) and a southern region where trade cumulus and broken stratocumulus dominate (southern NA region). Aerosol forcing was diagnosed using a pair of simulations in which the meteorology is approximately fixed via nudging to analysis; one simulation has pre-industrial (PI) and one has present-day (PD) aerosol emissions. This model does not include aerosol effects within the convective parameterization (but aerosol does affect the clouds associated with detrainment) and so it should be noted that the representation of aerosol forcing for convection is incomplete. Contributions to the surface ERF from changes in cloud fraction (fc), in-cloud liquid water path (LWPic) and droplet number concentration (Nd) were quantified. Over the northern NA region, increases inNd and LWPic dominate the forcing. This is likely because the already-high fc there reduces the chances of further large increases in fc and allows cloud brightening to act over a larger region. Over the southern NA region, increases in fc dominate due to the suppression of rain by the additional aerosols. Aerosol-driven increases in macrophysical cloud properties (LWPic and fc) will rely on the response of the boundary layer parameterization, along with input from the cloud microphysics scheme, which are highly uncertain processes. Model grid boxes with low-altitude clouds present in both the PI and PD dominate the forcing in both regions. In the northern NA, the brightening of completely overcast low cloud scenes (100 % cloud cover, likely stratocumulus) contributes the most, whereas in the southern NA the creation of clouds with fc of around 20 % from clear skies in the PI was the largest single contributor, suggesting that trade cumulus clouds are created in response to increases in aerosol. The creation of near-overcast clouds was also important there. The correct spatial pattern, coverage and properties of clouds are important for determining the magnitude of aerosol forcing, so we also assess the realism of the modelled PD clouds against satellite observations. We find that the model reproduces the spatial pattern of all the observed cloud variables well but that there are biases. The shortwave top-of-the-atmosphere (SWTOA) flux is overestimated by 5.8 % in the northern NA region and 1.7 % in the southern NA, which we attribute mainly to positive biases in lowaltitude fc.Nd is too low by−20.6 % in the northern NA and too high by 21.5 % in the southern NA but does not contribute greatly to the main SWTOA biases. Cloudy-sky liquid water path mainly shows biases north of Scandinavia that reach bePublished by Copernicus Publications on behalf of the European Geosciences Union. 15682 D. P. Grosvenor and K. S. Carslaw: Aerosol forcing in the UKESM model tween 50 % and 100 % and dominate the SWTOA bias in that region. The large contribution to aerosol forcing in the UKESM1 model from highly uncertain macrophysical adjustments suggests that further targeted observations are needed to assess rain formation processes, how they depend on aerosols and the model response to precipitation in order to reduce uncertainty in climate projections.

Abstract. Climate variability in the North Atlantic influences processes such as hurricane activity and droughts. Global model simulations have identified aerosol-cloud interactions (ACIs) as an important driver of sea surface temperature variability via surface aerosol forcing. However, ACIs are a major cause of uncertainty in climate forcing; therefore, caution is needed in interpreting the results from coarse-resolution, highly parameterized global models.
Here, we separate and quantify the components of the surface shortwave effective radiative forcing (ERF) due to aerosol in the atmosphere-only version of the UK Earth System Model (UKESM1) and evaluate the cloud properties and their radiative effects against observations. We focus on a northern region of the North Atlantic (NA) where stratocumulus clouds dominate (denoted the northern NA region) and a southern region where trade cumulus and broken stratocumulus dominate (southern NA region). Aerosol forcing was diagnosed using a pair of simulations in which the meteorology is approximately fixed via nudging to analysis; one simulation has pre-industrial (PI) and one has present-day (PD) aerosol emissions. This model does not include aerosol effects within the convective parameterization (but aerosol does affect the clouds associated with detrainment) and so it should be noted that the representation of aerosol forcing for convection is incomplete.
Contributions to the surface ERF from changes in cloud fraction (f c ), in-cloud liquid water path (LWP ic ) and droplet number concentration (N d ) were quantified. Over the northern NA region, increases in N d and LWP ic dominate the forcing. This is likely because the already-high f c there reduces the chances of further large increases in f c and allows cloud brightening to act over a larger region. Over the southern NA region, increases in f c dominate due to the suppression of rain by the additional aerosols. Aerosol-driven increases in macrophysical cloud properties (LWP ic and f c ) will rely on the response of the boundary layer parameterization, along with input from the cloud microphysics scheme, which are highly uncertain processes.
Model grid boxes with low-altitude clouds present in both the PI and PD dominate the forcing in both regions. In the northern NA, the brightening of completely overcast low cloud scenes (100 % cloud cover, likely stratocumulus) contributes the most, whereas in the southern NA the creation of clouds with f c of around 20 % from clear skies in the PI was the largest single contributor, suggesting that trade cumulus clouds are created in response to increases in aerosol. The creation of near-overcast clouds was also important there.
The correct spatial pattern, coverage and properties of clouds are important for determining the magnitude of aerosol forcing, so we also assess the realism of the modelled PD clouds against satellite observations. We find that the model reproduces the spatial pattern of all the observed cloud variables well but that there are biases. The shortwave top-of-the-atmosphere (SW TOA ) flux is overestimated by 5.8 % in the northern NA region and 1.7 % in the southern NA, which we attribute mainly to positive biases in lowaltitude f c . N d is too low by −20.6 % in the northern NA and too high by 21.5 % in the southern NA but does not contribute greatly to the main SW TOA biases. Cloudy-sky liquid water path mainly shows biases north of Scandinavia that reach be-

Introduction
Uncertainty in the radiative forcing (RF) from aerosols is the largest of the climate RF uncertainties over the industrial period (Boucher et al., 2013). General circulation models (GCMs) that simulate a small magnitude of cooling from aerosols are able to reproduce the observed temperature record if they have a low climate sensitivity and vice versa, which results in large uncertainties in climate sensitivity and therefore also in temperature change predictions (Andreae et al., 2005;Golaz et al., 2013). Mülmenstädt and Feingold (2018) suggests that one reason for the lack of progress in reducing the uncertainty in aerosol forcing despite years of research is that there has been a lack of studies that target (via evaluation and improvement) the individual processes that cause aerosol-cloud interactions within GCMs.
Aerosol effective radiative forcing (ERF, which differs from RF in that all physical variables are allowed to respond to perturbations except for those concerning the ocean and sea ice; e.g. see Myhre et al., 2013) can be separated into a component due to aerosol radiative interactions (ARIs) that occur outside of clouds (sometimes also known as the direct effect) and a component due to aerosol-cloud interactions (ACIs, or indirect effects). The ACI ERF is often also broken down into two further components. The first is due to an increase in cloud droplet concentration (N d ) alone at constant liquid water content (LWC) and constant cloud fraction (f c ), which causes a decrease in the cloud droplet effective radius (r e ). Here, this is termed ERF N d (or often the Twomey effect; Twomey, 1977). The second ERF component concerns rapid adjustments of LWC (or the vertical integral of this, which is the liquid water path, LWP) for only the cloudy parts of model grid boxes (termed in-cloud LWP, or LWP ic here) and/or adjustments in f c that occur in response to the initial decrease in droplet size associated with the N d increase. These are termed ERF LWP ic and ERF f c , respectively.
The mechanisms that cause the adjustments involve several microphysical and thermodynamical processes (Albrecht, 1989;Stevens et al., 1998;Ackerman et al., 2004;Bretherton et al., 2007;Hill et al., 2009;Berner et al., 2013;Feingold et al., 2015). Simulation of adjustments in GCMs therefore requires the parameterization of many subgridscale processes, which are likely to be more difficult for GCMs to get right than ERF N d , where only the change in r e needs to be parameterized. It is therefore desirable to separate ERF N d and the adjustment effects within GCMs so that they can be evaluated (against observations and highresolution models) and improved individually. The observational constraint of ERF N d (which is likely easier than the constraint of adjustments) might then reduce the overall forcing uncertainty and highlight issues with the adjustment part of the forcing. Separating the adjustments into ERF LWP ic and ERF f c components will further facilitate more detailed process-level improvements. A further reason to separate the different components is that current models simulate the same forcing with different combinations of ERF N d and adjustment components (Gryspeerdt et al., 2020). Therefore, one aim of this study is to separate and quantify the ERF contributions from N d , LWP ic and f c changes in a GCM. A second aim is to also quantify the amount of aerosol forcing from a GCM that originates from different cloud types and changes between cloud types. This too will allow a more targeted model evaluation and improvement in future work.
The aerosol forcing of GCMs is also important regionally, for example, in the North Atlantic (NA) region, which is the focus of this paper. It has been suggested (Booth et al., 2012, hereafter B12) that surface radiative aerosol forcing is the dominant driver of the variability in multi-decadal NA sea surface temperatures (SSTs) for the ocean-atmosphere coupled GCM (the UK Met Office HadGEM2-ES model) that was used in the fifth Coupled Model Intercomparison Project (CMIP5). NA SST variability has been linked to impacts on important climate phenomena such as hurricane and tropical storm activity (Zhang and Delworth, 2006;Smith et al., 2010;Dunstone et al., 2013); rainfall anomalies in Europe and North America (Sutton and Hodson, 2005;Sutton and Dong, 2012); droughts in the African Sahel and Amazonian regions (Hoerling et al., 2006;Knight et al., 2006;Ackerley et al., 2011); Greenland ice-sheet melt rates (Holland et al., 2008;Hanna et al., 2012); anomalies in sea levels (McCarthy et al., 2015); and the midlatitude jet strength (Woollings et al., 2015). For a review of changes in the North Atlantic climate system (with a focus on more recent changes), see Robson et al. (2018).
B12 showed that HadGEM2-ES, which represented aerosol-cloud interactions, reproduced the observed NA SSTs with good fidelity in contrast to the CMIP3 models that mostly did not include aerosol-cloud effects. Furthermore, tests using constant aerosols clearly showed the impact of aerosols upon NA SST variability in HadGEM2-ES. Aerosol ARI forcing was found to be negligible in the NA compared to aerosol indirect forcing. The spatial patterns of the indirect forcing, the downwelling surface shortwave (SW) radiation anomalies and the SST anomalies were all consistent, indicating a link between the three. Moreover, a simulation with fixed SSTs showed similar incoming surface SW to that in the coupled model. This suggested that the simulated surface SW anomalies were not brought about by the modification of SSTs as a result of ocean dynamics or other processes.
Thus, the implication from the HadGEM2-ES model is that aerosol indirect forcing has a direct local impact on SSTs via the modification of surface downwelling SW radiation.
The claims made in B12 are based upon a GCM and not direct observations. As mentioned earlier, the aerosol forcing in GCMs is highly uncertain for many potential reasons. For example, B12 used a coarse N96 model resolution, and thus the model relies upon parameterizations to represent subgrid cloud formation and cloud-aerosol interactions. It could be the case that HadGEM2-ES overestimates the magnitude of the aerosol forcing and thus overstates its role in driving NA SST variability. Zhang et al. (2013) suggest that the HadGEM2-ES model has some shortcomings in its representation of the ocean that may affect its ability to properly simulate the influence of the ocean. Other papers also argue for an important role for ocean processes in determining the NA SST variability (Ba et al., 2014;Knight, 2005;Menary et al., 2015;Robson et al., 2016;R. Zhang et al., 2016;Yan et al., 2018), which may indicate a lesser role for aerosol than simulated in B12.
The aim of this paper is to quantify the mechanisms by which a global climate model (an improved version of the model used in B12) produces aerosol forcing, with a focus on the NA region. Some work breaking down the aerosol ERF of a different GCM into that due to changes in N d , LWP ic and f c has already been recently published (Mülmenstädt et al., 2019). They found that in the ECHAM-HAMMOZ model ERF LWP ic was quite similar to ERF N d for most latitudes, except in the Southern Ocean where there was a much larger ERF LWP ic contribution. ERF f c contributions were mostly between around 50 % and 75 % of the ERF N d contribution, but again in the Southern Ocean there was a larger ERF f c contribution than ERF N d contribution. This was also true in the polar regions. Globally, the overall contribution from adjustments (ERF f c and ERF LWP ic ) was −0.92 W m −2 , compared to an ERF N d contribution of −0.52 W m −2 , so that in this model more aerosol forcing is coming from the more complicated adjustment processes, highlighting the importance of evaluating these processes in more detail. Here, we perform a similar analysis but with a different model. This will allow the two models to be compared in terms of how they produce their aerosol forcing. A very different breakdown between the models would mean that one of the models was incorrect and allow the basis for more detailed evaluation. We also focus on the NA region and on surface aerosol forcing rather than top-of-the-atmosphere (TOA) forcing due to the potential importance of aerosol forcing for the climate variability via sea surface temperature changes there. The focus on the NA also allows a more detailed look at the processes than is possible from a global study. We also go beyond the study of Mülmenstädt et al. (2019) by also characterizing the cloud regimes and the changes in cloud regimes for which aerosol forcing predominantly occurs according to the model; this will then allow (in future work) these regimes to be comprehensively evaluated against observations and also against high-resolution models. High-resolution modelling needs to be targeted to a smaller selection of regimes given the high computational cost. In addition, observations will be used to evaluate the general model cloud properties (spatial positioning, areal coverage, thickness, droplet concentrations, etc.) since these will affect the resulting aerosol forcing.

Definition of liquid water path
Throughout this paper, the term "LWP" refers to the all-sky value (i.e. including both the cloudy and clear-sky portions of model grid boxes or observed regions) and LWP ic refers to the in-cloud liquid water path, which is that from the cloudy regions only. It is assumed here that LWP = f c LWP ic (e.g. as also in Seethala and Horváth, 2010).

Model details
We examine the aerosol-cloud interactions in the UKESM1-A model, which is the atmosphere-only version of the coupled UKESM1 (UK Earth System Model), which was submitted as part of the sixth Coupled Model Intercomparison Project (CMIP6). UKESM1 is based on the HadGEM3-GC3.1 physical climate model (Williams et al., 2017;Kuhlbrodt et al., 2018) but in addition couples several Earth system processes . These additional components include the MEDUSA ocean biogeochemistry model (Yool et al., 2013), the TRIFFID dynamic vegetation model (Cox, 2001) and the stratospheric-tropospheric version of the United Kingdom Chemistry and Aerosol (UKCA) chemistry model (Archibald et al., 2020). This version of the UKCA allows a more complete description of atmospheric chemistry compared to HadGEM3-GC3.1. For example, the latter uses an offline climatology for oxidants, whereas in the UKESM1 oxidants are treated explicitly.
The UKESM1-A atmosphere-only version differs from the UKESM1 in that it does not include the ocean and sea-ice models, MEDUSA and TRIFFID. Instead, the UKESM1-A configuration uses observed sea surface temperatures and sea-ice concentrations provided by the Program for Climate Model Diagnosis and Intercomparison (Taylor et al., 2000, https://pcmdi.llnl.gov/mips/amip/, last access: 11 December 2020. Vegetation (vegetation fractions, leaf area index, canopy height) and surface ocean biology fields (dimethyl sulfide and chlorophyll ocean concentrations) are prescribed from a member of the UKESM1 CMIP6 historical ensemble . This ensures that the prescribed vegetation and ocean biological fields mirror those simulated by the TRIFFID dynamic vegetation and MEDUSA scheme in the coupled historical run. The vegetation fractions are prescribed from time-varying annual means; leaf area index, dimethyl sulfide (DMS) and chlorophyll concentrations are monthly values from the time means of the 1979-2014 pe-riod; and the canopy heights are an overall time mean for 1979-2014. All other emissions of gases and particles from sea and land surfaces are prescribed from the CMIP6 inventories; see Mulcahy et al. (2020) for details of the specific implementation for the UKESM1.
The atmospheric component is the GA7.1 atmospheric configuration of the Unified Model (UM). Full details of this can be found in Walters et al. (2019) and Mulcahy et al. (2020). Here, we only describe in detail the features that are more relevant to aerosol-cloud interactions. We use an N96 horizontal resolution, which is 1.875 × 1.25 • (208 × 139 km) at the Equator. A total of 85 vertical levels are used between the surface and 85 km altitude with a stretched grid such that the vertical resolution is 13 m near the surface and around 150-200 m at the top of the boundary layer. We chose this resolution since it is the same as that used for long climate runs in the CMIP6 model intercomparison and is the same horizontal resolution as that used in the B12 study.
Aerosol number concentrations are treated prognostically with the Global Model of Aerosol Processes (GLOMAP) multi-modal scheme (Mann et al., 2010(Mann et al., , 2012, which uses five log-normal aerosol size modes and includes sulfate, sea salt, black carbon and organic matter chemical components. These aerosol components are treated as being internally mixed within each size mode. Mann et al. (2010) and Mann et al. (2012) provide further details with some small changes for the implementation within UKESM1-A described in Mulcahy et al. (2020). Mineral dust is simulated separately using the CLASSIC dust scheme (Woodward, 2001;Mulcahy et al., 2020).
Shallow, middle and deep convection are parameterized separately to other cloud types (see Walters et al., 2019, for details). The parameterizations do not take into account aerosol, nor droplet concentrations, and they use their own simplified microphysics scheme. Therefore, the representation of aerosol forcing is incomplete. For clouds that are not shallow, middle or deep convection (termed large-scale clouds), the UKCA-Activate scheme is used to calculate cloud droplet concentrations from the aerosol size distribution using the West et al. (2014) scheme based on the parameterization of droplet activation in Abdul-Razzak and Ghan (2000). Cloud droplet concentrations at cloud base are replicated vertically throughout contiguous columns of clouds. The cloud droplet activation scheme is a diagnostic scheme since it is run on each model time step without consideration of how many cloud droplets were present before. The cloud droplet number concentration is then passed to the radiation and microphysics schemes.
The large-scale cloud microphysics is a single-moment scheme, such that the mass of liquid water, but not the cloud droplet number, is advected by the model and retained in memory between model time steps. It is based on Wilson and Ballard (1999) but with improvements to the warm rain parameterizations suggested by Boutle et al. (2014), which include bug fixes, a treatment of rain fraction that is con-sistent with the prognostic rain formulation, a switch to the Khairoutdinov and Kogan (2000) parameterization for autoconversion and accretion, and a bias correction for the latter processes to deal with subgrid variability of clouds and rainwater. Relative to Wilson and Ballard (1999), there is also an improved treatment of drizzle rates (Abel and Shipway, 2007) and a prognostic treatment of rain that allows the three-dimensional advection of precipitation. The introduction of the latter required modifications to be made to the aerosol wet scavenging processes as described in Mulcahy et al. (2020). The bulk properties (cloud fraction, cloud liquid water content, vertical overlap, etc.) of large-scale clouds are parameterized using the prognostic cloud fraction and prognostic condensate (PC2) scheme (Wilson et al., 2008a, b) with modifications described in Morcrette (2012). The atmospheric boundary layer is parameterized using the turbulence closure scheme of Lock et al. (2000) with modifications described in Lock (2001) and Brown et al. (2008).
There are some differences in the treatment of aerosols in UKESM-A and the physical climate model (HadGEM-GC3.1) primarily related to natural aerosols, aerosol chemistry and the prescription of anthropogenic SO 2 emissions (see Mulcahy et al., 2020, for details).

Simulation details
The model is run from 1 March 2009 to 28 March 2010. The first 27 d were used to spin up the aerosol and chemistry fields, leaving 1 year of data for analysis. The time period was chosen to allow comparisons with relevant field campaigns that will be performed in future work. Two parallel simulations were performed: one using pre-industrial (PI) CMIP6 aerosol emissions from the year 1850 and the other using present-day (PD) emissions (i.e. the CMIP6 emissions corresponding to the simulation period). Both simulations were nudged every 6 h to ERA-Interim horizontal wind fields between ∼ 2277 and ∼ 47 251 m (applied between the 18th and 76th model levels from the surface). The nudging ensures that the meteorology in the two runs is very similar, thereby allowing cloud radiative effects due solely to the aerosol perturbation to be calculated. Following the recommendations of Zhang et al. (2014), we do not nudge the temperature field in order to allow fast-acting local responses to aerosolinduced temperature changes (such as those from precipitation suppression, ARI and semi-direct radiative effects). Instantaneous model fields are output every 27 h, which allows the sampling of the diurnal cycle and more complex output analysis than is possible using time-averaged data.
The results in the main body of the paper are based on meteorology and emissions from the period of 28 March 2009 to 28 March 2010. It is possible that the results presented vary depending on the chosen year since meteorology, cloud fields, etc. vary from year to year. To address this, we have also run the PI and PD simulations for an additional year for the period of 28 March 2010 to 28 March 2011. We

Surface forcing calculations and forcing partitioning
In this paper, we are concerned with the shortwave aerosol ERF at the surface since we are interested in the effects on SSTs and NA climate variability. To separate the aerosol ERF into ARI and ACI components, we use output from the triple calls to the radiation scheme that are made by the model every time step. One call calculates the surface SW fluxes taking into account all radiatively active components including aerosols and clouds (designated here as SW aerosol + cloudy ), one call calculates the fluxes with the background aerosol removed (i.e. the aerosol outside of clouds and interstitial aerosol; aerosol Twomey and adjustment effects on the clouds are still included here; SW clean+cloudy ), and one call calculates the fluxes with both the background aerosol and clouds removed (SW clean+clear ). These are detailed in Table 1. The instantaneous changes in SW due to ARIs and ACIs for a given grid box are then calculated as SW ARI = SW aerosol + cloudy − SW clean+cloudy SW ACI = SW clean+cloudy − SW clean+clear .
The corresponding instantaneous radiative forcings due to anthropogenic aerosols (ERF ARI and ERF ACI ) can then be calculated from simulations with PI and PD aerosol emissions: This is similar to the technique used in Ghan (2013) and Jiang et al. (2016). We also decompose the surface ACI aerosol forcing into contributions from changes in N d , LWP ic and f c between the PI and PD. This is achieved using an offline calculation of the surface SW fluxes, which allows the magnitude of each term to be assessed individually. The approach is described in Grosvenor et al. (2017) and in Appendix F. We also note that TOA fluxes could also be decomposed using this technique, but we focus on the surface forcing here.

Model evaluation against satellite observations
Here, we evaluate the PD simulation against satellite observations but limit our analysis to ocean regions due to the lesser reliability of satellite products over land. The motivation for the model evaluation is that without a good representation of cloud properties and spatial distribution it is likely that the model aerosol forcing will be incorrect. For example, if the simulated clouds have a cloud fraction that is too high, then a cloud albedo perturbation due to aerosol would be larger than it is in reality. Placement of clouds in the wrong locations could affect forcing via the differing incoming solar insolation, or it could affect their chances of interacting with sources of aerosol. Where possible, we use the COSP (Cloud Feedback Model Intercomparison Project Observation Simulator Package; Bodas-Salcedo et al., 2011) satellite simulator model output for the relevant satellite in order to get a fairer comparison between the model and satellite. This is particularly important when comparing cloud fractions since the cloud detection threshold (e.g. in terms of optical depth) varies between the different satellite instruments and needs to be consistent with the threshold LWC used to define a cloud in the model. Furthermore, COSP accounts for the effect of overlying layers of clouds, which affects the observation of underlying clouds.
Following Gustafson and Yu (2012), model biases are quoted in terms of the normalized mean bias factor (NMBF) and the normalized mean absolute error factor (NMAEF), which are both expressed as a percentage, in order to provide unbiased metrics that are symmetric about zero. These quantities are similar to the mean percentage bias and the root mean square error (RMSE) except that account is taken of whether the model is greater or lower than the observations. In addition, the NMAEF does not involve squaring the error, which can lead to an exaggerated influence from larger biases. Appendix B provides the definitions for these metrics. Table 2 lists these bias metrics (along with the spatial correlation coefficient r, between the model and the observations) for the different regions considered (see next section) and for the different cloud variables. It should also be borne in mind that, as well as being due to issues with the representation of clouds in the model, differences in cloud properties between the model and satellite might also be due to cloud adjustments to aerosol that are too strong or weak, and that it is difficult to distinguish between the two.

Cloud fraction evaluation
The distribution of time-mean low-altitude f c from the model is compared to CALIPSO (Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation) satellite lidar observations in Fig. 1. Version 3 of the CALIPSO-GOCCP (GCM Oriented Cloud Calipso Product; Chepfer et al., 2010) is used, which is available at https://climserv.ipsl.polytechnique.fr/ cfmip-obs/, last access: 11 December 2020. The model has a good spatial correlation with the satellite (r = 0.83) over the region shown, with both model and observations showing higher cloud fractions in the northern part of the NA where the time-mean values can reach around 70 % especially towards the west near the coast of Canada and north of Iceland and Scandinavia. More than 40 %-50 % of the low-altitude clouds in these regions are stratocumulus (Wood, 2012). The model has a positive bias of around 30 %-40 % in the region off the eastern Canadian coast and along the east coast of the US. There is also a band of higher cloud fractions (around 20 %-30 %) down the west coast of Africa in both the model and observations; this region is also associated with a high occurrence of stratocumulus (40 %-60 % of the low clouds according to Wood, 2012). Model biases are minimal in these regions. The model and observations both show lower cloud fractions in the southwest region of the domain.
The model captures the main pattern of low-altitude cloud fraction well, giving us some confidence that it provides a realistic representation of the broad cloud types and locations in the NA. However, the model uses observed SSTs, and the winds above the boundary layer are nudged to reanalyses, which will help the model to replicate the correct cloud patterns to some degree. But even with the correct SSTs and wind fields, accurate simulation of the spatial pattern of f c is a strong test of the model boundary layer and cloud schemes, in particular the correct vertical thermodynamical structure and entrainment, the corresponding cloud fraction response, etc. Furthermore, the free-running ocean-coupled version of the model (UKESM1; Sellar et al., 2019), which has no wind nudging and predicts the SSTs itself, also reproduces the cloud pattern and amount well (Robson et al., 2020). This also implies that the coupled model and the nudged model used here exhibit similar cloud regimes and gives more confidence that the results in this paper apply to coupled models.
We define a subregion (marked in Fig. 1 and denoted northern North Atlantic, or northern NA) to represent the region where the stratocumulus cloud fraction is large. We also define a second region immediately to the south of the northern NA region (southern North Atlantic, or southern NA; see Fig. 1) where the annual mean stratocumulus cloud cover is lower (Wood, 2012), indicating more broken stratocumulus and/or a lower frequency of occurrence of stratocumulus. The overall low-altitude f c area-weighted biases for these regions were 5.1 % and −23.9 % for the northern and southern NA regions, respectively (see Table 2).
For mid-altitude clouds (Fig. 2), the spatial pattern is again captured well by the model (r = 0.80). Percentage biases are generally low in the northern NA region with a mean model bias of −11.0 %. There are larger negative biases in the southern part of the domain with a mean bias of −25.8 % for the southern NA region; however, the observed cloud fractions are very low in those regions, making higher percentage biases more likely. There are also positive biases of up to around 30 % in the stratocumulus region north of Scandinavia.
The modelled high-altitude cloud fraction is generally biased high over the ocean, with mean biases of 7.3 % for the northern NA and 4.5 % for the southern NA, although the spatial pattern is again good (r = 0.79 for the whole region). Positive biases are particularly evident in the region north of around 60 • N, where biases of up to 90 % occur. Since low-altitude clouds are likely to be the most important for aerosol-cloud interactions, we will leave an investigation of the biases in the mid-and high-altitude clouds to future research.

LWP evaluation
Liquid water path is important for cloud optical depth and hence cloud albedo. For example, for a fully overcast cloud with a liquid water content that increases with height in a manner consistent with adiabatic uplift, the optical depth (τ c ) is proportional to LWP 5/6 ic N 1/3 d . Therefore, a given relative change in LWP ic produces more than twice the relative change in τ c that the same relative change in N d does. LWP ic is also important in terms of cloud physics since it helps to determine rain rates and droplet sizes.
Microwave satellite instruments such as AMSR-E (Advanced Microwave Scanning Radiometer for Earth Observing System; e.g. Wentz and Meissner, 2000), whilst only being able to retrieve over ocean surfaces, probably represent the most accurate of the available retrievals of LWP since they are not subject to the biases associated with retrievals of LWP that use a combination of visible and shortwaveinfrared wavelengths (e.g. Moderate Resolution Imaging Spectroradiometer, MODIS; Salomonson et al., 1998), they give a better representation of the total column LWP than from such retrievals, they are not affected by the presence of ice, and they are available during both the daytime and nighttime (O'Dell et al., 2008;Elsaesser et al., 2017). Note that LWP from AMSR-E is the all-sky LWP and so includes the zero values from clear regions and the LWP contributions from cloudy regions (as does the LWP output from the model), whereas MODIS retrieves the in-cloud LWP, LWP ic . However, AMSR-E retrievals are still subject to biases; Lebsock and Su (2014) suggest that the main cause of bias is the presence of rain (and the inability to directly detect whether rain is present). Therefore, as suggested in Elsaesser et al. (2017), we place more confidence in the microwave LWP observations when the ratio of the all-sky LWP to total liquid water path (TLWP) (the sum of LWP and RWP, where RWP is rainwater path) is high; we denote the ratio as f LWP . A caveat here, though, is that the partitioning of LWP and RWP from microwave instruments, and hence the estimate of f LWP , is itself uncertain.  Given the uncertainties when rain is present, we also initially only use the model LWP and not the RWP. Furthermore, we only consider the model LWP from the large-scale cloud scheme and not that from the convective parameterization. Convective LWP will be mostly associated with heavily raining environments and so the observations in such regions are again more uncertain. However, we examine and discuss how RWP and convective LWP and RWP change the model evaluation in Appendix D. The AMSR-E instrument is aboard the Aqua satellite, which has local overhead overpass times of 01:30 and 13:30 LT, and we use data from both overpasses. Therefore, when comparing to AMSR-E for LWP and RWP, the model local times within 3 h on either side of both of these times are used. This is important for LWP and RWP because cloud water content can have a large diurnal cycle. Figure 4 shows f LWP from the model and the AMSR-E satellite along with the model bias relative to the retrieval. f LWP is quite high throughout the region in both the model and the observations, with values over the ocean generally larger than around 0.8. The model bias is generally small (< 10 %), except off the east coast of the US where there is an overestimate, although with biases mostly less than 0.1 (∼ 15 %). This region corresponds to a region of high LWP (see Fig. 5) and might suggest that too much rain is produced by the model, although, given that f LWP is low, it is also a region where the satellite estimate of f LWP is likely to be more uncertain. The largely good agreement between the model and satellite for f LWP provides some confidence in the ability of the model to represent cloud and rain formation processes, but insofar as we assume the model to be realistic, it also lends confidence to the satellite estimates of this quantity for this region, which, as explained above, is also uncertain.
LWP ic values were estimated by dividing the time-mean all-sky LWP by the time-mean low-altitude cloud fraction from CALIPSO. The same was done for the model using the time-mean CALIPSO low cloud fraction from the COSP  . Time average of f LWP (ratio of LWP to the sum of LWP and RWP). For the model, LWP and RWP contributions from both the large-scale cloud scheme and the convective parameterization are included. The RWP for AMSR-E is calculated from the retrieved LWP and rain rate by inverting the retrieval algorithm, as described in Elsaesser et al. (2017). The same raindrop size distribution and fall speed relationship as for the model is used. Both daytime and nighttime AMSR-E overpasses are used. satellite simulator. For reference, the evaluation of the gridbox mean LWP (including cloudy and clear regions) is shown in Appendix D. Figure 5 evaluates the time-mean LWP ic with no filtering using f LWP . The bias plot in Fig. 5 shows the 0.9 value of the time-mean satellite f LWP contour to highlight regions where the satellite data are likely to be more reliable. The spatial patterns of the model and satellite have a spatial correlation r value of 0.68. There is a negative model bias off the east coast of Florida, although there is generally a lot of rain present in this region, as indicated by the f LWP contour and Fig. 4, such that the satellite measurements are more uncertain. There is also a negative model bias between Greenland and Canada and on the east side of Greenland where the f LWP is high. The grid-box mean LWP is also biased low there (Fig. D1), indicating that the model clouds are too thin (rather than the cloud fraction too low; see also Fig. 1). Positive biases occur off the west coast of Africa at around 18 • N; however, the LWP ic is very low in this region. Further positive biases occur in the stratocumulus to the north of Scandinavia. Again, these are not associated with cloud fraction biases (see Figs. D1 and 1), suggesting model stratocumulus clouds that are too thick. Figure 6 compares LWP ic after filtering both the model and satellite data (before time averaging) to only include data points for which f LWP > 0.99 (hereafter LWPic 0.99 ). This selects cloud scenes that are likely to be non-precipitating and non-convective clouds, which are likely to be either stratocumulus or shallow cumulus clouds. The regions of stratocumulus north of Scandinavia and around Iceland where there were positive biases with no filtering for f LWP > 0.99 now have even larger LWP ic biases, again indicating that the Table 2. Model evaluation statistics for the various subregions using time-averaged data. The normalized mean bias factor (NMBF) and the normalized mean absolute (NMAEF) error factor statistics are used following Gustafson and Yu (2012); see Appendix B for definitions. r is the spatial correlation coefficient between the model and observed time averages. All values are area weighted to account for the variation in area of the model grid boxes. model low clouds may to be too thick. The negative biases west of Greenland remain. Since there is generally little RWP in this region and the f LWP filtering has been performed, this indicates that the LWP ic evaluation in this region is likely to be robust. There are now negative biases extending from the coast of Florida up to near the UK, suggesting that clouds there are too thin, although instrumental uncertainties may still be high here given the uncertainty in the satellite estimate of f LWP and the larger amount of RWP that occurs here. The LWP ic values from the AMSR-E observations in Fig. 6 show remarkably little spatial variability, suggesting that non-precipitating clouds tend to be fairly uniform across broad regions with an LWP ic value of around 60-100 g m −2 . Cloud fraction ( Fig. 1) and N d (Fig. 7) vary quite widely across the same region, potentially indicating a physical process that maintains a constant LWP ic despite a varying aerosol environment and cloud regime.

Cloud droplet concentration evaluation
Cloud droplet concentration (N d ) is an important quantity because it generally represents the first step in the chain of processes by which aerosols affect clouds. N d gives some indication of the number of cloud condensation nuclei (CCN) that were available to produce clouds as well as other factors  such as updraft speed, droplet collision coalescence, droplet scavenging by rain, cloud evaporation, etc. Thus, an evaluation of model N d can give an idea of how well the model captures a range of processes.
Here, we evaluate N d using a 1×1 o resolution data set calculated from MODIS retrievals of τ c and r e . Twodimensional fields of N d are derived by the retrieval since it is assumed that N d is constant throughout the depth of the cloud, which has been shown to be a good approximation by aircraft observations of stratocumulus (Painemal and Zuidema, 2011). Details of the retrieval are given in Appendix A. For the model, two-dimensional N d fields were obtained from the instantaneous 3D N d fields by calculating a weighted vertical mean N d , with the LWC on each level used for the weights. This ensures that the levels with the most LWC contribute most to the average N d , which is similar to what is assumed in the MODIS retrieval since most of the r e signal comes from near cloud top where the LWC is assumed to be the largest and the N d calculation is a strong function of r e . It also avoids contributions from very thin clouds that would not be detected by MODIS. Only data points for which the liquid cloud fraction is larger than 80 % and for which the mean cloud top height is below 3.2 km were included for the satellite N d calculation in order to help exclude satellite retrieval errors (see Grosvenor et al., 2018b). The same filtering was applied to the model to minimize sampling errors; the COSP MODIS liquid cloud fraction and a calculation of model cloud top height were used for this. The satellite data set was regridded to the model grid before comparison. Figure 7 shows that the modelled and observed N d values are largest near the continents and that N d decreases towards the middle of the Atlantic Ocean. This fits with the idea that CCN are scavenged during eastward transport . There are also likely to be some dilution effects as distance from the sources increases. Other factors may also influence the spatial N d pattern such as changes in boundary layer height, predominant cloud type, other meteorological factors, etc. CCN concentrations also increase close to the European and African source regions, consistent with the prevailing northerly wind, which transports European and African pollution down the coast. Overall, the model produces a reasonable spatial pattern with a spatial correlation coefficient of 0.67 for the whole region. However, the model has a tendency to overestimate N d over the southern part of the North Atlantic region, off the east coast of the UK and over Europe. Conversely, the model underestimates in the northern part of the domain. The NMBF is −20.6 % for the northern NA region where stratocumulus dominates compared to 21.5 % for the southern NA region where the clouds are more broken.

Shortwave top-of-the-atmosphere flux evaluation
Clouds and the Earth's Radiant Energy System -Energy Balanced and Filled (CERES-EBAF) data were taken from https://ceres.larc.nasa.gov/order_data.php, last access: 11 December 2020 and are the monthly averaged product of observed TOA for which the TOA net flux is constrained to the ocean heat storage. Figure 8 shows the evaluation of the time-mean top-ofthe-atmosphere SW radiative fluxes (SW TOA ) vs. the CERES satellite. The spatial pattern of the model matches the observed pattern well with a correlation coefficient of 0.89. The model biases are mostly small, with positive biases that are <∼20 % over all of the oceanic regions except south of the Equator where positive biases of up to <∼35 % occur. The NMBF bias for the whole region (including land) is −2.4 % and it is 5.8 % and 1.7 % for the northern NA and southern NA regions, respectively.
The SW TOA biases can be caused by a combination of biases in f c , TLWP ic and N d . To estimate the relative contributions of these different biases individually to the SW TOA bias, we used a similar calculation to that described in Sect. 2.4 but applied to TOA fluxes, following the technique described in Grosvenor et al. (2017). Firstly, the modelled time-mean SW TOA field was reconstructed by using the modelled timemean values of the COSP CALIPSO f c , LWP ic (with no filtering using f LWP ) and N d (filtered as in Sect. 3.1.3) as inputs into the SW TOA offline calculation. This was also done using the time-mean observed fields from CALIPSO, AMSR-E and MODIS. The results are shown in Fig. 9 and can be compared to Fig. 8a and b. Comparisons are not made over land since the higher land albedo was not taken into account in the calculations. In both cases, the spatial pattern of the calculated SW TOA (Fig. 9a and b) over the ocean correlates well with that of the actual SW TOA ( Fig. 8a and b).
The SW TOA values from the calculations are somewhat smaller than those actually modelled or observed. If considering only oceanic regions, the underestimates for the model off the east coast of Canada in the NA stratocumulus region are perhaps of greatest concern, since this is where the SW TOA values and their biases were highest. The underestimates are likely due to the approximations and assumptions that have been made with this approach such as using the time-mean shortwave flux and cloud fields. However, the agreement is sufficient for the purposes of estimating the relative contributions from biases in individual cloud properties to the SW TOA bias.
Next, perturbations to SW TOA were calculated by applying the model biases in the individual cloud fields (f c , LWP ic and N d ) to the observed cloud values on a one-at-a-time basis ( Fig. 10). They are expressed as a percentage of the observed time-mean SW TOA field. The sum of these perturbations, again expressed as a percentage, is shown in Fig. 9c Figure 8. Time-mean model SW TOA evaluation vs. the CERES satellite. The monthly mean CERES-EBAF data product is used here; this product uses data from both the Terra and Aqua satellites as well as geostationary satellites in order to approximate averaging across the diurnal cycle. The arrangement of the panels is as for Fig. 1. and is intended for comparison to the actual model SW TOA bias shown in Fig. 8c. The spatial pattern of the combined bias estimate matches the actual bias well, although there are some regions of negative bias that are not present in the actual bias. These are in regions of low SW TOA and low f c and so are more prone to error and likely less important for the overall SW TOA . In the regions of positive model SW TOA bias, the estimate is a little lower than the true model bias, but again the agreement should be sufficient to compare the relative contributions from the individual cloud field biases.
From Fig. 10, it is clear that perturbations of the magnitude of the f c model biases have a very large impact on the SW TOA and are the main contributor to model SW TOA biases in most regions. Large negative contributions occur to the SW TOA bias in the southern part of the domain due to the negative cloud biases there (Fig. 1). Smaller positive contributions from N d and LWP ic biases also occur in this region to give less overall estimated SW TOA bias (Fig. 9c) in agreement with the small SW TOA model bias (i.e. directly from the model output; Fig. 8c). This suggests that some cancellation of biases in f c , LWP ic and N d is occurring here resulting in the observed low SW TOA bias. Large positive contributions to the SW TOA bias from f c biases occur in the stratocumulus region in the western part of the northern NA. These are also partially cancelled by negative N d and LWP ic biases, but the overall SW TOA is positive in this region (Fig. 8c). LWP ic (note the different colour scale) has only a minor effect north of Scandinavia (contributing to positive biases) and  west and east of Greenland (negative contributions). The N d biases contribute even less to the SW TOA model bias than LWP ic biases; positive contributions from N d biases are seen south of 36 • N and negative contributions north of there.
Next, an attempt to quantify the relative percentage contributions (P x ) of the individual cloud field properties (denoted as x, where x is either f c , LWP ic or N d ) to the total SW TOA bias is made using the following equation: where SWTOA x is the perturbation in SW TOA due to the bias in cloud field property x. Figure 11 shows that f c biases dominate over nearly all regions south of around 54

Aerosol forcing
We now examine maps of the temporal mean aerosol forcings following Eq. (2). Note that the calculations were performed using the instantaneous fields from the 27-hourly output, which ensures that the diurnal cycle is sampled evenly throughout the course of the simulation. Figure 12 shows maps of ERF ARI and ERF ACI . The magnitude of ERF ACI is generally larger than that of ERF ARI for oceanic regions. For example, in the northern NA box, the mean ERF ARI is −0.50 W m −2 and ERF ACI is −3.1 W m −2 (see Table 3). For the southern NA box, ERF ARI and ERF ACI are −1.0 and −1.8 W m −2 , respectively. The dominance of ERF ACI over the NA region is in agreement with B12 ( Fig. 4b of that paper). The overall spatial pattern of ERF ACI is also similar to that in B12 with a band of large negative forcing running across the Atlantic from west to east between around 25 and 50 • N, and another region of negative forcing following the west coast of Africa. Some differences are apparent, though; for example, the negative forcing is smaller in magnitude southwest of the UK in our simulations and larger in magnitude near Africa south of the Equator. However, over the whole NA domain, ERF ARI is larger in magnitude than ERF ACI (−1.9 vs. −1.7 W m −2 ) due to the fact that ERF ARI is usually larger over land. The dominance of ERF ARI in the southern regions of the domain will also have more influence on the area-weighted average. The changes in f c are largest in the southern regions of the domain below around 35 • S. The responses of LWP ic and f c can be termed macrophysical responses (or also adjustments), since they affect bulk cloud properties, as opposed to the response of N d that is termed a microphysical response. The macrophysical responses are mostly due to the suppression effects of aerosols on rain rates via the autoconversion process. This is demonstrated in Figs. E1, E2 and Table 3, which show the impact of preventing aerosol from affecting the autoconversion process (see Appendix E for details). We hypothesize that the changes in LWP ic over the northern NA regions and lesser changes in f c reflect the presence of stratocumulus clouds with very large areal coverage in the north that can only respond to the suppression of rain by thickening (i.e. more LWP ic ) rather than by increasing f c . In the southern NA, the cloud coverage is much lower due to the presence of broken stratocumulus and cumulus clouds, which respond in a different way to the rain suppression, namely by increasing their coverage (f c ) more than their thickness (LWP ic ). We note that in reality clouds can also respond to enhanced aerosol by increasing cloud top entrainment, which can reduce LWP ic Table 3. Temporal and spatial means of various quantities for the whole NA domain and the various subregions. Values are shown for ERF ARI and ERF ACI , and the percentage changes in N d , LWP ic and f c between the PI and PD runs. Percentage f c changes are also shown for the runs with no aerosol effect on the rain autoconversion process. The other columns show results from the offline estimates of the ACI forcing. ERF ACI,all is the estimated forcing due to simultaneous PI to PD perturbations of N d , LWP ic and f c . ERF N d , ERF LWP ic and ERF f c are the forcing estimates from one-at-a-time perturbations of N d , LWP ic and f c , respectively, and ERF ACI,sum is the sum of these three forcing estimates. ERF ACI,macro is the forcing contribution from the cloud macrophysical changes, i.e. the sum of ERF LWP ic and ERF f c . All values are area weighted to account for the variation in area of the model grid boxes.

Region
and f c (Ackerman et al., 2004;Bretherton et al., 2007;Hill et al., 2009). However, this mechanism is currently not included in the model. Based on the maps of the cloud property changes alone, it is difficult to judge the relative contributions of each type of change to the forcing. Thus, we now make a quantitative estimate of this using offline radiative calculations as described in Sect. 2.4 and Appendix F. Figure 14a shows the ACI forcing estimated using the offline method following Eq. (F13) for the case where all of the cloud variables have been perturbed from their PI values to their PD values (ERF ACI,all ). Figure 14b shows the sum of the contributions calculated by separately perturbing the individual cloud properties in one-at-a-time tests (ERF ACI,sum ). These can be compared to the forcing diagnosed from the full model outputs as generated by the online radiation code (Fig. 12b). The general pattern and magnitude of the actual and estimated forcings are very similar. The fact that ERF ACI,all and ERF ACI,sum are similar indicates linearity in the effect of the perturbation of the individual cloud properties, such that they are almost directly additive. The mean forcing over the northern subregion (northern NA) is −3.2 W m −2 for ERF ACI,sum , −3.0 W m −2 for ERF ACI,all and −3.1 W m −2 for the actual forcing. In the southern subregion (southern NA), the mean forcings are −1.9, −2.0 and −1.8 W m −2 for ERF ACI,sum , ERF ACI,all and the actual forcing, respectively. Overall, the match is very good, suggesting that the estimation technique is sound and that it is likely to be useful for estimating the contributions from the different cloud property changes to the forcing. Figure 15 shows the contribution to the surface ACI forcing from the changes in N d , LWP ic and f c following Eq. (F15). Percentage contributions (P x ) from the individual cloud fields (denoted as x, where x is either N d , f c or LWP ic ) to the sum of the absolute contributions are calculated in a similar way to Eq. (3): where ERF x is the ACI forcing contribution due to the change in cloud field x. These values are mostly negative since the ACI forcing is negative, but positive contributions and thus positive ERF x are possible. Figure 16 shows the results.
From the summary of the area means in Table 3, we see that for the northern NA region the area-mean contributions from changes in N d and LWP ic (ERF N d and ERF LWP ic ) are similar with values of −1.4 and −1.2 W m −2 , respectively. These are larger than the f c contribution (ERF f c ) of −0.54 W m −2 . In contrast, for the southern NA region ERF f c is −1.2 W m −2 , which is considerably larger than the N d or LWP ic contributions (−0.63 and −0.23 W m −2 , respectively). Overall, for the whole NA region, ERF f c is −0.74 W m −2 , which is similar in magnitude to that from the The sum of the macrophysical contributions from f c and LWP ic (termed ERF ACI,macro ) is the dominant contribution to aerosol forcing in the NA region providing 63.9 % of ERF ACI,sum (= ERF N d + ERF LWP ic + ERF f c ). This is important because these changes are likely to be associated with a higher model uncertainty than pure cloud albedo effects (due to N d changes alone). The macrophysical contribution to ERF ACI,sum is larger in the southern NA region than in the northern NA region (71.5 % vs. 54.4 %). Figures 15 and 16 show the spatial patterns of the contributions and reveal that the contribution from N d is restricted to the northern part of the domain (including the region of the northern NA subregion) and is generally small elsewhere, except for a small contribution down the west coast of Africa. Somewhat surprisingly, this contribution is not co-located with where the largest changes in N d are simulated in Fig. 13 but occurs to the north and east. This likely reflects the fact that the cloud fraction is large in this region due to the prevalence of stratocumulus, which results in a large radiative impact from even modest N d increases. A similar argument can be applied for LWP ic changes. The forcing contribution from LWP ic changes follows a similar spatial pattern to that from N d changes and with similar magnitudes, although for LWP ic the largest LWP ic changes are co-located with the forcing contributions. The N d contribution is generally dominant in terms of Eq. (4) north of 36 • N but also provides a relatively large contribution down the west coast of Africa. The contribution from f c changes is largest in the southern NA region and around the coast of west Africa.

Determining the most important meteorological situations for surface ACI forcing
We now attempt to determine the meteorological situations that are most important for aerosol forcing, since this information can then be used to target particular situations for modelling at high resolution or for more detailed comparisons to observations. Initially, we quantify the forcing as a function of "cloud state", which we define in terms of the eight possible combinations of low-, mid-and high-altitude clouds (see Fig. 17), which are defined to be consistent with the International Satellite Cloud Climatology Project (ISCCP) definitions (Rossow et al., 1999): cloud-top pressure (CTP) > 680 hPa for low-altitude clouds, 680 ≤ CTP < 440 hPa for mid-altitude clouds and CTP ≤ 440 hPa for high-altitude clouds. For this analysis, clouds are considered present if the cloud fraction is greater than 0.01. The results are presented in the form of 2-D matrix plots showing the contribution to the overall ERF ACI for different combinations of PI and PD cloud states (Fig. 18) for both the northern NA and southern NA regions. The contributions take into account the frequency of occurrence of each pairing of PI and PD cloud states so that the values associated with the colours in the plots add up to the overall regional forcing.
Generally, for both regions, the largest negative contributions to ERF ACI are associated with low-altitude cloud states (cloud states that have even numbers). To some extent, this is expected since the convection scheme, which is likely one of the sources of higher-altitude clouds, is not currently coded to respond to aerosol and because the lower-altitude clouds are closer to the surface aerosol sources. For the northern NA region, the largest terms are those with the same low cloud state in the PI and PD (i.e. the diagonal terms for the even numbered states in the matrix), although it should be noted that the diagonal terms could involve f c or LWP ic changes within these states. The largest overall term occurs when both the PI and PD are in cloud state 8 (i.e. no change in cloud state). This is when low-, mid-and high-altitude clouds are present at the same time. However, if the mid-and highaltitude clouds are thin, it is still possible that the low-altitude cloud has the largest impact here. The next two (joint) largest forcing contributions occur when both the PI and PD have only low clouds (state no. 2) and when low clouds and high clouds are present together (state no. 6).
However, contributions from transitions between cloud states are not negligible for the northern region. For example, PI to PD transitions between states 1 and 2 (clear sky in the PI and low clouds in the PD) contribute −0.35 W m −2 . However, there is a reciprocal positive response of 0.30 W m −2 for transitions between states 2 and 1, which represents the removal of a low cloud that was present in the PI to create clear skies in the PD. There is a certain degree of randomness that is likely in the cloud state changes due to all of the changes not necessarily being driven by aerosol; there is still some model freedom despite the nudging to meteorology since the model is only nudged above the boundary layer and only the winds are nudged. Thus, it is perhaps more useful to consider the net forcing contribution from both the PI to PD transitions and those from the reciprocal transitions, as shown in the bottom row of Fig. 18. Appendix G explains the details of how these are produced. Other transitions have a larger net negative effect, such as the transitions between states 6 and 8 (net effect of −0.18 W m −2 ). This indicates transitions from low-and high-altitude clouds to low-, mid-, and high-altitude clouds, suggesting that the creation of midaltitude clouds also has an impact. Nevertheless, these transition terms are considerably smaller than the diagonal (no transition) terms described earlier for the northern NA region.
For the southern NA region, the low-altitude-only state (no. 2) in PI and PD again has a large impact (−0.53 W m −2 ). There is also now a large negative contribution due to transitions from the clear state (no. 1) in the PI to the low-altitude only state 2 in the PD, but again there is a positive forcing due to the reverse situation, with a net contribution of −0.29 W m −2 . This transition indicates the creation of additional low-altitude clouds in the PD due to aerosol effects,  which is consistent with the finding in the previous section that macrophysical changes to clouds (in particular changes in f c ) are more important in this region compared to the northern NA region and suggest that low-altitude clouds are one of the most important cloud types for this. However, as in the northern NA region, pairings with the same cloud states in the PI and PD (the diagonal terms) that involve mid-and high-altitude clouds are also important for forcing, particularly for states 6 and 8, although these are relatively less important than in the northern NA region, suggesting that a higher-altitude cloud has less effect on the forcing in the southern NA region.

Determining the most important cloud types for surface ACI forcing
We now break down the forcing as a function of the PI and PD cloud fractions in order to determine the types of low clouds involved, i.e. whether they are stratocumulus (high cloud fraction) or trade cumulus (low cloud fraction). Given that the contributions involving only cloud state 2 (lowaltitude only clouds) were the single largest contribution in Fig. 18 for the southern NA region and provided the joint second-largest contribution for the northern NA (with low clouds also involved for the highest contributor), we examine the forcing contribution for this cloud state only, along with the clear-sky state (no. 1).  Figure 14b shows the sum of these three terms. For the northern NA region, Fig. 19 shows that by far the largest single term comes from the situation when both the PI and PD are fully overcast (−2.24 W m −2 ). Consistent with the previous figures, this represents mainly the brightening of stratocumulus clouds due to increases in droplet concentrations combined with a smaller macrophysical effect from LWP ic increases. There are also some large negative contributions associated with increases in cloud fraction between PI and PD for this region too. The largest net contributions (ranging from −0.1 to −0.37 W m −2 ) associated with cloud fraction changes come from the creation of fully overcast or f c = 0.8 stratocumulus from lower cloud fractions (ranging from 0 to 0.8), indicating that the creation of overcast clouds from more broken stratocumulus and cumulus is playing some role in this region. These contributions sum to −0.91 W m −2 . For the southern NA region, the contribution from the overcast cloud state in both the PI and PD is much smaller (−0.24 W m −2 ) compared to the northern NA, indicating that pure microphysical (Twomey) changes are less important. The net contribution from zero and overcast state combinations is (−1.50 + 1.14 = −0.36 W m −2 ) and there are also large net contributions from PI f c values between 0 and Figure 18. Contributions (colours; the highest 10 values in absolute magnitude are also labelled with text) to the surface ACI forcing for the different combinations of cloud state in the PI emission run (x axis) and in the PD emission run (y axis). The cloud states are the eight different possible combinations of low-, mid-and high-altitude clouds (see Fig. 17). The overall contribution is shown, which includes the frequencies of occurrence such that the sum of all of the values gives the overall regional mean contribution. (a) The northern NA region; (b) the southern NA region. The top row shows all of the combinations. In the bottom row plots, the contributions due to PI to PD transitions between states are added to the same transition between PD and PI in order to get the net contribution; see Appendix G for details of this. 0.4 transitioning to PD values of 0.8-1.0. The creation of f c = 0.2 clouds from clear skies is the most important single term associated with cloud fraction changes for the southern NA region, in contrast to the northern NA region. The Twomey effect for f c ≤ 0.2 clouds is also important. This indicates a relatively more important role for trade cumulus or open-cell stratocumulus for the southern NA region. As also suggested from the previous results, this indicates that cloud fraction changes are relatively more important than the Twomey effect for this region compared to the northern NA region.

Conclusions
Previous work (Booth et al., 2012) has suggested that cloudaerosol forcing is the dominant driver of multi-decadal SST variability in the North Atlantic region. In this paper, we examined the cloud-aerosol response and surface aerosol forcing of a more modern version of the climate model used in Booth et al. (2012), namely the UKESM1-A, which is the atmosphere-only (i.e. non-ocean-coupled) version of the model used in the latest CMIP intercomparison (CMIP6). We focused on the North Atlantic region and used 1 year of meteorologically nudged model output. The aims were to (i) examine the representation of clouds and cloud-aerosol interactions in order to identify potential biases in cloud properties compared to observations; (ii) quantify the effect of cloud property biases on the SW TOA fluxes; (iii) determine the surface (downward) aerosol forcing response of the model; and (iv) identify the most important meteorological situations/cloud types for the surface forcing. The latter two aims should allow a more targeted evaluation of the model forcing in future work using observations and high-resolution modelling in order to test the magnitude of the forcing from the low-resolution climate model.

Model evaluation conclusions
The spatial patterns of low-, mid-and high-altitude clouds in the model compare well against observations. However, in the southern North Atlantic (southern NA), the model underestimated the low-and mid-level cloud fractions by −23.9 % and −25.8 %, respectively. Low-altitude cloud biases in the northern NA subregion were positive but lower in magnitude (5.1 %). Low-altitude cloud fraction biases have the potential to significantly impact aerosol forcing since they are closest in altitude to the aerosol sources. If we assume that the PI cloud cover is biased by a similar amount to the PD cloud cover and assume no cloud adjustments to aerosol, then the bias in forcing will be similar to the bias in f c . The results from this paper suggest that the second assumption is reasonable for the northern NA region because the Twomey effect dominates. Thus, we might expect the 5.1 % low-altitude f c bias there to make a small contribution to any error in forcing bias. Of course, it is also possible that the f c increase between PI and PD is underestimated by the model for this region, but the PI low-altitude f c is too high in order to give an overall PD bias. As such, it is difficult to make firm conclusions.
In the southern NA, f c changes in response to aerosols (adjustments) were large, so the above assumptions are less valid and the effects on forcing even less clear. The negative present-day f c biases could indicate a cloud fraction response to aerosol that is too low, which would cause subsequent negative forcing biases. On the other hand, the too-low f c may mean that the model PI era is in a broken, precipitating cloud regime too often. Such regimes are thought to be more sensitive to aerosols and more prone to produce cloud adjustments (Ackerman et al., 2004). In this case, the model forcing values would be too large. The effect of this may be significant given the large bias here (−23.9 %). Likewise, the too-large f c values in the northern NA (if also occurring in the PI) might prevent some instances of f c increase between PI and PD and lead to a forcing that is too small, although the overall bias for the northern NA region was only 5.1 %. The presence of clouds can also mask ARI forcing, and hence the f c and LWP ic biases might therefore affect the predicted ERF ARI magnitude. The larger low-and mid-altitude biases (which are likely to be thicker and hence have a stronger masking effect than high-altitude clouds) in the southern NA combined with the larger ERF ARI suggest that this effect would likely be more pronounced there. Here, the f c biases are negative, which would produce a positive ERF ARI bias by this mechanism. Mid-and high-altitude clouds can also mask ERF ACI forcing from low-altitude clouds (which is likely to be the biggest contributor to forcing). For both the northern NA and southern NA, mid-altitude clouds tend to have negative biases that are larger in magnitude than the positive biases of the high-altitude clouds suggesting the potential for an overall negative ERF ACI forcing from this mechanism. However, further work would be needed to quantify the effect of the model f c biases on the aerosol forcing.
In-cloud liquid water path (LWP ic ) was evaluated vs. the AMSR-E microwave radiometer satellite instrument. The model percentage LWP ic biases are small in the northern part of the Atlantic where higher cloud fraction stratocumulus dominates and where the satellite retrievals are likely to be more reliable due to lower rain amounts. In the NE part of the domain (again a region with likely reliable retrievals), there were some positive biases, particularly north of Scandinavia, which indicates that the stratocumulus in this region are too thick in the model. Elsewhere, off the east coast of Florida, there tended to be negative LWP ic biases. When the raining data points were filtered out, the biases tended to get worse, but the spatial pattern was preserved, suggesting that the biases are likely real rather than a result of artefacts in the observations or comparison method.
The overestimate of LWP ic in the model in the regions dominated by stratocumulus (northern part of the domain) has implications for its ability to simulate the correct cloud effective radius (r e ) given the correct cloud droplet concentration. This may confuse evaluation efforts using r e . An incorrect simulation of cloud droplet size also has implications for the conversion of cloud water into rain (autoconversion), suggesting the model might convert clouds into rain too readily at a given cloud droplet concentration; this will affect rain rates but may also make the response of cloud fraction more sensitive to aerosols and hence enhance aerosol forcing. LWP ic is also likely to affect the sensitivity of the cloud albedo to cloud droplet concentration (and hence aerosol), which will also affect the magnitude of aerosol forcing.
Model cloud droplet number concentrations (N d ) were compared to satellite observations from MODIS. The model captures the observed spatial pattern well, suggesting that the processes that govern the removal or dilution of aerosol as it travels eastwards from the North American continent are broadly captured by the model. Such processes are likely to include the scavenging of CCN during precipitation ; dilution effects as distance from the sources increases; variations in boundary layer height, which may affect aerosol scavenging due to cloud type and precipitation changes, and may cause the dilution of aerosol concentration over deeper boundary layers; as well as other unidentified meteorological effects. However, the model underestimates N d over most of the northern NA but overestimates it over the southern NA, which could indicate that there are some issues with aerosol sources and/or scavenging. Cloud processes could also be involved, such as updraft speed distribution errors, problems with the droplet activation scheme or issues relating to droplet removal (evaporation, coagulation, etc.). A caveat to the conclusion that the model has biases in N d is that there is uncertainty in the MODIS N d observations that may be of the same magnitude as, or greater than, the model biases (Grosvenor et al., 2018b).
The positive model N d biases associated with the aerosol outflow regions of the US and Europe have the potential to cause a positive bias in the aerosol forcing. In the northern parts of the Atlantic, which are less affected by anthropogenic aerosol, the negative N d biases may indicate a lack of aerosols from natural sources (or too much scavenging). This would create pre-industrial background aerosol concentrations that are too low, leading to a positive forcing bias (e.g. Carslaw et al., 2013). Further investigation into the cause of the spatial N d pattern using model experiments, an examination of the realism of anthropogenic and natural aerosol sources, and quantification of the MODIS N d uncer-tainties (e.g. through comparison with aircraft data for this region) is therefore warranted.
Shortwave top-of-the-atmosphere (SW TOA ) radiative fluxes from the model were compared to those from CERES. Again, the model reproduced the observed spatial pattern well, indicating a general good fidelity of overall cloud positions and properties. However, there was also an overestimate in most regions, showing that the clouds were either too bright, occurred too frequently or both.
The main cloud properties that determine SW TOA fluxes are f c , LWP ic and N d . Offline radiative analysis suggested that the f c bias is likely to contribute the most to the SW TOA biases, particularly in the stratocumulus region in the northern Atlantic, suggesting that biases in f c are the most important to address. This result also shows that biases in the response of cloud fraction to aerosol are likely to cause large forcing biases. LWP ic biases were determined to be most important in causing the positive SW TOA bias north of Scandinavia and so improvements there are also needed. N d biases were deemed important in causing SW TOA biases in the northern NA but with only small regions of high impact in the southern NA regions. However, it should be considered that a small impact from biases in a variable on the mean SW TOA flux bias does not preclude a large impact on forcing since the magnitude of the aerosol forcing is a lot smaller (of the order of 1 %-2 %) than the mean SW TOA flux and so small biases can still have the potential to produce a significant forcing error.

Aerosol forcing conclusions
The ARI and ACI surface aerosol forcings were calculated using the instantaneous model output. In agreement with B12, the magnitude of the ARI forcing over ocean grid boxes in the NA region was generally lower than the ACI forcing. The ACI forcing for the NA region (including land points) is negative with a mean value over the region of −1.7 W m −2 , showing that aerosol forcing is likely to be important in terms of the energy received at the ocean surface in this region. The spatial pattern agrees well with that from B12; however, the magnitude is somewhat lower. This likely reflects the steps taken to reduce the aerosol forcing in the UKESM model as described in Mulcahy et al. (2018).
The ACI forcing was decomposed into contributions from changes between PI and PD in N d , LWP ic and f c ( N d , LWP ic and f c , respectively) using an offline calculation of the net surface SW downwelling radiative fluxes. The results showed that N d and LWP ic contributed most in the northern part of the NA where overcast stratocumulus clouds are prevalent. f c had the highest contribution in the southern subtropical regions where cloud fractions are lower, indicating more broken clouds. We speculate that the higher cloud cover in the northern region allows the N d and LWP ic effect to be larger since the associated albedo perturbations can operate over a wider area. In the southern region, it is likely that the broken cloud scenes are more susceptible to aerosol-induced cloud fraction increases. In contrast, in the northern NA region, the overcast clouds cannot increase in cloud fraction much further.
Changes in LWP ic and f c can be considered to be cloud macrophysical responses because they are linked to the thermodynamics of liquid water production, whereas changes in N d can be considered more of a microphysical response since they are mainly caused by aerosol changes, although there is a link between the two. In the northern NA region, the macrophysical changes contribute to approximately 54.4 % of the total ACI forcing, whereas in the southern North Atlantic region they contribute 71.5 %.
Our results are important in the context of Malavelle et al. (2017), who showed that, for an earlier version of this model, the macrophysical responses to volcanic sulfate aerosol perturbations are likely to be small in the impact region of the volcano (north of ∼ 50 • N). Note that the Malavelle et al. (2017) study quoted responses of the all-sky LWP, which is the product of LWP ic and f c . Our Figures 15 and 16 show that with an updated model, using a year of data and considering PI to PD aerosol changes, macrophysical changes (dominated by LWP ic changes) have a similar radiative impact to N d changes for this region. It would be interesting to discover whether the implied larger LWP ic response to aerosol in the newer model in the Holuhraun region is now larger than suggested by the observational constraint used in Malavelle et al. (2017). Furthermore, our results show that macrophysical responses are even more important in other NA regions, especially the southern NA. This result highlights that cloud responses to aerosol perturbations in specific regions are not necessarily representative of those elsewhere.
In order to understand the cause of the aerosol-induced LWP ic and f c increases in the model (i.e. the macrophysical responses), simulations have been performed where N d has been prevented from affecting rain formation from cloud liquid through the autoconversion parameterization. Instead, a constant N d value is used. In these simulations, the changes in LWP ic and f c between PI and PD in the NA region drop to nearly zero or even decrease between PI and PD, strongly indicating that the mechanism for the increases in macrophysical quantities in the standard simulations is the suppression of rain. This is consistent with previous studies which have shown that precipitation is a major factor in causing cloud breakup (Stevens et al., 1998;Berner et al., 2013) mainly due to the stabilization effect of rain evaporation in the lower boundary layer but also with some positive feedback due to the removal of aerosol by rain and the subsequent enhancement of rain due to the lower aerosol concentrations (larger droplets). In a global model, the representation of the thermodynamics of this process will be reliant on the boundary layer scheme, along with input from the cloud microphysics scheme. Since both processes are highly parameterized in global models, it is likely that there is some uncertainty in their representation and thus uncertainty in the response of the clouds to aerosol. Since the contribution to aerosol forcing from changes in macrophysical quantities is very large in this model, this highlights the need for detailed and targeted model evaluation of these processes.
Our results are somewhat different from those obtained in Mülmenstädt et al. (2019) using a similar technique for the ECHAM-HAMMOZ model. For the North Atlantic region, their model produced ERF ACI contributions from N d , LWP ic and f c that were located in approximately the same locations in contrast to the f c contribution from our model being predominantly in the southern NA and the N d and LWP ic contributions being further north. This suggests that the lifetime effect (and the associated aerosol-induced precipitation suppression process described above) operates differently between the two models, or that the types and locations of clouds differ between the models. Their model also has very little aerosol forcing in the region north of Scandinavia, whereas our model had quite a large forcing due to N d and LWP ic changes there. It is possible that some of these differences are due to the use of different years or different decomposition techniques. Nevertheless, understanding these model differences may lead to a method of model evaluation to determine which is correct that might help bring down aerosol forcing uncertainty.

Important meteorological situations: conclusions
Contributions to the surface forcing were calculated for situations composed of each of the eight different combinations (termed here as "cloud states") of cloudy or clear for low, middle and high cloud altitudes (see Fig. 17). The largest contributions (taking into account the frequency of occurrence as well as the forcing effect) always involved the presence of low-altitude clouds in either the PI or PD suggesting that such clouds are driving most of the aerosol forcing in the model. There was some expectation of this a priori since the convection scheme in the model does not respond to aerosol and it is likely that this would be responsible for a lot of the creation of higher-altitude clouds. However, it is also possible that such clouds are created by the convection scheme and then get handed over to the large-scale cloud scheme and could then be affected by aerosol. It may be that low clouds are more affected simply because they are closer to the surface aerosol sources.
In both regions, the largest forcing contributions came from situations with the same cloud states in the PI and PD. Of these, the largest contributions for the northern NA region (and a considerable contribution for the southern NA region) was when low-, mid-and high-altitude clouds were present, at the same time suggesting that in these regions considerable forcing occurs when higher-altitude clouds are present. Part of the reason for this is likely due to the prevalence of clouds from a range of altitudes in this region. Thus, it may be important to consider the potential radiative effects of mid-and high-altitude clouds such as the shielding of low-altitude clouds, as also indicated in Malavelle et al. (2017). The neglecting of mid-and high-altitude clouds in the radiative calculations presented in this paper may therefore lead to some inaccuracy. On the other hand, the mid-and highaltitude clouds may be sufficiently thin that it has negligible impact. Further work is needed to determine whether this is the case. However, the idea is supported by the fact that the offline calculations of ACI forcing that considered only lowaltitude clouds matched the model calculated forcing (that included all cloud types) very closely.
Situations with only low-altitude clouds in the PI and PD were also very important for both the northern NA and southern NA regions, again highlighting the importance of lowaltitude clouds for aerosol forcing. Net contributions involving the creation or destruction of a cloud type (low, middle or high clouds) were small for the northern NA region but higher for the southern NA region. For the latter, the creation of low-altitude clouds was implicated in each case. This fits with the result that f c was a large contributor to the surface forcing in the southern NA region.
Overall, the results suggest that low-altitude clouds are the largest contributor to aerosol forcing in the model and so improvements to the representation of this and its response to aerosol are likely to yield the biggest improvement in the aerosol forcing estimate in this model. On the other hand, the lack of aerosol awareness of the convection scheme is unrealistic and may lead to missing aerosol-cloud interaction processes in the model. There are some indications that cyclones respond to increased aerosol concentrations by increasing their LWP and hence radiative forcing McCoy et al. (2018). The degree to which the convection scheme is involved in representing the clouds associated with cyclones and hence the likelihood of such aerosol-cloud interactions being missed at climate model resolution could be explored with high-resolution models (e.g. the nested version of the UM) in future work.

Important cloud types: conclusions
The forcing for grid boxes with low-altitude-only clouds and clear skies was further examined in terms of pairings of cloud fraction from the PI and PD simulations. The aim was to try to understand the PI to PD cloud transitions that occur to produce the model aerosol forcing and therefore gain insight into the types of clouds involved. The results showed that grid boxes with overcast (100 % cloud cover) in both the PI and the PD accounted for by far the largest contribution to forcing in the northern NA region, although there were some smaller contributions from transitions between 60 %-80 % and 100 % cloud cover. This agrees with the result from the forcing contribution calculations that changes in cloud fraction due to aerosol are not particularly important for forcing in this region and suggests that cloud fraction transitions that do occur are mostly likely within the stratocumulus regime.
The brightening of stratocumulus clouds by the Twomey effect seems to dominate the forcing in this region.
For the southern NA region, the largest contribution is from transitions from 0 to <∼ 30 % cloud cover, indicating the creation of trade cumulus from clear skies. The creation of 80 %-100 % cloud fraction states from 0 %-20 % states also has an important forcing contribution, suggesting the formation of stratocumulus clouds from clear skies and cumulus to stratocumulus transitions due to the anthropogenic aerosol input. The brightening of overcast clouds is also still very important here but not as important as in the northern NA.
A caveat is that with this analysis is that we only pair the same times in the PI and PD and do not allow for evolution over time. Therefore, the pairings may not be accurate since the Lagrangian trajectories associated with a given pairing may have had different cloud fractions in the time before the snapshot. It may therefore be useful to examine the evolution of cloud fraction, etc. over time using Lagrangian trajectories and examine how this varies with initial aerosol/droplet concentrations (e.g. see Eastman and Wood, 2018). This could be done in the PD simulation and compared to observations, but also comparisons between the evolution in the PI and PD runs could be made.

Recommendations
The results from this paper suggest that future studies should target the improvement of different cloud responses to aerosol depending on the location within the North Atlantic. In the northern NA, the Twomey effect was a large contributor to the aerosol forcing. This suggests that constraint of the aerosol forcing using observations will be easier here than in the southern region where cloud adjustments/macrophysical effects were more dominant. Similar approaches to those performed in Malavelle et al. (2017), whereby the model N d and r e responses to known aerosol perturbations (the Holuhrahn volcanic eruption in the case of Malavelle et al., 2017) were evaluated using satellite data, may therefore be able to constrain aerosol forcing. Examination of modelled trends in N d and r e over time in response to known aerosol emission changes may also prove useful in this regard. The examination of N d and r e changes alone is advantageous since their natural variability is significantly lower than that of LWP ic , f c and SW fluxes (Malavelle et al., 2017), making it easier to quantify aerosol-induced signals from the observations. However, our model results also showed a fairly large influence on aerosol forcing from increases in LWP ic (i.e. cloud thickening) in the northern NA region, which may complicate such efforts.
In the southern NA, the accuracy of the cloud fraction response in the model should be evaluated, with a particular focus on low clouds and the creation of trade cumulus and stratocumulus. As mentioned earlier, this is likely to be more difficult than evaluating the Twomey effect due to the larger number of processes involved. Ideally, the individual processes would be targeted for model evaluation with the formation of rain via the autoconversion process a prime first target since this was shown to be the cause of the f c and LWP ic response to aerosol in our model. Ways forward with this might include the use of observational data from the ground-based ARM site on the Azores and data from aircraft observations that have also taken place there. This is ideally located since it is near a location where the older version of the model predicts a large contribution to the aerosol forcing from cloud fraction changes but the newer version less so. Thus, the observations might help to determine which is the most realistic. Combined with satellite data, the long time series observations of aerosols, N d , LWP ic , f c and rain rates can be used to evaluate the relationships between these variables. Separating a causative signal due to the aerosol alone from that due to meteorology, etc. is difficult, but it may be possible to use the techniques described in McCoy et al. (2020). In that paper, several meteorological drivers are controlled for via binning and multi linear regression, along with using the model to estimate and subtract the influence of non-aerosol-induced changes. However, it is possible that even an evaluation of the observed relationship between rain rates and other variables, without separating the causative effects of aerosol, will prove useful in constraining the cloud adjustments.
A complementary approach is to model this region using a high-resolution nested version of the model, which would also allow for the use of the more sophisticated CASIM (Cloud AeroSol Interaction Microphysics) microphysics scheme (e.g. see Grosvenor et al., 2017;Stevens et al., 2018;Miltenberger et al., 2018a, b;Gordon et al., 2018). The assumption would be that the cloud responses to aerosol of this would be more accurate than the global model resolution simulations, although its performance should be tested using the observations.
In order to avoid issues regarding the representativeness of a limited number of years of meteorology, it would be useful to do future work such as that performed here using multiple years of data. Noise in some of the fields shown here may also be reduced by employing an ensemble approach as in Liu et al. (2018).
Finally, work to evaluate and improve the accuracy of the satellite evaluation in this paper would be very useful. For example, to deal with the issues of evaluating the model LWP in situations with precipitation, a microwave radiometer simulator could be implemented into the model, which would be able to estimate the total attenuation of the 37 GHz channel (used by AMSR-E to retrieve LWP), taking into account the differing attenuation strengths of cloud water and rainwater (Lebsock and Su, 2014). This attenuation could then be directly compared to that from AMSR-E. Comparing the global model evaluation results with and without the convective contribution to those from a high-resolution model (where the convective TLWP will be explicitly resolved) might help to assess the role of the convective parameterization on the evaluation of model TLWP. This could be combined with the microwave radiometer simulator mentioned above to help overcome the issue of retrieval problems in raining conditions too. Much greater confidence in N d retrievals would gained through further validation using in situ aircraft data. Cloud fraction could be evaluated with additional cloud fraction satellite data sets and ground-based data to improve the confidence in the result. More sophisticated analyses such as 2-D histograms of cloud fraction and LWP ic would assess this important cloud fraction variable at different cloud thicknesses and may help to isolate issues in particular regimes. Model improvements to the subgrid cloud scheme might be considered, such as a link between the boundary layer turbulence and the width of the subgrid relative humidity distribution. Comparisons to high-resolution models may also help with this.
Appendix A: Details on the satellite data sets A1 Droplet concentration N d can be estimated using satellite retrievals of τ c and r e (Han et al., 1998;Brenguier et al., 2000;Boers et al., 2006;Bennartz, 2007;Grosvenor et al., 2018b) made using observations from the visible and shortwave infrared wavelengths from instruments like MODIS (Nakajima and King, 1990). Here, we use a data set based on the methods from Grosvenor et al. (2018a), which represented some modifications to the methods described in Grosvenor and Wood (2014). The methodology used here differs slightly from that used in Grosvenor et al. (2018a) in that data are not filtered to remove data points with τ c of less than 5 and the correction for the vertical penetration depth bias proposed in Grosvenor et al. (2018a) is not applied. The 3.7µm r e is used for the N d calculations, which has been suggested to be less prone to errors due to cloud heterogeneity (Zhang et al., 2012;Grosvenor et al., 2018b). The data set excludes 1×1 • data points with mean solar zenith angle (SZA) greater than 65 • , mean cloud top heights greater than 3.2 km, liquid cloud fractions less than 80 % and for which the maximum sea-ice areal coverage over a moving 2-week window exceeded 0.001 %. The sea-ice data used were the daily 1×1 • version of the "Sea Ice Concentrations from Nimbus-7 SMMR and DMSP SSM/I-SSMIS Passive Microwave Data, Version 1" data set (Cavalieri et al., 1996). The N d data set (without sea-ice screening) is available; see Grosvenor and Wood (2018).

Appendix B: Bias metrics
Following Gustafson and Yu (2012), the NMBF is defined as where M is the mean of the model values and O is the mean of the observed values. In this paper, this refers to the spatial mean time-averaged values. The NMAEF is defined as  We now examine the changes between PI and PD for various aerosol-related model fields. Figures C2 and 13 show that the spatial patterns of the changes in AOD, CCN at 0.2 % supersaturation using aerosol from the lowest model layer (CCN 0.2 % ) and N d are very similar, suggesting that changes in both AOD and CCN 0.2 % are a good proxy for N d changes in this region. In terms of aerosol composition (Fig. C3), the largest changes in column aerosol mass (the vertically integrated aerosol mass concentrations for all size modes) occur for sulfate and BC. Sulfate changes occur further north than BC changes, with the latter occurring mostly over southern Europe and northern Africa. Sulfate changes dominate over the ocean except at around 18 • N where there is a band of larger BC change likely associated with outflow from Africa. Similarly to BC, OM increases also occur over Africa and over the Atlantic Ocean at 18 • N, with both BC and OM potentially contributing to the increase in CCN 0.2 % and N d over the Atlantic that region. Sulfate changes likely dominate the N d changes further north in the Atlantic. OM aerosol mass decreases in the northern part of the Atlantic. Dust and seasalt column mass changes are very small in comparison to the other aerosol types for this region.

Appendix D: Evaluation of grid-box mean LWP
Here, we show plots similar to Figs. 5 and 6 but for the allsky (cloudy and clear contributions) LWP, i.e. without applying the step of dividing by the cloud fraction. Table D1 summarizes the results for this and the other plots in this section. The bias pattern is very similar to that of the incloud values, suggesting that the conversion between LWP and LWP ic (by dividing by the low-altitude COSP-CALIPSO cloud fraction) does not greatly affect the evaluation, probably because the low cloud fraction biases are generally very small (Fig. 1). However, for the regions of negative bias between the Equator and 18 • N, the biases are larger for LWP than for LWP ic because there is a negative low cloud fraction bias here, which acts to increase the model LWP ic values relative to the observed ones to give a lower LWP ic bias. When filtering to include only data points with f LWP > 0.99 (Fig. D2), the bias pattern is again similar to that for LWP ic .
As discussed earlier, AMSR-E observations are potentially biased when it is raining. This is because assumptions are made about the partitioning between LWP and RWP in order to facilitate the retrieval since rainwater attenuates the microwave signal more strongly than liquid droplets (Lebsock and Su, 2014). It is assumed that rainwater does not occur for water paths below 180 g m −2 , which may lead to inaccuracies since the true partitioning value is likely to vary. Because of these issues, some previous model evaluation studies (e.g. Furtado et al., 2016) have chosen to compare the TLWP (the sum of LWP and RWP) to the TLWP provided in some microwave-based products. We also do this in Fig. D3; for this plot, we only use the large-scale RWP and not the convective RWP (or LWP). However, it should be borne in mind that if the assumed partitioning threshold is incorrect, the TLWP value retrieved by the satellite will also be incorrect, and so it is unclear whether this leads to a more accurate model-to-satellite evaluation.
The results show that the addition of RWP (compare Figs. D1 and D3) enhances both the model and the observations, so that the pattern of bias remains similar. This suggests that it does not matter greatly whether LWP or TLWP are used for model evaluation. The magnitudes of the negative model biases for TLWP are slightly enhanced and those of the positive biases reduced relative to those for LWP, though. The NMBFs for the northern NA region are −16.4 % for TLWP and −8.3 % for LWP. For the southern NA region, the corresponding values are −65.7 % and −39.5 %. Figure D4 shows the ratio of the TLWP from the model's convection parameterization to that of the large-scale plus convective TLWP. In most of the southern part of the NA region, the convective TLWP accounts for the majority of the TLWP. However, it is not straightforward to decide whether the convective TLWP from the model is physically meaningful and whether it should be included when comparing to the microwave instruments. On the one hand, LWP and RWP from convection in the real world will be detected by the instruments, but on the other hand it is unclear whether the condensed water from the convection parameterization in models properly represents this. Liquid water content and vapour are detrained to the environment from the convection scheme and incorporated into the large-scale cloud scheme (see UM Documentation Paper 030; hereafter UMDP030; https://code.metoffice.gov.uk/doc/ um/vn11.8/papers/umdp_030.pdf, last access: 11 December 2020). It may therefore be the case that the large-scale cloud amount is the more appropriate quantity even if there is condensate associated with the convection scheme and that double counting would occur if also using the convective values (UMDP30). However, there is also some convective condensate from the convective core that is not transferred to the large-scale scheme. Another point of note is that LWP associated with the convection scheme is only used by the radiation scheme for shallow convective clouds (clouds with geometrical depths less than 500 m over land and 1500 m over ocean), meaning that the majority of liquid water in deep clouds has no effect on radiation. Given the uncertainty, here we examine the effect of adding the convective LWP on the model evaluation (Fig. D5). As might be expected from Fig. D4, its inclusion leads to much larger model values, particularly in the southern parts of the region. The bias pattern also changes, so that large positive biases occur almost everywhere, compared to negative biases in the south and positive biases in the north when using only large-scale TLWP (Fig. D3). Given this, it would be useful for future studies to determine whether any of the convective TLWP should be included; ideas for how to make progress on this are discussed in Sect. 4.5.
The effect of the further addition of convective RWP is shown in Fig. D6. This increases the model values somewhat in the convective regions but not by a large relative amount. As might be expected, the model biases are also therefore increased but not to the same extent as the addition of the convective LWP; the spatial pattern of model bias remains very similar to that in Fig. D5.
We also test the effect of filtering using f LWP > 0.99 for the model and satellite data when including the large-scale RWP and the convective LWP and RWP (Fig. D7). The filtering reduces the large positive biases in the southern part of the domain (which were mainly due to the addition of the convective LWP) considerably, presumably because most of the grid points with convection are removed. The spatial patterns of the biases here are similar to those in Fig. D2, where only the LWP from the large-scale cloud scheme was used, but the biases in the southern part of the domain have changed from being slightly negative to slightly positive, while the biases in the northern part have become a little more positive. These relatively small changes suggest that whether the convective LWP, which was the biggest contributor to the total water path, is used is not as critical when filtering for non-precipitating situations. However, there are still large overall differences in the NMBF values between Figs. D2 and D7 (see Table D1) such that whether RWP or convective LWP/RWP is used or not still has significant effects on model evaluation attempts and therefore should be an avenue of future investigation.   Fig. D1 except both the model and satellite data have been filtered before time averaging to only include data points for which f LWP is greater than 0.99. This quantity is denoted as LWP 0.99 . Figure D3. As for Fig. D1 except with the addition of the all-sky rainwater path (RWP) for both the model and AMSR-E satellite data. No filtering for f LWP has been applied. Figure D4. Ratio of the TLWP (where TLWP is the sum of LWP and RWP) from the model convection scheme to the total (from the convection and large-scale scheme) TLWP. The RWP from the convection scheme is calculated from the convective rain rate diagnostic by assuming a raindrop size distribution and fall speed relationship (see Furtado et al., 2016, for details).     and E2 show the percentage increases between PI and PD for LWP ic and f c ( LWP ic and f c ), respectively, when aerosols are prevented from affecting the rain autoconversion process. This is done by setting N d in the autoconversion process equation to a constant value of 300 cm −3 over land and 100 cm −3 over oceans in both the PI and PD runs. These are termed the ConstantNdAutoCon runs here.
Most oceanic regions showed positive LWP ic and f c in the full model and near-zero or negative changes for Con-stantNdAutoCon. Table 3 shows that between the standard and ConstantNdAutoCon runs LWP ic reduced from 3.2 % to −5.0 % for the NA region and from 2.7 % to −0.43 % for the northern NA. For the southern NA region, there was a small negative LWP ic in the standard runs (−0.59 %) and a very similar value for ConstantNdAutoCon (−0.50 %) consistent with the idea that aerosols have little impact on LWP ic in this region, as discussed earlier. Respective f c values for the standard and ConstantNdAutoCon runs were 1.5 % and 0.14 % for the NA region; 0.94 % and 0.02 % for the northern NA; and 2.7 % and −0.75 % for the southern NA. These results suggest that most of the PI to PD increases in the macrophysical cloud properties (LWP ic and f c ) were due to the impact of aerosols on the rain autoconversion process, likely via the precipitation suppression effect of enhanced aerosol. However, in the region of the Atlantic near the Equator, Fig. E2 suggests that the increase in f c between the PI and PD runs was not due to the impact of aerosol on the autoconversion process since there are still positive f c values for the ConstantNdAutoCon runs. We hypothesize that, as far as allowed by the nudging (wind-only nudging applied above the boundary layer), the f c increases in this region may be related to other aerosol impacts such as the ARI, ACI or semidirect effects, via thermodynamical or dynamical changes. 15714 D. P. Grosvenor and K. S. Carslaw: Aerosol forcing in the UKESM model Figure E1. Mean percentage increase in LWP ic between PI and PD runs for (a) the full run and (b) the run where aerosol has been prevented from affecting the rain autoconversion.
where we have assumed a constant clear-sky transmissivity (T atmos ). The cloudy-sky surface flux is calculated by assuming that the flux reaching the cloud top is equal to SW downSURFclear : Thus, we now have a function SW downSURF (N d , f c , LWP ic ) that estimates the surface downwelling SW flux as a function of the three cloud variables of interest. This allows us to estimate the surface ERF ACI following Eqs. (2) and (1) where we have assumed that the PI and PD SW clean+clear values are the same. The forcing contributions from the changes to the individual cloud parameters are then estimated using ERF N d = 0.5 ERF N d ,PI base + ERF N d ,PD base ERF f c = 0.5 ERF f c ,PI base + ERF f c ,PD base ERF LWP ic = 0.5 ERF LWP ic ,PI base + ERF LWP ic ,PD base , ( Here, all of the PI cloud property values have been used as a baseline value but with the PI value for either N d , f c or LWP ic replaced with the PD value. A similar calculation is done using the PD values as baselines and replacing with This follows the work of Mülmenstädt et al. (2019), who found that an average of values obtained from using both the PI and PD values as baselines was more accurate than only using say the PI as a baseline. If f c is zero in the baseline state (i.e. PI for Eq. F15, PD for Eq. F16), then N d and LWP ic are undefined. Therefore, in order to calculate the effect of an increased f c between the baseline and perturbed state (i.e. PD for Eq. F15, PI for Eq. F16), the N d and LWP ic values from the perturbed state are used in such cases.
Appendix G: Net contributions to forcing from cloud state and cloud fraction state transitions Section 3.2.2 and 3.2.3 showed the contributions to the surface forcing from different combinations of PI and PD cloud states (low-, mid-and high-altitude cloud combinations) and cloud fraction. We explained that it is useful to consider the net forcing contribution from both the PI to PD transitions and those from the reciprocal transitions. That is, if g i and g j are two different cloud states (where the possible states run from 1 to 8; see Fig. 17), the net contribution (ERFcont NET ) for that pairing of cloud states is given by ERFcont NET g i , g j = ERFcont g iPI ,g j PD + ERFcont g j PI ,g iPD g j > g i , where ERFcont g iPI ,g j PD is the forcing contribution from pairings of g i in the PI simulation and g j in the PD simulation, and ERFcont g j PI ,g iPD is the contribution from pairings of g j in the PI simulation and g i in the PD simulation. A similar equation can be formulated using the five different cloud fraction bins instead of cloud states.

Appendix H: Results from an alternate year
The results in the main body of the paper were based on meteorology and emissions from the period of 28 March 2009 to 28 March 2010. It is possible that the results presented vary depending on the chosen year since meteorology, cloud fields, etc. vary from year to year. To address this, we have run the PI and PD simulations for an additional year and present results here for the period of 28 March 2010 to 28 March 2011. For brevity, we only present selected results focusing on those that are most likely to be subject to variability and noise issues, which are (1) the ARI and ACI aerosol forcings (Fig. 12); (2) the decomposition of the aerosol forcing into contributions from changes in N d , LWP ic and f c between the PI and PD (e.g. Fig. 15); (3) the contributions from changes between different cloud states and between cloud fractions (Figs. 18 and 19). Figure H1 shows that the pattern and magnitude of both ERF ARI and ERF ACI are very similar for the alternate year with only slight differences (compare to Fig. 12): the region of ERF ARI forcing off the coast of the US extends further east across the Atlantic; the main region of ERF ACI around Newfoundland is still present but extends further south and less to the southeast; the region of ERF ACI north of Scandinavia has a similar spatial pattern but is enhanced somewhat in the alternate year.  Figure H2 (compare to Fig. 15) shows the contributions to ERF ACI from the changes in N d , LWP ic and f c between the PI and PD for the alternate year. It shows that there are some small differences in the spatial patterns, but the overall patterns of the contributions remain very similar with N d and LWP ic changes dominating over the northern NA region and f c changes dominating over the southern NA as was the case for the original chosen time period.
A figure equivalent to Fig. 18 for the alternative year showing the contributions from changes between different cloud states (not shown) reveals a very similar pattern with very similar magnitudes of contribution. Likewise, the alternative version of Fig. 19 (contributions from changes between cloud fractions; not shown) is again very similar to original version in terms of both the pattern and magnitude of the contributions. Data availability. Raw model data are kept on tape archive available through JASMIN (http://www.jasmin.ac.uk/, last access: 11 December 2020) service. Please see http://www.ceda.ac.uk/blog/ access-to-the-met-office-mass-archive-on-jasmin-goes-live/, last access: 11 December 2020 for details on how to arrange access to Met Office data via JASMIN.
Author contributions. DPG designed and ran the simulations, analysed the output, processed the satellite data and produced the text and figures. KSC helped to analyse the model output and provided feedback and edits to manuscript drafts.
Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "Marine aerosols, trace gases, and clouds over the North Atlantic (ACP/AMT inter-journal SI)". It is not associated with a conference.
Acknowledgements. We acknowledge use of the MONSooN system, a collaborative facility supplied under the Joint Weather and Climate Research Programme, a strategic partnership between the Met Office and the Natural Environment Research Council. Version 3 of the CALIPSO-GOCCP (GCM Oriented Cloud Calipso Product; Chepfer et al., 2010) was obtained from https://climserv. ipsl.polytechnique.fr/cfmip-obs/. AMSR-E LWP data are produced by Remote Sensing Systems and sponsored by the NASA Earth Science MEaSUREs DISCOVER Project and the NASA AMSR-E Science Team. Data are available at http://www.remss.com, last access: 11 December 2020. The MODIS data were obtained from NASA's Level 1 and Atmosphere Archive and Distribution System (LAADS) (http://ladsweb.nascom.nasa.gov/, last access: 11 December 2020). CERES-EBAF data were taken from https://ceres. larc.nasa.gov/order_data.php and are the monthly averaged product of observed TOA for which the TOA net flux is constrained to the ocean heat storage.
Finally, we thank the two anonymous referees who helped to improve the manuscript through their constructive comments.
Financial support. This research has been supported by the National Environment Research Council (North Atlantic Climate System: integrated Study (ACSIS) project). Daniel P. Grosvenor was supported by the National Environmental Research Coun40 cil (NERC) national capability grant for the The North Atlantic Climate System Integrated Study (ACSIS) programme (grant no. NE/N018001/1) via NCAS.
Review statement. This paper was edited by Hailong Wang and reviewed by two anonymous referees.