How does the UKESM1 climate model produce its cloud-aerosol forcing in the North Atlantic?

Abstract. Climate variability in the North Atlantic influences processes such as hurricane activity and droughts. Global model simulations have identified aerosol-cloud interactions (ACIs) as an important driver of sea surface temperature variability via surface aerosol forcing. However, ACIs are a major cause of uncertainty in climate forcing, therefore caution is needed in interpreting the results from coarse resolution, highly parameterized global models. Here we separate and quantify the components of the surface shortwave effective radiative forcing (ERF) due to aerosol in the atmosphere-only version of the UK Earth System Model (UKESM1) and evaluate the cloud properties and their radiative effects against observations. We focus on a northern region of the North Atlantic (NA) where stratocumulus clouds dominate (denoted the northern NA region) and a southern region where trade cumulus and broken stratocumlus dominate (southern NA region). Aerosol forcing was diagnosed using a pair of simulations in which the meteorology is approximately fixed via nudging to analysis; one simulation has pre-industrial (PI) and one has present-day (PD) aerosol emissions. Contributions to the surface ERF from changes in cloud fraction (fc), in-cloud liquid water path (LWPic) and droplet number concentration (Nd) were quantified. Over the northern NA region increases in Nd and LWPic dominate the forcing. This is likely because the high fc there precludes further large increases in fc and allows cloud brightening to act over a larger region. Over the southern NA region increases in fc dominate due to the suppression of rain by the additional aerosols. Aerosol-driven increases in macrophysical cloud properties (LWPic and fc) will rely on the response of the boundary layer parameterization, along with input from the cloud microphysics scheme, which are highly uncertain processes. Model gridboxes with low-altitude clouds present in both the PI and PD dominate the forcing in both regions. In the northern NA the brightening of completely overcast low cloud scenes (100 % cloud cover, likely stratocumlus) contributes the most, whereas in the southern NA the creation of clouds with fc of around 20 % from clear skies in the PI was the largest single contributor, suggesting that trade cumulus clouds are created in response to increases in aerosol. The creation of near-overcast clouds was also important there. The correct spatial pattern, coverage and properties of clouds are important for determining the magnitude of aerosol forcing so we also assess the realism of the modelled PD clouds against satellite observations. We find that the model reproduces the spatial pattern of all the observed cloud variables well, but that there are biases. The shortwave top-of-the-atmosphere (SWTOA) flux is overestimated by 5.8 % in the northern NA region and 1.7 % in the southern NA, which we attribute mainly to positive biases in low-altitude fc. Nd is too low by −20.6 % in the northern NA and too high by by 21.5 % in the southern NA, but does not contribute greatly to the main SWTOA biases. Cloudy-sky liquid water path mainly shows biases north of Scandinavia that reach up to between 50 and 100 % and dominate the SWTOA bias in that region. The large contribution to aerosol forcing in the UKESM1 model from highly uncertain macrophysical adjustments suggests that further targeted observations are needed to assess rain formation processes, how they depend on aerosols and the model response to precipitation in order to reduce uncertainty in climate projections.


that in the ECHAM-HAMMOZ model ERF LW P ic was quite similar to ERF N d for most latitudes, except in the Southern Ocean where there was a much larger ERF LW P ic contribution. ERF f c contributions were mostly between around 50-75% of 95 the ERF N d contribution, but again in the Southern Ocean there was a larger ERF f c contribution than ERF N d contribution.
This was also true in the polar regions. Globally the overall contribution from adjustments (ERF f c and ERF LW P ic ) was -0.92 W m −2 compared to an ERF N d contribution of -0.52 W m −2 , so that in this model more aerosol forcing is coming from the more complicated adjustment processes highlighting the importance of evaluating these processes in more detail.
Here we perform a similar analysis but with a different model. This will allow the two models to be compared in terms of how 100 they produce their aerosol forcing. A very different breakdown between the models would mean that one of the models was incorrect and allow the basis for more detailed evaluation. We also focus on the NA region and on surface aerosol forcing rather than top of the atmosphere (TOA) forcing due to the potential importance of aerosol forcing for the climate variability via sea surface temperature changes there. The focus on the NA also allows a more detailed look at the processes than is possible from a global study. We also go beyond the study of Mülmenstädt et al. (2019) by also characterizing the cloud regimes and 105 the changes in cloud regimes for which aerosol forcing predominantly occurs according to the model; this will then allow (in future work) these regimes to be comprehensively evaluated against observations and also against high resolution models.
High resolution modelling needs to be targeted to a smaller selection of regimes given the high computational cost. In addition, observations will be used to evaluate the general model cloud properties (spatial positioning, areal coverage, thickness, droplet concentrations, etc.) since these will affect the resulting aerosol forcing.

Definition of Liquid Water Path
Throughout this paper the term LWP refers to the all-sky value (i.e., including both the cloudy and clear-sky portions of model gridboxes or observed regions) and LW P ic refers to the in-cloud liquid water path, which is that from the cloudy regions only.
It is assumed here that LW P = f c LW P ic .

Model details
We examine the aerosol-cloud interactions in the UKESM1-A model, which is the atmosphere-only version of the coupled UKESM1 (UK Earth System Model), which was submitted as part of the 6 th Coupled Model Intercomparison Project (CMIP6).
UKESM1 is based on the HADGEM3-GC3.1 physical climate model (Williams et al., 2017;Kuhlbrodt et al., 2018), but in addition couples several earth system processes . These additional components include the MEDUSA 120 ocean biogeochemistry model (Yool et al., 2013), the TRIFFID dynamic vegetation model (Cox, 2001) and the stratospherictropospheric version of the United Kingdom Chemistry and Aerosol (UKCA) chemistry model . This version of the UKCA allows a more complete description of atmospheric chemistry compared to HADGEM3-GC3.1. For example, the latter uses an offline climatology for oxidants, whereas in the UKESM1 oxidants are treated explicitly. models; MEDUSA; and TRIFFID. Instead, the UKESM1-A configuration uses observed sea surface temperatures and sea ice concentrations provided by the Program for Climate Model Diagnosis and Intercomparison (Taylor et al., 2000, https:// pcmdi.llnl.gov/mips/amip/). Vegetation (vegetation fractions, Leaf Area Index, canopy height) and surface ocean biology fields (dimethyl sulphide and chlorophyll ocean concentrations) are prescribed from a member of the UKESM1 CMIP6 historical ensemble . This ensures that the prescribed vegetation and ocean biological fields mirror those simulated by and particles from sea and land surfaces are prescribed from the CMIP6 inventories; see Mulcahy (2020) for details of the specific implementation for the UKESM1. 135 The atmospheric component is the GA7.1 atmospheric configuration of the Unified Model (UM). Full details of this can be found in Walters et al. (2019) and Mulcahy (2020). Here we only describe in detail the features that are more relevant to aerosol-cloud interactions. We use an N96 horizontal resolution, which is 1.875 × 1.25 o (208 ×139 km) at the equator. 85 vertical levels are used between the surface and 85 km altitude with a stretched grid such that the vertical resolution is 13 m near the surface and around 150-200 m at the top of the boundary layer. We chose this resolution since it is the same as that 140 used for long climate runs in the CMIP6 model intercomparison and is the same horizontal resolution as that used in the B12 study.
Aerosol number concentrations are treated prognostically with the GLOMAP multi-modal scheme (Mann et al., 2010(Mann et al., , 2012, which uses five log-normal aerosol size modes and includes sulfate, sea-salt, black carbon and organic carbon chemical components. These aerosol components are treated as being internally mixed within each size mode. Mann et al. (2010) and Mann 145 et al. (2012) provide further details with some small changes for the implementation within UKESM1-A described in Mulcahy (2020). Mineral dust is simulated separately using the CLASSIC dust scheme (Woodward, 2001;Mulcahy, 2020).
Shallow, mid and deep convection are parameterized separately to other cloud types (see Walters et al., 2019, for details). The parameterizations do not take into account aerosol, or droplet concentrations and they use their own simplified microphysics scheme. For cloud that is not shallow, mid, or deep convection (termed large-scale cloud), the UKCA-Activate scheme is 150 used to calculate cloud droplet concentrations from the aerosol size distribution using the West et al. (2014) scheme based on the parameterisation of droplet activation in Abdul- Razzak and Ghan (2000). Cloud droplet concentrations at cloud base are replicated vertically throughout contiguous columns of cloud. The cloud droplet activation scheme is a diagnostic scheme since it is run on each model time step without consideration of how many cloud droplets were present before. The cloud droplet number concentration is then passed to the radiation and microphysics schemes.

155
The large-scale cloud microphysics is single-moment, such that the mass of liquid water, but not the cloud droplet number, is advected by the model and retained in memory between model time steps. It is based on Wilson and Ballard (1999), but with improvements to the warm rain parameterisations suggested by Boutle et al. (2014), which include bug fixes, a treatment of rain fraction that is consistent with the prognostic rain formulation, a switch to the Khairoutdinov and Kogan (2000) pa-rameterisation for autoconversion and accretion, and a bias correction for the latter processes to deal with sub-grid variability of cloud and rain water. Relative to Wilson and Ballard (1999) there is also an improved treatment of drizzle rates (Abel and Shipway, 2007) and a prognostic treatment of rain that allows the 3-dimensional advection of precipitation. The introduction of the latter required modifications to be made to the aerosol wet scavenging processes as described in (Mulcahy, 2020). The bulk properties (cloud fraction, cloud liquid water content, vertical overlap, etc.) of large-scale cloud are parameterized using the prognostic cloud fraction and prognostic condensate (PC2) scheme (Wilson et al., 2008a, b) with modifications described 165 in Morcrette (2012). The atmospheric boundary layer is parameterizated using the turbulence closure scheme of Lock et al. (2000) with modifications described in Lock (2001) and Brown et al. (2008).
There are some some differences in the treatment of aerosols in UKESM-A and the physical climate model (HADGEM-GC3.1) primarily related to natural aerosols, aerosol chemistry and the prescription of anthropogenic SO 2 emissions (see Mulcahy, 2020, for details).

Simulation details
The model is run from 1st March 2009 to 28th March 2010. The first 27 days were used to spin-up the aerosol and chemistry fields leaving one year of data for analysis. The time period was chosen to allow comparisons with relevant field campaigns that will be performed in future work. Two parallel simulations were performed; one using pre-industrial (PI) CMIP6 aerosol emissions from the year 1850 and the other using present-day (PD) emissions (i.e., the CMIP6 emissions corresponding to 175 the simulation period). Both simulations were nudged every 6 h to ERA-Interim horizontal wind fields between ∼2277 m and ∼47,251 m (applied between the 18th and 76th model level from the surface). The nudging ensures that the meteorology in the two runs is very similar, thereby allowing cloud radiative effects due solely to the aerosol perturbation to be calculated.
Following the recommendations of Zhang et al. (2014), we do not nudge the temperature field in order to allow fast-acting local responses to aerosol-induced temperature changes (such as those from precipitation suppression, ARI and semi-direct 180 radiative effects). Instantaneous model fields are output every 27 hours, which allows the sampling of the diurnal cycle and more complex output analysis than is possible using time-averaged data.

Surface forcing calculations and forcing partitioning
In this paper we are concerned with the shortwave aerosol ERF at the surface since we are interested in the effects on SSTs and N. Atlantic climate variability. To separate the aerosol ERF into ARI and ACI components we use output from the triple calls to 185 the radiation scheme that are made by the model every timestep. One call calculates the surface SW fluxes taking into account both aerosols and clouds (designated here as SW aerosol+cloudy ), one call calculates the fluxes with the background aerosol removed (i.e., the aerosol outside of clouds and interstitial aerosol; aerosol Twomey and adjustment effects on the clouds are still included here; SW clean+cloudy ) and one call calculates the fluxes with both the background aerosol and clouds removed are then calculated as :- The corresponding instantaneous radiative forcings due to anthropogenic aerosols (ERF ARI and ERF ACI ) can then be calculated from simulations with PI and PD aerosol emissions :- We also decompose the surface ACI aerosol forcing into contributions from changes in N d , LW P ic and f c between the PI and PD. This is achieved using an offline calculation of the surface SW fluxes, which allows the magnitude of each term to be assessed individually. The approach is described in Grosvenor et al. (2017) and in Appendix E.

Model evaluation against satellite observations 200
Here we evaluate the PD simulation against satellite observations. The motivation for the model evaluation is that without a good representation of cloud properties and spatial distribution it is likely that the model aerosol forcing will be incorrect.
For example, if the simulated clouds have a cloud fraction that is too high then a cloud albedo perturbation due to aerosol would be larger than in reality. Placement of clouds in the wrong locations could affect forcing via the differing incoming solar insolation, or could affect their chances of interacting with sources of aerosol. Where possible we use the COSP (Cloud when comparing cloud fractions since the cloud detection threshold (e.g., in terms of optical depth) varies between the different satellite instruments and needs to be consistent with the threshold LWC used to define a cloud in the model. Furthermore, COSP accounts for the effect of overlying layers of cloud, which affects the observation of underlying clouds.

210
Following Gustafson and Yu (2012), model biases are quoted in terms of the normalized mean bias factor (NMBF) and the normalized mean absolute error factor (NMAEF), which are both expressed as a percentage, in order to provide unbiased metrics that are symmetric about zero. These quantities are similar to the mean percentage bias and the RMSE except that account is taken of whether the model is greater or lower than the observations. In addition, the NMAEF does not involve squaring the error, which can lead to an exaggerated influence from larger biases. Appendix B provides the definitions for 215 these metrics. Table 2 lists these bias metrics (along with the spatial correlation coefficients, r, between the model and the observations) for the different regions considered (see next section) and for the different cloud variables. It should also be borne in mind that as well being due to issues with the representation of clouds in the model, differences in cloud properties between the model and satellite might also be due to cloud adjustments to aerosol that are too strong or weak, and that it is difficult to distinguish between the two. reach around 70% especially towards the west near the coast of Canada and north of Iceland and Scandinavia. More than 40-50% of the low-altitude clouds in these regions are stratocumulus (Wood, 2012). The model has a positive bias of around 30 to 40% in the region off the eastern Canadian coast and along the east coast of the USA. There is also a band of higher cloud fractions (around 20-30%) down the west coast of Africa in both the model and observations; this region is also associated 230 with a high occurrence of stratocumulus (40-60% of the low clouds according to Wood, 2012). Model biases are minimal in these regions. The model and observations both show lower cloud fractions in the southwest region of the domain.
The model captures the main pattern of low-altitude cloud fraction well giving us some confidence that it provides a realistic representation of the broad cloud types and locations in the NA. However, the model uses observed SSTs and the winds above the boundary layer are nudged to reanalyses, which will help the model to replicate the correct cloud patterns to some 235 degree. But even with the correct SSTs and wind fields, accurate simulation of the spatial pattern of f c is a strong test of the model boundary layer and cloud schemes, in particular the correct vertical thermodyamical structure and entrainment, the corresponding cloud fraction response, etc. Furthermore, the free-running ocean-coupled version of the model (the UKESM1; Sellar et al., 2019), which has no wind nudging and predicts the SSTs itself, also reproduces the cloud pattern and amount well (Robson et al., 2020, submitted to JAMES). 240 We define a sub-region (marked in Fig. 1 and denoted northern North Atlantic, or northern NA) to represent the region where the stratocumulus cloud fraction is large. We also define a second region immediately to the south of the northern NA region 8 https://doi.org/10.5194/acp-2020-502 Preprint. Discussion started: 24 June 2020 c Author(s) 2020. CC BY 4.0 License.
(southern North Atlantic, or southern NA, see Fig. 1) where the annual mean stratocumulus cloud cover is lower (Wood, 2012) indicating more broken stratocumulus, and/or a lower frequency of occurrence of stratocumulus. The overall low-altitude f c area-weighted biases for these regions were 5.1% and -23.9% for the northern and southern NA regions, respectively (see 245   Table 2).
For mid-altitude clouds (Fig. 2) the spatial pattern is again captured well by the model (r=0.80). Percentage biases are generally low in the northern NA region with a mean model bias of -11.0%. There are larger negative biases in the southern part of the domain with a mean bias of -25.8% for the southern NA region; however, the observed cloud fractions are very low in those regions making higher percentage biases more likely. There are also positive biases of up to around 30% in the 250 stratocumulus region north of Scandinavia.
The modelled high-altitude cloud fraction is generally biased high over the ocean with mean biases of 7.3% for the northern NA and 4.5% for the southern NA, although the spatial pattern is again good (r=0.79 for the whole region). Positive biases are particularly evident in the region north of around 60 o N where biases of up to 90 % occur. Since low-altitude clouds are likely to be the most important for aerosol-cloud interactions we will leave an investigation into the biases in the mid and high altitude 255 clouds to future research.

LWP evaluation
Liquid water path is important for cloud optical depth and hence cloud albedo. For example, for a fully overcast cloud with a liquid water content that increases with height in a manner consistent with adiabatic uplift, the optical depth (τ c ) is proportional to LW P 5/6 ic N 1/3 d . Therefore, a given relative change in LW P ic produces more than twice the relative change in τ c that the 260 same relative change in N d does. LW P ic is also important in terms of cloud physics since it helps to determine rain rates and droplet sizes.
Microwave satellite instruments such as AMSR-E (Advanced Microwave Scanning Radiometer for Earth Observing System; e.g., Wentz and Meissner, 2000), whilst only being able to retrieve over ocean surfaces, probably represent the most accurate of the available retrievals of LWP since they are not subject to the biases associated with retrievals of LWP that use a combination 265 of visible and shortwave-infrared wavelengths (e.g., MODIS, MODerate Imaging Spectroradiometer; Salomonson et al., 1998), they give a better representation of the total column LWP than from such retrievals, are not affected by the presence of ice and are available in both the daytime and nighttime (O'Dell et al., 2008;Elsaesser et al., 2017). Note that LWP from AMSR-E is the all-sky LWP and so includes the contributions from both clear and cloudy regions (as does the LWP output from the model), whereas MODIS retrieves the in-cloud LWP, LW P ic . However, AMSR-E retrievals are still subject to biases; Lebsock and Su  (2014) suggest that the main cause of bias is the presence of rain (and the inability to directly detect whether rain is present).
Therefore, as suggested in Elsaesser et al. (2017) we place more confidence in the microwave LWP observations when the ratio of the all-sky LWP to TLWP (LWP + RWP, where RWP is rain water path) is high; we denote the ratio as f LWP . A caveat here, though, is that the partitioning of LWP and RWP from microwave instruments, and hence the estimate of f LWP , is itself uncertain.

275
Given the uncertainties when rain is present we also initially only use the model LWP not the RWP. Furthermore, we only consider the model LWP from the large-scale cloud scheme and not that from the convective parameterization. Convective LWP will be mostly associated with heavily raining environments and so the observations in such regions are again more uncertain.
However, we examine and discuss how RWP and convective LWP and RWP change the model evaluation in Appendix C. The AMSR-E instrument is onboard the Aqua satellite, which has local overhead overpass times of 01:30 and 13:30 and we use 280 data from both overpasses. Therefore, when comparing to AMSR-E for LWP and RWP the model local times within 3 hours either side of both of these times are used. This is important for LWP and RWP because cloud water content can have a large diurnal cycle. Fig. 4 shows f LWP from the model and the AMSR-E satellite along with the model bias relative to the retrieval. f LWP is quite high throughout the region in both the model and the observations, with values over the ocean generally larger than around 0.8.

285
The model bias is generally small (<10%) except off the east coast of the USA where there is an overestimate, although with biases mostly less than 0.1 (∼15%). This region corresponds to a region of high LWP (see Fig. 5) and might suggest that too much rain is produced by the model. Although, given that f LWP is low, it is also a region where the satellite estimate of f LWP is likely to be more uncertain. The largely good agreement between the model and satellite for f LWP provides some confidence in the ability of the model to represent cloud and rain formation processes, but insofar as we assume the model to be realistic, it 290 also lends confidence to the satellite estimates of this quantity for this region, which, as explained above, is also uncertain.
LW P ic values were estimated by dividing the time-mean all-sky LWP by the time-mean low-altitude cloud fraction from CALIPSO. The same was done for the model using the time-mean CALIPSO low cloud fraction from the COSP satellite simulator. For reference, the evaluation of the grid box mean LWP (including cloudy and clear regions) is shown in Appendix C.  no filtering for f LWP > 0.99 now have even larger LW P ic biases, again indicating that the model low clouds may to be too thick. The negative biases west of Greenland remain. Since there is generally little RWP in this region and the f LWP filtering has been performed, this indicates that the LW P ic evaluation in this region is likely to be robust. There are now negative 310 biases extending from the coast of Florida up to near the UK suggesting that clouds there are too thin, although instrumental uncertainties may still be high here given the uncertainty in the satellite estimate of f LWP and the larger amount of RWP that occurs here.
The LW P ic values from the AMSR-E observations in Fig. 6 show remarkably little spatial variability, suggesting that non-precipitating clouds tend to be fairly uniform across broad regions with an LW P ic value of around 60-100 g m −2 . Cloud 315 fraction ( Fig. 1) and N d (Fig. 7) vary quite widely across the same region potentially indicating a physical process that maintains a constant LW P ic despite a varying aerosol environment and cloud regime. Cloud droplet concentration (N d ) is an important quantity because it generally represents the first step in the chain of processes by which aerosols affect clouds. N d gives some indication of the number of CCN that were available to produce clouds as well as other factors such as updraft speed, droplet collision coalesence, droplet scavenging by rain, cloud evaporation, etc.

Cloud droplet concentration evaluation
Thus, an evaluation of model N d can give an idea of how well the model captures a range of processes.
Here we evaluate N d using a 1×1 o resolution data set calculated from MODIS retrievals of τ c and r e . Two-dimensional fields of N d are derived by the retrieval since it is assumed that N d is constant throughout the depth of the cloud, which has been shown to be a good approximation by aircraft observations of stratocumulus (Painemal and Zuidema, 2011 which the liquid cloud fraction is larger than 80% and for which the mean cloud top height is below 3.2 km were included for the satellite N d calculation in order to help prevent satellite retrieval errors (see Grosvenor et al., 2018b). The same filtering was applied to the model to minimise sampling errors; the COSP MODIS liquid cloud fraction and a calculation of model cloud top height were used for this. The satellite data set was re-gridded to the model grid before comparison. Figure 7 shows that the modelled and observed N d are largest near the continents and that N d decreases towards the middle 335 of the Atlantic Ocean. This fits with the idea that CCN are scavenged during eastward transport . There are also likely to be some dilution effects as distance from the sources increases. Other factors may also influence the spatial N d pattern such as changes in boundary layer height, predominant cloud type, other meteorological factors, etc. CCN concentrations also increase close to the European and African source regions, consistent with the prevailing northerly wind, which transports European and African pollution down the coast. Overall, the model produces a reasonable spatial pattern with a 340 spatial correlation coefficient of 0.67 for the whole region. However, the model has a tendency to overestimate N d over the southern part of the North Atlantic region, off the east coast of the UK and over Europe. Conversely the model underestimates in the northern part of the domain. The NMBF is -20.6% for the northern NA region where stratocumulus dominates compared to 21.5% for the southern NA region where the clouds are more broken.

345
CERES-EBAF data was taken from https://ceres.larc.nasa.gov/order_data.php and is the monthly averaged product of observed TOA for which the TOA net flux is constrained to the ocean heat storage. (including land) is -2.4% and it is 5.8% and 1.7% for the northern NA and southern NA regions, respectively.
The SW TOA biases can be caused by a combination of biases in f c , T LW P ic and N d . To estimate the relative contributions of these different biases individually to the SW TOA bias we used a similar calculation to that described in Section 2.4, but The SW TOA values from the calculations are somewhat smaller than those actually modelled or observed. If considering only oceanic regions, the underestimates for the model off the east coast of Canada in the N. Atlantic stratocumulus region are perhaps of greatest concern since this is where the SW TOA values and their biases were highest. The underestimates are likely due to the approximations and assumptions that have been made with this approach such as using the time-mean shortwave 365 flux and cloud fields. However, the agreement is sufficient for the purposes of estimating the relative contributions from biases in individual cloud properties to the SW TOA bias.
Next, perturbations to SW TOA were calculated by applying the model biases in the individual cloud fields (f c , LW P ic and N d ) to the observed cloud values on a one-at-a-time basis (Fig. 10). They are expressed as a percentage of the observed timemean SW TOA field. The sum of these perturbations, again expressed as a percentage, is shown in Fig. 9c and is intended for  Next, an attempt to quantify the relative percentage contributions (P x ) of the individual cloud field properties (denoted as x, where x is either f c , LW P ic , or N d ) to the total SW TOA bias is made using the following equation :- where ∆SW T OA x is the perturbation in SW TOA due to the bias in cloud field property x. Figure 11 shows that f c biases  Note, that the absolute values of the percentages for each grid-box add up to 100%.

Aerosol Forcing
We now examine maps of the temporal mean aerosol forcings following Eqn. 2. Note that the calculations were performed using the instantaneous fields from the 27-hourly output, which ensures that the diurnal cycle is sampled evenly throughout the course of the simulation. 400 Figure 12 shows maps of ERF ARI and ERF ACI . The magnitude of ERF ACI is generally larger than that of ERF ARI for oceanic regions. For example in the northern NA box the mean ERF ARI is -0.50 W m −2 and ERF ACI is -3.1 W m −2 (see Table 3). For the southern NA box ERF ARI and ERF ACI are -1.0 W m −2 and -1.8 W m −2 , respectively. The dominance of ERF ACI in over the N. Atlantic ocean region is in agreement with B12 ( Fig. 4b of that paper). The overall spatial pattern of ERF ACI is also similar to that in B12 with a band of large negative forcing running across the Atlantic from west to east 405 between around 25 and 50 o N, and another region of negative forcing following the west coast of Africa. Some differences are apparent, though; for example, the negative forcing is smaller in magnitude southwest of the UK in our simulations and larger in magnitude near Africa south of the equator. However, over the whole of the NA domain ERF ARI is larger in magnitude than ERF ACI (-1.9 vs -1.7 W m −2 ) due to the fact that ERF ACI is usually larger over land. The dominance of ERF ARI in the southern regions of the domain will also have more influence on the area-weighted average.  (or also adjustments) since they affect bulk cloud properties, as opposed to the response of N d that is termed a microphysical response. The macrophysical responses are mostly due to the suppression effects of aerosols on rain rates via the autoconversion and ERF f c are the forcing estimates from one-at-a-time perturbations of Nd, LW Pic and fc, respectively, and ERFACI,sum is the sum of these three forcing estimates. ERFACI,macro is the forcing contribution from the cloud macrophysical changes, i.e., the sum of ERFLW P ic and ERF f c . All values are area-weighted to account for the variation in area of the model grid-boxes.
process. This is demonstrated by Figs. D1 and D2, and Table 3 that show the impact of preventing aerosol from affecting the autoconversion process (see Appendix D for details). We hypothesize that the changes in LW P ic over the northern NA regions and lesser changes in f c reflects the presence of stratocumulus clouds with very large areal coverage in the north that can only 425 respond to the suppression of rain by thickening (i.e., more LW P ic ) rather than by increasing f c . In the southern NA the cloud coverage is much less due to the presence of broken stratocumulus and cumulus clouds, which respond in a different way to the rain suppression, namely by increasing their coverage (f c ) more than their thickness (LW P ic ).
Based on the maps of the cloud property changes alone it is difficult to judge the relative contributions of each type of change to the forcing. Thus, we now make a quantitative estimate of this using offline radiative calculations as described in 430 Section 2.4 and Appendix E. Fig. 14a shows the ACI forcing estimated using the offline method following Eq. E13 for the case where all of the cloud variables have been perturbed from their PI values to their PD values (ERF ACI,all ). Fig. 14b shows the sum of the contributions calculated by separately perturbing the individual cloud properties in one-at-a-time tests (ERF ACI,sum ). These can be compared to the forcing diagnosed from the full model outputs as generated by the online radiation code (Fig. 12b). The general pattern and magnitude of the actual and estimated forcings are very similar. The fact that 435 ERF ACI,all and ERF ACI,sum are similar indicates linearity in the effect of the perturbation of the individual cloud properties, such that they are almost directly additive. The mean forcing over the northern sub-region (northern NA) is -3.2 W m −2 for ERF ACI,sum , -3.0 W m −2 for ERF ACI,all and -3.1 W m −2 for the actual forcing. In the southern sub-region (southern NA) the mean forcings are -1.9, -2.0 and -1.8 W m −2 for ERF ACI,sum , ERF ACI,all and the actual forcing, respectively. Overall, the match is very good suggesting that the estimation technique is sound and that it is likely to be useful for estimating the 440 contributions from the different cloud property changes to the forcing. Figure 15 shows the contribution to the surface ACI forcing from the changes in N d , LW P ic and f c following Eqn. E15.
Percentage contributions (P x ) from the individual cloud fields (denoted as x, where x is either N d , f c , or LW P ic ) to the sum of the absolute contributions are calculated in a similar way to Eqn. 3 :- where ERF x is the ACI forcing contribution due to the change in cloud field x. These values are mostly negative since the ACI forcing is negative, but positive contributions and thus positive ERF x are possible. Figure 16 shows the results.
From the summary of the area means in The sum of the macrophysical contributions from f c and LW P ic (termed ERF ACI,macro ) is the dominant contribution to aerosol forcing in the NA region providing 63.9 % of ERF ACI,sum (=ERF N d + ERF LW P ic + ERF f c ). This is important 455 because these changes are likely to be associated with a higher model uncertainty than pure cloud albedo effects (due to N d changes alone). The macrophysical contribution to ERF ACI,sum is larger in the southern NA region than the northern NA region (71.5 % versus 54.4 %). Figures 15 and 16 show the spatial patterns of the contributions and reveal that the contribution from N d is restricted to the northern part of the domain (including the region of the northern NA sub-region) and is generally small elsewhere, except 460 for a small contribution down the west coast of Africa. Somewhat surprisingly this contribution is not co-located with where the largest changes in N d are simulated in Fig. 13, but occurs to the north and east. This likely reflects the fact that the cloud fraction is large in this region due to the prevalence of stratocumulus, which results in a large radiative impact from even modest N d increases. A similar argument can be applied for LW P ic changes. The forcing contribution from LW P ic changes follows a similar spatial pattern to that from N d changes and with similar magnitudes, although for LW P ic the largest LW P ic changes

Determining the most important meteorological situations for surface ACI forcing
We now attempt to determine the meteorological situations that are most important for aerosol forcing since this information 470 can then be used to target particular situations for modelling at high resolution, or for more detailed comparisons to observations.
Initially we quantify the forcing as a function of "cloud state", which we define in terms of the 8 possible combinations of low-, mid-and high-altitude clouds (see Fig. 17), which are defined to be consistent with the ISCCP definitions (Rossow et al., 1999): cloud top pressure (CTP) > 680 hPa for low-altitude cloud, 680<CTP<440 hPa for mid-altitude cloud and 475 CTP<440 hPa for high altitude cloud. For this analysis cloud is considered present if the the cloud fraction is greater than 0.01. The results are presented in the form of 2D matrix plots showing the contribution to the overall ERF ACI for different combinations of PI and PD cloud states (Fig. 18) for both the northern NA and southern NA regions. The contributions take into account the frequency of occurrence of each pairing of PI and PD cloud states so that the values associated with the colours in the plots add up to the overall regional forcing.

480
Generally for both regions the largest negative contributions to ERF ACI are associated with low-altitude cloud states (cloud states that have even numbers). To some extent this is expected since the convection scheme, which is likely one of the sources of higher altitude clouds, is not currently coded to respond to aerosol and because the lower-altitude clouds are closer to the surface aerosol sources. For the northern NA region, the largest terms are those with the same low cloud state in the PI and PD (i.e., the diagonal terms for the even numbered states in the matrix). Although it should be noted that the diagonal terms could involve f c or LW P ic changes within these states. The largest overall term occurs when both the PI and PD are in cloud state 8 (i.e., no change in cloud state). This is when low, mid and high altitude clouds are present at the same time. However, if the mid and high altitude clouds are thin, it is still possible that the low-altitude cloud is having the largest impact here. to meteorology since the model is only nudged above the boundary layer and only the winds are nudged. Thus, it is perhaps more useful to consider the net forcing contribution from both the PI to PD transitions and those from the reciprocal transitions from low+high altitude cloud to low+mid+high cloud, suggesting that the creation of mid-altitude clouds is also having an 500 impact. Nevertheless, these transition terms are considerably smaller than the diagonal (no transition) terms described earlier for the northern NA region.
For the southern NA region, the low-altitude-only state (no. 2) in PI and PD again has a large impact (-0.53 W m −2 ). There is also now a large negative contribution due to transitions from the clear state (no. 1) in the PI to the low-altitude only state 2 in the PD, but again there is a positive forcing due to the reverse situation, with a net contribution of -0.29 W m −2 . This 505 transition indicates the creation of additional low-altitude cloud in the PD due to aerosol effects, which is consistent with the finding in the previous section that macrophysical changes to clouds (in particular changes in f c ) are more important in this region compared to the northern NA region and suggest that low-altitude cloud is one of the most important cloud types for this.
However, as in the northern NA region pairings with the same cloud states in the PI and PD (the diagonal terms) that involve mid-and high-altitude clouds are also important for forcing, particularly for states 6 and 8, although these are relatively less 510 important than in the northern NA region suggesting that higher altitude cloud has less effect on the forcing in the SSA region.

Determining the most important cloud types for surface ACI forcing
We now break down the forcing as a function of the PI and PD cloud fractions in order to determine the types of low cloud involved, i.e., whether they are stratocumulus (high cloud fraction) or trade cumulus (low cloud fraction). Given that the contributions involving only cloud state 2 (low-altitude only clouds) were the single largest contribution in Fig. 18 for the 515 southern NA region and provided the joint second-largest contribution for the northern NA (with low clouds also involved for the highest contributer) we examine the forcing contribution for this cloud state only, along with the clear sky state (no. 1).  Fig. 17). The overall contribution is shown, which includes the frequencies of occurrence such that the sum of all of the values gives the overall regional mean contribution. Left:the northern NA region; right: the southern NA region. The top row shows all of the combinations. In the bottom row plots the contributions due to PI to PD transitions between states are added to the same transition between PD and PI in order to get the net contribution; see Appendix F for details of this.
For the northern NA region Fig. 19 shows that by far the largest single term comes from the situation when both the PI and PD are fully overcast (-2.24 W m −2 ). Consistent with the previous figures, this represents mainly the brightening of stratocumulus clouds due to increases in droplet concentrations combined with a smaller macrophysical effect from LW P ic 520 increases. There are also some large negative contributions associated with increases in cloud fraction between PI and PD for this region too. The largest net contributions (ranging from -0.1 to -0.37 W m −2 ) associated with cloud fraction changes 28 https://doi.org/10.5194/acp-2020-502 Preprint. Discussion started: 24 June 2020 c Author(s) 2020. CC BY 4.0 License.

Model evaluation conclusions
The spatial pattern of low, mid and high altitude clouds in the model compare well against observations. However, in the southern North Atlantic (southern NA) the model underestimated the low and mid-level cloud fractions by -23.9 and -25.8%, respectively. Low-altitude cloud biases in the northern NA sub-region were positive, but lower in magnitude (5.1%).

550
Low-altitude cloud fraction biases have the potential to significantly impact aerosol forcing since they are closest in altitude to the aerosol sources. If we assume that the PI cloud cover is biased by a similar amount to the PD cloud cover and assume no cloud adjustments to aerosol then the bias in forcing will be similar to the bias in f c . The results from this paper suggest that the second assumption is reasonable for the northern NA region because the Twomey effect dominates. Thus we might expect the 5.1% low-altitude f c bias there to make a small contribution to any error in forcing bias.

555
However, in the southern NA f c changes in response to aerosols (adjustments) were large, so the above assumptions are less valid and the effects on forcing less clear. The negative present-day f c biases could indicate a cloud fraction response to aerosol that is too low, which would cause subsequent negative forcing biases. On the other hand, a model f c that is too low may mean that the model is in a broken, precipitating cloud regime too often. Such regimes are thought to be more sensitive to aerosols and more prone to produce cloud adjustments (Ackerman et al., 2004b). In this case the model forcing values would be too 560 large. Further work would be needed to quantify the effect of the model f c biases on the aerosol forcing.
In-cloud Liquid Water Path (LW P ic ) was evaluated vs the AMSR-E microwave radiometer satellite instrument. The model percentage LW P ic biases are small in the northern part of the Atlantic where higher cloud fraction stratocumulus dominates and where the satellite retrievals are likely to be more reliable due to lower rain amounts. In the NE part of the domain (again that the stratocumulus in this region are too thick in the model. Elsewhere, off the east coast of Florida there tended to be negative LW P ic biases. When the raining data points were filtered out the biases tended to get worse, but the spatial pattern was preserved, suggesting that the biases are likely real rather than a result of artifacts in the observations or comparison method. The overestimate of LW P ic in the model in the regions dominated by stratocumulus (northern part of the domain) has 570 implications for its ability to simulate the correct cloud effective radius (r e ) given the correct cloud droplet concentration. This may confuse evaluation efforts using r e . An incorrect simulation of cloud droplet size also has implications for the conversion of cloud water into rain (autoconversion), suggesting the model might convert cloud into rain too readily at a given cloud droplet concentration; this will affect rain rates, but may also make the response of cloud fraction more sensitive to aerosols and hence enhance aerosol forcing. LW P ic is also likely to affect the sensitivity of the cloud albedo to cloud droplet concentration (and 575 hence aerosol), which will also affect the magnitude of aerosol forcing.
Model cloud droplet number concentrations (N d ) were compared to satellite observations from MODIS. The model captures the observed spatial pattern well suggesting that the processes that govern the removal or dilution of aerosol as it travels eastwards from the American continent are broadly captured by the model. Such processes are likely to include: the scavenging of CCN during precipitation ; dilution effects as distance from the sources increases; variations in boundary 580 layer height, which may affect aerosol scavenging due to cloud type and precipitation changes, and may cause the dilution of aerosol concentration over deeper boundary layers; as well as other unidentified meteorological effects. However, the model underestimates N d over most of the northern NA, but overestimates it over the southern NA, which could indicate that there are some issues with aerosol sources and/or scavenging. Cloud processes could also be involved, such as updraft speed distribution errors, problems with the droplet activation scheme, or issues relating to droplet removal (evaporation, coagulation, etc.). A 585 caveat to the conclusion that the model has biases in N d is that there is uncertainty in the MODIS N d observations that may be of the same magnitude, or more, as the model biases (Grosvenor et al., 2018b).
The positive model N d biases associated with the aerosol outflow regions of the USA and Europe have the potential to cause a positive bias in the aerosol forcing. In the northern parts of the Atlantic, which are less affected by anthropogenic aerosol, the negative N d biases may indicate a lack of aerosols from natural sources (or too much scavenging). This would 590 create pre-industrial background aerosol concentrations that are too low, leading to a positive forcing bias (e.g., Carslaw et al., 2013). Further investigation into the cause of the spatial N d pattern using model experiments; an examination of the realism of anthropogenic and natural aerosol sources; and quantification of the MODIS N d uncertainties (e.g., through comparison with aircraft data for this region) is therefore warranted.
Shortwave Top of the Atmosphere (SW TOA ) radiative fluxes from the model were compared to those from CERES. Again, the 595 model reproduced the observed spatial pattern well indicating a general good fidelity of overall cloud positions and properties.
However, there was also an overestimate in most regions, showing that the clouds were either too bright, or occur too frequently, or both. https://doi.org/10.5194/acp-2020-502 Preprint. Discussion started: 24 June 2020 c Author(s) 2020. CC BY 4.0 License.
The main cloud properties that determine SW TOA fluxes are f c , LW P ic and N d . Offline radiative analysis suggested that the f c bias is likely to contribute the most to the SW TOA biases, particularly in the statocumulus region in the northern Atlantic, 600 suggesting that biases in f c are the most important to address. This result also shows that biases in the response of cloud fraction to aerosol are likely to cause large forcing biases. LW P ic biases were determined to be most important in causing the positive SW TOA bias north of Scandinavia and so improvements there are also needed. N d biases were deemed important in causing SW TOA biases in the northern N. Atlantic, but with only small regions of high impact in the southern N. Atlantic regions. However, it should be considered that a small impact from biases in a variable on the mean SW TOA flux bias does not 605 preclude a large impact on forcing since the magnitude of the aerosol forcing is a lot smaller (of order 1-2%) than the mean SW TOA flux and so small biases can still have the potential to produce a significant forcing error.

Aerosol forcing conclusions
The ARI and ACI surface aerosol forcings were calculated using the instantaneous model output. In agreement with B12 the magnitude of the ARI forcing over ocean grid boxes in the NA region generally lower than the ACI forcing. The ACI forcing 610 for the NA region (including land points) is negative with a mean value over the region of -1.7 W m −2 , showing that aerosol forcing is likely to be important in terms of the energy received at the ocean surface in this region. The spatial pattern agrees well with that from B12, however, the magnitude is somewhat lower. This likely reflects the steps taken to reduce the aerosol forcing in the UKESM model as described in Mulcahy et al. (2018).
The ACI forcing was decomposed into contributions from changes between PI and PD in N d , LW P ic and f c (∆N d , ∆LW P ic 615 and ∆f c , respectively) using an offline calculation of the net surface SW downwelling radiative fluxes. The results showed that ∆N d and ∆LW P ic contributed most in the northern part of the NA where overcast stratocumulus clouds are prevalent. ∆f c had the highest contribution in the southern subtropical regions where cloud fractions are lower, indicating more broken clouds.
We speculate that the higher cloud cover in the northern region allows the ∆N d and ∆LW P ic effect to be larger since the associated albedo perturbations can operate over a wider area. In the southern region it is likely that the broken cloud scenes 620 are more susceptible to aerosol-induced cloud fraction increases. In contrast, in the northern NA region the overcast clouds cannot increase in cloud fraction much further.
Changes in LW P ic and f c can be considered to be cloud macrophysical responses because they are linked to the thermodynamics of liquid water production, whereas changes in N d can be considered more of a microphysical response since they are mainly caused by aerosol changes, although there is a link between the two. In the northern NA region the macrophysi-625 cal changes contribute to approximately 54.4 % of the total ACI forcing, whereas in the southern North Atlantic region they contribute 71.5 %.
Our results are important in the context of Malavelle et al. (2017) who showed that, for an earlier version of this model, the macrophysical responses to volcanic sulphate aerosol perturbations are likely to be small in the impact region of the volcano (north of ∼50 o N) . Note that the Malavelle et al. (2017) study quoted responses of the all-sky LWP, which is the product 630 of LW P ic and f c . Our Figures 15 and 16 show that with an updated model, using a year of data and considering PI to PD aerosol changes, macrophysical changes (dominated by LW P ic changes) have a similar radiative impact to N d changes for this region. It would be interesting to discover whether the implied larger LW P ic response to aerosol in the newer model in the Holuhraun region is now larger than suggested by the observational constraint used in Malavelle et al. (2017). Furthermore, our results show that macrophysical responses are even more important in other NA regions, especially the southern NA. This 635 result highlights that cloud responses to aerosol perturbations in specific regions are not necessarily representative of those elsewhere.
In order to understand the cause of the aerosol-induced LW P ic and f c increases in the model (i.e., the macrophysical responses), simulations have been performed where N d has been prevented from affecting rain formation from cloud liquid through the autoconversion parameterization. Instead a constant N d value is used. In these simulations the changes in LW P ic 640 and f c between PI and PD in the N. Atlantic region drop to nearly zero, or even decrease between PI and PD, strongly indicating that the mechanism for the increases in macrophysical quantities in the standard simulations is the suppression of rain. This is consistent with previous studies which have shown that precipitation is a major factor in causing cloud breakup (Stevens et al., 1998;Berner et al., 2013) mainly due to the stabilization effect of rain evaporation in the lower boundary layer, but also with some positive feedback due to the removal of aerosol by rain and the subsequent enhancement of rain due to the lower 645 aerosol concentrations (larger droplets). In a global model the representation of the thermodynamics of this process will be reliant on the boundary layer scheme, along with input from the cloud microphysics scheme. Since both processes are highly parameterized in global models it is likely that there is some uncertainty in their representation and thus uncertainty in the response of the clouds to aerosol. Since the contribution to aerosol forcing from changes in macrophysical quantities is very large in this model, this highlights the need for detailed and targeted model evaluation of these processes.

Important meteorological situations: conclusions
Contributions to the surface forcing were calculated for situations composed of each of the 8 different combinations (termed here as "cloud states") of cloudy or clear for low, mid and high cloud altitudes (see Fig. 17). The largest contributions (taking into account the frequency of occurrence as well as the forcing effect) always involved the presence of low-altitude clouds in either the PI or PD suggesting that such clouds are driving most of the aerosol forcing in the model. There was some 655 expectation of this apriori since the convection scheme in the model does not respond to aerosol and it is likely that this would be responsible for a lot of the creation of higher altitude cloud. However, it is also possible that such cloud is created by the convection scheme and then gets handed over to the large-scale cloud scheme and could then be affected by aerosol. It may be that low clouds are more affected simply because they are closer to the surface aerosol sources.
In both regions the largest forcing contributions came from situations with the same cloud states in the PI and PD. Of these 660 the largest contributions for the northern NA region (and a considerable contribution for the southern NA region) was when low, mid and high altitude cloud were present at the same time suggesting that in these regions considerable forcing occurs when higher altitude cloud is present. Part of the reason for this is likely due to the prevalence of clouds from a range of altitudes in this region. Thus, it may be important to consider the potential radiative effects of mid and high altitude clouds such as the shielding of low-altitude cloud, as also indicated in Malavelle et al. (2017). The neglecting of mid and high altitude clouds in the radiative calculations presented in this paper may therefore lead to some inaccuracy. On the other hand, the mid and high altitude cloud may be sufficiently thin that it has negligible impact. Further work is needed to determine whether this is the case. However, the idea is supported by the fact that the offline calculations of ACI forcing that considered only low-altitude clouds matched the model calculated forcing (that included all cloud types) very closely.
Situations with only low-altitude cloud in the PI and PD were also very important for both the northern NA and southern NA 670 regions, again highlighting the importance of low-altitude cloud for aerosol forcing. Net contributions involving the creation or destruction of a cloud type (low, mid or high cloud) were small for the northern NA region, but higher for the southern NA region. For the latter the creation of low-altitude cloud was implicated in each case. This fits with the result that ∆f c was a large contributor to the surface forcing in the southern NA region.
Overall, the results suggest that low-altitude cloud is the largest contributor to aerosol forcing in the model and so improve-675 ments to the representation of this and its response to aerosol are likely to yield the biggest improvement in the aerosol forcing estimate in this model. On the other hand, the lack of aerosol awareness of the convection scheme is unrealistic and may lead to missing aerosol-cloud interaction processes in the model. There are some indications that cyclones respond to increased aerosol concentrations by increasing their LWP and hence radiative forcing Mccoy et al. (2018). The degree to which the convection scheme is involved in representing the cloud associated with cyclones and hence the likelihood of such aerosol-cloud 680 interactions being missed at climate model resolution could be explored with high resolution models (e.g., the nested version of the UM) in future work.

Important cloud types: conclusions
The forcing for gridboxes with low-altitude-only clouds and clear skies were further examined in terms of pairings of cloud fraction from the PI and PD simulations. The aim was to try to understand the PI-to-PD cloud transitions that occur to produce 685 the model aerosol forcing and therefore gain insight into the types of clouds involved. The results showed that gridboxes with overcast (100% cloud cover) in both the PI and the PD accounted for by far the largest contribution to forcing in the northern NA region, although there were some smaller contributions from transitions between 60-80 and 100% cloud cover. This agrees with the result from the forcing contribution calculations that changes in cloud fraction due to aerosol are not particularly important for forcing in this region and suggests that cloud fraction transitions that do occur are mostly likely within the 690 stratocumulus regime. The brightening of stratocumulus clouds by the Twomey effect seems to dominate the forcing in this region.
For the southern NA region the largest contribution is from transitions from 0 to <∼30% cloud cover indicating the creation of trade cumulus from clear skies. The creation of 80-100% cloud fraction states from 0-20% states also has an important forcing contribution, suggesting the formation of stratocumulus clouds from clear skies and cumulus to stratocumulus transi-695 tions due to the anthropogenic aerosol input. The brightening of overcast clouds is also still very important here, but not as important as in the northern NA.
A caveat is that with this analysis is that we only pair the same times in the PI and PD and don't allow for evolution over time. Therefore, the pairings may not be accurate since the Lagrangian trajectories associated with a given pairing may have had different cloud fractions in the time before the snapshot. It may therefore be useful to examine the evolution of cloud fraction, etc. over time using Lagrangian trajectories and examine how this varies with initial aerosol/droplet concentrations (e.g., see Eastman and Wood, 2018). This could be done in the PD simulation and compared to observations, but also comparisons between the evolution in the PI and PD runs could be made.

Recommendations
The results from this paper suggest that future studies should target the improvement of different cloud responses to aerosol 705 depending on the location within the North Atlantic. In the northern NA the Twomey effect was a large contributor to the aerosol forcing. This suggests that constraint of the aerosol forcing using observations will be easier here than in the southern region where cloud adjustments/macrophyiscal effects were more dominant. Similar approaches to those performed in Malavelle et al. (2017) whereby the model N d and r e responses to known aerosol perturbations (the Holuhrahn volcanic eruption in the case of Malavelle et al., 2017) were evaluated using satellite data may therefore be able to constrain aerosol forcing. Examination of 710 modelled trends in N d and r e over time in response to known aerosol emission changes may also prove useful in this regard. The examination of N d and r e changes alone is advantageous since their natural variability is significantly lower than that of LW P ic , f c and SW fluxes (Malavelle et al., 2017) making it easier to quantify aerosol induced signals from the observations. However, our model results also showed a fairly large influence on aerosol forcing from increases in LW P ic (i.e., cloud thickening) in the northern NA region, which may complicate such efforts.

715
In the southern NA, the accuracy of the cloud fraction response in the model should be evaluated, with a particular focus on low clouds and the creation of trade cumulus and stratocumulus. As mentioned earlier, this is likely to be more difficult than evaluating the Twomey effect due to the larger number of processes involved. Ideally, the individual processes would be targeted for model evaluation with the formation of rain via the autoconversion process a prime first target since this was shown to be the cause of the f c and LW P ic response to aerosol in our model. Ways forward with this might include the use 720 of observational data from the ground-based ARM site on the Azores and data from aircraft observations that have also taken place there. This is ideally located since it is near a location where the older version of the model predicts a large contribution to the aerosol forcing from cloud fraction changes, but the newer version less so. Thus, the observations might help to determine which is the most realistic. Combined with satellite data, the long time series observations of aerosols, N d , LW P ic , f c and rain rates can be used to evaluate the relationships between these variables. Separating a causative signal due to the aerosol alone 725 from that due to meteorology, etc. is difficult, but it may be possible to use the techniques described in McCoy et al. (2019).
In that paper several meteorological drivers are controlled for via binning and multi linear regression, along with using the model to estimate and subtract the influence of non-aerosol-induced changes. However, it is possible that even an evaluation of the observed relationship between rain rates and other variables, without separating the causative effects of aerosol, will prove useful in constraining the cloud adjustments.

730
A complementary approach is to model this region using a high resolution nested version of the model, which would also allow for the use of the more sophisticated CASIM (Cloud AeroSol Interaction Microphysics) microphysics scheme (e.g., see Grosvenor et al., 2017;Stevens et al., 2018;Miltenberger et al., 2018a, b;Gordon et al., 2018). The assumption would be that the cloud responses to aerosol of this would be more accurate than the global model resolution simulations, although its performance should be tested using the observations. Finally, work to evaluate and improve the accuracy of the satellite evaluation in this paper would be very useful. For example, to deal with the issues of evaluating the model LWP in situations with precipitation, a microwave radiometer simulator could be implemented into the model, which would be able to estimate the total attenuation of the 37 GHz channel (used by AMSR-E to retrieve LWP) taking into account the differing attenuation strengths of cloud water and rain water (Lebsock and Su, 2014).
This attenuation could then be directly compared to that from AMSR-E. Comparing the global model evaluation results with 740 and without the convective contribution to those from a high resolution model (where the convective TLWP will be explicitly resolved) might help to assess the role of the convective parameterization on the evaluation of model TLWP. This could be combined with the microwave radiometer simulator mentioned above to help overcome the issue of retrieval problems in raining conditions too. Much greater confidence in N d retrievals would gained through further validation using in-situ aircraft data. Cloud fraction 745 could be evaluated with additional cloud fraction satellite datasets and ground based data to improve the confidence in the result. More sophisticated analyses such as 2-D histograms of cloud fraction and LW P ic would assess this important cloud fraction variable at different cloud thicknesses and may help to isolate issues in particular regimes. Model improvements to the sub-grid cloud scheme might be considered, such as a link between the boundary layer turbulence and the width of the sub-grid relative humidity distribution. Comparisons to high resolution models may also help with this.

750
Data availability. Raw model data is kept on tape archive available through the JASMIN (http://www.jasmin.ac.uk/) service. Please see http://www.ceda.ac.uk/blog/access-to-the-met-office-mass-archive-on-jasmin-goes-live/ for details on how to arrange access to Met Office data via JASMIN.
Appendix A: Details on the satellite data sets A1 Droplet concentration 755 N d can be estimated using satellite retrievals of τ c and r e (Han et al., 1998;Brenguier et al., 2000;Boers et al., 2006;Bennartz, 2007a;Grosvenor et al., 2018b) made using observations from the visible and shortwave infrared wavelengths from instruments like MODIS (Nakajima and King, 1990). Here we use a dataset based on the methods from Grosvenor et al. (2018a), which represented some modifications to the methods described in Grosvenor and Wood (2014). The methodology used here differs slightly from that used in Grosvenor et al. (2018a) in that data are not filtered for the presence of τ c <5 data points and the 760 correction for the vertical penetration depth bias proposed in Grosvenor et al. (2018a) is not applied. The 3.7 µm r e is used for the N d calculations, which has been suggested to be less prone to errors due to cloud heteorogeneity (Zhang et al., 2012(Zhang et al., , 2016bGrosvenor et al., 2018b). The data set excludes 1×1 o data points with mean SZA greater than 65 o , mean cloud top heights greater than 3.2 km, liquid cloud fractions less than 80% and for which the maximum sea-ice areal coverage over a moving 2-week window exceeded 0.001 %. The sea-ice data used were the daily 1×1 o version of the "Sea Ice Concentrations 765 from Nimbus-7 SMMR and DMSP SSM/I-SSMIS Passive Microwave Data, Version 1" data set (Cavalieri et al., 1996). The N d dataset (without sea-ice screening) is available; see Grosvenor and Wood (2018).

Appendix B: Bias metrics
Following Gustafson and Yu (2012) , the NMBF is defined as Here we show plots similar to Figures 5 and 6, but for the all-sky (cloudy and clear contributions) LWP; i.e., without applying the step of dividing by the cloud fraction. Table C1 summarizes the results for this and the other plots in this section. The bias pattern is very similar to that of the in-cloud values suggesting that the conversion between LWP and LW P ic (by dividing by the low-altitude COSP-CALIPSO cloud fraction) does not greatly affect the evaluation, probably because the low cloud fraction biases are generally very small (Fig. 1). However, for the regions of negative bias between the equator and 18 o N the 780 biases are larger for LWP than for LW P ic because there is a negative low cloud fraction bias here, which acts to increase the model LW P ic values relative to the observed ones to give a lower LW P ic bias. When filtering to include only datapoints with f LWP > 0.99 (Fig. C2) the bias pattern is again similar to that for LW P ic .
As discussed earlier, AMSR-E observations are potentially biased when it is raining. This is because assumptions are made about the partitioning between LWP and RWP in order to facilitate the retrieval since rainwater attenuates the microwave 785 signal more strongly than liquid droplets (Lebsock and Su, 2014). It is assumed that rain water does not occur for water   paths below 180 g m −2 , which may lead to inaccuracies since the true partitioning value is likely to vary. Because of these issues, some previous model evaluation studies (e.g., Furtado et al., 2016) have chosen to compare the total liquid water path (TLWP=LWP+RWP) to the TLWP provided in some microwave based products. We also do this in Fig. C3; for this plot we only use the large-scale RWP and not the convective RWP (or LWP). However, it should be borne in mind that if the assumed 790 partitioning threshold is incorrect the TLWP value retrieved by the satellite will also be incorrect and so it is unclear whether this leads to a more accurate model-to-satellite evaluation.
The results show that the addition of RWP (compare Fig. C1 to Fig. C3 the large scale cloud amount is the more appropriate quantity even if there is condensate associated with the convection scheme and that double counting would occur if also using the convective values (UMDP30). However, there is also some convective condensate from the convective core that is not transferred to the large scale scheme. Another point of note is that LWP associated with the convection scheme is only used by the radiation scheme for shallow convective clouds (clouds with geometrical depths less than 500 m over land and 1500 m over ocean) meaning that the majority of liquid water in deep clouds 810 has no effect on radiation. Given the uncertainty, here we examine the effect of adding the convective LWP on the model evaluation (Fig. C5). As might be expected from Fig. C4 its inclusion leads to much larger model values, particularly in the southern parts of the region. The bias pattern also changes, so that large positive biases occur almost everywhere, compared to negative biases in the south and positive biases in the north when using only large-scale TLWP (Fig. C3). Given this, it would be useful for future studies to determine whether any of the convective TLWP should be included; ideas for how to make 815 progress on this are discussed in Section 4.5.
The effect of the further addition of convective RWP is shown in Fig. C6. This increases the model values somewhat in the convective regions, but not by a large relative amount. As might be expected, the model biases are also therefore increased, but not by the same extent as the addition of the convective LWP; the spatial pattern of model bias remains very similar to that in Fig. C5. 820 We also test the effect of filtering using f LWP > 0.99 for the model and satellite data when including the large-scale RWP and the convective LWP and RWP (Fig. C7). The filtering reduces the large positive biases in the southern part of the domain (which were mainly due to the addition of the convective LWP) considerably, presumably because most of the gridpoints with convection are removed. The spatial pattern of the biases here are similar to those from Fig. C2 where only the LWP from the large-scale cloud scheme was used, but the biases in the southern part of the domain have changed from being slightly negative 825 to slightly positive, while the biases in the northern part have become a little more positive. These relatively small changes suggest that whether the convective LWP, which was the biggest contributor to the total water path, is used is not as critical when filtering for non-precipitating situations. However, there are still large overall differences in the NMBF values between Figs. C2 and C7 (see Table C1) such that whether RWP or convective LWP/RWP is used or not still has significant effects on model evaluation attempts and therefore should be an avenue of future investigation. for ConstantNdAutoCon. Table 3 shows that between the standard and ConstantNdAutoCon runs ∆LW P ic reduced from 3.2 Figure C4. Ratio of the TLWP (where T LW P = LW P + RW P ) from the model convection scheme to the total (from the convection + large-scale scheme) TLWP. The RWP from the convection scheme is calculated from the convective rain rate diagnostic by assuming a raindrop size distribution and fallspeed relationship (see Furtado et al., 2016, for details). to -5.0 % for the NA region and from 2.7 to -0.43 % for the northern NA . For the southern NA region there was a small negative ∆LW P ic in the standard runs (-0.59 %) and a very similar value for ConstantNdAutoCon (-0.50 %) consistent with the idea that aerosols have little impact on LW P ic in this region as discussed earlier. Respective ∆f c values for the standard and ConstantNdAutoCon runs were 1.5 and 0.14 % for the NA region; 0.94 and 0.02 % for the northern NA; and 2.7 and 840 -0.75 % for the southern NA. These results suggest that most of the PI to PD increases in the macrophysical cloud properties (LW P ic and f c ) were due to the impact of aerosols on the rain autoconversion process, likely via the precipitation suppression effect of enhanced aerosol.
However, in the region of the Atlantic near the equator Fig. D2 suggests that the increase in f c between the PI and PD runs was not due to the impact of aerosol on the autoconversion process since there are still positive ∆f c values for the Con-845 stantNdAutoCon runs. We hypothesize that, as far as allowed by the nudging (wind-only nudging applied above the boundary layer), the f c increases in this region may be related to other aerosol impacts such as the ARI, ACI or semi-direct effects, via thermodynamical or dynamical changes. Figure C5. As for Fig. C3 except with the addition of the all-sky LWP from the convection parameterization for the model. Figure C6. As for Fig. C5 except with the addition of the all-sky RWP from the convection parameterization of the model.

42
https://doi.org/10.5194/acp-2020-502 Preprint. Discussion started: 24 June 2020 c Author(s) 2020. CC BY 4.0 License. Figure C7. As for Fig. C6 except that the model and AMSR-E data have been filtered before time averaging to only include datapoints for which fLWP is greater than 0.99 (as in Fig. C2).

Appendix E: Offline shortwave flux calculation and partitioning
Here we give details of the offline calculations used to estimate the SW fluxes, which are needed to estimate the contributions 850 to the forcing from the individual changes in cloud properties between the PI and PD simulations.
An estimate of the cloud optical depth (τ c ) can be made following Eqns. 1 and 5 of Grosvenor et al. (2018b) :- , where Q ext is the scattering efficiency, which we assume to have a constant value of 2; this has been shown to be the case for droplet radii that are much larger than the wavelength of light concerned (Bennartz, 2007b). ρ w is the density of liquid 855 water, L(z) is the liquid water content, r e (z) is the effective radius, z is height, z base is cloud base height and z top is cloud top height.
We assume that the clouds are adiabatic (or some constant fraction of adiabatic) so that their liquid water increases linearly with height, and it is assumed that N d is constant throughout their depth. Observations suggest that both are valid assumptions Table C1. Model evaluation statistics for the various sub-regions using time-averaged data. See Gustafson and Yu (2012) for details of the normalized mean bias factor (NMBF) and the normalized mean absolute (NMAEF) error factor. r is the spatial correlation coefficient between the model and observed time-averages. All values are area-weighted to account for the variation in area of the model grid-boxes. for stratocumulus clouds (Albrecht et al., 1990;Zuidema et al., 2005;Painemal and Zuidema, 2011;Miles et al., 2000;Wood, 860 2005). The adiabatic assumption means that :- where f ad is the adiabatic fraction, which is assumed constant with height and with a value of 0.8. c w (T, P ) is the rate of increase of liquid water content with height (dL/dz, with units kg m −4 ) and is referred to as the "condensation rate" in Bennartz (2007b), or the "water content lapse rate" in Painemal and Zuidema (2011). See Ahmad et al. (2013) for a definition.

865
A constant temperature of 278 K is used for the temperature (T ) in the calculation of c w , along with a constant pressure (P ) of 850 hPa. This is an approximation since these values would vary depending upon cloud height and location. However, since we only consider low clouds here and because in Eqn. E1 the dependence of τ c upon c w is very weak (τ ∝ 1/c 1/6 w ), this introduces very little error. Figure D1. Mean percentage increase in LW Pic between PI and PD runs for left: the full run; right: the run where aerosol has been prevented from affecting the rain autoconversion.
In the radiative scheme of the UKESM model the parameterisation of Liu et al. (2008) is used, which makes the width of 870 the droplet size distribution (assumed to be represented by a gamma function) a function of N d and L. For consistency with the model we also apply this parameterisation to our offline radiative calculations. From Mulcahy et al. (2018) :- where β m is a parameter related to the droplet distribution width, which is parameterized in Liu et al. (2008) as :-

875
x and y are constants set at 0.0266 (with units kg y , which has been converted from the value of 0.07 g y from Liu et al. (2008)) and 0.14, respectively, following Liu et al. (2008) and Mulcahy et al. (2018). N.B., β m is related to the k parameter, which is often used to describe the droplet distribution width (e.g., Martin et al., 1994), as β m = 1/k (1/3) .
, which follows from integrating Eqn. E2 over the depth of the cloud. This now allows τ c to be calculated from the cloud LW P ic and N d .
The cloud albedo (A c ) is then estimated using Eqn. 24.38 of Seinfeld and Pandis (2006), which is based on the two-stream 890 approximation for a non-absorbing, horizontally homogeneous cloud :- The shortwave downwards flux at the sufrace (SW downSU RF ) for a given cloud fraction (f c ) can then be estimated from the cloudy and clear-sky fluxes (SW downSU RF cloudy and SW downSU RF clear , respectively) using :- where we have assumed a constant clear-sky transmissivity (T atmos ). The cloudy sky surface flux is calculated by assuming that the flux reaching the cloud top is equal to SW downSU RF clear :- Thus we now have a function SW downSU RF (N d , f c , LW P ic ) that estimates the surface downwelling SW flux as a function of the three cloud variables of interest. This allows us to estimate the surface ERF ACI following Eqns. 2 and 1 as :-ERF ACI,all = ERF P D − ERF P I = SW P Dclean+cloudy − SW P Dclean+clear − SW P Iclean+cloudy + SW P Iclean+clear = SW P Dclean+cloudy − SW P Iclean+cloudy = SW downSU RF (N dP D , f cP D , LW P icP D ) − SW downSU RF (N dP I , f cP I , LW P icP I ) , where we have assumed that the PI and PD SW clean+clear values are the same.
The forcing contributions from the changes to the individual cloud parameters are then estimated using :-905 ERF N d = 0.5 (ERF N d,P Ibase + ERF N d,P Dbase ) ERF f c = 0.5 (ERF f c,P Ibase + ERF f c,P Dbase ) ERF LW P ic = 0.5 (ERF LW P ic,P Ibase + ERF LW P ic,P Dbase ) , where :-ERF N d,P Ibase = SW downSU RF (N dP D , f cP I , LW P icP I ) − SW downSU RF (N dP I , f cP I , LW P icP I ) ERF f c,P Ibase = SW downSU RF (N dP I , f cP D , LW P icP I ) − SW downSU RF (N dP I , f cP I , LW P icP I ) ERF LW P ic,P Ibase = SW downSU RF (N dP I , f cP I , LW P icP D ) − SW downSU RF (N dP I , f cP I , LW P icP I ) Here, all of the PI cloud property values have been used as a baseline value, but with the PI value for one of either N d , f c or LW P ic replaced with the PD value. A similar calculation is done using the PD values as baselines and replacing with one of This follows the work of Mülmenstädt et al. (2019) who found that an average of values obtained from using both the PI and PD values as baselines was more accurate than only using say the PI as a baseline. If f c is zero in the baseline state (i.e., PI for Eqn. E15, PD for Eqn. E16) then N d and LW P ic are undefined. Therefore, in order to calculate the effect of an increased f c 915 between the baseline and perturbed state (i.e., PD for Eqn. E15, PI for Eqn. E16), the N d and LW P ic values from the perturbed state are used in such cases.
Appendix F: Net contributions to forcing from cloud state and cloud fraction state transitions Sections 3.2.2 and 3.2.3 showed the contributions to the surface forcing from different combinations of PI and PD cloud states (low, mid, high altitude cloud combinations) and cloud fraction. We explained that it is useful to consider the net forcing 920 contribution from both the PI to PD transitions and those from the reciprocal transitions. I.e., if g i and g j are two different and sponsored by the NASA Earth Science MEaSUREs DISCOVER Project and the NASA AMSR-E Science Team. Data are available at www.remss.com. The MODIS data were obtained from NASA's Level 1 and Atmosphere Archive and Distribution System(LAADS http://ladsweb.nascom.nasa.gov/). CERES-EBAF data was taken from https://ceres.larc.nasa.gov/order_data.php and is the monthly averaged product of observed TOA for which the TOA net flux is constrained to the ocean heat storage.