Constraining the Twomey effect from satellite observations: issues and perspectives

Abstract. The Twomey effect describes the radiative forcing associated with a change in cloud albedo due to an increase in anthropogenic aerosol emissions. It is driven by the perturbation in cloud droplet number concentration (ΔNd, ant) in liquid-water clouds and is currently understood to exert a cooling effect on climate. The Twomey effect is the key driver in the effective radiative forcing due to aerosol–cloud interactions, but rapid adjustments also contribute. These adjustments are essentially the responses of cloud fraction and liquid water path to ΔNd, ant and thus scale approximately with it. While the fundamental physics of the influence of added aerosol particles on the droplet concentration (Nd) is well described by established theory at the particle scale (micrometres), how this relationship is expressed at the large-scale (hundreds of kilometres) perturbation, ΔNd, ant, remains uncertain. The discrepancy between process understanding at particle scale and insufficient quantification at the climate-relevant large scale is caused by co-variability of aerosol particles and updraught velocity and by droplet sink processes. These operate at scales on the order of tens of metres at which only localised observations are available and at which no approach yet exists to quantify the anthropogenic perturbation. Different atmospheric models suggest diverse magnitudes of the Twomey effect even when applying the same anthropogenic aerosol emission perturbation. Thus, observational data are needed to quantify and constrain the Twomey effect. At the global scale, this means satellite data. There are four key uncertainties in determining ΔNd, ant, namely the quantification of (i) the cloud-active aerosol – the cloud condensation nuclei (CCN) concentrations at or above cloud base, (ii) Nd, (iii) the statistical approach for inferring the sensitivity of Nd to aerosol particles from the satellite data and (iv) uncertainty in the anthropogenic perturbation to CCN concentrations, which is not easily accessible from observational data. This review discusses deficiencies of current approaches for the different aspects of the problem and proposes several ways forward: in terms of CCN, retrievals of optical quantities such as aerosol optical depth suffer from a lack of vertical resolution, size and hygroscopicity information, non-direct relation to the concentration of aerosols, difficulty to quantify it within or below clouds, and the problem of insufficient sensitivity at low concentrations, in addition to retrieval errors. A future path forward can include utilising co-located polarimeter and lidar instruments, ideally including high-spectral-resolution lidar capability at two wavelengths to maximise vertically resolved size distribution information content. In terms of Nd, a key problem is the lack of operational retrievals of this quantity and the inaccuracy of the retrieval especially in broken-cloud regimes. As for the Nd-to-CCN sensitivity, key issues are the updraught distributions and the role of Nd sink processes, for which empirical assessments for specific cloud regimes are currently the best solutions. These considerations point to the conclusion that past studies using existing approaches have likely underestimated the true sensitivity and, thus, the radiative forcing due to the Twomey effect.


. In turn, cloud albedo (α c , the fraction of solar radiative energy reflected back to space by clouds in relation to that incident at the cloud top) increases, as it is a monotonically increasing function of N d . Following Platnick and Twomey (1994) and Ackerman et al. (2000), a formulation which relies (i) on a two-stream radiative transfer approximation, and (ii) the assumption that clouds obey vertical 50 stratification that scales with an adiabatic one and that is horizontally homogeneous. Eq. 1 is expressed as a partial derivative: other quantities -notably cloud water path -are considered constant.
These two facts -N d is a monotonic function of CCN and α c in the partial-derivative sense is a monotonic function of N d -imply that the anthropogenic increase in CCN concentrations causes a negative (cooling) radiative forcing due to aerosolcloud interactions, RF aci , denoted as F aci (Bellouin et al., 2020b). It can be approximately (neglecting 55 absorption in the column above the cloud after scattering at cloud top) written as (Quaas et al., 2008;Bellouin et al., 2020b): with the downward solar radiative flux density (irradiance) above clouds, F ↓ s , and a quantitative description of CCN denoted here as a. The relative anthropogenic perturbation to a is denoted ∆ ln a ant . This formulation assumes (i) that only the solar spectrum is relevant, which is well justified for the optically thick, liquid water clouds considered here, since an N d perturbation 60 only marginally changes the cloud radiative effect in the terrestrial spectrum of an optically thick cloud; and (ii) that there is one liquid water cloud layer that determines the effect so that the problem can be considered as purely horizontal in space. In contrast to the formulation by Bellouin et al. (2020b), we consider the problem as horizontally variable in space (x, y) and in time (t), i.e. F aci = F aci (x, y, t). If Eq. 2 is assessed from temporally-sparse satellite data, a proper integration over temporally varying solar zenith angles and cloud diurnal cycles is necessary.
∂ ln re ∂ ln a , in which case the evaluation of the partial derivative requires stratifying by L, in addition to updraught regime, which adds substantial complexity.
Among the four factors on the right-hand side of Eq. 2, the first one, F ↓ s , is well quantified for each given latitude, longitude, and time. The second one, ∂α c /∂ ln N d , can be evaluated using Eq. 1 (Bellouin et al., 2020b;Hasekamp et al., 2019a), or alternatively by radiative-transfer simulations . This implies that the two key problems in determining RF aci are the quantification of the anthropogenic perturbation of CCN, ∆ ln a ant , and the sensitivity of N d to CCN perturbations, β = ∂N d /∂ ln a (Feingold et al., 2001). Taken together, this is the distribution of the anthropogenic perturbation of N d 85 (here expressed in absolute, not relative terms): The plausible range of the sensitivity is 0 ≤ β ≤ 1, except for heavily polluted situations (where it may become negative; Feingold et al., 2001), or when giant CCN play an important role (Ghan et al., 1998;Betancourt and Nenes, 2014;Gryspeerdt et al., 2016;McCoy et al., 2017) where competition for water vapor during droplet formation is at its strongest. Such conditions 90 represent significant challenge to models and parameterizations of the process (Betancourt and Nenes, 2014).
The aerosol forcing has to be evaluated at a scale much larger than an individual cloud. One of the key reasons for this is that there is currently no way to use satellite data to determine the anthropogenic fraction of the CCN population for a single air parcel. Methods applying model information, or data-tied approaches such as Bellouin et al. (2013) instead use the scale of model resolution or aggregate data resolution which is typically of the order of 1 • × 1 • (or about 100 × 100 km 2 ). The problem 95 formulated in Eq. 3 then has to be reformulated, using an overbar to denote the averaging over a 1 • × 1 • grid-box as which considers the mean sensitivity of N d to CCN,β, given the probability density function (PDF) of cloud-base updraught velocity, w in the grid-box, P(w), the PDF of CCN at cloud base within the scene, P(a), and the anthropogenic perturbation of the CCN concentration at the grid-box scale, ∆ ln a ant . Note in the above equation, β is assumed independent of ln a ant , 100 which assumes that P(w) is independent of cloud properties (primarily, liquid water content), which applies to stratus clouds (Morales and Nenes, 2010) but not in general. Similarly, the covariance of P(w) and P(a) may not be zero (e.g., Kacarab et al., 2020-in addition to Bougiatioti et al., 2020. All of the above suggest that observation of β at a cloud parcel scale is not directly transferrable to the large-scale for an assessment of the Twomey effect. Rather,β has to be estimated. Beyond RF aci , aerosol-cloud interactions also lead to rapid adjustments: once cloud droplet size distributions are altered 105 due to anthropogenic CCN, cloud microphysical and dynamical processes are modified as well (Albrecht, 1989;Ackerman et al., 2000;Wang et al., 2003;Heyn et al., 2017;Mülmenstädt and Feingold, 2018). Aerosols can induce transitions between cloud regimes, for instance by changing drizzle behavior (Rosenfeld et al., 2006;Feingold et al., 2010;Wood et al., 2011). The direction and magnitude of these changes depends on the cloud state and regime, because responses to aerosol changes occur due to processes spanning a range from microphysics to the mesoscale (Christensen and Stephens, 2012;Kazil et al., 2011;110 Wang et al., 2011). These processes include precipitation suppression (Albrecht, 1989), rapid feedbacks involving cloud-top entrainment (Ackerman et al., 2004;Bretherton et al., 2007;Hill et al., 2009;Bulatovic et al., 2019), and rapid feedbacks involving cloud lateral entrainment (Xue and Feingold, 2006;Small et al., 2009) as well as responses in dynamics (Xue et al., 2008;Stevens and Feingold, 2009;Wang and Feingold, 2009). If one considers also deep clouds, further intricate cloud adjustments may occur that are not considered here (e.g., Ekman et al., 2011;Fan et al., 2013;Yan et al., 2014). As a result 115 of these adjustment processes, cloud horizontal extent (Gryspeerdt et al., 2016) and liquid water path  respond to perturbations in N d . The sum of RF aci and the radiative effects of these adjustments is the effective radiative forcing due to aerosol-cloud interactions, ERF aci . Based on modelling and data analysis, it is evident that the adjustments and, thus, also ERF aci , scale with ∆N d,ant (Bellouin et al., 2020b;Gryspeerdt et al., 2020;Mülmenstädt et al., 2019). Analysis of model data shows that the rapid adjustments due to other contributions (small-to mesoscale circulation 120 changes, thermodynamic changes) are small (Heyn et al., 2017;Mülmenstädt et al., 2019). Even so, thermodynamic and dynamic adjustments to aerosol changes can still have an important impact on droplet formation -especially under conditions where droplet formation is largely velocity-limited (Kacarab et al., 2020;Bougiatioti et al., 2020).
Despite the fact that the activation of an individual CCN to form a droplet is well understood in thermodynamic equilibrium (Köhler, 1936), it is not clear how N d responds to perturbations of CCN at the scale of a cloudy air parcel, an entire cloud, 125 or at the scale of a cloud field up to the large scale of order of 1 • × 1 • as used in Eq. 4. A one-to-one relationship between CCN in the updraught below cumulus and N d above cloud base within the cumulus has been observed (Werner et al., 2014); although even at the cloud updraft scale this relationship could be a convolution of the effect of CCN on droplet number, vertical velocity variability and lateral entrainment (Morales et al., 2011). At a larger scale, this relation is less pronounced (Boucher and Lohmann, 1995), consistent with the expectation from Eq. 4. In turn, there may be co-variability of updraughts 130 and aerosol concentrations that lead to largerβ compared to situations with constant w (Kacarab et al., 2020;Bougiatioti et al., 2017Bougiatioti et al., , 2020. Ground-based remote sensing methods provide data to infer the sensitivity term β from long-term observations McComiskey et al., 2009;Schmidt et al., 2015;Liu and Li, 2018). However, this approach is limited to individual sites and cloud regimes. In consequence, when investigating the global radiative forcing relevant for climate studies, the 135 sensitivity term necessarily is derived from satellite remote sensing (Nakajima and Schulz, 2009).
This leads to a number of problems and challenges discussed in more detail in the following sections: -Retrieval of CCN. The first issue is the missing coincidence of cloud and aerosol retrievals. Usually, no aerosol is retrieved below or within clouds. It is thus questionable how representative aerosol in cloudless scenes is for (neighboring) cloud-base CCN. The second issue is the imperfect nature of proxies for CCN. Often the aerosol optical depth (AOD, 140 see below) or a variant thereof is used, which can only imperfectly be related to CCN due to differences in sensitivity and the lack of vertical resolution.
-Retrieval of N d . There are (i) retrieval errors and biases in N d , which depend on cloud regimes, and (ii) one needs to consider the link between N d as formed by CCN activation at cloud base, and the retrieved cloud-top N d . Cloud-top N d (N d,top ) is the one that determines the scattering of sunlight and, thus, is relevant for the top-of-atmosphere cloud radiative effect. It differs from cloud-base N d (N d,base ) in conditions where N d sinks such as precipitation or mixing play a role. When using r e rather than N d the additional problem of stratification by retrieved L arises.
-Cloud-regime dependence. Cloud base droplet concentration, N d,base , is a function of both CCN and updraught, and N d,top further a function of N d sinks such as precipitation formation and entrainment-mixing. Thus, one needs to understand how the characteristics of w and its PDF, as well as precipitation and mixing processes depend on cloud regime 150 and how this may be used for an empirical estimation ofβ.
-Aggregation scale. The relation of aggregate quantities is not the same as the aggregate relation, and, thus, one needs to determine how to deriveβ optimally from remote sensing data (Grandey and Stier, 2010;McComiskey and Feingold, 2012).
In practical terms, one further needs to assess to which extent a simple scalar sensitivity metric is sufficient, or whether a 155 joint-PDF approach is preferable (McComiskey and Feingold, 2012;Gryspeerdt et al., 2017).
Beyond these questions which are discussed in the following sections, it is necessary to quantify the anthropogenic perturbation to CCN, ∆ ln a ant , which is not easily quantified from observations. The key problem is that there is little potential to observe an atmosphere unperturbed by anthropogenic emissions (Carslaw et al., 2013(Carslaw et al., , 2017. Some studies attempt to quantify the anthropogenic perturbation to the column aerosol light extinction, or aerosol optical depth (AOD; τ a ), in a data-tied 160 approach (Kaufman et al., 2005;Bellouin et al., 2005Bellouin et al., , 2013Kinne, 2019). Such approaches rely on simplifying parameterisations, such as the assumption that small-mode aerosol particles are predominantly anthropogenic. The other option is to estimate it from simulations (Quaas et al., 2009b;Gryspeerdt et al., 2017). There are some indirect ways to infer the anthropogenic impacts on N d (Quaas, 2015), such as from trends (Krüger and Graßl, 2002;Bennartz et al., 2011) or periodicity in anthropogenic emissions such as the weekly cycle (Quaas et al., 2009a). Hence, models are involved in determining an an-165 thropogenic perturbation of CCN concentrations, which can even be attempted for individual weather events (Schwartz et al., 2002). In any case, it seems impossible to know the anthropogenic perturbation to the aerosol at the scale of an air parcel; it rather is possible only at larger, aggregate scales. The remainder of this review will focus on the sensitivity termβ.

Remote sensing of CCN concentrations
The aerosol quantity most accessible to passive satellite remote sensing is AOD (Kaufman et al., 2002). It is derived from 170 the multi-spectral reflectance of the Earth-atmosphere system using the incident solar radiation and retrieving or assuming surface albedo characteristics as well as aerosol absorption coefficient and scattering phase functions. There are four key issues with using the retrieved AOD for estimating the N d to CCN sensitivity, which will be discussed in the following subsections, namely: -AOD is the vertical integral of the extinction coefficient. For the sensitivity of N d to the aerosol, one needs to know -AOD is an optical integral and does not provide information on the aerosol size distribution and its hygroscopicity.
The use of AOD does not isolate aerosol particles that have the size and chemical composition to serve as CCN. It is also affected by aerosol swelling due to hygroscopic growth.
-AOD can be derived only for pixels determined as cloud-free. The degree to which this correlates with the CCN at 180 the base of (neighbouring) clouds is questionable. In addition, retrieved AOD can show a positive bias due to enhanced reflectance from neighbouring cloudy pixels or due to the lack of detecting spurious clouds in a retrieval scene.
-The optical signal is very weak at low concentrations. Therefore, retrievals become more and more uncertain below a certain aerosol load, especially over land and in situations with variable or uncertain surface albedo.
At aggregate scales, i.e. for monthly averages over regions, AOD from ground-based remote sensing retrievals (AERONET; 185 Holben et al., 2001) correlates well with CCN surface measurements (Andreae, 2009;Shen et al., 2019). Similar results were also reported for aircraft measurements (Clarke and Kapustin, 2010;Shinozuka et al., 2015). However, at shorter timescales or less spatial aggregation, there are significant deviations from a perfect correlation (Liu and Li, 2014). AOD due to aerosol light extinction is determined by the vertical integral of the extinction cross section, proportional to the vertical integral of the second moment of the aerosol size distribution. In turn, for a given chemical composition of aerosol particles, the CCN concentration 190 is the zeroth moment of the size distribution for particles exceeding a size threshold that depends on supersaturation. In the following, the different problems are discussed in more detail, together with options for a better proxy for CCN from satellite remote sensing.

Vertical co-location
Stier (2016) investigated the correlation between AOD and CCN as represented in a climate model. He confirmed a mostly 195 positive correlation of the temporal variability of the two quantities, although in some regions the correlation is low or even negative. A key reason for the partly low correlation is the fact that AOD is a vertically integrated quantity and may include aerosol layers that are not interacting with clouds. A similar result was reported from a statistical analysis of satellite data: cloud microphysical parameters correlate well with aerosol properties only if the vertical alignment of the aerosol and cloud layers is accounted for Bréon, 2010, 2013 AOD. Ship measurements of CCN and microwave-retrieved N d at cloud base between Los Angeles and Hawaii show weaker β metric as the boundary layer deepens thus indicating that surface aerosol measurements become less representative for aerosol variability at cloud base as the boundary layer deepens (Painemal et al., 2017), or that the updraughts become high enough to activate smaller aerosols than the accumulation mode. In-situ observations suggests that AOD may even be anticorrelated with 205 CCN at cloud base (Kacarab et al., 2020).
A way forward is the use of spaceborne vertically resolved observations such as lidar measurements (Shinozuka et al., 2015;Stier, 2016). The Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO; Winker et al., 2009) lidar retrieves aerosol backscatter profiles, and thus is capable of identifying aerosol layers (Costantino and Bréon, 2010). Profiles of aerosol particle extinction are inferred from these backscatter profiles by using typical extinction-to-backscatter ratios 210 based on aerosol type. However, the signal is not sensitive to smaller aerosol concentrations which hampers a quantitative analysis at large scale (Watson-Parris et al., 2018;Ma et al., 2018). For situations with sufficient aerosol loading for reliable CALIPSO aerosol profile observations, methods for retrieving CCN concentrations from ground-based lidar measurements can be adapted (Feingold and Grund, 1994;Lv et al., 2018;Haarig et al., 2019). These methods apply empirical extinction-toparticle-concentration relationships to obtain input for CCN concentrations for different aerosol types (Mamouri and Ansmann,215 2016). In the future, the EarthCARE satellite mission currently scheduled for launch in 2022 (Illingworth et al., 2015;Hélière et al., 2017) shows promise to extend and improve upon the success of the CALIPSO mission. Its Atmospheric Lidar (ATLID) is a linearly polarized high-spectral resolution lidar (HSRL) operating at a wavelength of 355 nm. The instrument allows to directly infer profiles of aerosol backscatter and extinction coefficients, thereby substantially increasing the retrieval accuracy.
The direct retrieval of the extinction-to-backscatter (lidar) ratio (Müller et al., 2007) with ATLID (compared to the use of 220 pre-set values in the CALIPSO retrieval Kim et al., 2018) and the large difference between lidar ratios of aerosols (20 sr -80 sr) and clouds (20 sr -30 sr) is also expected to provide better distinction between optically thin cirrus clouds and aerosols than CALIPSO (Reverdy et al., 2015). While a similar sensitivity to aerosol load is expected for ATLID and CALIOP observations during nighttime, ATLID promises a better daytime sensitivity. EarthCARE is also expected to provide better distinction between optically thin clouds and aerosols than CALIPSO (Reverdy et al., 2015). Airborne measurements have shown that fur-225 ther utilizing HSRL at more than one wavelength (extending beyond ATLID) would provide substantial additional information content for retrieving vertically resolved aerosol parameters, especially when combined with polarimeter measurements (Burton et al., 2016). From the passive-remote sensing perspective, promising results have been obtained for retrievals of aerosol vertical information from near-ultra-violet polarimetry (Wu et al., 2016), although the quality degrades for small aerosol concentrations. Passive observations with high spectral resolution within the oxygen A absorption band around 760 nm can also be 230 used to infer aerosol layer height (Hollstein and Fischer, 2014;Geddes and Bösch, 2015). In particular, an operational aerosol layer height product is now available from the Tropospheric Monitoring Instrument (TROPOMI) flown on the Sentinel-5p mission (Sanders et al., 2015). Also, a recent study presents promising results based on Orbiting Carbon Observatory 2 (OCO-2) observations (Zeng et al., 2020). In particular, a combination of such approaches, e.g. passive polarimetry and active lidar observations (Stamnes et al., 2018) or multi-angle polarimetry and oxygen A band observations as planned for NASA's Plankton, 235 Aerosol, Cloud, ocean Ecosystem (PACE) mission  shows potential. Retrievals could also combine observations and model adjoints to constrain below-cloud aerosol number, which is directly relevant for aerosol-cloud interactions (Saide et al., 2012).
In summary, the lack of vertical co-location between retrieved CCN proxy and clouds leads to an underestimate in N d -CCN sensitivity (Costantino and Bréon, 2010). Model studies suggest that this bias may be approximately cancelled by a 240 corresponding bias in the anthropogenic component of the cloud base CCN (Gryspeerdt et al., 2017). However, the extent of this cancellation in current observational studies is unknown and requires further investigation. For an accurate estimation ofβ the use of lidar retrievals seems to be the best way forward, while additional information on the vertical distribution of aerosol can also be gained from present and upcoming passive satellite instruments.

Horizontal co-location 245
In studies examining β from satellite data, spatial aggregates are considered (i.e.,β as in Eq. 4), in which the aerosol retrievals in the cloud-free pixels are averaged at a coarse resolution (such as 1 • ) and taken to define the relation with N d retrievals in the same grid-box (Quaas et al., 2008). This assumes that the aerosol population is horizontally homogeneous at such large scales. According to Anderson et al. (2003), this is often the case. It has been confirmed from aircraft data for the stratocumulus cases investigated by Shinozuka et al. (2020). However, CCN is consumed when droplets activate and aerosol is scavenged 250 when clouds precipitate. Hence, the assumption of aerosol concentration horizontal homogeneity is questionable at least in precipitating clouds.
It is the aerosol in air masses before cloud particles form that is relevant to compute the aerosol impact on N d (Gryspeerdt et al., 2015). In one of the early aerosol-cloud interaction studies from satellite data (Bréon et al., 2002) used trajectories to identify cloudless situations in which aerosol retrievals were possible for air masses that later formed clouds. This is a 255 promising solution but it requires much more effort than the simpler co-location assumptions. It also requires reliable, highresolution information about atmospheric trajectories. Another complication is that the formation rate of secondary aerosol is enhanced by aqueous phase reactions, potentially enhancing aerosol concentrations in the vicinity of clouds (Jeong and Li, 2010). Such trajectory approaches are particularly useful when they exploit the high temporal resolution that are available from geostationary satellites. Aerosol retrievals from geostationary satellites may be combined using trajectory modelling to link 260 these to clouds that form in these airmasses (Kikuchi et al., 2018), or also the aerosol retrieval from a polar orbiter could be related to clouds retrieved from geostationary satellites that form in the same air masses (Christensen et al., 2020).
Altogether, the lack of horizontal co-location may imply somewhat too lowβ due to the potential de-correlation of CCN concentrations and N d in situations with spatially heterogeneous aerosol. The consideration of backward trajectory analysis seems the best option to address the issue since there is no solution yet to retrieve aerosols below or within clouds from satellite.

Hygroscopic growth of aerosol particles
The extinction of solar radiation by aerosol particles is a strong function of the hygroscopic growth of the particles. Haze particles attenuate much more sunlight compared to the same aerosol particle ensemble in dry conditions. AOD is thus heavily influenced by the variability of relative humidity. The light extinction caused by dry particles (at relative humidities below 30%) is much better correlated to CCN concentrations than the extinction of particles at ambient relative humidity (Shinozuka 270 et al., 2015). Liu and Li (2018) showed that using total AOD compared to dry AOD as a CCN proxy when estimatingβ from measurements at different Atmospheric Radiation Measurements (ARM) sites resulted in a 23% underestimate. A way forward is to apply parameterisations in terms of retrievals of relative humidity to account for the aerosol swelling. These need information about aerosol hygroscopicity and relative humidity at the appropriate scale. Hygroscopicity information could rely on the kappa-Köhler parameterisation approach (Petters and Kreidenweis, 2007;Pringle et al., 2010), and a parameterisation 275 of small-to mesoscale humidity variability could make use of approaches exploited in GCMs (Quaas, 2012;Petersik et al., 2018). Another alternative would be to retrieve the amount of aerosol water, making use of the real part of the refractive index (Schuster et al., 2009). This would allow to translate the size distribution of humidified aerosol particles to the corresponding dry size distribution. In the near future, accurate refractive index retrievals are expected from polarimeters such as the SPEXone instrument on the NASA PACE mission (Hasekamp et al., 2019b;Werdell et al., 2019), to be launched in 2022.

280
Summarizing, using AOD as a proxy for CCN results in low-biased estimates ofβ due to aerosol swelling. Approaches to parameterise the dry aerosol properties on the basis of the humidified one can help alleviate the problem.
2.4 Approaches using aerosol index, column-CCN, reanalysis or cloud-base updraught The aerosol index (AI 1 ) is defined as the product of AOD and the Ångström exponent (Deuzé et al., 2001). This latter quantity is the slope of the spectral variation in AOD and is typically larger for smaller particles (Ångström, 1929). AI is more weighted 285 towards smaller particles, which makes it better suited as a proxy for CCN concentration at typical supersaturations than AOD.
For log-normal size distributions, AI is approximately proportional to the column aerosol number concentration (Nakajima et al., 2001). Studies using models concluded that AI is a better predictor for CCN (Stier, 2016) and that AI -N d relationships are better suited to predict ∆N d,ant than AOD -N d relationships (Penner et al., 2011;Gryspeerdt et al., 2017). However, retrievals of the Ångström exponent, and thus of AI, over land are not reported in operational products such as the MODIS 290 dark target algorithm, and are in general not as reliable as they are over ocean (Lee and Chung, 2013;Sayer et al., 2013).
Further refining this idea, Hasekamp et al. (2019a) aimed to retrieve the column CCN concentrations over oceans. The analysis of polarimetric observations allowed to account for some aspects of the aerosol particle size distribution, and for particle sphericity, which is related to particle hygroscopicity. This column-CCN retrieval implied largerβ, increasing the resulting RF aci by almost 50%. It is an example of how additional information from polarimetry is useful for studying the CCN Such aerosol re-analysis information has been used for assessing RF aci in several studies (Bellouin et al., 2013;McCoy et al., 2017;Bellouin et al., 2020a). However, assessing the validity of model results requires extensive and rigorous evaluation, especially for coarsely-resolved models with regard to aerosol scavenging below clouds. For this, independent data is required such as from ground-based observations or satellite observations from sensors other than those that are assimilated.

305
Yet another solution initially proposed by Feingold et al. (1998) and applied to satellite retrievals by Rosenfeld et al. (2016) is to parameterize the cloud-base updraught, w, on the basis of cloud retrievals, rather than to retrieve the aerosol. For convec-tive clouds, Zheng et al. (2015) suggested that w scales with cloud-base altitude, which can be retrieved from satellites. For stratocumulus clouds, Zheng et al. (2016) proposed that updraught is a function of cloud-top radiative cooling, and that this can be computed by radiative transfer modelling on the basis of cloud quantities retrieved from passive sensors and thermodynamic 310 profiles from meteorological re-analyses. The retrieved profiles of r e together with derivations of supersaturation as a function of w and N d  then allows to parameterize the CCN concentration at any given supersaturation. This approach does not suffer from the problem of lower detection limit. However, it has not yet been used to quantify the Twomey effect.
Concluding, all four approaches alleviate many problems encountered when using AOD. An ideal solution may be the 315 combination of several of these by assimilating, in addition to AOD, also polarimetric satellite observations, as well as lidar measurements, into the analysis of the atmospheric state in high-resolution models.

Remote sensing of cloud droplet concentrations
The problem of the remotely sensed N d as used to estimateβ has three different facets to it, which will be discussed in this section, namely:

320
-Consideration of r e rather than N d in aerosol-cloud interaction studies: In many studies, the droplet effective radius, r e , is used, and the datasets are stratified with respect to L in order to estimateβ. This is very difficult to perform adequately and leads to biases.
-Biases in the retrieved N d : For the assessment of sensitivity, systematic (rather than random) errors in retrieved N d are relevant. Also, N d is not retrieved in standard operational procedures, so that inconsistencies between the retrieval of 325 standard components and in the computation of N d on the basis of retrievals can lead to additional errors.
-Relationship of N d formed at activation with retrieved and radiation-relevant N d,top : Retrieved N d,top refers to the drop concentration within the top 1 to 2 optical depths of the clouds, and it is N d,top that is relevant for determining the cloud radiative effect. N d sink processes such as coagulation imply that N d,top is smaller than the one resulting from activation at of above cloud base, N d,base .

330
N d is vertically constant for single-layer, purely-liquid-water clouds with (i) a vertically homogeneous droplet size spectrum, (ii) for adiabatically stratified clouds, or (iii) for sub-adiabatic clouds in which mixing is homogeneous. However, in many situations, precipitation formation or entrainment can lead to reduction of N d above cloud base. In such situations, it is N d,top that is relevant to determine the cloud radiative effect (cloud albedo in Eq. 2). Building on Eq. 4 thus gives

335
When estimatingβ as regression coefficient from, e.g. satellite-retrieved N d and a proxy for CCN such as AOD, it is thus thiŝ β that is inferred.
3.1 Considering r e rather than N d Many past studies have used operationally-retrieved r e rather than N d in aerosol-cloud interaction studies. However, r e is a function of both N d and L. This introduces the requirement for stratifying the data with respect to L in order to estimateβ. To 340 further complicate matters, N d and L have been found to be correlated (e.g. Michibata et al., 2016;Gryspeerdt et al., 2019).
A precise estimation ofβ is thus only possible for a large amount of data combined with suitable binning by L. Errors in this approach that are related to a lack of data increase at aggregated scales (McComiskey and Feingold, 2012). Using derived N d is therefore preferable to avoid unnecessary complications.

Biases in the N d retrieval 345
Satellite retrievals of N d were extensively reviewed by Grosvenor et al. (2018). Since N d currently is not retrieved by operational algorithms and new developments to retrieve N d (e.g. from polarimetry) are still in their infancy, the most frequently used method is to infer N d from retrieved r e and cloud optical depth, τ c , using the relationship where γ ≈ 1.37 · 10 −5 m −0.5 is a parameter provided as a constant here but more realistically depending on cloud base tem-350 perature and pressure, the adiabatic fraction, and the drop size distribution breadth (Boers et al., 2006;Quaas et al., 2006;Grosvenor et al., 2018). The relationship in Eq. 6 assumes that clouds are adiabatic or nearly adiabatic (i.e. adiabatic clouds or sub-adiabatic clouds with homogeneous mixing only; Brenguier et al., 2000). The most common method uses a bispectral approach to retrieve r e and τ c (Nakajima and King, 1990). Various error sources lead to an overall retrieval error for N d (Grosvenor et al., 2018;Wolf et al., 2019). As can be deduced form Eq. 6, the most important contributions are from retrieval 355 errors in r e . Other error sources are the uncertainty in sub-adiabatic factor, the cloud model used in the retrieval, and the droplet size distribution width. Satellite retrievals of the vertical profile of cloud droplet size may help to improve the retrieval (Chang and Li, 2002;Chen et al., 2008). Grosvenor et al. (2018) identified biases of retrieved N d especially for broken cloud regimes and at large solar zenith angles. In stratocumulus, it was suggested that the retrieval yields the most trustworthy results when considering only the brightest pixels (Zhu et al., 2018). For the ideal case of homogeneous, low-latitude stratiform clouds, 360 relative errors in the N d retrieval at pixel scale are quantified as 78% (Grosvenor et al., 2018). In such cases, the error was assumed as random. However, systematic errors occur especially in broken cloud regimes and for large solar zenith angles, leading to an underestimation (broken cloudiness) and overestimation (large solar zenith angles), respectively, of N d . Painemal et al. (2020) addressed the N d bias for broken clouds by only sampling N d retrieved for large clouds (larger than 5 × 5 km 2 ) to find that the relation between N d and aerosols is substantially enhanced.

365
For improvements in estimates of N d , it would be beneficial to formulate a retrieval in terms of N d directly rather than in terms of r e and τ c . It is also possible to reduce uncertainties in retrievals of r e and τ c , or to reduce uncertainties related to assumptions of the vertical structure of the cloud and particle size distribution shape. Approaches to quantify and partly correct for retrieval biases as discussed in Grosvenor et al. (2018) include accounting for cloud heterogeneity by using those channels in passive imagers that provide spatial resolution that exceeds the one at which the standard retrieval products are provided (Zhang et al., 2016). The combination of passive observations with radar may further improve the retrieval (Posselt et al., 2017). Substantially more accurate retrievals of r e and additional relevant information about droplet size distributions may also come from multi-angular polarimetric measurements (Alexandrov et al., 2012a, b;Shang et al., 2019), which will be possible from orbit at pixel level from the Hyper-Angular Rainbow Polarimeter-2 (HARP-2) on the NASA PACE mission (Martins et al., 2018;McBride et al., 2019). Polarimetric retrievals allow to infer the spectral width or general shape of the 375 droplet size distribution at cloud top (Hu et al., 2007). This approach is not substantially sensitive to sub-pixel cloudiness, mixed-phase conditions and 3D radiative effects (Alexandrov et al., 2012b). The sensitivity of derived N d to uncertainties in r e from polarimetric retrievals may further be reduced by additionally inferring cloud physical thickness. In this case, N d can be inferred as linear in τ c and inversely linear in geometrical thickness and mean droplet extinction cross-section at cloud top (Sinclair et al., 2019). The geometrical thickness may also be inferred from total and/or polarized reflectances measured 380 in oxygen or water vapour absorption bands (Desmons et al., 2013;Sanghavi et al., 2015;Richardson et al., 2019;Sinclair et al., 2019) or by retrieving cloud base using lidar  or using multi-angle observations (Böhm et al., 2019). When exploiting passive observations together with lidar, N d at cloud top can be robustly inferred as the ratio of incloud extinction (lidar) and extinction cross section (passive). A slightly less direct approach using depolarization to estimate extinction and effective radius to estimate extinction cross section has been presented by Hu et al. (2007).

Relationship between N d formed at CCN activation and retrieved radiation-relevant N d
In stratiform clouds, droplets form in updraughts near cloud base which is where N d most closely relates to CCN. In convective clouds, updraught in some cases increases with height above cloud base. Hence, additional CCN may activate above cloud base and lead to vertically increasing N d in the lower third of the cloud with a decrease further up (Endo et al., 2015). However, in most cumulus, and in stratiform clouds, N d is found to be largest at cloud base and to slightly decrease above it (Jiang et al., 390 2008;Small et al., 2009;vanZanten et al., 2011). In the approach discussed by Grosvenor et al. (2018), the retrieved N d is representative of the cloud-top reflectance, and thus, the relevant proxy for the N d that matters for cloud albedo and RF aci (Platnick, 2000). To which extent the microphysical structure of lower parts of a cloud exactly impacts radiation (weighting function) depends on the multiple scattering and thus on the vertical structure of N d itself (Platnick, 2000;Krisna et al., 2018).
For vertically constant N d , the retrieved N d represents the droplet concentration formed by CCN activation. However, there 395 are N d sinks, in particular due to collision-coalescence (in liquid clouds, the autoconversion and accretion, or "warm rain" processes) that lead to droplet depletion. Wood (2006) demonstrated that the depletion is exponential in precipitation rate and estimated a loss in N d of 100 cm −3 day −1 for precipitation rates of 1 mm day −1 . There may also be lateral and vertical mixing (of heterogeneous type, Lehmann et al., 2009) of cloud air with environmental cloud-free air that can lead to the full evaporation of droplets. In both sinks for N d , the one due to precipitation formation and the one due to mixing, the retrieved N d is expected 400 to be smaller than the N d formed at activation of CCN. In an aged cloud, however, updraughts may have decayed such that no additional droplets are formed, while existing droplets persist, or may be advected from elsewhere. Also, in case they are very large, raindrops may break up into droplets, in which case N d is increased. Arguably, it is the right choice to relate the retrieved N d , as the radiation-relevant one, to CCN, i.e. to useβ, when computing the N d to CCN sensitivity with the aim to constrain RF aci .

405
Cloud-resolving models are a good tool to investigate these interpretations (McComiskey and Feingold, 2012). Fig. 1 shows an analysis of a large-domain large-eddy simulation with the ICON-LEM model (Heinze et al., 2017;Costa-Surós et al., 2019).
CCN concentrations in these simulations are relaxed towards pre-computed spatially and temporally varying fields and are consumed at activation. In the 22 million grid columns, the droplet concentration at cloud top (what is retrieved from satellites) is compared to the maximum droplet concentration (approximately the concentration of activated CCN / formed droplets). This 410 demonstrates that there is a link between the droplet concentration formed at activation and N d determining the cloud radiative effect at its top. These two quantities correlate rather well in the joint histogram, though that link is far from one-to-one. The second plot (Fig. 1b) assesses the possibility to infer cloud-top N d from cloud-top r e and τ c (Grosvenor et al., 2018). For this, the MODIS simulator (Pincus et al., 2012)  and τ c . From these, N d is computed as in Eq. 6. This approach mimics the satellite retrieval but assumes no retrieval errors, i.e. the comparison is a lower bound on the accuracy of the retrieved N d in representing the actual N d at cloud top. There is a meaningful co-variation of the two quantities, but it is far from perfect. In particular, there is a systematic overestimation of N d in the retrieval approach, especially at low N d . The relative error even is a function of N d , with larger relative errors at low N d .

420
In conclusion, the fact that cloud-top N d is in general lower than N d at activation height implies thatβ is indeed somewhat smaller than unity. This is not a problem, but a desired analysis result when studying the Twomey effect. However, N d obtained from retrieval products is biased high for low values of N d,top . This relative error, which is a function of N d , implies that the regression between satellite-derived N d and CCN yields a sensitivity that is too weak.

Cloud regime dependence 425
Aerosol-cloud interactions depend on cloud regime (Stevens and Feingold, 2009;Mülmenstädt and Feingold, 2018). When it comes to RF aci , there are three reasons for this: (i) the radiative sensitivity (Oreopoulos and Platnick, 2008;Alterskjaer et al., 2012), i.e. the first two terms on the right-hand-side of Eq. 2 (in particular the sensitivity expressed in Eq. 1), (ii) the updraughtdependence ofβ, and (iii) the dependence of the relation of cloud-top to cloud-base N d on characteristics of turbulence and rain. The latter two are of interest here. "Cloud regime" thus means here, a cluster of clouds with similar P(w) and similar in Eq. 5. When considering CCN at a certain supersaturation level,β is larger at larger updraught, w (MacDonald et al., 2020). Broadly, cumulus clouds have larger w than stratiform clouds. In addition, clouds over land usually have larger w than clouds over ocean. Building on Eq. 5, this suggests a regime-based analysis expressed as Fig. 2 shows the spatial distribution of the N d -AI regression coefficient from its temporal variability within 1 • × 1 • grid 435 boxes. The large spatial heterogeneity is not straightforward to interpret. Some problems may be due to the lack of aerosol retrieval sensitivity (e.g. in regions with low CCN concentrations such as the southern oceans) or lack of vertical or horizontal co-incidence (e.g. in regions with heterogeneous aerosol and large cloud coverage such as mid-latitude storm tracks). However, aspects of the geographical heterogeneity may indeed be attributable to physical and relevant reasons. However, it is difficult to determine any attributable factors in the spatial and cloud regime variations inβ (Gryspeerdt and Stier, 2012) before retrieval 440 errors are remedied.
In precipitating situations, the two-way interactions can lead to large challenges in determining theβ term (Ekman et al., 2011). Precipitation scavenges aerosol and, in certain situations, the interplay between aerosol, droplet concentrations and precipitation determines both aerosol and droplet concentrations. This may yield bifurcations between situations with large N d in which no drizzle forms, and very low N d and cloud dissolution when precipitation forms (e.g. Yamaguchi et al., 2017). In 445 such situations, it is particularly challenging to identify the N d -CCN concentration sensitivity.

Aggregation scale
The impact of aggregation scale on estimates of β has been discussed in detail by McComiskey and Feingold (2012). Their key conclusion is that at scales larger than the cloud variability scale of about 1 to 10 km, aerosol and cloud data become de-correlated so that the diagnosed β becomes less and less representative for individual cloud parcels. In turn, Sekiguchi et al. 6 Quantification for the regression coefficient When sensitivities are approximated by linear regression coefficients from an ordinary least squares (OLS) line fitting method, rather than derived e.g., in form of joint histograms, the problem of regression dilution arises to the extent that the aerosol 470 quantity shows errors: the regression coefficient becomes gradually smaller as the stochastic error increases (Cantrell, 2008;Pitkänen et al., 2016;Wu and Yu, 2018). Regression dilution, also known as regression attenuation is a problem if the independent variable (x-axis) in the regression is subject to a statistical error.If the regression method does not take the statistical error into account, that is often the case (for example in OLS), the regression coefficient is always systematically biased low. In turn, statistical error on the dependent variable (y-axis) only causes uncertainty in the regression coefficient but no systematic bias.

475
This is quantified for the column-CCN vs. N d sensitivity evaluated as regression coefficient in Fig. 3. Due to the regression dilution, the sensitivity decreases by factors of 2 to 3 as the error in column CCN increases when considering relative errors of 50%. This can to a large extent be remedied by ignoring data points at low CCN concentrations from the regression (Fig. 3, right panel). However, this solution is limited to regions not dominated by low aerosol concentrations. Fig. 3 also illustrates that an absolute bias in the data translates to relative bias in logarithmic scale. Therefore, if no bias correction is applied, an absolute 480 bias in the data will cause a bias in the sensitivity estimates. As shown by Pitkänen et al. (2016), the regression dilution in turn becomes weaker at coarser aggregation scales in cases of auto-correlated data, which is the case for aerosol concentrations.
This is of relevance in case of both temporal and spatial aggregation. In other words, the systematic low bias in the sensitivity is reduced if data are aggregated. This could partly explain some previous findings of increasing sensitivity with decreasing resolution (see discussion in the previous section), in addition to the actual bias due to the aggregation over smaller scale of 485 cloud processes. These considerations imply that it is necessary to either analyse the full variability of aerosol-cloud interactions, e.g. in the form of joint histograms, or to account for the regression dilution using established mathematical approaches that properly consider measurement uncertainties, as discussed in Mikkonen et al. (2019), for instance.

Conclusions
The radiative forcing due to aerosol-cloud interactions, or Twomey effect, requires quantification based on observational data, 490 since models are associated with large uncertainties. At a large scale, this calls for satellite retrievals. There are, however, large challenges when using satellite data and this review summarizes these challenges and suggests some potential ways forward.
The key data-related question is the sensitivity of droplet concentration, N d , to perturbations in the cloud-active aerosol, i.e. the cloud condensation nuclei (CCN) concentration at or above cloud base. The most widely-used proxy of the cloud-base CCN concentration is the aerosol optical depth (AOD), or alternatively the aerosol index (AI), taken from cloud-free pixels in the 495 vicinity of the locations the cloud retrievals. The four main caveats with AOD are the lack of vertical resolution, the additional influence of hygroscopic swelling, the fact that the detected aerosol might be not active as CCN, as well as the impossibility to retrieve it below clouds. In terms of the vertical resolution, satellite-based lidar offers help. However, current lidar retrievals are even more constrained to large aerosol concentrations than passive AOD retrievals. EarthCARE's ATLID lidar will allow direct inference of the ratio of backscatter to extinction, enabling greatly improved retrievals of aerosol extinction profile. Adding 500 a second wavelength with ATLID capabilities and combining it with polarimetric measurements would substantially extend vertically resolved aerosol information content. In terms of horizontal co-location, trajectory computations may help to identify the aerosol representative of that affecting specific clouds. However, this requires extra effort and reliable information about trajectories. The hygroscopic swelling can be addressed by parameterisations that use retrievals and ancillary data to compute the swelling. Further relevant information is possible from polarimetric measurements.

505
Cloud droplet number concentration, N d , is only indirectly available from current operational satellite retrievals. It is generally computed from retrieved cloud-top droplet effective radius, r e and cloud optical thickness, τ c , leading to substantial biases in comparison to the cloud-top droplet number concentration, especially in inhomogeneous, broken and/or precipitating cloud regimes. Sink processes for N d and variability due to atmospheric dynamics, including turbulent mixing, imply that the radiatively-relevant cloud-top N d relates imperfectly to the N d formed by CCN activation. In addition, at a given CCN con-510 centration, the updraught variability also leads to sensitivies of N d to CCN that are much less than one. These latter two facts are not problematic when assessing the N d to aerosol sensitivity from data for the estimation of the Twomey effect. In fact, it is desirable to quantify at a large scale the net impact of aerosol perturbations of the (radiatively-relevant) cloud-top N d that accounts for updraught and N d sink variability. However, it is necessary to operationally retrieve N d , rather than to indirectly compute it from r e and τ c retrievals. It is also necessary to improve these retrievals in particular for low droplet concentrations 515 and broken cloud conditions. In addition, these retrievals should take into account additional information e.g. about the onset of drizzle.
Regression dilution influences the statistically inferred sensitivity as a result of stochastic retrieval errors in CCN concentration. On the one hand, at aggregate scales, this problem becomes less relevant due to the autocorrelation of the aerosol concentrations. The relationship between N d , that varies at cloud-dynamics scales, and CCN proxies becomes weaker at ag-520 gregate scales. Relative retrieval errors in N d that depend on actual N d (with larger high-biases at low true N d ) lead to a further reduction in the estimated sensitivity. It is thus necessary to account for the impact of CCN errors in the statistics and to optimize the resolution of N d and CCN retrievals towards cloud-scale resolutions.