Quantifying Albedo Susceptibility Biases in Shallow Clouds

The evaluation of radiative forcing associated with aerosol-cloud interactions remains a significant source of uncertainty in future climate projections. The problem is confounded by the fact that aerosol particles influence clouds locally, and that averaging to larger spatial and/or temporal scales carries biases that depend on the heterogeneity and spatial correlation of the interacting fields and the non-linearity of the responses. Mimicking commonly applied satellite data analyses for calculation of albedo susceptibility So, we quantify So aggregation biases using an ensemble of 127 large eddy simulations of marine 5 stratocumulus. We explore the cloud field properties that control this spatial aggregation bias, and quantify the bias for a large range of shallow stratocumulus cloud conditions manifesting a variety of morphologies and range of cloud fractions. We show that So spatial aggregation biases can be on the order of 100s of percent, depending on methodology. Key uncertainties emanate from the typically applied adiabatic drop concentration Nd retrieval, the correlation between aerosol and cloud fields, and the extent to which averaging reduces the variance in cloud albedo Ac and Nd. Biases are more often positive than negative. So 10 biases are highly correlated to biases in the L adjustment. Temporal aggregation biases are shown to offset spatial averaging biases. Both spatial and temporal biases have significant implications for observationally based assessments of aerosol indirect effects and our inferences of underlying aerosol-cloud-radiation effects.


15
Shallow liquid clouds are a poorly quantified component of the climate system and one of the greatest sources of uncertainty for climate projections (e.g. Bony and Dufresne, 2005;Bony et al., 2017). The problem is multifaceted and encompasses fundamental understanding of how these clouds are affected by the thermodynamic structure of the atmosphere, how they might change in a warmer world, how they are influenced by the atmospheric aerosol, and how all of these components are represented in climate models. The difficulty in quantifying the radiative effects of shallow clouds emanates, to a large extent, 20 from the large range of spatiotemporal scales involved: aerosol-cloud interaction processes need to be understood and resolved at the scale of centimeters (e.g. Hoffmann et al., 2019) while cloud fields and their organization are driven by larger scale circulations at scales of 100s to 1000s of kms (e.g. Norris and Klein, 2000). Importantly, aerosol-cloud interactions acting at scales on the order of 100 m need to be resolved; (a) because they can lead to fundamental changes in the radiative state of a cloud system by changing the cloud albedo, cloud fraction, and spatial distribution of condensate (e.g. Sharon et al., 2006;25 Stevens et al., 2005;Wang and Feingold, 2009); and (b) because non-linearities in aerosol-cloud-radiation interactions mean that the methodology of averaging small-scale properties to larger-scales might generate biases in the radiative response. This paper focuses on the ramifications of small-scale processes for cloud albedo susceptibility and cloud liquid water path adjustments (changes in liquid water path in response to changes in aerosol concentration) within the context of how they are treated in satellite-based analyses. Two key aspects are addressed: the first relates to spatial averaging from the level of the 30 satellite pixel (order 1 km) to commonly used aggregation scales on the order of 10s -100s of kms; the second relates to temporal averaging from the individual scene snapshot up to a timeframe on the order of months. Large scale analyses often aggregate data both spatially and temporally into data sets that might cover e.g. 4 ⇥ 4 and five years (Chen et al., 2014), or a range of scales from kms -10s kms for cloud microphysics, liquid water path, and radiation, 1 ⇥ 1 for aerosol, and three month seasonal responses (Lebsock et al., 2008). While it is impractical not to aggregate to some degree -e.g., to smooth noisy 35 retrievals or extract signals from a noisy background -the implications for quantifying aerosol-cloud interactions are still not well understood.
We quantify aerosol-cloud interactions using the albedo susceptibility metric (Platnick and Twomey, 1994) defined here as where A c is the cloud albedo, N d is the drop concentration, and L is the liquid water path. The expression comprises the cloud 40 brightening or 'Twomey' component (1 A c )/3 and the adjustment term L o = dlnL/dlnN d , and assumes no changes in drop distribution width (e.g. Feingold and Siebert, 2009). The factor of 5/2 suggests a potentially strong but uncertain contribution from L adjustments; even the sign of this term varies from positive for precipitating clouds to negative for non-precipitating clouds (Christensen and Stephens, 2011;Glassmeier et al., 2021).
Using satellite-based observation systems, e.g., the MODerate Imaging Spectroradiometer (MODIS; Salomonson et al.,45 1998), one can derive a drop concentration N d,a , based on adiabatic assumptions, from retrieved visible cloud optical depth ⌧ and cloud-top drop effective radius r e : where c w (T, P ) is related to the condensation rate and is a know function of cloud-base temperature T and pressure P , f ad is the adiabatic fraction (assumed in this paper to be 0.8), Q ext is the extinction coefficient (⇡ 2 in the visible part of the 50 spectrum), ⇢ w is the density of liquid water and k is a factor that is inversely proportional to the width of the drop size distribution (assumed to be 0.8). When f ad = 1, N d,a is the adiabatic drop concentration.
Liquid water path L is derived from MODIS data as The theoretical framework for addressing this question is well-known from similar examples in the atmospheric sciences, notably biases in rain formation processes that result from large-scale averaging of the cloud water and drop concentration terms in expressions for autoconversion and accretion (e.g. Lebsock et al., 2013;Zhang et al., 2019). In the interests of completeness, we repeat the key equations here. Assuming a lognormal PDF of quantity x: where the lognormal parameters are geometric mean (or median) x g and geometric standard deviation g,x . Quantity x represents A c or L, and N . Using well known integral properties of the lognormal function (e.g. Feingold and Levin, 1986) it is easy to show that the bias in a moment x x associated with averaging can be written as: with the relative dispersion D x , the ratio of standard deviation to the mean, defined as If the two interacting fields, e.g. L and N , are assumed to follow a bivariate lognormal distribution, the bias associated with the covariance between L and N is

100
where r(L, N) is the spatial correlation between L and N . (Note that we elect to present the theory in terms of L and N rather than A c and N because A c includes compounded dependence on N via ⌧ . Since L and A c are highly correlated, this choice does not affect the general framework for discussion.) The overall bias associated with averaging for two covarying fields is then given by

105
The equations allow theoretical calculation of the S o biases associated with two covarying fields L and N , each characterized by its own heterogeneity D L and D N , respectively. These are shown in Fig. 1 in D L ; D N space, for three values of r(L, N).
We note that large positive and negative biases can result from averaging; for negative r(L, N), biases are positive, and on the order of 20 -60 % whereas for positive values of r(L, N), biases are negative, and on the order of -20 --70%. When correlation between the fields is zero, biases are 10 -20 %. The central role of r(L, N) is further illuminated by plotting the 110 bias in S o as a function of r(L, N) for specified D L and D N combinations (Fig. 2).
Note that the assumption of a bivariate lognormal distribution is common when dealing with geophysical fields. Regardless of how well our fields actually adhere to this assumption, the theoretical analysis does provide a framework within which to identify possible key controls over biases.
2.2 Effects of spatial and temporal aggregation on variance, and correlation between fields

115
The second framework for assessment of biases derives from the basic definition of the linear regression fit: whereb is the regression slope and x is the standard deviation of field x. In our case, x= aerosol or drop concentration (we will use N d ) and y = cloud variable (A c or L). As shown by McComiskey and Feingold (2012), aggregation increases r(x, y) but decreases x and y to varying degrees. Of interest is therefore, the extent to which aggregation changes r(x, y) and the 120 ratio y / x .

Large Eddy Simulation as a Data Source
We calculate S o biases using output from 127 large eddy simulations of marine stratocumulus under a range of conditions from fairly homogeneous overcast to broken open cellular structures.
Simulations are generated by the System for Atmospheric Modeling (SAM; Khairoutdinov and Randall, 2003). Input condi- The LES output provides microphysical fields drop number concentration and liquid water content as three-dimensional 135 prognostic variables, under the assumption of a bimodal lognormal distribution of cloud drops and rain drops with fixed distribution width ( g = 1.2 for both). However to mimic satellite retrievals, we work with the derived model variables ⌧ and cloud-top r e to calculate N d , and L based on Equations (2) and (3). Both cloud water and rain water contribute to ⌧ and r e .
A c is calculated based on the modeled value of ⌧ (Eq. 4) and then averaged, and therefore does not suffer from aggregation bias. S o is calculated directly for each scene based on the definition (S o = dlnA c /dlnN d ) using least squares regression to the 140 natural logarithms of A c and N d .

Level 2 Analysis
The standard averaging method follows the MODIS level 2 (L2) methodology in the sense that calculations are performed at high resolution prior to averaging. In the current work, Eqns. (2) and ( 3) are calculated based on the native 200 m model mesh 145 (n =1) and then aggregated to n ⇥ n tiles. Results will be shown for n = 30 vs. n = 4, i.e., 6 km ⇥ 6 km boxes vs. 800 m ⇥ 800 m boxes. These choices result from a desire for a reasonable number of regression points (n = 30) and for some small amount of smoothing (n = 4) to reduce the noise in N d retrievals (Eq. 2). The choice of n = 4 also brings us close to the typical 1 km length-scale used for analysis of L2 MODIS data. Note that L2 methodology removes the biases associated with aggregation of non-linear functions discussed in section 2.1 (Jensen's inequality) since N d is calculated using Eq.
(2) at the pixel level and 150 then averaged up. The same is true for A c , which as previously noted is calculated based on Eq. (4).
Biases are defined as where X represents S o calculated at n = 4 andX represents S o at n = 30 . Correlations r(L, N d ) refer to calculations for n = 4 unless otherwise stated. N d in the L2 analysis represents cloudy column averaged drop concentration, i.e., N d = N d,a based 155 on Eq. (2).
The N d retrieval (Eq. 2) is very sensitive to thin clouds particularly when r e , but also ⌧ are small. As is common in MODIS data analyses, calculations are only applied to thicker clouds, in our case, to r e > 3 µm and ⌧ > 3.

Variants
Two variants of the calculations will be presented. (2) and (3) for N d and L, respectively, are applied to data aggregated to n = 4 and n = 30. By doing so the aggregation biases associated with Jensen's inequality are introduced. The aggregation biases are expected to derive from a mix of influences: low for N d (a convex function in r e dominates a concave function in ⌧ ; Eq. 2), and negligible for L (Eq. 3). The effect of smoothing associated with level 3 aggregation is thus likely to be highly dependent on cloud field 165 heterogeneity.
2. Mimic MODIS L2 analysis but eliminate uncertainties in the (sub)adiabatic retrieval of N d by assuming a 'perfect' N d , which is taken directly from the LES. Because the LES N d is a 3-dimensional variable, we use in-cloud column average values. This methodology will be referred to as L2 N . Note that L is still calculated as per Eq.
(3). The goal here is to understand the extent to which the retrieval of N d drives the regression biases.

Temporal aggregation
To address temporal aggregation we consider the same individual LES snapshots described above but calculate S o in two ways: (i) S o regression fits to individual cloud scenes are simply averaged up over all cloud scenes; (ii) LES output is temporally aggregated to a large data set, from which S o is calculated via regression. The first approach preserves the individual cloud scene susceptibilities while the second aggregates many different cloud fields before performing the regression. The biases are 175 calculated based on Eq. (11), with the overbar indicating the second approach. The methodology is followed (separately) for both n = 4 and n = 30 spatial aggregation, and for L2, L3, and L2 N . Responses for the L3 analysis are distinctly different in a number of ways: first, bothS o and S o are almost always negative, and low f c cloud scenes often have lower biases than high f c scenes. This is because L3 averaging has a stronger smoothing effect on broken cloud fields, and therefore somewhat unexpectedly reduces the averaging bias for broken cloud fields compared to solid cloud fields. Nevertheless, the non-physical shift in the sign of S o associated with L3 methodology should act as To quantify the biases, these same analyses are shown as % biases in S o (Eq. 11) as a function of the correlation between L and N d (r(L, N d ); Fig. 4), the calculations of which are consistent with the derivation of variables averaged to n = 4; e.g., L2 and L3 calculations apply r(L, N d ) based on Eqns. 2 and 3, and L2 N calculates r(L, N d ) using the true N d and Eq. 3.

200
L2 biases are almost always positive and can reach values of many 100s of % (Fig. 3). As expected from Section 2.1, r(L, N d ) has a strong influence over the S o bias, particularly for L2, with the bias increasing noticeably with decreasing r(L, N d ), in a manner qualitatively similar to Fig. 2. The high values and high variability in the bias as one approaches r(L, N d ) ⇡ 0 are to some extent a consequence of an uncertain regression fit when the correlation between the L (or the closely related A c ) and N d fields is poor.

205
Of note is that L3 analysis methodology (aggregate first, then derive) changes the sign of r(L, N d ) to negative values, as it did the sign of S o (Fig. 3). L3 biases are more widely dispersed and show no clear trend with r(L, N d ) or f c . L2 N biases tend to be capped at about 100% and values of r(L, N d ) are noticeably more positive than those in L2. These results reinforce two points: (i) that L3 analysis generates non-physical results (negative S o and a change in the sign of r(L, N d )); and (ii) that the use of the (sub)adiabatic assumption (Eq. 2) as a proxy for N d incurs a significant increase in the S o bias relative to L2 N , most 210 noticeably at low r(L, N d ) where Eq. 2 results in a significantly reduced (but for the most part, positive) correlation.

Effect of Spatial Aggregation on Regression
It is useful to turn to the underlying definitions of regression analysis (section 2.2) to explore more deeply the influence of averaging on the S o bias. According to Eq. (10), S o biases are related to the ratio ofb (6 km) tob (800 m): 215 We therefore consider (i) the ratio¯ Ac /¯ N d to Ac / N d (henceforth the ' -ratio'; to adhere more rigorously to the regression analysis definitions. The interpretation of these ratios is non-trivial. They express the extent to which Ac / N d and r(A c , N d ) are modified by aggregation. In the case of Ac / N d this amounts to interpreting a 'ratio of ratios'. Later we will delve into this in more detail.

220
L2 results show a clear dependence of the S o bias on the -ratio, and especially large biases when the -ratio is high (Fig.   5a). These also happen to be points exhibiting high f c (cf. Fig 4a). Also apparent is that the separation of positive and negative biases is demarcated at a -ratio of 1. The results suggest a strong correlation between the -ratio and f c . At high r(L, N d ), the S o bias increases systematically with increasing -ratio ( Fig. 5a) but with decreasing r(L, N d ) the strong, and orthogonal influence of the r-ratio becomes more important (Fig. 6a). Clearly evident in Fig. 6a is the anticipated unstable calculation of 225 the r-ratio in the vicinity of r(L, N d ) = 0.
The L3 methodology exhibits an even clearer dependence of the S o bias on the -ratio, except for cloud fields with r(L, N d ) ⇡ 0.2 (Fig. 5b) where the high bias is clearly related to both the -and r-ratios (Fig. 6b). The aforementioned vertically oriented green colored points have a -ratio of about 1 and r-ratio of 2.5, i.e., the bias is driven by the r-ratio.
For the L2 N methodology, here too the -ratio (Fig. 5c) provides a clearer indication of the magnitude of the bias compared 230 to either the r-ratio (Fig. 6c) or f c (Fig. 4).
Based on Eq. (12), the influence of the -and r-ratios on S o bias is expected. What is more revealing is the influence of averaging on the components of these ratios. To this end, Fig. 7 examines the effect of averaging on the reduction in Ac and L2 analysis shows that at low f c the normalized reductions in (N d ) tend to be smaller than those in (A c ) but that the reverse tends to be true for f c > 0.75 (Fig. 7a), i.e. at high f c , averaging smooths the N d field more than it smooths the A c field. We will show below that these high f c scenes, although inherently more homogeneous, often manifest significant inhomogeneity in N d as a result of the N d retrieval. (See further discussion in section 4.1.4; Examples.) The clear exception to the trend of aggregation smoothing N d more than A c with increasing f c is the group of low f c points 240 that exists in Fig. 4a at r(L, N d ) < 0.2. These anomalous points appear below the 1:1 line in Fig. 7a and show up as lower r-ratio points (reddish points) in a sea of higher r-ratio points (brown colors) in Fig. 8a. Another distinct feature is the group of vertically oriented high f c (Fig. 7a) and low r-ratio (Fig. 8a) points for which smoothing of N d increasingly exceeds smoothing in A c as one moves below the 1:1 line. Because these points are characterized by small values of the r-ratio, there appears to be an offsetting of -ratio and r-ratio effects.

245
A closer look shows that these points manifest as a negative S o bias (Fig. 9a); in other words, the reduction in the r-ratio dominates the increase in the -ratio. Although these negative bias points tend to be more rare, they can be identified in Fig. 4a at high f c and low r(L, N d ) (< 0.3).
Analysis of L3 shows some similarities and some differences from L2. First, there is a much more significant scatter in points, particularly at lower f c (Fig. 7b); second, as in L2, averaging-related smoothing of N d tends to exceed smoothing in

250
A c at higher f c (Fig. 7b); third, and different from L2, values of the r-ratio of ⇡ 1 (Fig. 8b, green colors) are associated with higher S o biases (Fig. 9b), which must derive from the -ratio. and 9c). In fact it is clear from Fig. 9 that S o biases exceeding 100% always occur when averaging-related smoothing has a stronger effect on N d than on A c , although the magnitude and even sign of these biases vary significantly depending on the methodology. The richness (lack of monotonicity) in responses under these conditions depends on the relative strength of the r-ratio, and the extent to which it amplifies or counteracts the -ratio.
We note one interesting difference between L2 and L2 N : for the latter, low f c points reside in conditions under which 260 averaging-related smoothing is dominated by more as well as less significant smoothing of N d vs A c . (Fig. 7c). In very rare cases the low f c points above, but close to the 1:1 line in Fig. 9c cause negative S o biases (cf. Fig. 6c). Finally, some very high positive S o biases (up to 500 %) do exist for L2 N . These can be traced to conditions when the r-ratio is large (Fig. 8c) and averaging smooths N d more than A c . In this case the effects of averaging on the r-ratio and the -ratio work in unison to amplify the bias. The topic of L adjustments is of great interest given that the term may both enhance or offset the overall albedo susceptibility (Eq. 1) (e.g. Glassmeier et al., 2021). For example, a value of L o = dlnL/dlnN d < 0.4 will change the sign of S o from positive to negative. Numerous recent articles, based on models and observations, point to L o being positive in precipitating conditions, following the familiar Albrecht (1989) 'cloud lifetime' hypothesis which posits that aerosol perturbations will 270 suppress collision-coalescence and decrease precipitation, and therefore L losses, while it is negative in the non-precipitating regime, as a result of enhanced evaporation-entrainment feedbacks (Wang et al., 2003;Ackerman et al., 2004;Xue et al., 2008;Christensen and Stephens, 2011;Gryspeerdt et al., 2019). Fig. 10 shows the relationship between spatial averaging-related S o and L o biases, with points colored by f c . For clarity we show a subsample of 58 of the total 127 simulations to avoid points clustering and obscuring points below.

275
First, we note a strong positive correlation between the two biases, with L o biases larger than S o biases in L2 and L2 N (Fig.   10a, c). The reverse is true for L3 (Fig. 10b). For L2 and L3, two distinct branches appear; the first is a tight relationship for f c > 0.7, while the second is somewhat less well defined and associated with lower f c . There is a saturation in the ratio for f c on the order of 0.3 in L2 (Fig. 10a), while L3 shows a distinctly stronger S o bias at low f c compared to the approximately linear relationship for high f c (Fig. 10b). The separation of these branches is much less distinct for L2 N (Fig. 10c)

Examples
While our goal has been to provide a broad assessment of susceptibility biases in terms of cloud field properties, some examples 285 are helpful to illustrate the issues. We focus on L2 and L2 N to isolate the effect of the N d retrieval for individual cases, chosen randomly based on their visual physical characteristics, but supported by other cases. Figure 11 presents an LESgenerated stratocumulus scene characterized by high f c (= 0.98) and a very realistic closed-cellular structure ( Fig. 11a,c).
Drop concentration fields for the (sub)adiabaticlly-derived N d (Eq. 2) and the 'true' (LES) N d show the problem very clearly.
The retrieved N d shows a great deal more fine-scale structure than the true N d , and although in the mean the retrieved N d is not  Fig. 4). We emphasize that the significant inhomogeneity in N d is a result of the N d retrieval, rather than an inherent property of the cloud field, and that it is this increase in inhomogeneity and reduction in r(L, N d ) that drives up the S o bias.
The second example (Fig. 12) is a low f c case ( While examination of case studies proves useful, we argue that the broader statistical view is essential to understanding the error landscape that might be encountered in highly variable natural cloud scenes. In this regard, Figs. 3 -6, supported by Figs.

310
Although the focus of the work has been on spatial aggregation, we now briefly consider the effects of temporal aggregation, which by Eq. (10) will also affect S o . Large scale analyses often aggregate data over multiyear timescales (e.g. Chen et al., 2014), or three month seasonal timescales (Lebsock et al., 2008), extending the range of conditions and changing the variance and correlation between the fields. The result is that the regression fit to a longterm, temporally aggregated data set will be different from the short timeframe fits, averaged up to include the same data.
315 Table 1 summarizes the results for the three averaging methodologies, and for spatial averaging of n = 4 and n = 30.
Here Of immediate note is that the analysis shows that temporal aggregation results in a reduced S o on the order of 320 70 -110 %, depending on the methodology applied. For L2 and L2 N the average of local-in-time cloud albedo susceptibility is larger than that calculated by temporally aggregating many scenes. This bias is of opposite sign to the typical biases associated with spatial aggregation (e.g. Fig. 4) and not significantly different in magnitude. This offsetting of errors should be seen as a cautionary flag when making choices of how to aggregate data, rather than as a fortuitous occurrence, since as we have shown above, the biases are highly dependent on cloud field properties.

325
In the case of L3, the temporal aggregation of many different scenes results in an increase in albedo susceptibility and a change in sign from negative to weakly positive. Here too, the spatial and temporal aggregation has opposite effects (cf. Fig.   3b). (Note that the % differences for L3 in Table 1  Satellite-based measurements are our best means of assessing aerosol-cloud-radiation interactions at the global scale and providing model constraints for one of the most uncertain forcings of the atmospheric system, namely aerosol indirect effects. Space-based data draw heavily on polar-orbiting satellites carrying passive instruments from which we infer aerosol properties 335 (e.g., aerosol optical depth), cloud properties (cloud optical depth, cloud-top drop effective radius, liquid water path), and radiative fluxes. Typical approaches to quantification of aerosol-cloud-radiation interactions aggregate these inferred properties spatially and temporally to suppress the 'noisy' retrievals inherent to these measurements. The goal of this paper has been to address the ramifications of spatial and temporal aggregation for standard metrics of indirect effects in the form of cloud albedo susceptibility S o , which is in turn strongly dependent on the liquid water path adjustment (L o ) (Eq. 1). The question 340 is central to our ability to quantify aerosol indirect effects, and raises fundamental questions of non-linearities in the aerosolcloud interaction system, and natural co-variability of the interacting fields. Early work recognized the effects of aggregation on cloud albedo (A c ) -the so-called "albedo bias" (e.g. Cahalan et al., 1994). The current study has assumed no aggregation error in A c (as in the case of a direct CERES-based retrieval) but has focused instead on derivatives of the form of Eq. (1) that include highly non-linear satellite-based retrievals of drop concentration (N d ). In addition, we have derived liquid water The assessment of spatial aggregation biases considers three methodologies. The first is standard level 2 (L2) satellite methodology, which retrieves cloud properties at high resolution (order 1 km) and aggregates them up to a scale on the order of 100 km. Given the limitations in our LES domain size (40 km), we instead compare S o based on 800 m and 6 km scales.
This forms the basis of our spatial scale aggregation. Recognizing that some might choose to use the more compact aggregated 360 data as a start point, our second approach is level 3 (L3) methodology, which applies microphysical retrievals to spatially aggregated data. Considering that a key and uncertain variable in the S o calculation is the derived drop concentration (N d ), a third set of calculations repeats the level 2 analysis but uses the 'true' (LES-calculated) N d , which we refer to as the L2 N methodology. All consider perfect cloud albedo retrievals based on LES cloud optical depth (Eq. 4), removing the well-known cloud albedo aggregation bias from the discussion. Furthermore, derived variables r e and ⌧ , which are central to the S o bias analysis have been assumed to be free of measurement error. Real-world satellite retrieval errors of these variables, especially in heterogeneous or broken cloud fields, could amplify (or perhaps counter) the biases identified here. Therefore, applying the conclusions directly to satellite studies should be done with caution.
Key results pertaining to spatial aggregation biases are 1. S o biases are generally positive for all three approaches and, consistent with theory, tend to increase with decreasing 370 correlation between the fields (Fig. 4). For L2, biases can reach 500 %, whereas they are capped at about 100 % for L2 N .
These biases will translate to biases in aerosol-cloud radiative forcing.
2. L3 methodology (falsely) creates negative correlations between the cloud fields, resulting in generally unpredictable S o bias behavior (Fig. 4b). Moreover, it generates negative S o values, exacerbated by increasing degrees of aggregation ( Fig. 3b). The positive S o biases for L3 are therefore misleading (Figs. 3b and 4b). 3. High cloud fraction f c states are equally or even more prone to S o bias than low f c (high D(L)) states. This is in part due to the nature of the N d retrieval, which introduces heterogeneity into the field (see example Fig. 11), but L2 N cases also tend to exhibit this trend. The literature tends to consider high f c states as homogeneous, but this works shows that this is not necessarily the case since the N d retrieval generates unrealistic small-scale heterogeneity in relatively homogeneous conditions. 380 4. Using regression theory (Eq. 10) we can interpret biases in S o based on the extent to which aggregation changes (i) the ratio of the correlation r x,y (r-ratio) and (ii) the ratio of y / x ( -ratio) at the 6 km vs the 800 m scales. Fig. 5 shows that that the S o bias is strongly dependent on the -ratio, particularly for L3 and L2 N , and for L2 at high r(L, N d ). The r-ratio is a weaker control over the S o biases (Fig. 6).
5. Since the -ratio is a ratio of ratios, we further expand our analysis of this term to assess the extent to which aggregation 385 changes Ac relative to N d . We find that while aggregation can reduce Ac as much as N d , there is a tendency for larger reductions in N d (Fig. 7), particularly for high f c and low r-ratio L2 cases (Fig. 8) and for low f c L2 N cases (Fig.   7).
6. S o biases exceeding 100% always occur when aggregation-related smoothing has a stronger effect on N d than on A c ( Fig. 9) although the magnitude and even sign of these biases vary significantly depending on the methodology. The 390 lack of monotonicity in responses under these conditions depends on the relative strength of the r-ratio, and the extent to which it amplifies or counteracts the -ratio. This in turn, depends on the cloud fields themselves in ways that are not always easy to clarify. 7. As anticipated by Eq. (1) the S o bias is a strong function of the L o bias (Fig. 10). In the case of L2, the L o bias is noticeably smaller than the S o bias at low f c , while the reverse is true for L3.

395
Regarding temporal aggregation, we note that if aerosol-cloud interaction metrics such as S o are based on aggregation of cloud scenes over an extended period of time, bias can be expected as a result of extending the range of conditions beyond the natural local fluctuations inherent to covarying meteorological and aerosol conditions. To assess the effects of this temporal aggregation, we consider the difference between the average of S o from individual cloud scenes and the S o that is derived from a best fit to the data from all those scenes, with the former reflecting the average of local-in-time S o and the latter reflecting the 400 effect of temporal aggregation. Calculations are performed at both 800 m and 6 km scales (Table 1). Of note is that temporal averaging, as calculated in this manner, reduces (in % terms) S o at both averaging scales, and for L2, L3, and L2 N analysis.
The negative bias is on the order of 70 -80 % for L2 and L2 N . As noted in section 4.1.1, L3 spatial aggregation methodology generates negative S o , and temporal aggregation has the beneficial effect of creating small positive, and therefore more realistic S o (relative to temporally unaggregated) values. (In percentage change terms, however, the bias is negative.) 405 We emphasize that these offsetting effects of spatial and temporal S o biases are very situationally-dependent and require further investigation. As in the case of the regression analysis (Eq. 10) performed for spatial aggregation, the pertinent question becomes the extent to which temporal aggregation will affect the -and r-ratios. Further work will need to place this offsetting effect on firmer footing.
Finally, it is of great importance that similar assessments of S o biases be considered in real-world satellite-based data. This 410 will allow the community to assess the degree of coherence between the aforementioned studies and the model-based results presented here, and the implications for quantification of aerosol-cloud radiative forcing.               Fig. 4 but with points colored by the ratio of s Ac /s Nd at 6 km to s Ac /s Nd at 800 m (the s -ratio). Note the clear dependence of the S o bias on the s -ratio and that the bias ~ 0 when the s -ratio ~1 (green colors). Exceptions to the latter occur for L2 at low r(L, N d ).