Evaluating Models’ Response Of Tropical Low Clouds to SST Forcings Using CALIPSO Observations

Recent studies have shown that in response to a surface warming, the marine tropical low-cloud cover (LCC) as observed by passive sensor satellites substantially decreases, therefore generating a smaller negative value of the top-of-theatmosphere cloud radiative effect (CRE). Here we study the LCC and CRE interannual changes in response to sea surface temperature (SST) forcings in the GISS Model E2 climate model, a developmental version of the GISS Model E3 climate model, and in 12 other climate models, as a function of their ability to represent the vertical structure of the cloud response to 15 SST change against 10 years of CALIPSO observations. The more realistic models (those that satisfy the observational constraint) capture the observed interannual LCC change quite well (∆LCC/∆SST = -3.49 ±1.01 % K-1 vs. ∆LCC/∆SSTobs = 3.59 ±0.28 % K-1) while the others largely underestimate it (∆LCC/∆SST = -1.32 ± 1.28 % K-1). Consequently, the more realistic models simulate more positive shortwave feedback (∆CRE/∆SST = 2.60 ±1.13 W m-2 K-1) than the less realistic models (∆CRE/∆SST = 0.87 ±2.63 W m-2 K-1), in better agreement with the observations (∆CRE/∆SSTobs = 3.05 ±0.28 W m20 2 K-1), although slightly underestimated. The ability of the models to represent moist processes within the planetary boundary layer and produce persistent stratocumulus decks appears crucial to replicating the observed relationship between clouds, radiation and surface temperature. This relationship is different depending on the type of low cloud in the observations. Over stratocumulus regions, cloud top height increases slightly with SST, accompanied by a large decrease of cloud fraction, whereas over trade cumulus regions, cloud fraction decreases everywhere, to a smaller extent. 25

avoid daytime noise contamination on the lidar signal, we only use nighttime data, however the results using nighttime and daytime data are similar with a slightly larger amplitude of interannual LCC changes (10 % to 15 % larger).
To derive an uncertainty estimate of the relationship between monthly cloud amount change and SST anomalies over several years, referred to as interannual change, we use four different datasets for the SST: ERAI, Extended Reconstructed SST version 5 5 (ERSSTv5, Huang et al., 2017), NOAA Optimum Interpolation (OI) SST version 2 (NOAA-OI SSTv2, Reynolds et al., 2002) and Centennial in situ Observation-Based Estimates SST version 2 (COBE-SST2, Hirahara et al., 2014). The uncertainty related to clouds is due to the cloud threshold and the attenuation of the lidar beam. However, these are reproduced in the model via the use of the lidar simulator and therefore does not necessitate further investigation here. The "actual" observed relationship may be biased low because of the lidar attenuation and the sensitivity of the dataset to the cloud threshold. While 10 lidar-only products of LCC agree with each other (e.g., Chepfer et al., 2013) some disagreements exist in their cloud profiles due to different definitions of cloudy and fully attenuated pixels in their algorithm (Cesana et al., 2016;Chepfer et al., 2013).
Additionally, CloudSat-CALIPSO combined products have been shown to retrieve larger cloud fraction in regimes of weak subsidence but these datasets are only available for a short period of time (Mace and Zhang, 2014) and are therefore unsuited for this study. For radiative fluxes, we use the monthly CERES Energy Balanced and Filled (EBAF) edition 4 dataset (CERES-15 EBAF 4.0, Loeb et al., 2018). The large-scale circulation (⍵500 ) is obtained from the monthly ERA-interim reanalysis (Dee et al., 2011). All datasets are averaged over a 2.5˚ horizontal grid and are used over the same time period as CALIPSO-GOCCP.
Using a finer grid (1˚) does not impact the results (not shown).

Simulations
In this study, we analyze prescribed-SST (Atmospheric Model Intercomparison Project, AMIP) monthly outputs from two 20 generations of the GISS Model GCM. The first one is the GISS-E2 model that was used for the 5 th Coupled Model Intercomparison Project (CMIP5) (Schmidt et al., 2014). The second one is a developmental version of the GISS-E3 modeltuned to achieve radiative balance (+0.29 W/m 2 ) using some of the cloud parameters described below -that will be submitted to CMIP6 and will undergo additional changes and tunings by then. E3 and E2 differ in many ways that can potentially affect low clouds: 25 (1) Layering in lower troposphere: E2 uses a 40 layer vertical grid, whereas these E3 runs use 62 levels with the greatest refinement in the lower atmosphere: at the surface and at 850 hPa pressure, nominal layer thicknesses for E2 are respectively 20 and 35 hPa, and for the 62 layer grid they are 10 and 20 hPa.
(2) Turbulence: the E2 scheme (Yao and Cheng, 2012), which includes nonlocal transport and does not consider moist 30 processes, has been replaced by the scheme of Bretherton and Park (2009) for E3, which includes moist processes in the computation of turbulent fluxes and uses a novel relaxation approach to parameterize the nonlocal transport of TKE within well-mixed regions; the turbulent transfer coefficients it computes are applied to all prognostic variables separately, with Atmos. Chem. Phys. Discuss., https://doi.org/10.5194/acp-2018-1008 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 15 November 2018 c Author(s) 2018. CC BY 4.0 License. a water-cloud-only saturation adjustment applied immediately after the transport is treated, using the scheme described below for stratiform cloud macrophysics. The Galperin et al. (1988) scheme that is used by the Bretherton and Park (2009) has been replaced by a second-order scheme with a larger critical Richardson number.
(3) Stratiform cloud macrophysics: while designed differently, both E2 and E3 use a diagnostic determination of cloud fraction as a function of grid-mean moisture and a condition-dependent sub-grid variance expressed as a threshold grid-5 mean relative humidity (RH) for cloud formation. The Sundqvist-type scheme of E2 (Del Genio et al., 1996), applied identically to water and ice clouds, is replaced for E3 by a scheme that uses a triangular probability density function (PDF) to compute water cloud fraction and cloud water mixing ratio (Smith, 1990). For E3, ice cloud fraction is obtained independently via inversion of that PDF scheme (Wilson and Ballard, 1999), with a different variance than for water. For E3 water clouds, different prescribed values of threshold RH determine the width of the PDF for layers that are within and 10 outside well-mixed regions as determined by the turbulence scheme; this distinction is loosely congruent to Ua and Ub in E2 (Schmidt et al., 2014, section 2.5). In E2, suppression of stratiform cloud under conditions favoring convective cloud is primarily through restriction of the maximum possible areal extent of stratiform cloud to a fraction determined by the depth of convection. In E3 the following check is applied instead: if, above the PBL, a hypothetical saturated parcel is conditionally unstable, stratiform cloud is assumed to be meteorologically inconsistent with the stratification and not 15 allowed to form except at 100% grid-mean RH.
(4) Stratiform cloud microphysics: the Sundqvist-type prognostic cloud water parameterization used in E2 (Del Genio et al., 1996) is replaced in E3 by a two-moment microphysics scheme with prognostic precipitation (Gettelman and Morrison, 2015). For our implementation we use a fixed relative dispersion for the gamma size distribution of water droplets following Geoffroy et al. (2010) andthe Meyers et al. (1992) expression for deposition mode heterogeneous ice nuclei, 20 and allow homogeneous aerosol freezing to occur (with a prescribed number concentration) when the RH with respect to ice (grid-mean divided by the fractional threshold RH used to define the width of the PDF used for water cloud fraction) exceeds the threshold of Karcher and Lohmann (2002). Cloud droplet concentrations are prescribed with different values over land and ocean.
(5) Moist convection: as in E2, the cumulus category realized for a given environment is a function of dynamically determined 25 entrainment, which is stronger in E3 as described below. The default entrainment efficiency results in a relatively large rate producing shallow cumulus for typical subtropical conditions; this highly entraining plume may grow deeper under more unstable or moister free-tropospheric conditions. As in E2, a fraction of cloud-base mass flux seeds a second plume with a small entrainment rate conducive to deep convection when conditions are diagnosed to be favorable to mesoscale organization. The E3 version in this study relates the less-entraining fraction to the downdraft mass flux forming cold 30 pools, mirroring Del Genio et al. (2015), whose cold pool parameterization also affects the determination of updraft properties at cloud base. This choice reduces the global frequency and shifts the pattern of less-entraining convection compared to E2, which related it to the large-scale vertical velocity. Reformulations of the numerics in E3, targeting layering independence, eliminated inadvertent but systematic reductions of entrainment rate occurring in E2. Other E2 to Atmos. Chem. Phys. Discuss., https://doi.org/10.5194/acp-2018- E3 convection changes directly affecting lower-tropospheric conditions include (a) rain evaporation above cloud base, a moistening countered by (b) more efficient venting of the PBL, with the restriction that (c) convection may only originate at the top of a turbulent layer as defined in item (3) above.
(6) Convective cloud microphysics: particle size distributions (PSDs) and size-fall speed relationships used in E2 (Del Genio et al., 2005) have been replaced for E3 with field experiment-based normalized gamma PSDs and fall speeds for ice 5 described by Elsaesser et al. (2017); for liquid, the E2 formulations have been replaced with bimodal (cloud and rain) drop size distributions (DSDs) (each DSD provided by Thompson et al. (2008), with a modified shape parameter from Shipway and Hill (2012) for the rain DSD), while droplet fall speed formulations are now provided by Seifert (2008).
We note that the improved representation of stratocumulus in E3 relative to E2 is principally attributable to the implementation 10 of the moist turbulence scheme, together with critical linkages to stratiform cloud macrophysics and moist convection.
To provide context for the GISS model results, we also analyze AMIP simulations from 12 other CMIP5 models (Table 1).
Except for GISS-E3 (2007GISS-E3 ( -2015, we use the last 18 years of AMIP simulations (1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008). To ensure a fair evaluation, we compare simulated and observed cloud fields through the use of the lidar simulator (e.g., Cesana and Chepfer, 2012) although 15 the relationships found in this study are very similar (in terms of sign and shape) when original cloud fractions are utilized in GISS-E3. The model outputs are monthly means of the CALIPSO low-level cloud fraction and CALIPSO cloud fraction, socalled cllcalipso and clcalipso, respectively. The simulator package (Bodas-Salcedo et al., 2011) uses profiles of model variables (temperature, pressure, mixing ratios and cloud fraction) in each longitude-latitude grid box for each time step, divides them into sub-columns to account for sub-grid scale variability (Klein and Jakob, 1999) and mimics the lidar simulator 20 signal (Chepfer et al., 2008). Then, the simulated lidar signal is interpolated to the CALIPSO-GOCCP vertical resolution, 40 levels of 480 m thickness between 0 and 19.2 km, and the different diagnostics are computed and accumulated into statistics.
A sub-pixel is diagnosed as cloudy when its SR is larger than 5 and low-level clouds are diagnosed in the column whenever a cloudy pixel is present below 3.36 km.

Definition of low cloud regions
In this work, we focus on the low-level clouds that form over the tropical oceans (between 35˚S and 35˚N) in subsidence regimes defined as having a large-scale pressure vertical velocity at 500 hPa (⍵500) greater than 10 hPa/day. This filtering captures most of the stratocumulus and stratocumulus-to-shallow-cumulus transition regions, which are located climatologically within the blue contours in Fig. 1. In the literature, some studies use a 0 hPa/d ⍵500 threshold (e.g., Myers and30 Norris, 2015, 2016). Here we choose a more conservative ⍵500 threshold to minimize areas where high clouds are common and that may mask the detection of underlying low-clouds in the observations. We confirm this by looking at the height at which the lidar signal becomes completely attenuated, so-called z_opaque (Guzman et al., 2017). The 10 hPa/d threshold almost perfectly encompasses areas where z_opaque is smaller than 2 km (see Fig. S1), meaning that the lidar is able to detect virtually all low clouds in these regions (clouds with cloud top lower than ~ 3 km).

Cloud-SST relationship and observational constraint
Two main goals of our study are to investigate the interannual variation of the vertical cloud fraction (CF) and LCC in response 5 to a change in SST in both the observations and the models, and to use the observed relationship to evaluate the models. By interannual variation we mean the monthly variations over multiple years, a decade in this case. Capturing the mechanisms that govern the change of clouds in response to a surface warming is an essential condition -although not the only one -to predict future climate. Thus, we select the GCMs that produce the most realistic change in cloud profile per K of SST warming.
We refer to these as "constrained" models, in the sense that they are distinguished from other models in our analysis using an 10 observational constraint; we emphasize though that the models have not been changed in response to the observations. We compare the cloud fraction and shortwave (SW), longwave (LW) and net cloud radiation effect (CRE) changes of these models to the others, which we refer to as "unconstrained" models.
To calculate the interannual relationship between SST and cloud amount, we compute the monthly mean of CF and LCC and 15 monthly anomalies of SST after having filtered out all grid boxes where ⍵500 is lower than 10 hPa/d, referred to as CFsub, LCCsub and SSTsub,anom. Those can be seen as dynamically-based means and anomalies, as opposed to spatially-based anomaly/mean studies that focus on particular regions (e.g., McCoy et al., 2017, Qu et al., 2015. Hence, the cloud response is dominated by the local component rather than the large-scale component (dynamics). It is therefore complementary to imposing a uniform +4K increase (e.g., Cesana et al., 2017) or an abrupt 4 times CO2 increase (e.g., Brient et al., 2016) that 20 are also significantly affected by dynamical changes. We then linearly regress CFsub and LCCsub against SSTsub,anom to obtain the change (∆) in cloud fraction and low cloud cover per K of SST warming ∆C/∆SST, where C is either the CF or LCC. Using a centered finite-differencing scheme as in Myers and Norris (2015) instead of a linear regression does not impact the results (not shown).

Assumptions and caveats 25
By using this method, we make some assumptions that generate some caveats. For example, we assume that the relationship between SST and low cloud amount is time-scale invariant, i.e., the same regardless of the time-scale over which anomalies are calculated. This assumption seems to be supported by several previous studies (e.g., Klein et al., 2017;Mc Coy et al., 2017), but we note that any such relevance to cloud feedback in the regions we study does not necessarily have broader implications for the global equilibrium climate sensitivity (Caldwell et al., 2018). Moreover, we analyze the effect of SST on 30 clouds by assuming that the cloud effect on the SST is negligible on a monthly time-scale based on previous studies (e.g., here. However, the standard deviation (STD) computed using the four SST datasets (or the 5-95 % confidence intervals when using a single SST dataset, not shown) is far smaller than the multimodel mean STD and bias, as shown in section 4. In addition, using a smaller period of time does not change the sign and shape of the results but may change its magnitude (not shown).

5
Other environmental factors may cause low cloud changes such as the estimated inversion strength or ⍵500 (Qu et al., 2015;Myers and Norris, 2016). When these factors are held constant the variation of the cloud amount as a function of the SST becomes a partial derivative. Past studies have shown that computing the partial derivative may decrease the magnitude of ∆LCC (e.g., Myers and Norris, 2015;Qu et al., 2015). We find a similar decrease in our study using four of the five observational datasets of section 4.3 (∆LCC ~ 20 % smaller, see section 4.3). 10 As stated earlier, our ⍵500 filter targets stratocumulus and stratocumulus-to-shallow-cumulus transition regions. Such a definition of low clouds -while extensively used in the literature -does not permit us to distinguish between the two most common low-cloud types, that is to say trade cumulus and stratocumulus, and it also excludes parts of the trade cumulus regimes that have been argued to be important to overall cloud feedback (weak convective regimes, e.g., Nuijens et al., 2015). 15 As a consequence, our results do not target a specific type of cloud but rather represent the regional-only averaged effect of all types of low clouds. Nevertheless, we attempt to provide some information on the observed interannual changes of low clouds in trade-cumulus and stratocumulus regimes in section 4.3. Figure 2a shows averaged cloud fraction profiles over the tropical oceans (35˚S to 35˚N) in subsidence regimes (⍵500 > 10 hPa/d). In the low levels (z < 3.36 km), both GISS models underestimate the CF. Although GISS-E2's peak (purple line with stars) is slightly larger than E3's (blue line with stars), the shape of the GISS-E3 profile is in better agreement with the observations (two large values at 1.2 km and 1.68 km). In addition, GISS-E3's CF values are in very good agreement with the observations at 2.16 km and above while they are overestimated in GISS-E2, suggesting an excess of trade cumulus type of 25 clouds. Most of the other models (9/12) also underestimate the CF, yielding a multi-model mean peak ~43% smaller than observed (triangle green line, 11.2 %, vs. circled orange line, 19.6 %, Fig. 2a). In addition, the model behavior is relatively diverse, which highlights the large uncertainty around the simulation of low clouds. The observed shape of the cloud fraction profile -a single peak around 1.2 km -is not captured by all models. Some simulate a double-peak shape, which is likely the result of the distinct contribution of stratocumulus and trade cumulus clouds, the latter having typically smaller CF and higher 30 cloud top (typically treated by separate parameterizations in a model). Other models show a single peak as in the observations but with a far smaller CF. This could be explained by several reasons: that is a too shallow PBL, a general lack of low clouds for a given thermodynamic state, a strong masking effect by overlying high clouds or by a larger influence of a convection parameterization over that of the large-scale cloud and turbulence parameterizations that determine stratocumulus clouds.

Constraining the vertical response of low-level cloud fraction 20
In Figure 2b, we show the interannual change in CF per K of SST warming (∆CF/∆SST) based on a linear regression method between SST anomalies and CF, as described in Section 3.2. As for the mean cloud profiles, the model responses are quite 5 diverse, generating a very large variability compared to the observed STD. A group of models predict a very small change, which can be either an increase, a decrease or both at different heights. Others models simulate a large increase of CF at cloud top and a large decrease below, i.e., an upward shift rather than a cloud cover change. Finally, the remaining models reproduce the shape of observed change pretty well, that is to say a large decrease below 2 km.

10
In this study, we assume that i) the physical mechanisms that control the subtropical low-cloud response to warmer surface temperature remain identical across all time scales and ii) those mechanisms are essential to predict the correct subtropical low-cloud change in the future, although they may not necessarily be the only ones (e.g., current climate variability does not include the radiative effect of increased CO2 on cloud-top turbulence). Additional phenomena, e.g., large-scale dynamical feedbacks that differ on interannual and centennial time scales, could also mitigate or amplify the change. However, we believe 15 that the present-day interannual change in the cloud fraction (∆CF/DSST) is one important test that a model must pass to have confidence in its prediction of future climate. We therefore isolate the change of the low-cloud cover associated with a surface warming as well as the related top-of-atmosphere radiative impact for the subset of models that best reproduce the observed cloud fraction change -i.e., a large CF decrease (< -1 % K -1 ) and no significant CF cloud top increase (< +0.5 % K -1 ) (see Fig.   S2 for details). In the remainder of the manuscript, we will call this category the "constrained models" (6/14, marked with a 20 star in Table 1), represented in blue, and the other models the "unconstrained models" (8/14), represented in purple. The two GISS models fall into each category: the unconstrained category for GISS-E2 and the constrained category for the newest version, GISS-E3.
Overall, the constrained models simulate a larger cloud amount at low levels, in better agreement with CALIPSO, than the 25 unconstrained models (Fig. 2c). In addition to underestimating the low-level cloud amount and its decrease with surface warming, some unconstrained models predict low-level cloud top rising, either because of a deepening of the PBL or due to an increase of the upper cloud fraction peak (Fig. 2d). This cloud-top rising may imply an excess of trade cumuli in the presentday climate in the models having a dual-peak cloud fraction in the low levels (e.g., CCSM4-CAM4, MIROC, MRI, GISS-E2 and MPI, Fig. S2): one large peak close to the surface (stratocumulus type) and another smaller peak above (trade cumulus 30 type).

Consequences for low-cloud cover
In the remainder of the manuscript, we use star shapes in our plots to distinguish the GISS models from the other models and emphasize the effect of cloud parameterization changes with respect to interannual LCC and cloud radiative effect (CRE) changes in a GCM.

5
Based on this observational constraint, we now investigate how well the models simulate LCC in present-day climate and with a surface warming. Figure 1 shows the LCC maps for the observations and for the two model categories as well as their biases.
Although the LCC global means of GISS models are almost identical (LCCE2 = 28.5 % and LCCE3 = 28.6 %), their spatial patterns  are completely different (E2 failing to produce any stratocumulus clouds), which results in a very poor correlation factor for E2 (r = 0.11, the smallest of all 14 models) as opposed to a very good one for E3 (r=0.86, the largest of 10 all 14 models). The reader should also bear in mind that E3 cloud fraction and cloud cover are slightly underestimated in the present study because the simulator is run offline (at daily frequency), which generates lower cloud fractions and cloud covers than the inline version (not shown). The constrained models (Fig. 1h) simulate larger LCC global (and tropical) means (LCC = 30.5 %, r = 0.92), closer to the observations (LCC = 37 %), and also better reproduce the observed LCC pattern than the unconstrained models (Fig. 1j, LCC = 25.7 %, r = 0.86) and the multimodel mean (Fig. 1f, LCC = 27.8 %, r = 0.90). 15 We apply the same method as in Section 3.2 to calculate the interannual change in LCC per K of surface warming ( Figure 3a and Table 2 first column, ∆LCC/∆SST. Consistent with the cloud fraction profiles, GISS-E3, the only model being within the observation uncertainty, predicts a decrease of the LCC in response to a local 1K surface warming (-3.55 % K -1 ), like most models (12/14), as opposed to a small increase for GISS-E2 (0.22 % K -1 ). As the diffence between GISS-E2 and E3, the 20 multimodel spread is significantly large (5.4 % K -1 , Table 2), which is about two and half times greater than the absolute value of the multimodel mean (-2.25 % K -1 , Table 2). However, the constrained models simulate a ∆LCC/∆SST slightly smaller than the observation but within the observational uncertainty (-3.59 % K -1 +/-0.28 % K -1 ) and with a much-reduced spread (-3.49 % K -1 +/-1.01 % K -1 ). The observed ∆LCC/∆SST is significant as its amplitude is more than three times larger than the LCC annual standard deviation in the same dynamical regimes (1 % K -1 ). 25 It is plausible to think that ∆LCC could depend on the initial amount of LCC in a model (e.g., Brient and Bony, 2012). While the difference between GISS-E2 and GISS-E3 is not substantial, comparing this relationship for multiple versions of the GISS-E3 model (run along the course of its development) supports a relationship between ∆LCC and the present-day LCC in subsidence regions (Fig. 3b). This relationship holds regardless of whether the simulator is used or not. Except for MIROC5, 30 which simulates a present-day LCC almost as large as the observations, the constrained models simulate a larger present-day LCC in subsidence regions (consistent with what was found in Fig. 2). When MIROC5 is set aside, the correlation between the LCC and ∆LCC in Fig. 3 becomes more obvious (r = -0.57 vs. r = -0.40 for all models). One should note that the present- day LCC could be biased low in some models, due to a too strong shielding effect by overlying high-clouds compared to the observations, possibly affecting the relationship between the present-day LCC and ∆LCC. In the GISS-E3 model, the simulator does not affect ∆LCC (Fig. 3; compare red and black versions of the same symbols), despite its significant impact on the present-day LCC as hypothesized before. In addition, the relationship may be different depending on the type of clouds, since

Consequences for annual low-cloud feedbacks
In this section, we further examine the impact of cloud changes on the radiative budget for the same stratocumulus and stratocumulus-to-shallow-cumulus transition regions (over the tropical oceans and based on ⍵500), using CRE, defined as the difference between the all-sky flux minus the clear-sky flux at the TOA. Figure 4 shows the change in the SW, LW, and net CRE per K of surface warming referred to as ∆CRE/∆SST (i.e., dCRE/dSST). A positive ∆CRE/∆SST implies a warming of 10 the climate system due to clouds when the SST increases; conversely, a negative ∆CRE/∆SST implies a cooling effect. This quantity may be used as a proxy to characterize cloud feedbacks at the top of the atmosphere (TOA; e.g., Medeiros et al., 2015, Cesana et al., 2017. All observed ∆CRESW/∆SST, ∆CRELW/∆SST and ∆CRENET/∆SST are positive, a feature particularly well-captured by GISS-E3, which is surprisingly good for both the SW and LW components of the interannual feedback, while GISS-E2 gets the sign of the SW component wrong. Both constrained and unconstrained multimodel means (colored triangles) 15 get the correct sign of all three feedbacks although the sign and the magnitude of ∆CRENET/∆SST vary significantly among the models, mostly driven by the SW component, in agreement with previous studies (e.g., Medeiros et al., 2015, Cesana et al., 2017. Overall, the constrained models perform better than the unconstrained models for all three components, in terms of absolute value and variability. In particular, the unconstrained models largely underestimate the ∆CRESW/∆SST (0.73 W m -2 K -1 , Table 2 second column), compared to the observations (3.05 +/-0.28 W m -2 K -1 ) whereas the constrained models almost 20 fall within the observed uncertainty (2.60 W m -2 K -1 ).
Because of the optical properties of their spherical droplets, low-lying warm marine cloud reflect more sunlight than the underlying ocean surface. As a result, any change in LCC should affect the CRESW at TOA and one should expect a good correlation between the two quantities, which is demonstrated in Fig. 4a, with a linear correlation coefficient of -0.94 25 (excluding the outlier of the calculation). There is little correlation for the LW component whereas for the net component, the correlation is also very large (r = -0.94), driven by the shortwave radiation, confirming its crucial role in determining the cloud feedback spread of CMIP models (e.g., Andrews et al., 2012). Once again, both the magnitude and the variability of the three components is better reproduced by the constrained category of models.

30
In addition, we analyzed the sensitivity of ∆CRESW to ∆LCC by simply computing the ratio between the two quantities as in GISS-E3 stands out among the best models and replicates the observed ratio. Like GISS-E2, the unconstrained models largely overestimate the radiative impact of an LCC loss (-3.13 W m -2 % -1 compared to the observations (-0.85 W m -2 % -1 ) while the constrained models reproduced the observed relationship quite well (-0.74 W m -2 % -1 ). The inability of the unconstrained models to simulate a sufficient amount of LCC in the present-day climate may generate a lack of outgoing SW radiation at TOA, which is compensated by artificially increasing the reflectivity of the clouds during the tuning process in some modeling 5 centers (e.g., Nam et al., 2012).
The constrained models all generate large stratocumulus decks along with a substantial amount of tropical low clouds in nonstratocumulus regions, which seems key to simulating the correct global response of low clouds to surface warming. This behavior is likely due to the fact that they simulate moist processes in the PBL by either turbulence (e.g., GISS-E3, CESM1-10 CAM5, GFDL AM3, hadGEM2A, CanAM4), convection (IPSL5B) or both parameterizations (hadGEM2A), in addition to having a stratocumulus decks. This becomes more evident when looking at the evolution of individual models. For example, implementing a more physically-based "moist" turbulence parametrization (following Bretherton and Park, 2009) in the GISS-E3 model changes the sign of ∆LCC/∆SST and ∆CRESW/∆SST and brings the model results within the range of uncertainty of the observations. Similarly, the changes in the IPSL model from version 5A to 5B significantly improved its simulation of the 15 ∆LCC and ∆CRESW quantities most likely because its "dry" PBL was effectively turned into a "moist" PBL through the implementation of moist shallow convection within the PBL (Rio and Hourdin, 2008), which improved their wind profiles and PBL height (Hourdin et al., 2013), combined with a revision of their turbulence scheme, which improved their representation of stratocumulus clouds. However, the MPI "moist-PBL" model does not fall into the constrained category. Even though its results are quite close to the observations, the clear overestimation of the cloud frequency above 2.16 km (Fig. S2, likely trade 20 cumulus clouds) alters its ∆CF and leads to a sensitivity of ∆CREsw to ∆LCC that is too strong. Conversely, the BCC "dry-PBL" model captures ∆LCC and ∆CREsw variations pretty well (within the range of the constrained models) although its ∆CF is unrealistic. Therefore, the capacity of the models to replicate the observed response of low-level clouds and radiation to warmer surface temperature seems to be tied to whether or not i) they simulate moist processes in the PBL and ii) their turbulence scheme sustains stratocumulus clouds. Such results also demonstrate that a simple 2D description of the cloud 25 properties -i.e., as seen from space-borne passive sensors -is not sufficient to fully understand and predict how cloud may react to surface temperature forcings and further requires information on the vertical structure of clouds.

Discriminating trade cumulus from stratocumulus clouds
Given the different factors controlling cumulus and stratocumulus clouds, one could expect a different response of each type of cloud to a surface temperature perturbation. This is further supported by the diverse behavior of modeled ∆LCC/∆SST, 30 which is correlated with the ability of the models to produce a large amount of stratocumulus or not in the present climate. To verify this, we determine the ∆LCC/∆SST of trade-cumulus (∆LCCTrCu/∆SST) and stratocumulus-dominated regions (∆LCCSc/∆SST) and their associated ∆CRESW/∆SST. Distinguishing cumulus from stratocumulus clouds is particularly challenging in the observations. Climatologically the two cloud types can be separated using k-means clustering of optical thickness-cloud top pressure histograms over GCM grid-sized areas (Chen and Del Genio, 2009), although instantaneous errors can arise, e.g., from overlying clouds. In the PBL, as the inversion strength increases, the moisture tends to increase, leading to larger cloud fractions (e.g., Klein and Hartmann, 1993). 5 This phenomenon explains why lower tropospheric stability (LTS), defined as the difference between potential temperature at 700 mb and the surface, is well correlated with LCC in the observations, over the tropical oceans (e.g., Klein and Hartmann, 1993;Wood and Bretherton, 2006). We verified this relationship using LTS derived from the ERAI reanalysis and CALIPSO-GOCCP LCC. The correlation between the two quantities is 0.65 but decreases when limited to larger LTS values (0.51 for LTS > 15 K, 0.38 for LTS > 17 K). We tried to use LTS-based thresholds to separate stratocumulus decks from other low-10 level clouds but the method does not work well for monthly climatology (not shown). Besides, only a few models have high LTS-LCC correlations (4/14 larger than 0.6), and for those, the larger LTS do not match the stratocumulus areas. Note that using convective and stratiform cloud fraction (often separated in GCMs) would solve this problem on the model side but such partitioning is not provided in the CMIP5 archive.

15
Instead, we focus on eight specific regions (Fig. S3) that have been identified as being dominated by either stratocumulus clouds (Sc) or trade cumulus clouds (Cu) in previous studies (e.g., McCoy et al., 2017). This method does not allow a model evaluation as the models may not be able to simulate the correct type of clouds in these regions, regardless of their ability to reproduce the response of each type of cloud to SST variability. Therefore we focus our analysis on observations only. In the literature, all studies referenced before but (Brient and Schneider, 2016) exclusively used passive sensor to derive ∆LCC/∆SST 20 composites. In contrast, we use CALIPSO-GOCCP, which has a shorter time record and poorer sampling but a greater sensitivity to trade cumulus clouds due to both its narrower horizontal footprint and its better instrument sensitivity to liquid cloud particles. However, because we are using different methods and regions in our study, we included two ISCCP (Qu et al., 2015 andPincus et al., 2012)  dataset includes the cloud fraction from cloud retrievals, which are the cloud fraction used to derive MODIS cloud properties in the collection 5.1 product. These are pixels that are entirely filled with clouds. On the other hand, the second dataset contains the cloud fraction from the so-called MODIS mask, which includes partially cloud-filled pixels (Pincus et al., 2012;Platnick et al., 2003). Here we used 15 years of data from 2001 to 2016. Figure 5 shows the ∆LCC/∆SSTs for all datasets using the same cloud regimes based on ⍵500, and the same four trade cumulusand four stratocumulus-dominated regions (Fig. S3). Consistent with previous studies (e.g., McCoy et al., 2017;Myers and Norris, 2017;Qu et al., 2015), all datasets agree on a decrease of the LCC for increasing SSTs. The decrease still occurs but to a smaller extent (~ 20 % smaller) when the Estimated Inversion Strength (EIS, Wood and Bretherton, 2006), another supposed low-cloud controlling factor, is held constant (cf., Klein et al., 2017). The overall magnitude of the change is larger 10 in CALIPSO (-3.59 % K -1 ) than in the passive sensor datasets (-1 to 2.95 % K -1 , Table 3). When only the stratocumulus regions are considered, all datasets show a larger decrease of the LCC than for the trade cumulus clouds or all tropical low clouds.
This may suggest that the overall behavior of clouds is controlled by the Cu (i.e., Bony and Dufresne, 2005), which supposedly cover a larger area of the tropics, although it does not guarantee it as we do not know for sure what type of clouds cover what part of the tropics from the observations. The difference between ISCCP and GOCCP is also relatively smaller in Sc regions 15 (∆LCCGOCCP,Sc/∆SST = -5.32 % K -1 vs. ∆LCCISCCP,Sc/∆SST = -5.22 and -6.06 % K -1 for the two ISSCP products described above) than in the Cu regions (∆LCCGOCCP,Cu/∆SST = -3.62 % K -1 vs. ∆LCCISCCP,Cu/∆SST = -2.31 and -1.4 % K -1 , Table 3).
Without the east coast of Peru region, MODIS observations also agree well with ISCCP and GOCCP ∆LCCSc (within 15%, not shown) although the MODIS mask cloud cover remains smaller for the most part than all other datasets regardless of the regions selected. 20 Finally, it is also worth mentioning that the sensitivity of CRE at TOA per unit change in cloud fraction is significantly smaller in magnitude for trade cumulus clouds than for stratocumulus clouds in the three satellite estimates, including CALIPSO-GOCCP (-0.44 W m -2 % -1 vs. -1.34 W m -2 % -1 ), consistent with the fact that trade cumulus clouds are less reflective than stratocumulus clouds (Table 3). Even though differences in TOA CRE would emerge if one could use CERES-like 25 observations at the CALIPSO horizontal resolution, these biases would remain small (e.g., Ham et al., 2015). In addition, we document the cloud opacity in the two regions using the ratio of opaque cloud cover (fully attenuating the lidar, Guzman et al., 2017) to the total cloud cover, Ropacity. We find that the stratocumulus Ropacity (75.9 %) is 50 % larger than that of trade cumulus regions (50.6 %), confirming the larger optical thickness of clouds in the stratocumulus regions than in the trade cumulus regions. Overall, all passive sensor estimates of this quantity are larger in magnitude than that of CALIPSO-GOCCP 30 (-0.85 W m -2 % -1 vs. -1.11 to -3.26 W m -2 % -1 , Table 3), even more so in the trade cumulus regions.
Finally, the vertical response of the CF to a surface warming (∆CFall, ∆CFSc, ∆CFCu) is shown in Figure 6b while in the trade cumulus regions, the cloud top does not change and the decrease is significantly smaller (green line). Note that because our Sc and Cu are defined by regions rather than actual cloud types, we cannot distinguish between an actual rising of Sc cloud tops and a transition from lower-topped Sc to higher-topped Cu; both may contribute to the behavior of the ∆CFSc profile. Depending on low-level cloud top height and the type of cloud, the effect of a surface warming is therefore different, generating a small decrease for "higher" low-level clouds (Cu) as compared to a larger decrease of the "lower" low-5 level clouds (Sc) along with an increase of their cloud top height. Thus, favoring one cloud type over the other in the models may result in either an overestimate (too many Sc) or an underestimate of the ∆LCC/∆SST (too many Cu). Because the overall ∆LCC/∆SST and the height of the CF change (Fig. 2) are underestimated by most models, it is likely that models do not simulate enough stratocumulus clouds. We also investigate how well GISS-E3, a model that can produce a decent spatial pattern and amount of low clouds as well as a correct ∆CF, perform against the observations. GISS-E3 ∆CFCu (green dotted 10 line) is quite well captured by GISS-E3 whereas ∆CFSc (purple dotted line) is underestimated (its magnitude being out of the observed STD) and the Sc cloud top lifting is not reproduced. The overall ∆CF (orange dotted line) peaks slightly too high with a too small magnitude compared to the observations. Therefore, GISS-E3 likely underestimates the amount of Sc clouds compared to Cu, supporting the aforementioned hypothesis, in addition to slightly underestimating the amount of all types of clouds in the present-day climate (Fig. 6a). The good agreement of the total ∆CREsw with the observations results from 15 compensating errors between an overestimated Cu ∆CREsw and underestimated Sc ∆CREsw although the corresponding ∆LCCs are well simulated. This suggests that the regional GISS-E3 radiative effect of low-clouds is likely not well captured, in accordance with a long-standing problem in GCMs (e.g., Nam et al., 2012).

Conclusion
In response to interannual surface warming, the marine tropical low cloud cover (LCC) as observed by the active sensor from 20 the CALIPSO satellite over a 10-year period significantly decreases (∆LCC/∆SST = -3.59 % K -1 ). This reduction of the LCC is larger than that found using results from passive sensor satellites (∆LCC = -1 to -2.95 % K -1 ), albeit consistent in terms of sign and magnitude (e.g., McCoy et al., 2017;Qu et al., 2015;Seethala et al., 2015). Overall, the ensemble mean of CMIP5 models captures the sign and the shape of the observed interannual low-cloud cover change (∆LCC/∆SST) quite well.
However, its magnitude is underestimated and the model variability is large (∆LCC/∆SST = -2.25 ±1.58 % K -1 ), with some 25 models (2 out of 14) even producing the wrong sign (a gain instead of a loss).
When scrutinized as a function of height, the interannual cloud fraction change (∆CF/∆SST) in the lower levels reveals various behaviors, which depend on the type of cloud and its height. We further show that it is possible to separate the model responses to SST variations using CALIPSO observations of the vertical cloud fraction (∆CF/∆SST) as a constraint: we select the GCMs 30 that produce the most realistic change in cloud profile per K of SST warming, referred to as "constrained" models. By doing so, we find that the "constrained" models, including the latest version of the GISS model (GISS-E3), simulate a more realistic Atmos. Chem. Phys. Discuss., https://doi.org/10.5194/acp-2018- behavior of low-level cloud fraction and their associated interannual radiative feedbacks (∆CRESW/∆SST) together with a smaller variability in response to a surface warming. Their averaged ∆LCC/∆SST is within the observed uncertainty while they slightly underestimate the ∆CRESW/∆SST. Meanwhile, the "unconstrained" category, which includes the CMIP5 version of the GISS model (GISS-E2), fails to reproduce the observed magnitude of both quantities by a factor of 3 to 4. The fact that models that simulate moist processes within the PBL produce sustainable stratocumulus decks appears crucial to replicate the 5 observed relationship between cloud, radiation and surface temperature.
Separating clouds between stratocumulus and trade cumulus categories helps us better quantify their contribution to global tropical low-level cloud change. The vertical structure of the change is indeed different in regions dominated by stratocumulus clouds than in those dominated by cumulus clouds. Over the stratocumulus regions, the cloud top increases slightly, 10 accompanied by a large decrease of the cloud fraction below, whereas over the trade cumulus regions, cloud fraction decreases to a smaller degree, but over its full vertical extent. As a result, the cloud cover change per unit SST change is smaller over trade cumulus regions than over stratocumulus regions (∆LCCCu/∆SST = -3.62 % K -1 compared to ∆LCCSc/∆SST = -5.32 % K -1 ). Passive sensor observations confirm this result; although their overall ∆LCC/∆SST is consistently smaller regardless of the SST dataset used (Fig. S4), mostly attributable to trade cumulus regions where passive sensors are less sensitive to broken 15 cumulus. However, the derived slopes for trade cumulus and stratocumulus from active and passive methods are within the measurement uncertainty and cannot formally be distinguished (Fig. S4).
Finally, a region-based evaluation of the GISS-E3 model suggests that producing realistic global ∆CF, ∆LCC and ∆CRE may be the result of compensating errors between the Sc-dominated and Cu-dominated regions. However, it is difficult to determine 20 with certainty whether the model is biased or not as we discriminate these cloud types by regions and not by actual type with the method used in this study. Future work will focus on developing a method to discriminate stratocumulus from trade cumulus clouds in satellite-based observations. By doing so, we will be able to assess the spatial distributions of these clouds and to evaluate the models more precisely. In addition, refining the contribution of additional cloud controlling factors may advance our understanding of physical processes driving the change of cloud fraction in response to a warming climate. 25

Data availability
The GISS-E3 simulations can be made available upon request; the final version of GISS-E3 will be made part of the CMIP6