Articles | Volume 24, issue 22
https://doi.org/10.5194/acp-24-13025-2024
https://doi.org/10.5194/acp-24-13025-2024
Research article
 | 
26 Nov 2024
Research article |  | 26 Nov 2024

Analysis of the cloud fraction adjustment to aerosols and its dependence on meteorological controls using explainable machine learning

Yichen Jia, Hendrik Andersen, and Jan Cermak
Abstract

Aerosol–cloud interactions (ACI) have a pronounced influence on the Earth's radiation budget but continue to pose one of the most substantial uncertainties in the climate system. Marine boundary-layer clouds (MBLCs) are particularly important since they cover a large portion of the Earth's surface. One of the biggest challenges in quantifying ACI from observations lies in isolating adjustments of cloud fraction (CLF) to aerosol perturbations from the covariability and influence of the local meteorological conditions. In this study, this isolation is attempted using 9 years (2011–2019) of near-global daily satellite cloud products in combination with reanalysis data of meteorological parameters. With cloud-droplet number concentration (Nd) as a proxy for aerosol, MBLC CLF is predicted by region-specific gradient boosting machine learning (ML) models. By means of SHapley Additive exPlanation (SHAP) regression values, CLF sensitivity to Nd and meteorological factors as well as meteorological influences on the Nd–CLF sensitivity are quantified. The regional ML models are able to capture, on average, 45 % of the CLF variability. Based on our statistical approach, global patterns of CLF sensitivity suggest that CLF is positively associated with Nd, particularly in the stratocumulus-to-cumulus transition regions and the Southern Hemispheric midlatitudes. However, Nd retrieval bias may contribute to non-causality in these positive sensitivities, and hence they should be considered upper-bound estimates. CLF sensitivity to estimated inversion strength (EIS) is ubiquitously positive and strongest in tropical and subtropical regions topped by stratocumulus and within the midlatitudes. Globally, increased sea-surface temperature (SST) reduces CLF, particularly in stratocumulus regions. The spatial patterns of CLF sensitivity to horizontal wind components in the free troposphere may point to the impact of synoptic-scale weather systems and vertical wind shear on MBLCs. The Nd–CLF relationship is found to depend more on the selected thermodynamical variables than dynamical variables and in particular on EIS and SST. In the midlatitudes, a stronger inversion is found to amplify the Nd–CLF relationship, while this is not observed in the stratocumulus regions. In the stratocumulus-to-cumulus transition regions, the Nd–CLF sensitivity is found to be amplified by higher SSTs, potentially pointing to Nd more frequently delaying this transition in these conditions. The expected climatic changes in EIS and SST may thus influence future forcings from the CLF adjustment. The novel data-driven framework, whose limitations are also discussed, produces a quantification of the response of MBLC CLF to aerosols, taking into account the covariations with meteorology.

1 Introduction

The emission of aerosols into the atmosphere affects the Earth's climate in particular by masking part of the warming effect from greenhouse gases by reflecting solar radiation and changing cloud properties. Aerosol–cloud interactions (ACIs) can strongly influence the Earth's energy distribution and thus also contribute a substantial uncertainty to past and future climate projections. The effective radiative forcing due to ACI (ERFaci) is assessed to be 1.0 W m−2, with an uncertainty range of 1.7 to 0.3 W m−2 (Forster et al.2021) albeit decades of effort and headway have been made in understanding the complex system of aerosols, clouds, and their environmental controls. The correct representation of ACI in Earth system models (ESMs) remains a tremendous challenge because of the lack of accurate global quantification of the cloud-related fine-scale processes and the lack of larger-scale constraints from the existing measurement systems at the ESM spatiotemporal resolution (Fan et al.2016; Seinfeld et al.2016; Sato et al.2018).

Marine boundary-layer clouds (MBLCs) cover over 23 % of the global ocean surface (Wood2012). Due to relatively small temperature differences between MBLC top and the sea surface, they only weakly impact outgoing longwave radiation but greatly reflect incoming shortwave radiation, leading to a strong net cooling effect (Hartmann et al.1992). MBLCs play a critical role in the Earth's radiative balance (Zheng et al.2021) and, in this regard, are the most important cloud type (Chen et al.2014). Furthermore, MBLCs are especially susceptible to aerosol perturbations due to their relatively low optical depths (Turner2007; Leahy et al.2012) and their formation in environments typically characterized by lower anthropogenic aerosol loading than continental clouds (Platnick and Twomey1994). Therefore, a deeper understanding of the aerosol–MBLC interactions is crucial to reduce the uncertainties in climate predictions. Atmospheric aerosols are critical for the formation of clouds as cloud condensation nuclei (CCN). Increases in aerosols are associated with increases in cloud-droplet number concentration (Nd). As the cloud water is distributed among more droplets, cloud-droplet effective radius (re) shrinks at constant liquid water content, resulting in an enhancement of cloud brightness and a negative instantaneous radiative forcing (Twomey1977). The likelihood of collision and coalescence subsequently decreases due to smaller drop sizes, hampering rainfall formation, which can prolong cloud lifetime and thus increase cloud fraction (CLF) (Albrecht1989). However, the aerosol–CLF relationship is complex, and the sign of the CLF adjustment can also be the opposite. This has been found in particular for non-precipitating clouds, stemming from enhanced entrainment mixing with ambient air over the clouds owing to shorter evaporation timescales (Wang et al.2003; Jiang et al.2006; Small et al.2009) or reduced sedimentation (Ackerman et al.2004; Bretherton et al.2007) because of smaller droplet sizes.

From the perspective of observations at satellite scales, though there are studies suggesting a negative relationship between aerosols and CLF (Dey et al.2011; Small et al.2011), it has been documented by multiple studies that the overall CLF increases in response to increasing aerosols (e.g. Kaufman and Koren2006; Yuan et al.2011; Gryspeerdt et al.2016; Christensen et al.2017; Andersen et al.2017; Fuchs et al.2018; Rosenfeld et al.2019; Christensen et al.2020). Likewise, studies based on ESMs reported substantial negative ERFaci due to liquid water path (LWP) and CLF adjustments (e.g. Zelinka et al.2014). In spite of the attribution of such adjustments in ESMs primarily to LWP adjustments (Ghan et al.2016), a global satellite-based study by Bender et al. (2019) suggested that LWP adjustments are overestimated in ESMs and that aerosol impact on CLF dominates the negative aerosol forcing. This is supported by observational evidence presented by Toll et al. (2019), who also reported an overestimation of LWP adjustment in climate models, and by Y. Chen et al. (2022), who recently highlighted the role of CLF increases due to aerosols from a large volcano eruption as the main cause of the associated forcing. Some large-eddy simulations have, however, suggested a negative response of CLF of trade wind cumulus to aerosol perturbations (Xue and Feingold2006; Seifert et al.2015). While most studies, from both observational and model points of view, are in agreement that generally CLF increases with increasing aerosols due to a prolonged lifetime (Douglas and L'Ecuyer2022), the magnitude of the response of CLF to aerosols and its corresponding adjustments are still highly uncertain. For satellite-based analyses, one of the most challenging aspects in the quantification of CLF adjustment is isolating the influence of the aerosol loading on cloud properties from confounding covariations with meteorological parameters (Andersen et al.2016; Gryspeerdt et al.2019; Bellouin et al.2020) paired with aerosol retrieval issues related to aerosol swelling and 3D radiative effects in the vicinity of clouds (Loeb and Schuster2008; Schwarz et al.2017). Recent observational studies have utilized different methods to tackle this issue. A first approach is to stratify the data by meteorological factors, therefore accounting for local meteorology in the relationships (e.g. Su et al.2010; Chen et al.2014; Andersen and Cermak2015). Secondly, using Nd as a mediating variable was proposed by Gryspeerdt et al. (2016) to analyse the causal pathway between aerosol optical depth and CLF. Another approach is to use a sampling strategy that applies a cloud–aerosol pairing algorithm (Christensen et al.2017). However, these methods do not account for aerosol retrieval issues, meteorological influencing factors, and confounders at once, which is essential to constrain the CLF adjustment. Recently, several studies have successfully used machine learning (ML) to account for non-linearities and meteorological factors to quantify ACI (Andersen et al.2017; Fuchs et al.2018; Dadashazar et al.2021; Zipfel et al.2022). ML regression algorithms allow for the prediction of CLF (predictand) on the basis of aerosol and meteorological factors at the same time and treat the aerosol–cloud–meteorology system as a whole. In addition, ML models can represent non-linear interactive systems, which can be analysed in sensitivity analyses with explainable ML techniques. Explainable ML refers to the techniques explaining the predictions of a trained ML model by explicitly quantifying the relationships, which helps improve the understandability, transparency, and trustworthiness of the ML models (Beucler et al.2023).

In this study, we set up region-specific ML models at a global scale using satellite and reanalysis data sets to predict CLF to analyse Nd-induced changes in MBLCs. The goal of the explainable ML framework is to quantify the global sensitivity patterns of CLF to Nd and meteorological factors. In addition, we aim to estimate the magnitude of the dependence of Nd–CLF sensitivity on the meteorological factors using SHapley Additive exPlanation (SHAP) interaction values, providing a new and insightful pathway to more profound knowledge of the physical processes relevant to the CLF adjustment and, hence, to a global constraint on aerosol-induced CLF changes accounting for meteorological covariations. The hypothesis of this study is that the response of cloud fraction of MBLCs to aerosol perturbations is positive but buffered, i.e. reduced or amplified, by ambient meteorology and that both the sensitivities and the interactions with meteorological factors have distinct regional patterns.

2 Data and methods

2.1 Data sets

This work combines 9 years (2011–2019) of satellite retrievals from Moderate Resolution Imaging Spectroradiometer (MODIS) and reanalysis data from the European Centre for Medium-Range Weather Forecasts (ECMWF) from 60° N to 60° S. In this study, MBLCs are defined as single-layer warm cloud fields with cloud top temperatures higher than 268 K. To achieve this, the information on CLF (Cloud_Retrieval_Fraction_1L_Liquid product), re (Cloud_Effective_Radius_1L_Liquid_Mean product), cloud optical depth (τc; Cloud_Optical_Thickness_1L_Liquid_Mean product), cloud top temperature (CTT; Cloud_Top_Temperature_Mean product), and satellite viewing geometry are obtained from MODIS level-3 collection-6.1 atmosphere daily products on the Terra platform (MOD08_D3), which are gridded into 1° × 1° globally from level-2 atmospheric products. CLF serves as the predictand in this study. The computation of Nd relies on τc and re, with filtering criteria based on CTT, solar zenith viewing angle, and satellite zenith angle, as elaborated in the following.

The equation used to calculate the MODIS Nd is from Quaas et al. (2006) depends on the retrievals of re and τc and so do the uncertainties in the errors propagated from re and τc:

(1) N d = α τ c 0.5 r e - 2.5 ,

where α=1.37×10-5m−0.5 is a constant related to adiabatic growth rate. The uncertainties in Nd retrievals are exhaustively evaluated by Grosvenor et al. (2018), which suggests that the uncertainties in averaged Nd over a 1° × 1° grid box (spatial resolution of the MODIS products used in this study) decrease by over 50 % compared to pixel-level uncertainties. This derivation approach relies on the assumed adiabaticity in global marine warm clouds where liquid water content and re increase monotonically and Nd is distributed as constant vertically. Departure from the adiabatic assumption (e.g. due to entrainment) would result in Nd retrieval biases (Merk et al.2016; Bennartz and Rausch2017). The uncertainty related to the estimation of Nd from MODIS also depends on liquid CLF. Nd is less biased in the regions of larger CLF, where clouds are more homogeneous, while in the regions with lower CLF Nd retrievals are sparser and less reliable (Grosvenor et al.2018; Zhu et al.2018). In such heterogeneous cloud fields, subpixel effects in the retrieval of re can negatively bias the retrieved Nd values (Zhang and Platnick2011; Zhang et al.2012; Grosvenor et al.2018). Such retrieval biases could cause a bias in the Nd–CLF relationship as well. Furthermore, the interpretation of the causal effect of Nd on CLF can also be obscured by small-scale sampling issues. In particular, apart from the retrieval errors in re and τc, the natural spatial variability in cloud fields can also propagate to the Nd estimate and distort the Nd–CLF relationship (Arola et al.2022; Liu et al.2024).

Following the screening criteria for a more reliable Nd demarcated by Gryspeerdt et al. (2022), only clouds restricted to a single layer in the liquid phase with a CTT higher than 268 K are considered. As suggested by Quaas et al. (2006), samples with re<4µm and τc<4 are excluded to cope with the high re retrieval uncertainties at low τc. In addition, solar and sensor viewing zenith angles respectively greater than 65° and 55° are removed to avoid the large biases in re and τc retrievals (as in Grosvenor et al.2018). The pixels selected according to the above sampling strategies generate more reliable Nd estimates.

Atmospheric and oceanic variables are taken from the fifth-generation ECMWF atmospheric reanalysis of the global climate (ERA5) at an hourly frequency (Table 1) (Hersbach et al.2020). The ERA5 data sets are harmonized to fit the level-3 MODIS data by first being resampled to 1° × 1° from their default 0.25° × 0.25° spatial resolution using bilinear interpolation; they are subsequently collocated with Terra MODIS by extracting hourly data to align with the UTC overpass times of the Terra satellite for each grid cell, yielding a spatiotemporally matched MODIS-ERA5 combined data set for training the ML models. For Nd retrievals, only samples within 1st–99th percentiles are retained to exclude potential unrealistic outliers from re and τc retrievals (Zipfel et al.2022). Furthermore, the explanation of ML models in this study relies on using linear regressions to capture the distribution of individual prediction instances, and the extreme values may excessively magnify or reduce the sensitivity or interactive effects quantified by SHAP (shown in Fig. 1 and discussed in Sect. 2.3.2). The threshold of 1st–99th percentiles for each predictor is thus adopted to remove the values at the very tails of the specific distribution and to improve the robustness of the estimated sensitivities. To define the sensitivities of CLF and the interactive effects of meteorological factors, the natural logarithm of Nd is taken (see Sect. 2.3.2 in detail). Estimated inversion strength (EIS) is calculated based on the formulation from Wood and Bretherton (2006), and in this study, it is dependent only on atmospheric temperatures at 700 hPa and at the level of 1000 hPa.

All input predictors for each Extreme Gradient Boosting (XGB) model (i.e. for each 5° × 5° window aggregated from 1° × 1° grid boxes, as detailed in Sect. 2.2) are standardized by centring around the mean and scaling to have unit variance as in Scott et al. (2020). Hamby (1994) suggested that the standardization process is a standard practice when aiming for comparability of sensitivity estimates across predictors. This process eliminates the influence of units and aligns data on the same scale instead of the original natural ones, thereby ensuring the comparability of the quantified sensitivities and interactive effects with meteorology among different variables. This standardization procedure has been applied in other studies investigating different cloud sensitivities to various cloud-controlling factors (e.g. Ceppi and Nowack2021; Andersen et al.2023). This procedure, however, may result in reduced spatial comparability due to variations in mean and standard deviation values across different 5° × 5° windows. To assess the trade-off between comparability among different predictors and comparability in space, we provide results without standardization in the Supplement (Figs. S2 to S7 therein) as done by Grise and Kelleher (2021). In terms of spatial patterns, the results are nearly identical to their corresponding ones presented in the following sections of the main text, suggesting that standardizing the data based on the local mean and standard deviation for each window has only a small impact on comparability across each window. Therefore, we primarily benefit from achieving comparability among different predictors while making only a minor compromise in spatial comparability.

Table 1Summary of the predictors from ERA5 reanalysis.

Download Print Version | Download XLSX

2.2 Machine learning model setup

Extreme Gradient Boosting (XGB) is a distributed tree boosting algorithm aiming to provide a scalable, portable, and flexible library under the gradient boosting framework (Chen and Guestrin2016). The state-of-the-art XGB algorithm can be implemented efficiently in Python and has been recently used to study clouds and ACI (Andersen et al.2022; Douglas and L'Ecuyer2022). As an extension of previous gradient boosting methods, XGB has incorporated regularization techniques which help prevent overfitting and improve model generalization. Besides, the subsampling on training subsets and column (feature) subsampling techniques can shorten the running time and also avert overfitting and hence elevate model performance (Chen and Guestrin2016). Relevant regularization and subsampling hyperparameters are tuned using Bayesian optimization to determine the best combination; see Table 2 for the search space.

Table 2Overview of the hyperparameters tuned for regional Extreme Gradient Boosting models using Bayesian optimization.

Download Print Version | Download XLSX

Data from 2011 to 2016 are used for training and data from 2017 to 2019 for testing (ratio of independent train to test split of about 67 % / 33 %). By chronologically splitting the training and test sets without random shuffling, we ensure that the training data does not see future information and the autocorrelation in data does not lead to overoptimistic evaluation of the model's performance (Beucler et al.2023; Kapoor et al.2023). As suggested by Karpatne et al. (2017), a single ML model may not perform well across all regions due to the heterogeneity of relevant processes. Therefore, data at a 1° × 1° spatial resolution are aggregated into 5° × 5° geographical windows, where an individual independent XGB model is trained and tested for each “window”. Hereby, a region-specific ML framework is established to potentially capture regional relationships and characteristics and thus the regional patterns of CLF adjustment. The coarser 5° × 5° spatial resolution of the modelling grid increases the sample size by a factor of  25, which is helpful to establish robust sensitivity estimates. In addition, at the spatial resolution of 1° × 1° summarized in 5° × 5° windows, the spatial scale is adequate for ACI sensitivity estimation (Grandey and Stier2010). To ensure a sufficient data amount for training and testing the XGB models, only the geographical windows with over 6000 available data points are retained. Consequently, 34 out of 1190 oceanic windows have been excluded. These windows located between 47.5° W and 122.5° E and 52.5 and 57.5° S in the Southern Ocean (Fig. 2) contain fewer than 6000 valid samples due to the screening for Nd retrievals. For each model, the hyperparameters are tuned by implementing Bayesian optimization, which uses a Gaussian process prior distribution over hyperparameters to initialize a probabilistic model for the objective function to be optimized. After the initialization, the probabilistic model is updated iteratively, and Bayesian optimization suggests the optimal combination of hyperparameters to try for the next iteration according to the previous one and samples gathered from the search space (Table 2) (Snoek et al.2012). Each iteration is evaluated by five-fold cross-validation using the root mean square error (RMSE) as score. The number of boosting rounds (the number of trees) for each XGB model is then determined by the early stopping technique to further avoid overfitting; i.e. the training of the model stops early once it is monitored, so the score of cross-validation does not improve within 20 iteration rounds.

2.3 Explaining the machine learning models

2.3.1 SHapley Additive exPlanation (SHAP) values

SHAP values were proposed by Lundberg and Lee (2017) on the basis of cooperative game theory to explain the outputs of ML models. The SHAP approach has been implemented with XGB in Python, and it has been reported that outputs from XGB models with various number of trees can be well explained by the SHAP framework in different subject areas (e.g. Padarian et al.2020; Lundberg et al.2018, 2020; Kim et al.2021; Li et al.2022). The contribution of a predictor value to a specific model prediction is calculated as the difference between the predictions of the model in the presence and absence of this particular predictor for all possible combinations of predictor values. Since this is performed at a “local” level (i.e. for this specific instance's prediction), it allows for insights into how a certain model outcome is achieved, thereby complementing more traditional “global” (considering all instances) feature importance measures (e.g. partial dependence plot).

The base value in the context of SHAP values is what would be predicted in the absence of any feature information (Lundberg and Lee2017), and it is typically computed as the average of all predictions by ML models over the entire training data set. Positive (negative) SHAP values indicate that the specific feature value increases (decreases) the prediction compared to this base value. In other words, the base value serves as the reference point against which the contributions of individual features are measured. SHAP values for all features always sum up to the difference between the base value and the final model prediction so that SHAP values are additive and internally consistent. The base value could be analogous to the climatological CLF for a given geographical window, assuming no information about the input parameters is known. In this context, the SHAP values of input features indicate the extent to which knowing information about each feature value would deviate the prediction from the climatological CLF (base value).

Furthermore, the quantification of the influence of meteorology on the Nd–CLF relationship can be analysed using SHAP interaction values, which are an extension of SHAP values. They measure the difference between the SHAP values for a feature when another (secondary) feature is included versus when it is not included, offering a potential tool for insights into feature interactions captured by the tree ensembles. SHAP values have been applied to study atmospheric aerosols in the context of air pollution Stirnberg et al. (2021) and have been used by Zipfel et al. (2022) to explore satellite-observed Nd–LWP relationship in MBLCs in the southeast Atlantic, finding that meteorological variables have considerable influences on the Nd–LWP relationship using SHAP interactive values. Moreover, the use of SHAP interaction values in these studies allows for a more profound and in-depth comprehension of the underlying processes with respect to local meteorology. SHAP values provide insights into the behaviour of the XGB models, and as all statistical/ML models, they may not necessarily reflect real-world physical causality. Nevertheless, this state-of-the-art technique allows us to account for meteorological covariations when deriving sensitivities and to appraise to what extent the meteorological predictors interact with and influence the Nd–CLF relationship beyond traditional global-level feature attributions.

2.3.2 Quantification of sensitivities and interactive effects

Figure 1 is an exemplary graph for a regional XGB model at a specific 5° × 5° window (27.5–32.5° S, 122.5–127.5° W). SHAP values and SHAP interaction values are used to explain this XGB model and to quantify and isolate the CLF sensitivity to Nd and the interactive effects of meteorological factors (here sea-surface temperature, SST). Each dot in Fig. 1 represents an individual data instance (i.e. a single observation at a specific grid cell and time step) and shows how individual Nd or lnNd values impact the CLF prediction.

Plotting SHAP values of Nd against Nd values without the standardization process (Fig. 1a) for each data sample illustrates that increased Nd values lead to an increase in the predicted CLF, while the rate of the increase (dSHAP /dNd) drops with Nd as shown by the orange line. For each 20 cm−3 wide bin of Nd, dSHAP /dNd is calculated as the slope of the linear regression between Nd and Nd SHAP values. The non-linear positive association between Nd and predicted CLF aligns well with findings of prior studies (e.g. Gryspeerdt et al.2016; Rosenfeld et al.2019) that the aerosol impact on CLF saturates at relatively high aerosol loading. This relationship also resembles the one reported by Yuan et al. (2023), which is attributed to the precipitation suppression effect due to a relatively high Nd.

Expressing the sensitivity logarithmically in Nd is ideal because cloud processes are prone to respond to a relative change in Nd rather than an absolute one (Carslaw et al.2013; Bellouin et al.2020). Furthermore, the log-transformed Nd facilitates the application of simple linear regressions to capture the relationship between the contribution of Nd and the predicted CLF (Nd SHAP values) and its feature values. As depicted in Fig. 1b, the contribution of lnNd to the predicted CLF increases almost linearly with a rising lnNd. Thus, the CLF sensitivity to Nd is estimated as the slope of the linear regression between lnNd SHAP values and lnNd values (0.098 CLF σ−1). A similar method to estimate sensitivity has also been used by Li et al. (2022), where it is also suggested that this method can enhance the robustness of the sensitivity estimation. Because it can leverage the benefits of an XGB model, including bagging techniques and no need for distribution assumptions, along with the advantages of SHAP, which provides global interpretations consistent with local explanations (Lundberg et al.2020; Molnar2022). It should be noted that the notably linear relationship in Fig. 1b does not hold across all geographical windows. Figure S1 displays additional exemplary windows where the relationships exhibit less linearity. Our approach also captures non-linearity in the system; in these cases, the linear regression helps decrease the convolved relationships as in Gryspeerdt et al. (2016). Note that unlike Nd (cm−3) in panel (a), lnNd and SST in (b) and (c) have been standardized, and thus sensitivities and interaction indices (IAIs) are expressed with the unit of cloud fraction change per standard deviation (CLF σ−1). Standardizing all predictors ensures that the results become comparable across all of them. We also present the SHAP dependence plots for the same example window in Fig. S2 where non-standardized lnNd and SST are used to plot panels (b) and (c). The patterns are alike and only the magnitudes of the example sensitivity and IAI are different because they are no longer expressed on a physical scale.

The vertical dispersion around the lnNd–CLF relationship captured by the SHAP dependence plot is due to the dependence of the lnNd contribution to the predicted CLF on meteorological factors (e.g. SST) in the model, which is captured by SHAP interaction values, as displayed in Fig. 1c. The colouring of the data points by SST illustrates how interactions with SST split up the lnNd–CLF relationship, with low SST values amplifying the lnNd contribution and vice versa. To quantify this interaction effect, the meteorological data are then divided into a group of above-average feature values and a group of below-average feature values. A linear regression is fit to the lnNd values and the SHAP interaction values in each group. An interaction index (IAI) is derived from these regression fits and defined as the slope for the high-value group (> mean) with the slope for the low-value group (< mean) subtracted:

(2) IAI = β x , high - β x , low ,

where β is the slope of the linear regression between SHAP interaction values and lnNd values and the subscripts denote the high-value group and the low-value group for a specific meteorological variable x (SST in the example) respectively. At the exemplary geographical window, the influence of SST on the Nd–CLF sensitivity is quantified by IAI =-0.029 CLF σ−1 (Fig. 1c). Similar to sensitivities, the unit of IAIs is also CLF σ−1. Therefore, for a positive sensitivity such as the Nd–CLF sensitivity shown in Fig. 1b, a negative IAI value means that the Nd–CLF sensitivity is larger with low feature values, as shown in Fig. 1c (the positive relationship is weakened by high SST values). On the contrary, a positive IAI value corresponds to a larger positive sensitivity with high feature values.

https://acp.copernicus.org/articles/24/13025/2024/acp-24-13025-2024-f01

Figure 1SHAP dependence plots for the cloud-droplet number concentration (Nd) in the region from 27.5 to 32.5° S and from 122.5 to 127.5° W. (a) Dots show Nd SHAP values versus Nd values. The orange line shows the change rate of Nd SHAP values with respect to Nd (dSHAP /dNd) versus Nd values for each Nd bin of 20 cm−3 wide. Panel (b) is similar to panel (a) but shows the relationship between lnNd SHAP values and lnNd with the corresponding sensitivity defined as the slope of the linear regression. Panel (c) shows SHAP interaction values coloured by sea-surface temperature (SST) showing the dependence of lnNd–CLF relationship on the interactive effects of SST. The interaction values are further divided into two groups by the mean feature value of SST. Linear regressions are performed respectively for the high-value group and low-value group and the interaction index (IAI) is defined as the slope for the high-value group by subtracting the slope for the low-value group. The horizontal dashed lines are a demarcation between negative and positive SHAP (interaction) values. Note that Nd in (a) is not standardized, while lnNd and SST in (b) and (c) are standardized.

Download

2.3.3 Limitations of observation-based machine learning of aerosol-cloud processes

In this section, limitations of this study are discussed. A fundamental limitation of our study is that the assertion of causality from the statistical relationships of aerosols/Nd and cloud fraction/properties is not easily done. While causal inference approaches exist and have been applied in the field of aerosol–cloud interactions (Fons et al.2023), we employ a more traditional approach of analysing statistical relationships of instantaneous observations (i.e. correlations). Unless nonetheless explicitly incorporating such causal inference approaches, studies utilizing statistical or ML models to explore observational aerosol–cloud processes contend with this common limitation. For instance, some studies assessed satellite-based statistical relationships between CLF and Nd (Christensen et al.2016, 2017), between LWP and Nd (Michibata et al.2016; Rosenfeld et al.2019), and between Nd and other aerosol proxies (Gryspeerdt et al.2017; McCoy et al.2017a), all resting on statistically inferring sensitivities of cloud quantities to aerosol proxies (Forster et al.2021). While we interpret the derived relationships with respect to the known physical relationships, uncertainties regarding the physical interpretation are mainly driven by two sources: uncertainties in the data and uncertainties from the methods.

  • 1.

    Data. Uncertainties exist for each satellite/reanalysis quantity, but may be particularly large in Nd. For example, the subpixel effect can introduce more bias in the Nd retrieval process within broken-cloud regimes due to increased heterogeneity. The Nd retrieval biases are discussed in Sect. 2.1. Also, Nd and CLF observations are not fully independent, which may introduce a spurious positive correlation between the two variables. As such, we expect the physical relationship of Nd and CLF to be weaker than our estimate so that the derived sensitivities present an upper bound of the physical relationship.

    Another caveat in our data is that Nd values in our study are computed using MODIS level-3 large-scale mean re and τc values instead of joint histograms as in Gryspeerdt et al. (2016). This may introduce additional biases considering the non-linearity of the Nd calculation. In future work, Nd data calculated from underlying joint histograms or pre-filtered data by Gryspeerdt et al. (2022) could be applied to be compared with the results in this study.

  • 2.

    Methods.

    • a.

      The exact quantification of sensitivities is dependent on the choice of the statistical/machine learning model. While for (more linearly related) monthly data, Andersen et al. (2022) have shown that XGB, artificial neural networks, and linear models tend to lead to very similar results, this is not expected for more instantaneous data. Here, non-linear relationships are expected, and a more complex non-linear model is a more appropriate choice. XGB and other tree ensemble methods are a particularly popular choice because of their interpretability, high accuracy considering computational efficiency (Lundberg et al.2020), and ability to model the interactions between predictors (Elith et al.2008). They have been frequently used to study aerosols and clouds in the past (Fuchs et al.2018; Dadashazar et al.2021; Andersen et al.2021; Y. Chen et al.2022; Bender et al.2024). Besides, the Tree SHAP algorithm, specifically tailored for tree-based models to compute exact Shapley values, can even further enhance their interpretability and has been applied in this field as well (Stirnberg et al.2021; Zipfel et al.2022).

    • b.

      The quantification of sensitivities with SHAP values depends on details: the choice of the algorithm to effectively estimate Shapley values is application-specific and comes to the trade-off between being true to the data and true to the model, which relies on an observational and interventional conditional expectation respectively (Chen et al.2020). The true to the model approach is preferable when trying to understand how an ML model makes a prediction, which requires assuming feature independence. In this study, we focus on potential mechanisms behind CLF sensitivities, and thus we tend to respect the correlations spread among input features (true to the data) (Frye et al.2021; Chen et al.2022). Consequently, we suffer from the disadvantage of being true to the data: entangled importance attributions of correlated features, e.g. a feature not explicitly used by the model for the prediction task, might be assigned a non-zero contribution. Yet we refrain from the drawback of being true to the model – unrealistic input instances (Sundararajan and Najmi2020; Linardatos et al.2021; H. Chen et al.2023). Despite the inherent trade-off, SHAP approach has been employed in the context of being true to the data (e.g. Stirnberg et al.2021; Zipfel et al.2022; Li et al.2022).

The derived estimates of sensitivities and interactive effects in this paper should thus be interpreted with these limitations and uncertainties in mind.

3 Results and discussion

3.1 Model performance

The skills of the region-specific XGB models in predicting CLF are evaluated by the coefficient of determination (R2) on the unseen hold-out test data. The global weighted mean R2 is 0.45 (about 45 % on weighted average and up to 73.57 % of the variability in CLF prediction is explained) and the standard deviation 0.10. While this means that, on average, about half of the variability in CLF cannot be explained by the machine learning models, this is expected as previous studies have shown that the performance of statistical models decreases when going from monthly to daily data (Andersen et al.2017; Fuchs et al.2018; Dadashazar et al.2021), and the performance is on par with that reported by Dadashazar et al. (2021), who used machine learning models to predict Nd with daily reanalysis data. The models in tropical regions in the Indian Ocean and the western Pacific relatively poorly explain the variability in CLF, while XGB models perform well in the stratocumulus regions in the subtropics near the continents and in the midlatitudes, particularly the Southern Hemispheric midlatitudes. The high skill of predicting CLF in the Southern Hemispheric midlatitudes is in contrast to a recent study where this region has been found to be particularly difficult to model statistically with monthly data (Andersen et al.2023). In this region, the day-to-day CLF variability is high due to the large influence of synoptic-scale weather systems, and hence data at the daily resolution are more adequate to represent the CLF variability in these regions.

https://acp.copernicus.org/articles/24/13025/2024/acp-24-13025-2024-f02

Figure 2R2 score of regional Extreme Gradient Boosting models predicting the cloud fraction of marine boundary-layer clouds in the independent test data set (2017–2019).

3.2 CLF sensitivity: global perspectives and regional characteristics

3.2.1 Global overview of CLF sensitivities

Figure 3 summarizes the means and distributions of the near-global sensitivities of CLF to all predictors. The sensitivities are estimated as described in Sect. 2.3.2. The sequence is sorted by descending mean values of the absolute sensitivities (i.e. by feature importance) of the predictor variables. A strong and consistently positive Nd–CLF sensitivity is found. The fact that CLF is the most sensitive to Nd is to be expected, as cloud observations from the same sensor are more directly related than a reanalysis product, so their overall magnitude should not be compared (Zipfel et al.2022). The entrainment of relatively dry air from the free troposphere into the MBL is impeded by a stronger inversion (i.e. higher EIS), resulting in a shallower, better-mixed, and more humid MBL conducive to stratocumulus clouds (Bretherton and Wyant1997; Wood and Hartmann2006; Qu et al.2015a; Myers et al.2021). The salient positive sensitivity to EIS is in accordance with the links found in previous studies (e.g. Klein and Hartmann1993; Qu et al.2015b; Andersen et al.2017), suggesting that EIS is a crucial controlling factor for low marine cloud cover. Note that in some studies, the strength of the inversion over the boundary layer is measured by lower tropospheric stability, which can be regarded as a similar metric outperformed by EIS (Wood and Bretherton2006). Precipitation fraction is the fraction of the original ERA5 grid box covered by large-scale precipitation. The strong positive CLF sensitivity to precipitation fraction is likely caused by the ML model learning that precipitation can be viewed as a proxy for cloudiness rather than being an indicator of the physical processes via which precipitation exerts controls on the macrophysics of MBLCs. Humidity shows positive CLF sensitivities greater at 850 hPa, where cloud tops are often located (Gryspeerdt and Stier2012), than at 700 hPa, which is typically in the free troposphere above the MBLCs (Myers and Norris2013). Likewise, the atmospheric temperature at 850 hPa (t850) presents stronger CLF sensitivity than the temperature at 700 hPa (t700). Nonetheless, in the case of winds the 700 hPa pressure level is more relevant than that at 850 hPa. A relatively pronounced negative sensitivity to the eastward wind component at 700 hPa (u700) seems to indicate that clouds are depleted due to more westerlies at this level. CLF exhibits negative sensitivities to vertical pressure velocities at both 850 and 700 hPa, showing that large-scale ascending motion is connected to increases in MBLCs (Myers and Norris2013; Bretherton et al.2013; Blossey et al.2013). In general, the global averages of CLF sensitivity in terms of dynamical predictors (i.e. 3D winds at surface and pressure levels) vary in sign and are less strong. A marked negative sensitivity of CLF to SST is found, which is in agreement with many prior studies (e.g. Qu et al.2015b; Scott et al.2020), where increases in SST have been found to lead to low cloud breakup and dissipation due to a number of processes as described in, for example, Scott et al. (2020). One of these is that the associated enhancement of mean surface latent heat flux (LHF) deepens MBL and facilitates buoyancy and thus the entrainment of dry free-tropospheric air (Rieck et al.2012; Andersen et al.2022). However, CLF is much less sensitive to LHF than to SST, which may indicate that this mechanism is less important at the spatial scale and timescale considered in this study. CLF exhibits a considerable negative sensitivity to mean surface sensible heat flux (SHF), which quantifies an increase in CLF with increasing SHF (upward SHF is negative). While increased SHF can promote the transition from decks of stratus or stratocumulus clouds (high CLF) to more convective clouds (low CLF) due to the deepening of the boundary layer (Fan et al.2016), potentially leading to a positive SHF–CLF relationship, increased SHF is associated with situations of cold air advection where turbulent surface fluxes are enhanced, which could lead to marked increases in CLF (Miyamoto et al.2018; Zelinka et al.2018; Grise and Kelleher2021).

https://acp.copernicus.org/articles/24/13025/2024/acp-24-13025-2024-f03

Figure 3The distribution of the sensitivities of the cloud fraction to all predictors as depicted in Table 1. Boxes represent the interquartile range, which is extended by whiskers to up to 1.5 interquartile ranges, with outliers shown as points outside the range. The solid line and white dot in each box show the median and mean values of the sensitivities respectively. Predictors are sorted by the mean values of absolute sensitivity values. The dashed line across the figure separates positive and negative sensitivity values.

Download

3.2.2 Spatial patterns of the CLF sensitivity to Nd

The sensitivity of the MBLC fraction associated with the aerosol proxy, Nd, is ubiquitously positive in accordance with the global correlations or sensitivities found in, for example, Gryspeerdt et al. (2016) and Andersen et al. (2017). This is presumably due to the lifetime effect but could also partially result from Nd retrieval biases discussed in Sect. 2.1. The global weighted mean value of the Nd–CLF sensitivity is 0.074 CLF σ−1, with a standard deviation of 0.036 CLF σ−1. The relationship between CLF and Nd is found to be particularly strong in the regions of frequent stratocumulus-to-cumulus transition off the western continental coasts. These marked positive Nd–CLF sensitivities may be caused by high Nd, delaying the transition from stratocumulus to cumulus clouds (Gryspeerdt et al.2016; Christensen et al.2020). However, as this cloud regime transition involves clouds shifting from more overcast to more broken, the strong relationships in these regions may be more affected by Nd retrieval errors. The Nd–CLF sensitivity is also pronounced in the Southern Hemispheric midlatitudes, where stratiform clouds dominate. The Nd–CLF sensitivity is weak and close to zero in the tropics, in particular in the deep convective warm-pool region. These spatial patterns of Nd–CLF sensitivity resemble those found by Gryspeerdt et al. (2016), in particular the ones where they mediated the aerosol optical depth–CLF relationship by Nd but are more pronounced in the Southern Hemispheric midlatitudes. This difference in estimated sensitivity seems noteworthy and should thus be investigated in future work. As Nd retrievals tend to negatively bias at lower CLF and positively bias at higher CLF, the Nd–CLF sensitivity may be overestimated and, at the scales considered here, should be interpreted as an upper bound to the physical Nd–CLF sensitivity. The global weighted average of the CLF–lnNd sensitivity without standardization is 0.112 (unitless), and its spatial pattern is shown in Fig. S4. This value is higher than the upper bound of 0.1 reported by Bellouin et al. (2020), which is based on global climate models and large-eddy simulations. This may be partly due to the aforementioned bias. However, it is important to note that our non-standardized CLF–Nd sensitivity, shown in Fig. 1a, closely mirrors that from Yuan et al. (2023), with a similar range. In addition, the high lnCLF–lnNd values estimated in Y. Chen et al. (2022) and Chen et al. (2024) suggest that values exceeding the upper bound of 0.1 might be plausible. These recent observational studies, including quantifying cloud fraction adjustment based on ship tracks (Yuan et al.2023), volcano aerosol perturbations (Y. Chen et al.2022; Chen et al.2024), and our SHAP approach using global satellite observations, indicate that the 0.1 upper bound may be extended. In future work, estimating a radiative forcing using the SHAP-based sensitivities will make our study more comparable with other research on cloud fraction adjustment.

https://acp.copernicus.org/articles/24/13025/2024/acp-24-13025-2024-f04

Figure 4Sensitivity of the marine boundary-layer cloud fraction to lnNd.

3.2.3 Spatial patterns of the CLF sensitivity to thermodynamical drivers

There has been a strong consensus that EIS and SST are the two important determinants of cloud fraction of marine boundary clouds and their corresponding radiative effects across different geographical regions and on varying timescales (e.g.  Bretherton2015; Myers and Norris2015; McCoy et al.2017b; Wall et al.2017). Stronger inversions capping MBL (i.e. higher EIS) will hamper the entrainment of aloft dry air from the troposphere and thus lead to a shallower MBL and more moisture trapped within MBL, promoting the development and maintenance of low-level clouds (Andersen et al.2017). The regional EIS–CLF sensitivity patterns (Fig. 5a) show that low marine cloud fraction increases ubiquitously in response to stronger EIS, in particular in the tropical and subtropical stratocumulus-capped regions and within the midlatitudes. The sensitivity pattern is in good agreement with that found by Scott et al. (2020) and Andersen et al. (2023), related studies at different timescales (Grise and Medeiros2016; Kelleher and Grise2019; de Szoeke et al.2016).

MBLC cover reduces globally in response to increased SST, particularly pronounced in the stratocumulus regions over eastern oceanic basins (Fig. 5b), consistent well with (Scott et al.2020). SST can favour MBLC dissipation through increasing surface latent heat fluxes and deepening MBL, facilitating dry entrainment and eventually desiccating the MBL and clouds (Rieck et al.2012; Qu et al.2015b). Yet as stated in Sect. 3.2.1, the weak CLF sensitivity to LHF in relation to the strong sensitivity to SST may imply that the other process makes more substantial contributions – namely, that the higher moisture gradient between the troposphere and MBL arising from the increased SST makes the entrained air more efficient in evaporating cloud water (van der Dussen et al.2015; Qu et al.2015b). This process has been shown to be the driving mechanism for the observed reduction in marine low cloud cover near the coast of Baja California (Andersen et al.2022).

Figure 5c shows that low marine cloud fraction increases with negative (upward) SHF most markedly in the stratocumulus regions. CLF can increase in response to increased surface fluxes in situations of cold advection (Zelinka et al.2018). Over the south Indian Ocean, a marked SHF–CLF sensitivity is also found. Here, enhancements of SHF due to the subtropical anticyclone and midlatitude storm-track activity have been found to increase CLF (Miyamoto et al.2018). The results may be a hint that the increase in CLF presumably due to increased SHF (e.g. due to cold advection) outweighs the influence of SHF on CLF by controlling the transition from marine stratocumulus to open-cellular marine clouds (Kazil et al.2014; Fan et al.2016) in the core stratocumulus regions. Consequently, the SHF–CLF sensitivity is less pronounced in regions of frequent closed- to open-cell and cumulus transitions. Relative humidity at 850 hPa (RH850) is positively related to marine low liquid cloud fraction across the globe. The positive sensitivity is particularly strong in the trade cumulus regions, where the 850 hPa level is representative of the boundary layer. In the coastal stratocumulus regions, clouds are frequently below this level (Adebiyi and Zuidema2016), so that clouds are not as sensitive to variability in RH at that level.

https://acp.copernicus.org/articles/24/13025/2024/acp-24-13025-2024-f05

Figure 5Sensitivity of the marine boundary-layer cloud fraction to the estimated inversion strength (EIS), sea-surface temperature (SST), sensible heat flux (SHF), and relative humidity at 850 hPa (RH850). Note that the range of colour bars of SHF and RH850 (0.075 to 0.075) is narrower than EIS and SST (0.15 to 0.15).

3.2.4 Spatial patterns of the CLF sensitivity to dynamical drivers

Large-scale circulations and dynamical conditions play an essential role in controlling cloud fraction and the indirect effects of aerosols (Su et al.2010; Small et al.2011). The large-scale dynamics are represented by the horizontal and vertical winds at 700 and 850 hPa, which display clear and distinct regional patterns of CLF sensitivity (Fig. 6). It can also be seen that at the considered scales and pressure levels, horizontal wind vectors have stronger CLF sensitivities than large-scale vertical motion. There is a coherent pattern of negative CLF sensitivity to the zonal wind at 700 hPa in the stratocumulus-dominated regions (also apparent at 850 hPa), and the Southern Hemispheric midlatitudes, indicating a decrease in MBLCs with westerly anomalies at this pressure level. Recently, a study using monthly data has also found a similar sensitivity pattern of stratocumulus clouds to zonal wind at 700 hPa, finding that the reduced CLF is related to increased vertical wind shear (as the boundary-layer flow is easterly), leading to increased turbulence and dry-air entrainment (Andersen et al.2023). However, using monthly data, Andersen et al. (2023) did not find a similar CLF sensitivity to zonal winds in the Southern Hemispheric midlatitudes. As the CLF sensitivity to u700 in the Southern Hemispheric midlatitudes is only apparent using daily data and only at 700 hPa, it seems likely that it is related to synoptic variability that drives day-to-day variability in MBLCs in this region (Kelleher and Grise2019). Positive CLF sensitivities to u700 (higher CLF with westerly anomalies) and, to a lesser degree, u850 are found off the eastern Asian and North American continents. CLF increases due to cold-air outbreaks in NW Atlantic and NW Pacific may be the reason for these positive sensitivities. Cold-air outbreaks occur during winter as cold continental air moves over warmer SSTs, increasing moisture and heat fluxes into the MBL so that the formation of MBLCs is favoured (Young et al.2002). This leads to wintertime maxima in CLF in these regions (Yuan and Oreopoulos2013).

The sensitivity of CLF to the meridional winds at 700 hPa exhibits two bands straddling the subtropical regions between about 15 and 35° in both hemispheres but opposite in sign (positive in the Northern Hemisphere and negative in the Southern Hemisphere), illustrating that in these regions, the poleward winds are associated with an increase in low cloud fraction. The bands are still apparent at 850 hPa, while the negative band in the Southern Hemisphere extends northward to tropical areas. These hemispheric sensitivity bands to the v wind component at 700 hPa closely resemble those found in Andersen et al. (2023), with their analysis suggesting that the poleward winds on the eastern side of midlatitude cyclones may be related to warm and moist advection, increasing CLF. However, they also find a strong correlation of these free-tropospheric poleward winds with large-scale ascending air motion making the assertion of causality difficult. Poleward winds are also found to decrease CLF over the Southern Hemispheric midlatitudes.

CLF is negatively connected to the vertical pressure velocity at both 700 and 850 hPa (ω700 and ω850) over the entire Earth, indicating that ascending large-scale air motion enhances the cover of MBLCs globally. It is shown in the bottom of Fig. 6 column (a) that the CLF sensitivity to ω700 is larger in the midlatitude ocean basins, whereas the CLF sensitivity to ω850 is larger in the subtropical oceans, where subsidence is climatologically prevalent (Myers and Norris2015, 2016; Scott et al.2020). This seems indicative of CLF being the most sensitive to large-scale ascending motion at the typical altitude of the clouds. It is interesting to note that between 30° N and 30° S, no marked CLF sensitivity to ω700 is found, contrasting the finding of enhanced subsidence at this level reducing MBLCs by Myers and Norris (2013). This effect is likely better described in the ω850 data, which is more related to the altitude of the cloud top.

https://acp.copernicus.org/articles/24/13025/2024/acp-24-13025-2024-f06

Figure 6Sensitivity of cloud fraction to wind component vectors u and v and vertical velocities at 700 hPa (column a) and 850 hPa (column b). Note that the range of the colour bars is in general smaller (0.04–0.04) than in Fig. 5.

3.3 Dependence of Nd–CLF relationship on meteorology

3.3.1 Global overview of the interaction indices

In this section, we use the IAI as defined in Sect. 2.3.2 to quantitatively show how the response of MBLC fraction attributed to the aerosol proxy Nd varies with the meteorological factors. As discussed in Sect. 2.3.2, since the sensitivity related to Nd is positive across the globe (Fig. 5d), a positive IAI can be interpreted as an amplification of the Nd–CLF sensitivity with high (above-average) feature values of a meteorological variable, whereas a negative IAI signifies an amplification of the sensitivity at low feature values.

In Fig. 7, analogous to Fig. 3, the features along the x axis are arranged in descending order based on their averaged absolute IAIs, that is, by the strength of the impact of each meteorological feature on the Nd–CLF sensitivity. Similar to the feature importance summarized by Fig. 3, EIS, SST, RH850, and SHF have relatively large strength of interaction effect and can thus be regarded as critical controlling factors for not only marine low cloud cover but also their response to changes in Nd (and in extension aerosols). Compared to the CLF sensitivities, the IAIs associated with atmospheric temperatures at 700 and 850 hPa have greater strengths. Furthermore, it can also be seen that the vertical and horizontal winds at the surface and different pressure levels are generally ranked lower. In general, the thermodynamical factors seem to have a stronger influence on the Nd–CLF sensitivity than the dynamical factors.

https://acp.copernicus.org/articles/24/13025/2024/acp-24-13025-2024-f07

Figure 7Similar to Fig. 3 but for the interaction effect of Nd with all environmental parameters, quantified by the interaction index (CLF σ−1).

Download

3.3.2 Spatial patterns of the interaction indices

Coherent and distinct spatial distributions of the impact of selected meteorological parameters on the Nd–CLF relationship can be observed. Hereafter, we show the regional characteristics of the interaction effects of EIS and SST, which are the two most important meteorological factors for CLF in MBLCs and have the greatest absolute strengths of IAI. EIS exerts the most noticeable positive IAIs over the midlatitude oceanic areas (Fig. 8a), reflecting that stronger temperature inversions capping the MBL over these regions may amplify the positive Nd–CLF relationship. The interpretation of possible underlying physical mechanisms of these interaction effects is difficult and remains speculative. The results seem to suggest that in these regions, potentially through hampering the entrainment of drier air from the free troposphere, the stronger inversion and more stable conditions are capable of trapping more moisture within a shallower MBL and could thus weaken the evaporation–entrainment feedback. As a result, it may ultimately favour a more positive Nd–CLF relationship (Chen et al.2014; Christensen et al.2020). It is interesting to note that these interactions are not apparent in the stratocumulus regions, where EIS is a strong control of CLF, and in the stratocumulus-to-cumulus transition regions, where Christensen et al. (2020) found the aerosol effect on this transition to be confined to stable atmospheric conditions. This may imply that the suggested entrainment effect is dependent on the EIS and stronger at slightly lower EIS values typically found in the midlatitudes (Scott et al.2020). The observed impact of EIS on the Nd–CLF relationship found in the midlatitudes may also have implications within the context of climate change. While in the subtropics global climate models predict an increase in EIS with a warming climate, in the midlatitudes EIS is predicted to decrease (Myers et al.2021), potentially decreasing the sensitivity of CLF to Nd there.

Figure 8b shows that higher SSTs are found to amplify the positive Nd–CLF relationship (positive IAI) in the regions of frequent stratocumulus-to-cumulus transition (Cesana and Del Genio2021). The physical interpretation could be the following: here, higher SSTs tend to lead to the transition from stratocumulus clouds to shallow convective clouds (Cesana et al.2019); however, this transition has been found to be delayed when aerosol is increased (Goren et al.2019; Christensen et al.2020). Tentatively, the positive IAIs in these transition regions may thus point to increased control of Nd on CLF at higher SST values as these are the situations where transitions typically occur and when increased Nd can act to delay this transition. In these regions, higher SSTs in the future might thus increase the sensitivity of MBLC CLF to aerosols. It should be noted that the quantification of the dependence of the Nd–CLF relationship on meteorological factors (EIS, SST discussed in this section) is also likely subject to the biases in the Nd–CLF sensitivity caused by the Nd retrieval biases as a function of CLF. This would potentially contribute to the non-causal facets of the relationships and interactive effects quantified by SHAP values.

https://acp.copernicus.org/articles/24/13025/2024/acp-24-13025-2024-f08

Figure 8Patterns of the interaction index showing the dependence of the Nd–CLF relationship on estimated inversion strength (EIS) (a) and sea-surface temperature (SST) (b).

4 Conclusions

In this study, 9 years (2011–2019) of daily satellite and reanalysis data have been analysed to better understand the effect of Nd on CLF in MBLC and its dependence on meteorological factors. We have established a near-global machine learning framework to predict the cloud fraction of marine boundary clouds using regionally specific XGB regression models. Including many confounding and influencing factors as a whole, the explainable machine learning technique of SHAP regression values has been used to explain the regional XGB models; to quantify the CLF sensitivity to all cloud controlling factors with a specific focus on Nd; and, moreover, to quantify the meteorological influence on the Nd–CLF relationship at a global scale. The statistical sensitivities and interactive effects are interpreted with the guidance of hypothesized causal pathways and the state-of-the-art physical understanding of the system. The main findings of this study, which should be interpreted in light of the data and methodology limitations discussed in Sect. 2.3.3), are summarized as follows:

  1. The marine boundary-layer cloud fraction shows a notable positive sensitivity to Nd (a surrogate for aerosols) in the regions of stratocumulus-to-cumulus transition, which may arise from the high Nd delaying this transition. The Nd–CLF sensitivity in the Southern Hemispheric midlatitudes is observed to be higher than in previous studies, which should be investigated in future work. The estimated Nd–CLF sensitivity and its magnitude suggest that aerosols likely have a considerable impact on MBL cloudiness although this may partially result from an overestimation caused by the effect of a positive retrieval bias of Nd at high CLF.

  2. Consistent with the literature, our statistical method shows that EIS and SST are two important determinants for low marine clouds by regulating surface fluxes and dry-air entrainment processes. In addition, strong negative CLF sensitivity and spatial patterns for SHF are also found, suggesting that the effect of cold air advection might surpass the SHF enhancement of closed-to-open-cell and cumulus transitions. Dynamic drivers (meridional and zonal winds) indicate that midlatitude synoptic-scale disturbances and vertical wind shear seemingly make considerable contributions to marine low cloud amounts.

  3. In general, thermodynamical parameters exert a more important influence on the Nd–CLF relationship than dynamical parameters. EIS, RH850, SST, and temperatures at 700 and 850 hPa have the strongest effect on the Nd–CLF sensitivity. In the midlatitudes, higher EIS is found to amplify the positive Nd–CLF sensitivity, which may be related to a reduced entrainment feedback in these conditions, whereas higher SST is found to amplify the Nd–CLF sensitivity in stratocumulus-to-cumulus transition regions, which is potentially because the transition induced by higher SSTs may be delayed by increased Nd. These findings have potential implications for possible future changes in the sensitivity of CLF to aerosols.

  4. For the dynamical and thermodynamical factors shown here, both CLF sensitivities and interactive effects (dependence of Nd–CLF relationship on meteorology) exhibit distinct regional patterns. These coherent spatial patterns indicate that the proposed explainable machine learning framework not only is capable of skilfully predicting CLF for marine low clouds but also has the potential to capture regional characteristics of the relation between CLF and Nd as well as meteorological influences.

In the future, the observation-based sensitivities and interactive effects quantified by the ML framework here will be compared to those in ESMs, which have the potential to evaluate ESM parameterizations related to ACI and even help gain insights into how the models could be tuned in this respect. In addition, incorporating causal approaches for SHAP, such as those proposed by Heskes et al. (2020) and Frye et al. (2021), would help to test to which extent the observed statistical relationships and interaction effects represent physical processes.

Code availability

Code is available from the corresponding author upon reasonable request.

Data availability

All data sets used in this study are publicly available. The MODIS data set (https://doi.org/10.5067/MODIS/MOD08_D3.061, Platnick et al.2015) was acquired from the Level-1 and Atmosphere Archive and Distribution System (LAADS) Distributed Active Archive Center (DAAC) (NASA: MODIS Data Collection, https://ladsweb.modaps.eosdis.nasa.gov/search/, last access: 17 November 2024); the hourly reanalysis data at single levels (https://doi.org/10.24381/cds.adbb2d47, Hersbach et al.2023a) and pressure levels (https://doi.org/10.24381/cds.bd0915c6, Hersbach et al.2023b) are obtained from the Copernicus Climate Change Service (C3S) Climate Date Store.

Supplement

The supplement related to this article is available online at: https://doi.org/10.5194/acp-24-13025-2024-supplement.

Author contributions

HA and JC designed the initial research idea. YJ, HA, and JC developed the study concept and methodology. YJ and HA obtained and analysed the data sets. YJ implemented the explainable machine learning framework, performed the visualization, and wrote the original draft. All authors contributed to interpreting the results and reviewing and improving the paper.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Acknowledgements

The (co-)authors have received funding from European Union’s Horizon 2020 research and innovation programme under grant agreement no. 821205 (FORCeS) and the Deutsche Forschungsgemeinschaft (DFG) as part of the project Constraining Aerosol-Low cloud InteractionS with multi-target MAchine learning (CALISMA; project no. 440521482). We thank three anonymous reviewers whose helpful comments contributed to improving the manuscript.

Financial support

This research has been supported by Horizon 2020 (grant no. 821205) and the Deutsche Forschungsgemeinschaft (grant no. 440521482).

The article processing charges for this open-access publication were covered by the Karlsruhe Institute of Technology (KIT).

Review statement

This paper was edited by Yuan Wang and reviewed by three anonymous referees.

References

Ackerman, A. S., Kirkpatrick, M. P., Stevens, D. E., and Toon, O. B.: The impact of humidity above stratiform clouds on indirect aerosol climate forcing, Nature, 432, 1014–1017, https://doi.org/10.1038/nature03174, 2004. a

Adebiyi, A. A. and Zuidema, P.: The role of the southern African easterly jet in modifying the southeast Atlantic aerosol and cloud environments, Q. J. Roy. Meteorol. Soc., 142, 1574–1589, https://doi.org/10.1002/qj.2765, 2016. a

Albrecht, B. A.: Aerosols, cloud microphysics, and fractional cloudiness, Science, 245, 1227–1230, https://doi.org/10.1126/science.245.4923.1227, 1989. a

Andersen, H. and Cermak, J.: How thermodynamic environments control stratocumulus microphysics and interactions with aerosols, Environ. Res. Lett., 10, 024004, https://doi.org/10.1088/1748-9326/10/2/024004, 2015. a

Andersen, H., Cermak, J., Fuchs, J., and Schwarz, K.: Global observations of cloud-sensitive aerosol loadings in low-levelmarine clouds, J. Geophys. Res., 121, 936–12, https://doi.org/10.1002/2016JD025614, 2016. a

Andersen, H., Cermak, J., Fuchs, J., Knutti, R., and Lohmann, U.: Understanding the drivers of marine liquid-water cloud occurrence and properties with global observations using neural networks, Atmos. Chem. Phys., 17, 9535–9546, https://doi.org/10.5194/acp-17-9535-2017, 2017. a, b, c, d, e, f

Andersen, H., Cermak, J., Stirnberg, R., Fuchs, J., Kim, M., and Pauli, E.: Assessment of COVID-19 effects on satellite-observed aerosol loading over China with machine learning, Tellus B, 73, 1–14, https://doi.org/10.1080/16000889.2021.1971925, 2021. a

Andersen, H., Cermak, J., Zipfel, L., and Myers, T. A.: Attribution of Observed Recent Decrease in Low Clouds Over the Northeastern Pacific to Cloud‐Controlling Factors, Geophys. Res. Lett., 49, 1–10, https://doi.org/10.1029/2021gl096498, 2022. a, b, c, d

Andersen, H., Cermak, J., Douglas, A., Myers, T. A., Nowack, P., Stier, P., Wall, C. J., and Wilson Kemsley, S.: Sensitivities of cloud radiative effects to large-scale meteorology and aerosols from global observations, Atmos. Chem. Phys., 23, 10775–10794, https://doi.org/10.5194/acp-23-10775-2023, 2023. a, b, c, d, e, f

Arola, A., Lipponen, A., Kolmonen, P., Virtanen, T. H., Bellouin, N., Grosvenor, D. P., Gryspeerdt, E., Quaas, J., and Kokkola, H.: Aerosol effects on clouds are concealed by natural cloud heterogeneity and satellite retrieval errors, Nat. Commun., 13, 7357, https://doi.org/10.1038/s41467-022-34948-5, 2022. a

Bellouin, N., Quaas, J., Gryspeerdt, E., Kinne, S., Stier, P., Watson-Parris, D., Boucher, O., Carslaw, K. S., Christensen, M., Daniau, A. L., Dufresne, J. L., Feingold, G., Fiedler, S., Forster, P., Gettelman, A., Haywood, J. M., Lohmann, U., Malavelle, F., Mauritsen, T., McCoy, D. T., Myhre, G., Mülmenstädt, J., Neubauer, D., Possner, A., Rugenstein, M., Sato, Y., Schulz, M., Schwartz, S. E., Sourdeval, O., Storelvmo, T., Toll, V., Winker, D., and Stevens, B.: Bounding Global Aerosol Radiative Forcing of Climate Change, Rev. Geophys., 58, 1–45, https://doi.org/10.1029/2019RG000660, 2020. a, b, c

Bender, F. A., Frey, L., McCoy, D. T., Grosvenor, D. P., and Mohrmann, J. K.: Assessment of aerosol–cloud–radiation correlations in satellite observations, climate models and reanalysis, Clim. Dynam., 52, 4371–4392, https://doi.org/10.1007/s00382-018-4384-z, 2019. a

Bender, F. A., Lord, T., Staffansdotter, A., Jung, V., and Undorf, S.: Machine Learning Approach to Investigating the Relative Importance of Meteorological and Aerosol-Related Parameters in Determining Cloud Microphysical Properties, Tellus B, 76, 1–18, https://doi.org/10.16993/tellusb.1868, 2024. a

Bennartz, R. and Rausch, J.: Global and regional estimates of warm cloud droplet number concentration based on 13 years of AQUA-MODIS observations, Atmos. Chem. Phys., 17, 9815–9836, https://doi.org/10.5194/acp-17-9815-2017, 2017. a

Beucler, T., Ebert-Uphoff, I., Rasp, S., Pritchard, M., and Gentine, P.: Machine Learning for Clouds and Climate, in: Clouds and Their Climatic Impacts, Geophysical Monograph Series, 325–345, ISBN 9781119700357, https://doi.org/10.1002/9781119700357.ch16, 2023. a, b

Blossey, P. N., Bretherton, C. S., Zhang, M., Cheng, A., Endo, S., Heus, T., Liu, Y., Lock, A. P., de Roode, S. R., and Xu, K.-M.: Marine low cloud sensitivity to an idealized climate change: The CGILS LES intercomparison, J. Adv. Model. Earth Sy., 5, 234–258, https://doi.org/10.1002/jame.20025, 2013. a

Bretherton, C. S.: Insights into low-latitude cloud feedbacks from high-resolution models, Philos. T. R. Soc. A, 373, 20140415, https://doi.org/10.1098/rsta.2014.0415, 2015. a

Bretherton, C. S. and Wyant, M. C.: Moisture Transport, Lower-Tropospheric Stability, and Decoupling of Cloud-Topped Boundary Layers, J. Atmos. Sci., 54, 148–167, https://doi.org/10.1175/1520-0469(1997)054<0148:MTLTSA>2.0.CO;2, 1997. a

Bretherton, C. S., Blossey, P. N., and Uchida, J.: Cloud droplet sedimentation, entrainment efficiency, and subtropical stratocumulus albedo, Geophys. Res. Lett., 34, L03813, https://doi.org/10.1029/2006GL027648, 2007. a

Bretherton, C. S., Blossey, P. N., and Jones, C. R.: Mechanisms of marine low cloud sensitivity to idealized climate perturbations: A single-LES exploration extending the CGILS cases, J. Adv. Model. Earth Sy., 5, 316–337, https://doi.org/10.1002/jame.20019, 2013. a

Carslaw, K. S., Lee, L. A., Reddington, C. L., Pringle, K. J., Rap, A., Forster, P. M., Mann, G. W., Spracklen, D. V., Woodhouse, M. T., Regayre, L. A., and Pierce, J. R.: Large contribution of natural aerosols to uncertainty in indirect forcing, Nature, 503, 67–71, https://doi.org/10.1038/nature12674, 2013. a

Ceppi, P. and Nowack, P.: Observational evidence that cloud feedback amplifies global warming, P. Natl. Acad. Sci. USA, 118, e2026290118, https://doi.org/10.1073/pnas.2026290118, 2021. a

Cesana, G., Del Genio, A. D., Ackerman, A. S., Kelley, M., Elsaesser, G., Fridlind, A. M., Cheng, Y., and Yao, M.-S.: Evaluating models' response of tropical low clouds to SST forcings using CALIPSO observations, Atmos. Chem. Phys., 19, 2813–2832, https://doi.org/10.5194/acp-19-2813-2019, 2019. a

Cesana, G. V. and Del Genio, A. D.: Observational constraint on cloud feedbacks suggests moderate climate sensitivity, Nat. Clim. Change, 11, 213–218, https://doi.org/10.1038/s41558-020-00970-y, 2021. a

Chen, H., Janizek, J. D., Lundberg, S., and Lee, S.-I.: True to the Model or True to the Data?, ArXiv, abs/2006.16234, https://arxiv.org/abs/2006.16234 (last access: 17 November 2024), 2020. a

Chen, H., Lundberg, S. M., and Lee, S.-I.: Explaining a series of models by propagating Shapley values, Nat. Commun., 13, 4512, https://doi.org/10.1038/s41467-022-31384-3, 2022. a

Chen, H., Covert, I. C., Lundberg, S. M., and Lee, S.-I.: Algorithms to estimate Shapley value feature attributions, Nat. Mach. Intell., 5, 590–601, https://doi.org/10.1038/s42256-023-00657-x, 2023. a

Chen, T. and Guestrin, C.: XGBoost: A scalable tree boosting system, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining San Francisco California USA, 13–17 August, Association for Computing Machinery, New York, NY, United States, 785–794, https://doi.org/10.1145/2939672.2939785, 2016. a, b

Chen, Y., Haywood, J., Wang, Y., Malavelle, F., Jordan, G., Partridge, D., Fieldsend, J., De Leeuw, J., Schmidt, A., Cho, N., Oreopoulos, L., Platnick, S., Grosvenor, D., Field, P., and Lohmann, U.: Machine learning reveals climate forcing from aerosols is dominated by increased cloud cover, Nat. Geosci., 15, 609–614, https://doi.org/10.1038/s41561-022-00991-6, 2022. a, b, c, d

Chen, Y., Haywood, J., Wang, Y., Malavelle, F., Jordan, G., Peace, A., Partridge, D. G., Cho, N., Oreopoulos, L., Grosvenor, D., Field, P., Allan, R. P., and Lohmann, U.: Substantial cooling effect from aerosol-induced increase in tropical marine cloud cover, Nat. Geosci., 17, 404–410, https://doi.org/10.1038/s41561-024-01427-z, 2024. a, b

Chen, Y. C., Christensen, M. W., Stephens, G. L., and Seinfeld, J. H.: Satellite-based estimate of global aerosol-cloud radiative forcing by marine warm clouds, Nat. Geosci., 7, 643–646, https://doi.org/10.1038/ngeo2214, 2014. a, b, c

Christensen, M. W., Chen, Y.-C., and Stephens, G. L.: Aerosol indirect effect dictated by liquid clouds, J. Geophys. Res.-Atmos., 121, 614–636, https://doi.org/10.1002/2016JD025245, 2016. a

Christensen, M. W., Neubauer, D., Poulsen, C. A., Thomas, G. E., McGarragh, G. R., Povey, A. C., Proud, S. R., and Grainger, R. G.: Unveiling aerosol–cloud interactions – Part 1: Cloud contamination in satellite products enhances the aerosol indirect forcing estimate, Atmos. Chem. Phys., 17, 13151–13164, https://doi.org/10.5194/acp-17-13151-2017, 2017. a, b, c

Christensen, M. W., Jones, W. K., and Stier, P.: Aerosols enhance cloud lifetime and brightness along the stratus-to-cumulus transition, P. Natl. Acad. Sci. USA, 117, 17591–17598, https://doi.org/10.1073/pnas.1921231117, 2020. a, b, c, d, e

Dadashazar, H., Painemal, D., Alipanah, M., Brunke, M., Chellappan, S., Corral, A. F., Crosbie, E., Kirschler, S., Liu, H., Moore, R. H., Robinson, C., Scarino, A. J., Shook, M., Sinclair, K., Thornhill, K. L., Voigt, C., Wang, H., Winstead, E., Zeng, X., Ziemba, L., Zuidema, P., and Sorooshian, A.: Cloud drop number concentrations over the western North Atlantic Ocean: seasonal cycle, aerosol interrelationships, and other influential factors, Atmos. Chem. Phys., 21, 10499–10526, https://doi.org/10.5194/acp-21-10499-2021, 2021. a, b, c, d

de Szoeke, S. P., Verlinden, K. L., Yuter, S. E., and Mechem, D. B.: The Time Scales of Variability of Marine Low Clouds, J. Clim., 29, 6463–6481, https://doi.org/10.1175/JCLI-D-15-0460.1, 2016. a

Dey, S., Di Girolamo, L., Zhao, G., Jones, A. L., and McFarquhar, G. M.: Satellite-observed relationships between aerosol and trade-wind cumulus cloud properties over the Indian Ocean, Geophys. Res. Lett., 38, L01804, https://doi.org/10.1029/2010GL045588, 2011. a

Douglas, A. R. and L'Ecuyer, T.: Possible evidence of increased global cloudiness due to aerosol-cloud interactions, Atmos. Chem. Phys. Discuss. [preprint], https://doi.org/10.5194/acp-2022-688, 2022. a, b

Elith, J., Leathwick, J. R., and Hastie, T.: A working guide to boosted regression trees, J. Anim. Ecol., 77, 802–813, https://doi.org/10.1111/j.1365-2656.2008.01390.x, 2008. a

Fan, J., Wang, Y., Rosenfeld, D., and Liu, X.: Review of aerosol-cloud interactions: Mechanisms, significance, and challenges, J. Atmos. Sci., 73, 4221–4252, https://doi.org/10.1175/JAS-D-16-0037.1, 2016. a, b, c

Fons, E., Runge, J., Neubauer, D., and Lohmann, U.: Stratocumulus adjustments to aerosol perturbations disentangled with a causal approach, npj Clim. Atmos. Sci., 6, 1–10, https://doi.org/10.1038/s41612-023-00452-w, 2023. a

Forster, P. M., Storelvmo, T., Armour, K., Collins, W., Dufresne, J. L., Frame, D., Lunt, D. J., Mauritsen, T., Palmer, M. D., Watanabe, M., Wild, M., and Zhang, H.: Chapter 7: The Earth’s Energy Budget, Climate Feedbacks, and Climate Sensitivity, in: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change, edited by Masson-Delmotte, V., Zhai, P., Pirani, A., Connors, S. L., Péan, C., Berger, S., Caud, N., Chen, Y., Goldfarb, L., Gomis, M. I., Huang, M., Leitzell, K., Lonnoy, E., Matthews, J., Maycock, T. K., Waterfield, T., Yelekçi, O., Yu, R., and Zhou, B., Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, 923–1054, https://doi.org/10.1017/9781009157896.009, 2021. a, b

Frye, C., Rowat, C., and Feige, I.: Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability, Adv. Neural Informat. Process. Syst., 33, 1229–1239, https://doi.org/10.48550/arXiv.1910.06358, 2021. a, b

Fuchs, J., Cermak, J., and Andersen, H.: Building a cloud in the southeast Atlantic: Understanding low-cloud controls based on satellite observations with machine learning, Atmos. Chem. Phys., 18, 16537–16552, https://doi.org/10.5194/acp-18-16537-2018, 2018. a, b, c, d

Ghan, S., Wang, M., Zhang, S., Ferrachat, S., Gettelman, A., Griesfeller, J., Kipling, Z., Lohmann, U., Morrison, H., Neubauer, D., Partridge, D. G., Stier, P., Takemura, T., Wang, H., and Zhang, K.: Challenges in constraining anthropogenic aerosol effects on cloud radiative forcing using present-day spatiotemporal variability, P. Natl. Acad. Sci. USA, 113, 5804–5811, https://doi.org/10.1073/PNAS.1514036113, 2016. a

Goren, T., Kazil, J., Hoffmann, F., Yamaguchi, T., and Feingold, G.: Anthropogenic Air Pollution Delays Marine Stratocumulus Breakup to Open Cells, Geophys. Res. Lett., 46, 14135–14144, https://doi.org/10.1029/2019GL085412, 2019. a

Grandey, B. S. and Stier, P.: A critical look at spatial scale choices in satellite-based aerosol indirect effect studies, Atmos. Chem. Phys., 10, 11459–11470, https://doi.org/10.5194/acp-10-11459-2010, 2010. a

Grise, K. M. and Kelleher, M. K.: Midlatitude Cloud Radiative Effect Sensitivity to Cloud Controlling Factors in Observations and Models: Relationship with Southern Hemisphere Jet Shifts and Climate Sensitivity, J. Clim., 34, 5869–5886, https://doi.org/10.1175/JCLI-D-20-0986.1, 2021. a, b

Grise, K. M. and Medeiros, B.: Understanding the Varied Influence of Midlatitude Jet Position on Clouds and Cloud Radiative Effects in Observations and Global Climate Models, J. Clim., 29, 9005–9025, https://doi.org/10.1175/JCLI-D-16-0295.1, 2016. a

Grosvenor, D. P., Sourdeval, O., Zuidema, P., Ackerman, A., Alexandrov, M. D., Bennartz, R., Boers, R., Cairns, B., Chiu, J. C., Christensen, M., Deneke, H., Diamond, M., Feingold, G., Fridlind, A., Hünerbein, A., Knist, C., Kollias, P., Marshak, A., McCoy, D., Merk, D., Painemal, D., Rausch, J., Rosenfeld, D., Russchenberg, H., Seifert, P., Sinclair, K., Stier, P., van Diedenhoven, B., Wendisch, M., Werner, F., Wood, R., Zhang, Z., and Quaas, J.: Remote Sensing of Droplet Number Concentration in Warm Clouds: A Review of the Current State of Knowledge and Perspectives, Rev. Geophys., 56, 409–453, https://doi.org/10.1029/2017RG000593, 2018. a, b, c, d

Gryspeerdt, E. and Stier, P.: Regime-based analysis of aerosol-cloud interactions, Geophys. Res. Lett., 39, 1–5, https://doi.org/10.1029/2012GL053221, 2012. a

Gryspeerdt, E., Quaas, J., and Bellouin, N.: Constraining the aerosol influence on cloud fraction, J. Geophys. Res., 121, 3566–3583, https://doi.org/10.1002/2015JD023744, 2016. a, b, c, d, e, f, g, h

Gryspeerdt, E., Quaas, J., Ferrachat, S., Gettelman, A., Ghan, S., Lohman, U., Morrison, H., Neubauer, D., Partridge, D. G., Stier, P., Takemura, T., Wang, H., Wang, M., and Zhang, K.: Constraining the instantaneous aerosol influence on cloud albedo, P. Natl. Acad. Sci. USA, 114, 4899–4904, https://doi.org/10.1073/pnas.1617765114, 2017. a

Gryspeerdt, E., Goren, T., Sourdeval, O., Quaas, J., Mülmenstädt, J., Dipu, S., Unglaub, C., Gettelman, A., and Christensen, M.: Constraining the aerosol influence on cloud liquid water path, Atmos. Chem. Phys., 19, 5331–5347, https://doi.org/10.5194/acp-19-5331-2019, 2019. a

Gryspeerdt, E., McCoy, D. T., Crosbie, E., Moore, R. H., Nott, G. J., Painemal, D., Small-Griswold, J., Sorooshian, A., and Ziemba, L.: The impact of sampling strategy on the cloud droplet number concentration estimated from satellite data, Atmos. Meas. Tech., 15, 3875–3892, https://doi.org/10.5194/amt-15-3875-2022, 2022. a, b

Hamby, D. M.: A review of techniques for parameter sensitivity analysis of environmental models, Environ. Monit. Assess., 32, 135–154, https://doi.org/10.1007/BF00547132, 1994. a

Hartmann, D. L., Ockert-Bell, M. E., and Michelsen, M. L.: The Effect of Cloud Type on Earth's Energy Balance: Global Analysis, J. Clim., 5, 1281–1304, https://doi.org/10.1175/1520-0442(1992)005<1281:TEOCTO>2.0.CO;2, 1992. a

Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz-Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., De Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes, M., Geer, A., Haimberger, L., Healy, S., Hogan, R. J., Hólm, E., Janisková, M., Keeley, S., Laloyaux, P., Lopez, P., Lupu, C., Radnoti, G., de Rosnay, P., Rozum, I., Vamborg, F., Villaume, S., and Thépaut, J.-N.: The ERA5 global reanalysis, Q. J. Roy. Meteorol. Soc., 146, 1999–2049, https://doi.org/10.1002/qj.3803, 2020. a

Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., 25 Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., and Thépaut, J-N.: ERA5 hourly data on single levels from 1940 to present, Copernicus Climate Change Service (C3S) Climate Data Store (CDS) [data set], https://doi.org/10.24381/cds.adbb2d47, 2023a. a

Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., and Thépaut, J.-N.: ERA5 hourly data on pressure levels from 1940 to present, Copernicus Climate Change Service (C3S) Climate Data Store (CDS) [data set], https://doi.org/10.24381/cds.bd0915c6, 2023b. a

Heskes, T., Sijben, E., Bucur, I. G., and Claassen, T.: Causal Shapley Values: Exploiting Causal Knowledge to Explain Individual Predictions of Complex Models, Adv. Neural Informat. Process. Syst., 33, 4778–4789, https://doi.org/10.48550/arXiv.2011.01625, 2020. a

Jiang, H., Xue, H., Teller, A., Feingold, G., and Levin, Z.: Aerosol effects on the lifetime of shallow cumulus, Geophys. Res. Lett., 33, L14806, https://doi.org/10.1029/2006GL026024, 2006. a

Kapoor, S., Cantrell, E., Peng, K., Pham, T. H., Bail, C. A., Gundersen, O. E., Hofman, J. M., Hullman, J., Lones, M. A., Malik, M. M., Nanayakkara, P., Poldrack, R. A., Raji, I. D., Roberts, M., Salganik, M. J., Serra-Garcia, M., Stewart, B. M., Vandewiele, G., and Narayanan, A.: REFORMS: Reporting Standards for Machine Learning Based Science, Arxiv, https://arxiv.org/abs/2308.07832, 2023. a

Karpatne, A., Ebert-Uphoff, I., Ravela, S., Babaie, H. A., and Kumar, V.: Machine Learning for the Geosciences: Challenges and Opportunities, Arxiv, https://arxiv.org/abs/1711.04708, 2017. a

Kaufman, Y. J. and Koren, I.: Smoke and Pollution Aerosol Effect on Cloud Cover, Science, 313, 655–658, https://doi.org/10.1126/science.1126232, 2006. a

Kazil, J., Feingold, G., Wang, H., and Yamaguchi, T.: On the interaction between marine boundary layer cellular cloudiness and surface heat fluxes, Atmos. Chem. Phys., 14, 61–79, https://doi.org/10.5194/acp-14-61-2014, 2014. a

Kelleher, M. K. and Grise, K. M.: Examining Southern Ocean Cloud Controlling Factors on Daily Time Scales and Their Connections to Midlatitude Weather Systems, J. Clim., 32, 5145–5160, https://doi.org/10.1175/JCLI-D-18-0840.1, 2019. a, b

Kim, M., Brunner, D., and Kuhlmann, G.: Importance of satellite observations for high-resolution mapping of near-surface NO2 by machine learning, Remote Sens. Environ., 264, 112573, https://doi.org/10.1016/j.rse.2021.112573, 2021. a

Klein, S. A. and Hartmann, D. L.: The Seasonal Cycle of Low Stratiform Clouds, J. Clim., 6, 1587–1606, https://doi.org/10.1175/1520-0442(1993)006<1587:TSCOLS>2.0.CO;2, 1993. a

Leahy, L. V., Wood, R., Charlson, R. J., Hostetler, C. A., Rogers, R. R., Vaughan, M. A., and Winker, D. M.: On the nature and extent of optically thin marine low clouds, J. Geophys. Res.-Atmos., 117, D22201, https://doi.org/10.1029/2012JD017929, 2012. a

Li, W., Migliavacca, M., Forkel, M., Denissen, J. M. C., Reichstein, M., Yang, H., Duveiller, G., Weber, U., and Orth, R.: Widespread increasing vegetation sensitivity to soil moisture, Nat. Commun., 13, 3959, https://doi.org/10.1038/s41467-022-31667-9, 2022. a, b, c

Linardatos, P., Papastefanopoulos, V., and Kotsiantis, S.: Explainable AI: A Review of Machine Learning Interpretability Methods, Entropy, 23, 18, https://doi.org/10.3390/e23010018, 2021. a

Liu, Y., Lin, T., Zhang, J., Wang, F., Huang, Y., Wu, X., Ye, H., Zhang, G., Cao, X., and de Leeuw, G.: Opposite effects of aerosols and meteorological parameters on warm clouds in two contrasting regions over eastern China, Atmos. Chem. Phys., 24, 4651–4673, https://doi.org/10.5194/acp-24-4651-2024, 2024. a

Loeb, N. G. and Schuster, G. L.: An observational study of the relationship between cloud, aerosol and meteorology in broken low-level cloud conditions, J. Geophys. Res.-Atmos., 113, D14214, https://doi.org/10.1029/2007JD009763, 2008. a

Lundberg, S. M. and Lee, S. I.: A unified approach to interpreting model predictions, Adv. Neur. In., abs/1705.07874, https://doi.org/10.48550/arXiv.1705.07874, 2017. a, b

Lundberg, S. M., Erion, G. G., and Lee, S.-I.: Consistent Individualized Feature Attribution for Tree Ensembles, ArXiv, http://arxiv.org/abs/1802.03888, 2018. a

Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., and Lee, S.-I.: From local explanations to global understanding with explainable AI for trees, Nat. Mach. In., 2, 56–67, https://doi.org/10.1038/s42256-019-0138-9, 2020. a, b, c

McCoy, D. T., Bender, F. A.-M., Mohrmann, J. K. C., Hartmann, D. L., Wood, R., and Grosvenor, D. P.: The global aerosol-cloud first indirect effect estimated using MODIS, MERRA, and AeroCom, J. Geophys. Res.-Atmos., 122, 1779–1796, https://doi.org/10.1002/2016JD026141, 2017a. a

McCoy, D. T., Eastman, R., Hartmann, D. L., and Wood, R.: The Change in Low Cloud Cover in a Warmed Climate Inferred from AIRS, MODIS, and ERA-Interim, J. Clim., 30, 3609–3620, https://doi.org/10.1175/JCLI-D-15-0734.1, 2017b. a

Merk, D., Deneke, H., Pospichal, B., and Seifert, P.: Investigation of the adiabatic assumption for estimating cloud micro- and macrophysical properties from satellite and ground observations, Atmos. Chem. Phys., 16, 933–952, https://doi.org/10.5194/acp-16-933-2016, 2016. a

Michibata, T., Suzuki, K., Sato, Y., and Takemura, T.: The source of discrepancies in aerosol–cloud–precipitation interactions between GCM and A-Train retrievals, Atmos. Chem. Phys., 16, 15413–15424, https://doi.org/10.5194/acp-16-15413-2016, 2016. a

Miyamoto, A., Nakamura, H., and Miyasaka, T.: Influence of the Subtropical High and Storm Track on Low-Cloud Fraction and Its Seasonality over the South Indian Ocean, J. Clim., 31, 4017–4039, https://doi.org/10.1175/JCLI-D-17-0229.1, 2018. a, b

Molnar, C.: Interpretable Machine Learning, 2nd Edn., https://christophm.github.io/interpretable-ml-book (last access: 17 November 2024), 2022. a

Myers, T. A. and Norris, J. R.: Observational Evidence That Enhanced Subsidence Reduces Subtropical Marine Boundary Layer Cloudiness, J. Clim., 26, 7507–7524, https://doi.org/10.1175/JCLI-D-12-00736.1, 2013. a, b, c

Myers, T. A. and Norris, J. R.: On the Relationships between Subtropical Clouds and Meteorology in Observations and CMIP3 and CMIP5 Models, J. Clim., 28, 2945–2967, https://doi.org/10.1175/JCLI-D-14-00475.1, 2015. a, b

Myers, T. A. and Norris, J. R.: Reducing the uncertainty in subtropical cloud feedback, Geophys. Res. Lett., 43, 2144–2148, https://doi.org/10.1002/2015GL067416, 2016. a

Myers, T. A., Scott, R. C., Zelinka, M. D., Klein, S. A., Norris, J. R., and Caldwell, P. M.: Observational constraints on low cloud feedback reduce uncertainty of climate sensitivity, Nat. Clim. Change, 11, 501–507, https://doi.org/10.1038/s41558-021-01039-0, 2021. a, b

Padarian, J., McBratney, A. B., and Minasny, B.: Game theory interpretation of digital soil mapping convolutional neural networks, Soil, 6, 389–397, https://doi.org/10.5194/soil-6-389-2020, 2020. a

Platnick, S. and Twomey, S.: Determining the Susceptibility of Cloud Albedo to Changes in Droplet Concentration with the Advanced Very High Resolution Radiometer, J. Appl. Meteorol. Climatol., 33, 334–347, https://doi.org/10.1175/1520-0450(1994)033<0334:DTSOCA>2.0.CO;2, 1994. a

Platnick, S., King, M., and Hubanks, P.: MODIS Atmosphere L3 Daily Product, NASA MODIS Adaptive Processing System, Goddard Space Flight Center [data set], https://doi.org/10.5067/MODIS/MOD08_D3.061, 2015. a

Qu, X., Hall, A., Klein, S. A., and Caldwell, P. M.: The strength of the tropical inversion and its response to climate change in 18 CMIP5 models, Clim. Dynam., 45, 375–396, https://doi.org/10.1007/s00382-014-2441-9, 2015a. a

Qu, X., Hall, A., Klein, S. A., and Deangelis, A. M.: Positive tropical marine low-cloud cover feedback inferred from cloud-controlling factors, Geophys. Res. Lett., 42, 7767–7775, https://doi.org/10.1002/2015GL065627, 2015b. a, b, c, d

Quaas, J., Boucher, O., and Lohmann, U.: Constraining the total aerosol indirect effect in the LMDZ and ECHAM4 GCMs using MODIS satellite data, Atmos. Chem. Phys., 6, 947–955, https://doi.org/10.5194/acp-6-947-2006, 2006. a, b

Rieck, M., Nuijens, L., and Stevens, B.: Marine Boundary Layer Cloud Feedbacks in a Constant Relative Humidity Atmosphere, J. Atmos. Sci., 69, 2538–2550, https://doi.org/10.1175/JAS-D-11-0203.1, 2012. a, b

Rosenfeld, D., Zhu, Y., Wang, M., Zheng, Y., Goren, T., and Yu, S.: Aerosol-driven droplet concentrations dominate coverage and water of oceanic low-level clouds, Science, 363, 6427, https://doi.org/10.1126/science.aav0566, 2019. a, b, c

Sato, Y., Goto, D., Michibata, T., Suzuki, K., Takemura, T., Tomita, H., and Nakajima, T.: Aerosol effects on cloud water amounts were successfully simulated by a global cloud-system resolving model, Nat. Commun., 9, 1–7, https://doi.org/10.1038/s41467-018-03379-6, 2018. a

Schwarz, K., Cermak, J., Fuchs, J., and Andersen, H.: Mapping the Twilight Zone – What We Are Missing between Clouds and Aerosols, Remote Sens., 9, 577, https://doi.org/10.3390/rs9060577, 2017. a

Scott, R. C., Myers, T. A., Norris, J. R., Zelinka, M. D., Klein, S. A., Sun, M., and Doelling, D. R.: Observed Sensitivity of Low-Cloud Radiative Effects to Meteorological Perturbations over the Global Oceans, J. Clim., 33, 7717–7734, https://doi.org/10.1175/JCLI-D-19-1028.1, 2020. a, b, c, d, e, f, g

Seifert, A., Heus, T., Pincus, R., and Stevens, B.: Large-eddy simulation of the transient and near-equilibrium behavior of precipitating shallow convection, J. Adv. Model. Earth Sy., 7, 1918–1937, https://doi.org/10.1002/2015MS000489, 2015. a

Seinfeld, J. H., Bretherton, C., Carslaw, K. S., Coe, H., DeMott, P. J., Dunlea, E. J., Feingold, G., Ghan, S., Guenther, A. B., Kahn, R., Kraucunas, I., Kreidenweis, S. M., Molina, M. J., Nenes, A., Penner, J. E., Prather, K. A., Ramanathan, V., Ramaswamy, V., Rasch, P. J., Ravishankara, A. R., Rosenfeld, D., Stephens, G., and Wood, R.: Improving our fundamental understanding of the role of aerosol-cloud interactions in the climate system, P. Natl. Acad. Sci. USA, 113, 5781–5790, https://doi.org/10.1073/pnas.1514043113, 2016. a

Small, J. D., Chuang, P. Y., Feingold, G., and Jiang, H.: Can aerosol decrease cloud lifetime?, Geophys. Res. Lett., 36, 1–5, https://doi.org/10.1029/2009GL038888, 2009. a

Small, J. D., Jiang, J. H., Su, H., and Zhai, C.: Relationship between aerosol and cloud fraction over Australia, Geophys. Res. Lett., 38, L23802, https://doi.org/10.1029/2011GL049404, 2011. a, b

Snoek, J., Larochelle, H., and Adams, R. P.: Practical Bayesian Optimization of Machine Learning Algorithms, in: Advances in Neural Information Processing Systems, edited by Pereira, F., Burges, C. J., Bottou, L., and Weinberger, K. Q., vol. 25, Curran Associates, Inc., 9 pp., https://arxiv.org/abs/1206.2944 (last access: 17 November 2024), 2012. a

Stirnberg, R., Cermak, J., Kotthaus, S., Haeffelin, M., Andersen, H., Fuchs, J., Kim, M., Petit, J. E., and Favez, O.: Meteorology-driven variability of air pollution (PM1) revealed with explainable machine learning, Atmos. Chem. Phys., 21, 3919–3948, https://doi.org/10.5194/acp-21-3919-2021, 2021. a, b, c

Su, W., Loeb, N. G., Xu, K.-M., Schuster, G. L., and Eitzen, Z. A.: An estimate of aerosol indirect effect from satellite measurements with concurrent meteorological analysis, J. Geophys. Res.-Atmos., 115, D18219, https://doi.org/10.1029/2010JD013948, 2010. a, b

Sundararajan, M. and Najmi, A.: The many shapley values for model explanation, in: 37th International Conference on Machine Learning, ICML 2020, PartF16814, 9210–9220, ISBN 9781713821120, 2020. a

Toll, V., Christensen, M., Quaas, J., and Bellouin, N.: Weak average liquid-cloud-water response to anthropogenic aerosols, Nature, 572, 51–55, https://doi.org/10.1038/s41586-019-1423-9, 2019. a

Turner, D. D.: Improved ground-based liquid water path retrievals using a combined infrared and microwave approach, J. Geophys. Res.-Atmos., 112, D15204, https://doi.org/10.1029/2007JD008530, 2007. a

Twomey, S.: The Influence of Pollution on the Shortwave Albedo of Clouds, J. Atmos. Sci., 34, 1149–1152, https://doi.org/10.1175/1520-0469(1977)034<1149:tiopot>2.0.co;2, 1977. a

van der Dussen, J. J., de Roode, S. R., Dal Gesso, S., and Siebesma, A. P.: An LES model study of the influence of the free tropospheric thermodynamic conditions on the stratocumulus response to a climate perturbation, J. Adv. Model. Earth Sy., 7, 670–691, https://doi.org/10.1002/2014MS000380, 2015. a

Wall, C. J., Hartmann, D. L., and Ma, P.-L.: Instantaneous Linkages between Clouds and Large-Scale Meteorology over the Southern Ocean in Observations and a Climate Model, J. Clim., 30, 9455–9474, https://doi.org/10.1175/JCLI-D-17-0156.1, 2017. a

Wang, S., Wang, Q., and Feingold, G.: Turbulence, Condensation, and Liquid Water Transport in Numerically Simulated Nonprecipitating Stratocumulus Clouds, J. Atmos. Sci., 60, 262–278, https://doi.org/10.1175/1520-0469(2003)060<0262:TCALWT>2.0.CO;2, 2003. a

Wood, R.: Stratocumulus Clouds, Mon. Weather Rev., 140, 2373–2423, https://doi.org/10.1175/MWR-D-11-00121.1, 2012. a

Wood, R. and Bretherton, C. S.: On the relationship between stratiform low cloud cover and lower-tropospheric stability, J. Clim., 19, 6425–6432, https://doi.org/10.1175/JCLI3988.1, 2006. a, b

Wood, R. and Hartmann, D. L.: Spatial Variability of Liquid Water Path in Marine Low Cloud: The Importance of Mesoscale Cellular Convection, J. Clim., 19, 1748–1764, https://doi.org/10.1175/JCLI3702.1, 2006. a

Xue, H. and Feingold, G.: Large-Eddy Simulations of Trade Wind Cumuli: Investigation of Aerosol Indirect Effects, J. Atmos. Sci., 63, 1605–1622, https://doi.org/10.1175/JAS3706.1, 2006. a

Young, G. S., Kristovich, D. A. R., Hjelmfelt, M. R., and Foster, R. C.: Rolls, Streets, Waves, And More: A Review of Quasi-Two-Dimensional Structures in the Atmospheric Boundary Layer, Bull. Am. Meteorol. Soc., 83, 997–1002, https://doi.org/10.1175/1520-0477(2002)083<0997:RSWAMA>2.3.CO;2, 2002. a

Yuan, T. and Oreopoulos, L.: On the global character of overlap between low and high clouds, Geophys. Res. Lett., 40, 5320–5326, https://doi.org/10.1002/grl.50871, 2013. a

Yuan, T., Remer, L. A., and Yu, H.: Microphysical, macrophysical and radiative signatures of volcanic aerosols in trade wind cumulus observed by the A-Train, Atmos. Chem. Phys., 11, 7119–7132, https://doi.org/10.5194/acp-11-7119-2011, 2011. a

Yuan, T., Song, H., Wood, R., Oreopoulos, L., Platnick, S., Wang, C., Yu, H., Meyer, K., and Wilcox, E.: Observational evidence of strong forcing from aerosol effect on low cloud coverage, Sci. Adv., 9, eadh7716, https://doi.org/10.1126/sciadv.adh7716, 2023. a, b, c

Zelinka, M. D., Andrews, T., Forster, P. M., and Taylor, K. E.: Quantifying components of aerosol-cloud-radiation interactions in climate models, J. Geophys. Res.-Atmos., 119, 7599–7615, https://doi.org/10.1002/2014JD021710, 2014. a

Zelinka, M. D., Grise, K. M., Klein, S. A., Zhou, C., DeAngelis, A. M., and Christensen, M. W.: Drivers of the Low-Cloud Response to Poleward Jet Shifts in the North Pacific in Observations and Models, J. Clim., 31, 7925–7947, https://doi.org/10.1175/JCLI-D-18-0114.1, 2018.  a, b

Zhang, Z. and Platnick, S.: An assessment of differences between cloud effective particle radius retrievals for marine water clouds from three MODIS spectral bands, J. Geophys. Res.-Atmos., 116, D20215, https://doi.org/10.1029/2011JD016216, 2011. a

Zhang, Z., Ackerman, A. S., Feingold, G., Platnick, S., Pincus, R., and Xue, H.: Effects of cloud horizontal inhomogeneity and drizzle on remote sensing of cloud droplet effective radius: Case studies based on large-eddy simulations, J. Geophys. Res.-Atmos., 117, D19208, https://doi.org/10.1029/2012JD017655, 2012. a

Zheng, G., Wang, Y., Wood, R., Jensen, M. P., Kuang, C., McCoy, I. L., Matthews, A., Mei, F., Tomlinson, J. M., Shilling, J. E., Zawadowicz, M. A., Crosbie, E., Moore, R., Ziemba, L., Andreae, M. O., and Wang, J.: New particle formation in the remote marine boundary layer, Nat. Commun., 12, 527, https://doi.org/10.1038/s41467-020-20773-1, 2021. a

Zhu, Y., Rosenfeld, D., and Li, Z.: Under What Conditions Can We Trust Retrieved Cloud Drop Concentrations in Broken Marine Stratocumulus?, J. Geophys. Res.-Atmos., 123, 8754–8767, https://doi.org/10.1029/2017JD028083, 2018. a

Zipfel, L., Andersen, H., and Cermak, J.: Machine-Learning Based Analysis of Liquid Water Path Adjustments to Aerosol Perturbations in Marine Boundary Layer Clouds Using Satellite Observations, Atmosphere, 13, 586, https://doi.org/10.3390/atmos13040586, 2022. a, b, c, d, e, f

Download
Short summary
We present a near-global observation-based explainable machine learning framework to quantify the response of cloud fraction (CLF) of marine low clouds to cloud droplet number concentration (Nd), accounting for the covariations with meteorological factors. This approach provides a novel data-driven method to analyse the CLF adjustment by assessing the CLF sensitivity to Nd and numerous meteorological factors as well as the dependence of the Nd–CLF sensitivity on the meteorological conditions.
Altmetrics
Final-revised paper
Preprint