Cloud drop number concentrations over the western North Atlantic Ocean: seasonal cycle, aerosol interrelationships, and other influential factors

Cloud drop number concentrations (Nd) over the western North Atlantic Ocean (WNAO) are generally highest during the winter (DJF) and lowest in summer (JJA), in contrast to aerosol proxy variables (aerosol optical depth, aerosol index, surface aerosol mass concentrations, surface cloud condensation nuclei (CCN) concentrations) that generally peak in spring (MAM) and JJA with minima in DJF. Using aircraft, satellite remote sensing, ground-based in situ measurement data, and reanalysis data, we characterize factors explaining the divergent seasonal cycles and furthermore probe into factors influencing Nd on seasonal timescales. The results can be summarized well by features most pronounced in DJF, including features associated with cold-air outbreak (CAO) conditions such as enhanced values of CAO index, planetary boundary layer height (PBLH), low-level liquid cloud fraction, and cloud-top height, in addition to winds aligned with continental outflow. Data sorted into high- and low-Nd days in each season, especially in DJF, revealed that all of these conditions were enhanced on the high-Nd days, including reduced sea level pressure and stronger wind speeds. Although aerosols may be more abundant in MAM and JJA, the conditions needed to activate those particles into cloud droplets are weaker than in colder months, which is demonstrated by calculations of the strongest (weakest) aerosol indirect effects in DJF (JJA) based on comparing Nd to perturbations in four different aerosol proxy variables (total and sulfate aerosol optical depth, aerosol index, surface mass concentration of sulfate). We used three machine learning models and up to 14 input variables to infer about most influential factors related to Nd for DJF and JJA, with the best performance obtained with gradient-boosted regression tree (GBRT) analysis. The model results indicated that cloud fraction was the most important input variable, followed by some combination (depending on season) of CAO index and surface mass concentrations of sulfate and organic carbon. Future work is recommended to further understand aspects uncovered here such as impacts of free tropospheric aerosol entrainment on clouds, degree of boundary layer coupling, wet scavenging, and giant CCN effects on aerosol–Nd relationships, updraft velocity, and vertical structure of cloud properties such as adiabaticity that impact the satellite estimation of Nd.


Introduction
Aerosol indirect effects remain the dominant source of uncertainty in estimates of total anthropogenic radiative forcing (Boucher et al., 2013;Myhre et al., 2013). Central to these effects is knowledge about cloud drop number concentration (N d ), as it is the connection between the subset of particles that activate into drops (cloud condensation nuclei, CCN) and cloud properties. It is widely accepted that warm clouds influenced by higher number concentrations of aerosol particles have elevated N d and smaller drops (all else held fixed), resulting in enhanced cloud albedo at fixed liquid water path (Twomey, 1977) and potentially suppressed precipitation (Albrecht, 1989) and increased vulnerability to overlying air resulting from enhanced cloud-top entrainment (Ackerman et al., 2004).
Reducing uncertainty in how aerosols and clouds interact within a given meteorological context requires accurate estimates of N d and aerosol concentrations and properties. Since intensive field studies struggle to obtain broad spatial and temporal coverage of such data, satellite remote sensing and reanalysis datasets are relied on for studies examining intra-and interannual features over large spatial areas. Limitations of satellite retrievals are important to recognize. N d is not directly retrieved but derived using other parameters (e.g., cloud optical depth, cloud drop effective radius, cloud-top temperature) and with assumptions about cloud adiabatic growth and N d being vertically constant .
Aerosol number concentrations are usually represented by a columnar parameter such as aerosol optical depth (AOD) and thus not directly below clouds, which is the aerosol layer most likely to interact with the clouds. Furthermore, aerosol data are difficult to retrieve in cloudy columns. Reanalysis datasets circumvent issues for the aerosol parameters as they provide vertically resolved data (e.g., surface layer and thus below clouds) and are available for cloudy columns.
Of special interest in this work is the western North Atlantic Ocean (WNAO) where decades of extensive research have been conducted for topics largely unrelated to aerosol-cloud interactions , thereby providing an opportunity for closing knowledge gaps for this area in a region with a wide range of aerosol and meteorological conditions (Corral et al., 2021;Painemal et al., 2021). Past work showed different seasonal cycles of AOD and N d in this region Sorooshian et al., 2019), which partly motivates this study to unravel why N d behaves differently on seasonal timescales. A previous study investigating seasonal cycles of N d in the North Atlantic region found that cloud microphysical properties were primarily dependent on CCN concentrations while cloud macrophysical properties were more dependent on meteorological conditions (e.g., Sinclair et al., 2020). However, due to the complexity of interactions involved and the covariability between individual components, the magnitude and sign of these feedbacks remain uncertain.
This study uses a multitude of datasets to characterize the N d seasonal cycle and factors related to N d variability. The structure of the results and discussion is as follows: (i) case study flight highlighting the wide range of N d in wintertime and factors potentially affecting that variability; (ii) seasonal cycle of N d and aerosol concentrations based on different proxy variables; (iii) seasonal cycles of factors potentially influential for N d such as aerosol size distribution, vertical distribution of aerosol, humidity effects, and aerosol-cloud interactions; (iv) composite analysis of influential factors on high-and low-N d days in each season; (v) modeling analysis to probe more deeply into N d relationships with other parameters for winter and summer seasons; and (vi) discussion of other factors relevant to N d unexplored in this work.

Study region
We focus on the WNAO, defined here as being bounded by 25-50° N and 60-85° W. A subset of the results focuses on six individual sub-domains representative of different parts of the WNAO (shown later), with five just off the East Coast extending from south to north (south: S; central-south: C-S; central: C; central-north: C-N; north: N) and one over Bermuda. (Minnis et al., 2011, which are based on the application of CERES's retrieval algorithms on the radiances measured by the MODerate resolution Imaging Spectroradiometer (MODIS) instrument aboard the Aqua satellite. Aqua observations used to estimate N d were from the daytime overpasses of the satellite around 13:30 local time (LT). Level 3 daily cloud properties at 1° × 1° spatial resolution (listed in Table 1) were used for the period between January 2013 and December 2017 from CERES-MODIS edition 4 Single Scanning Footprint (SSF) products (Loeb et al., 2016). The CERES-MODIS SSF Level 3 product includes 1° × 1° averaged data according to the cloudtop pressure of individual pixels: low (heights below 700 hPa), mid-low (heights within 700-500 hPa), mid-high (heights within 500-300 hPa), and high (heights above 300 hPa) level clouds. For this study, we only use low-cloud averages.

Satellite observations (CERES-MODIS/CALIPSO)-Relevant cloud parameters were obtained from the Clouds and the Earth's Radiant Energy System (CERES) edition 4 products
N d is estimated based on an adiabatic cloud model : where τ is cloud optical depth and r e is cloud drop effective radius, both of which are obtained from CERES-MODIS for low-level (i.e., surface to 700 hPa) liquid clouds. Q ext is the unitless extinction efficiency factor, assumed to be 2 for liquid cloud droplets, and ρ w is the density of water (1 g cm −3 ). Methods described in Painemal (2018) were used to estimate parameters in Eq. (1) as follows. (i) Adiabatic water lapse rate (C w ) was determined using cloud-top pressure and temperature provided by CERES-MODIS. (ii) The N d estimation is often corrected for the sub-adiabatic profile by applying the adiabatic value (f ad ), but in this work, a value of f ad = 1 was assumed due to both lack of consensus on its value and its relatively minor impact on N d estimation . (iii) k is the parameter representing the width of the droplet spectrum and was assumed to be 0.8 over the ocean. Statistics of N d are often estimated after screening daily observations based on cloud fractions (Wood, 2012;Grosvenor et al., 2018). The purpose of such filters is to reduce the uncertainties associated with the estimation of N d (Eq. 1) driven by the errors in the retrieval of r e and τ from MODIS's observed reflectance in a highly heterogeneous cloud field.
However, this may inadvertently mask the effects of cloud regime on aerosol-cloud interactions by only including certain low-level cloud types in the analyses (e.g., closed-cell stratocumulus). Therefore, we use all N d data regardless of cloud fraction with exceptions being Sects. 3.5 and 4.2 where a filter of low-level liquid cloud fraction (i.e., CF low-liq. ≥ 0.1) was applied.
The Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) instrument aboard the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) provides data on the vertical distribution of aerosols (Winker et al., 2009). Nighttime extinction profiles were acquired from Level 2 version 4.20 products (i.e., 5 km aerosol profile data), between January 2013 and December 2017. We averaged the Level 2 daily extinctions in (i.e., 273K and 101.325 kPa) while FCDP and 2DS measurements correspond to ambient conditions.

Regression analyses
Regression modeling was conducted to investigate relationships between environmental variables and N d . The gradient-boosted regression trees (GBRT) model, classified as a machine learning (ML) model, is used, consisting of several weak learners (i.e., regression trees with a fixed size) that are designed and subsequently trained to improve prediction accuracy by fitting the model's trees on residuals rather than response values (Hastie et al., 2009). Desirable characteristics of the GBRT model include both its capacity to capture nonlinear relationships and being less vulnerable to overfitting (Persson et al., 2017;Fuchs et al., 2018;Dadashazar et al., 2020). Two separate GBRT models were trained using daily CERES-MODIS N d data (1° × 1°) in winter (DJF) and summer (JJA) to reveal potential variables impacting N d . Winter and summer are chosen as they exhibit the highest and lowest N d concentrations, respectively, among all seasons over the WNAO.
Many variables were picked as input parameters (Table 2) for the GBRT model, categorized as being aerosol, dynamic/thermodynamic, or cloud variables. Aerosol parameters included MERRA-2 surface mass concentrations for sulfate, sea salt, dust, and organic carbon. Black carbon concentration was removed from input parameters because of its high correlation (R 2 = 0.6) with organic carbon. The following is the list of thermodynamic/dynamic input parameters derived from MERRA-2: vertical pressure velocity at 800 hPa (ω 800 ), planetary boundary layer height (PBLH), cold-air outbreak (CAO) index, wind speed and wind direction at 2 m (wind 2 m and wind-dir 2 m ), and relative humidity (RH) in the PBL and free troposphere represented by RH 950 and RH 800 , respectively. CAO index is defined as the difference between skin potential temperature (θ skt ) and air potential temperature at 850 hPa (θ 850 ) (Papritz et al., 2015). Updraft velocity plays a crucial role in the activation of aerosol into cloud droplets in warm clouds (Feingold, 2003;Reutter et al., 2009). Since the direct representation of updraft speed is not available from reanalysis data, near-surface wind speed (i.e., wind 2 m ) is used as a representative proxy parameter as an input parameter to the regression models. CERES-MODIS cloud parameters include liquid cloud fraction and cloud-top height for low-level clouds. In addition, PERSIANN-CDR daily precipitation (Rain) was included as a relevant cloud parameter.
Data were split into two sets: training/validation (70 %) and testing (30 %). Five-fold crossvalidation was implemented to train the GBRT model using the training/validation data. Furthermore, both performance and generalizability of the trained models were tested via the aid of the test set, which was not used in the training process. Hyperparameters of the GBRT models were optimized through a combination of both random and grid search methods. Table S1 in the Supplement shows the list of important hyperparameters of the GBRT model and associated ranges tested via random and grid search methods. The optimized model hyperparameters can also be found in Table S1. The GBRT models were performed using the scikit-learn module designed in Python (Pedregosa et al., 2011).
The regression analyses were not performed solely to construct and provide a highly accurate model useful for prediction, but rather to disclose and examine the possible effects of the relevant input variables on N d considering all the shortcomings of such analyses. For instance, there is some level of interdependency between input variables. To reduce unwanted consequences of correlated features, the interpretation of the results was done with the aid of accumulated local effect (ALE) plots, which are specifically designed to be unbiased to the correlated input variables (Apley and Zhu, 2020). ALE plots illustrate the influence of input variables on the response parameter in ML models. The ALE value for a particular variable s at a specific value of x s (i.e., f s,ALE (x s )) can be calculated as follows: where f s (z s x c ) is the gradient of model's response with respect to variable s (i.e., local effect) and P(x c |z s ) is the conditional distribution of x c , where c denotes the other input variables rather than s, and x c is the associated point in the variable space of c. z 0,1 is chosen arbitrarily below the smallest observation of feature s (Apley and Zhu, 2020). The steps in Eq.
(2) can be summarized as follows (Molnar, 2019;Apley and Zhu, 2020): (i) the average change in the model's prediction is calculated using the conditional distribution of features; (ii) the average change will then be accumulated by integrating it over feature s; and (iii) a constant will be subtracted to vertically center (i.e., the average of ALE becomes zero) the ALE plot. The aforementioned steps, although seemingly complex, assure the avoidance of undesired extrapolation (especially an issue for correlated variables) occurring in alternative approaches such as partial dependence (PD) plots. The value of f s,ALE (x s ) can be viewed as the difference between the model's response at x s and the average prediction. We used the source code available in https://github.com/blent-ai/ALEPython (Jumelle et al., 2021) for the calculation of ALE plots.

Results and discussion
3.1 Aircraft case study of N d gradient ACTIVATE Research Flight 5 (RF05) on 22 February 2020 demonstrates the wide range in N d offshore in the PBL (1.6 km) over the WNAO (Fig. 1). On this day, the ACTIVATE study region was dominated by a surface high-pressure system centered over the southeastern US, with a significant ridge axis extending from the main high to the eastnortheast off the Virginia-North Carolina coast and into the WNAO. Aloft, the flight region was located in northwesterly flow behind a trough offshore. This setup led to subsidence in the region and generally clear skies, except where scattered to broken marine boundary layer clouds formed along and east of the Gulf Stream. The 2 d NOAA HYSPLIT (Stein et al., 2015;Rolph et al., 2017) back trajectories using the "model vertical velocity" method and "REANALY-SIS" meteorology data indicate air in the flight region (between 0-3 km) had wrapped around the surface high from the north and left the New England coast 12-24 h before-hand (with a descending profile). Along the flight segment shown, winds were approximately 6 m s −1 , out of the northnorthwest during the initial descent, Min. Alt. 1, and BCB1 legs and primarily from the northeast for the other sections of the flight. Sea surface temperatures were 6-9 °C near the coast during the descent and Min. Alt. 1 leg (readers are referred to  Fig. 1, supermicrometer concentrations varied over 2 orders of magnitude (0.002-0.51 cm −3 ) and expectedly did not exhibit a pronounced offshore gradient as it is naturally emitted from the ocean.
Closer to shore during the Min. Alt. 1 leg, nitrate was the dominant aerosol species (~ 70% mass fraction). Farther offshore during both the BCB1 leg and cloud-free portion of the ACB1 leg, organics were the dominant constituent (~ 46% mass fraction), whereas farther during the BCB3 leg, the mean mass fraction of sulfate was the highest (75 %). Droplet residual particle data show a greater contribution of organics farther offshore, increasing from 46% to 75% between the ACB1 and ACB3 legs, respectively. These composition results, albeit limited to the non-refractory portion of submicrometer aerosol particles, reveal significant changes with distance offshore indicative of varying chemical properties of particles activating into droplets.
The cloudy portions of ACB1 are characterized as having little or no rain with a maximum RWC value of 0.02 g m −3 and mean value of 0.003 g m −3 . There is a notable RWC peak at the beginning of the Min. Alt. 2 leg, reaching as high as 1.81 g m −3 associated with clouds aloft. The precipitation occurrence was also evident in a subsequent BCT1 leg where RWC reached as high as 0.18 g m −3 . GOES satellite imagery of the study region ( Fig. 1) also reflects the effect of precipitation on cloud morphology where clouds farther offshore resemble open-cell structures. Associated scavenging of particles through the washout process is presumed to contribute to the decline in aerosol concentrations with distance offshore. Figure 1 shows changes in aerosol characteristics coincident with the large gradient in N d .
While ACTIVATE airborne data collection is ongoing to build flight statistics over multiple years, the wide changes in microphysical properties in RF05 motivate looking at other datasets with broader spatiotemporal coverage to learn about potential seasonally dependent drivers of N d , including meteorological parameters that vary throughout the year.
Furthermore, other datasets can provide insight into the source(s) of seasonal discrepancy between columnar aerosol remote sensing parameters and N d . consistent with the airborne data. The seasons with the greatest AOD values, accompanied by the most pronounced spatial gradient offshore, were JJA and MAM. The offshore gradient owes to continental pollution outflow (Corral et al., 2021, and references therein). In contrast, DJF and SON exhibited lower AOD values with a distinct area of higher AOD values offshore between ~ 35-40° N accounted for by sea salt. MERRA-2 speciated AOD data ( Fig. 3) indicate that sea salt and sulfate dominate total AOD regardless of season and that sulfate, organic carbon, and black carbon most closely follow the offshore gradient pattern owing to continental sources. Dust and sea salt have different spatial distributions with the former derived from sources such as North Africa leading to enhanced AODs < 30° N in particular in JJA and sea salt being enhanced offshore in particular in JJA. . We attribute the slightly different seasonal cycle over Bermuda to its remote nature, leading to differences in meteorology and aerosol sources between seasons.

Seasonal cycles of N d and AOD
One factor that could bias AOD towards higher values with disproportionately less impact on N d is aerosol hygroscopic growth in humid conditions. Table 3 summarizes mean MERRA-2 RH values in the PBL and free troposphere (FT). Results show that while RH is highest in JJA (except for FT of DJF in sub-domain N), differences between seasons were not very large. The maximum difference among the four seasons when considering mean RH in the PBL and FT for all sub-domains ranged between 3 %-9% and 7 %-25 %, respectively. Consequently, humidity effects on remotely sensed aerosol parameters are less likely to be the sole explanation of the dissimilar seasonal cycle of N d and AOD, but can plausibly contribute to some extent.
One factor that could drive the seasonal variation in N d is the unwanted effects of retrieval errors in the estimation of N d at low-cloud-coverage conditions. Uncertainty associated with the estimation of N d from MODIS observation increases as cloud fraction decreases . This is mainly because of the overestimation of droplet effective radius (r e ) in the retrieval algorithm due to the interference of cloud-free pixels and also high spatial inhomogeneity in low-cloud-coverage conditions that violates horizontal homogeneity assumptions in the retrieval of r e and τ from radiative transfer modeling (Zhang et al., 2012(Zhang et al., , 2016. To test whether retrieval errors in N d are the main driver of seasonal trends, Fig. S1 shows the seasonal cycle of N d at various low-level liquid cloud fractions. The results show that as cloud fraction increases the average N d increases, regardless of season. Perhaps the more important result is that the seasonal trend in spatial maps of N d remains similar regardless of cloud fraction. This finding is important as it confirms that the seasonal cycle in N d cannot be solely explained by the uncertainties associated with the retrieval of N d at low cloud fraction.

Contrasting AOD and aerosol index
While previous studies have pointed to the limitations of AOD as an aerosol proxy (e.g., Stier, 2016;Gryspeerdt et al., 2017;Painemal et al., 2020), the N d -AOD anticorrelation at seasonal scale over the WNAO is at odds with findings for other regions supporting the relationship between these two parameters (Nakajima et al., 2001;Sekiguchi et al., 2003;Quaas et al., 2006Quaas et al., , 2008Grandey and Stier, 2010;Penner et al., 2011;Gryspeerdt et al., 2016) and also that between sulfate and N d (Boucher and Lohmann, 1995;Lowenthal et al., 2004;Storelvmo et al., 2009;McCoy et al., 2017McCoy et al., , 2018MacDonald et al., 2020). Values of N d are influenced by the number concentration of available CCN, which is determined by aerosol properties (size distribution and composition) and supersaturation level. AOD is an imperfect CCN proxy variable because it does not provide information about composition and size distribution and is sensitive to relative humidity. Aerosol index (AI) is more closely related to CCN as it partially accounts for the size distribution of aerosols (Deuze et al., 2001;Nakajima et al., 2001;Breon et al., 2002;Hasekamp et al., 2019). The sensitivity of AI to size is evident in spatial maps for each season showing more of an offshore gradient (like sulfate AOD in Fig. 3) in each season and lacking both the offshore peak in sea salt between ~ 35-40° N and the maximum AOD for dust south of 30° N in JJA. However, when comparing absolute values between the four seasons in Fig. 2b, AI exhibits a similar seasonal cycle to AOD, thereby indicating that size distribution alone cannot explain diverging seasonal cycles for N d and AOD. We next compare N d to aerosol data in the PBL where CCN more relevant to droplet activation are confined. Size distribution effects in the PBL can instead be more of a factor, especially as sea salt is abundant.

Aerosol size distribution and vertical aerosol distribution
Vertical profiles of aerosol extinction coefficient estimated from CALIOP nighttime observations are shown in Fig. 4 for the six sub-domains. Also shown are the seasonally representative planetary boundary layer heights (PBLHs) from MERRA-2, with numerical values of both PBLH and fractional AOD contributions to the PBL and FT in Table 3. Although here we used nighttime observations from CALIOP because of having higher signal-to-noise ratio than daytime observations, we expect the general seasonal trends discussed here to remain the same regardless of the observation time. The CALIOP results indicate that aerosol extinction more closely follows the N d seasonal cycle with the highest (lowest) values in the PBL during DJF (JJA). However, aerosol extinction coefficient is sensitive to aerosol size distribution, and a plausible scenario is that DJF extinction in the PBL is primarily contributed by coarse sea salt particles, which are especially hygroscopic but do not contribute significantly to number concentration as demonstrated clearly by airborne observations (i.e., FCDP >3μm time series shown in Fig. 1d). This is supported in part by how DJF is marked by the highest fractional AOD contribution from the PBL (59 %-72 %) where sea salt is concentrated. In contrast, JJA has the lowest fractional AOD contribution from the PBL (11.3 %-52.6 %). It is also possible that the higher fractional AOD contribution from the PBL in winter is partly owed to aerosol particles being more strongly confined to the PBL compared to the summer. Sub-domains C-N and N exhibit the greatest changes in AOD fraction in the PBL between seasons with a maximum in DJF (59 %-61 %) and a minimum in JJA (11%-19%), suggesting they are relatively more sensitive to the aerosol vertical distribution in leading to contrasting AOD and N d seasonal cycles.
Bermuda stands out as having the highest AOD fractional contributions in the PBL in DJF (72 %) and SON (69 %) and among the highest seasonal total AODs in those two seasons (0.14 in DJF and 0.10 in SON) assisted in large part by sea salt (Fig. 3) (Aldhaif et al., 2021), coincident with high seasonal wind speeds (Corral et al., 2021).
To explore aerosol number concentration characteristics in the PBL in different seasons, we next discuss results from an opportune dataset over the US East Coast (Cape Cod, MA) providing an annual profile of CCN concentration at 1% supersaturation (Fig. 5). Cape Cod is a coastal location representative of the outflow, providing an important fraction of the CCN impacting offshore low-level clouds. As the supersaturation examined is relatively high (1 %), the measured CCN include smaller particles representing high number concentrations that would not appreciably contribute to the high aerosol extinctions from CALIOP in the PBL in direct contrast to sea salt (i.e., high extinction due to fewer but larger particles). Seasonal mean CCN values do not follow the seasonal cycle of N d nor CALIOP extinction in the PBL, with values being as follows: DJF = 1436 cm −3 ; MAM = 1533 cm −3 ; JJA = 1895 cm −3 ; and SON = 1326 cm −3 . These results suggest the following: (i) size distribution effects are significant in the PBL when comparing extinction to number concentration, and (ii) aerosol vertical distribution behavior cannot alone explain the divergent seasonal cycles of N d and aerosol parameters (e.g., AOD, AI, surface number concentrations).
We next compare MERRA-2 speciated aerosol concentrations at the surface (Fig. 6) to those of speciated AOD (Fig. 3). Surface mass concentrations have the limitation of being biased by larger particles (similar to extinction). The seasonal cycle of mean values for speciated AOD and surface concentration for individual sub-domains generally agree with the exception that there was disagreement for sulfate in each sub-domain (see seasonal mean values in Table S2). Sulfate exhibited higher AODs in JJA but with surface concentrations usually being highest in DJF or MAM; although differences in seasonal mean mass concentrations were relatively small (< 1 μg m −3 ). A plausible explanation is enhanced secondary production of sulfate via oxidation of SO 2 or DMS convectively lifted to the free troposphere in JJA. An important result confirmed by the surface mass concentrations is that sea salt is an order of magnitude higher than the other species, supporting the previous speculation that sea salt dominates the aerosol extinction in the PBL from CALIOP.

Aerosol-cloud interactions
Studies of China's east coast have shown that the aerosol indirect effect is especially strong in wintertime, whereby pollution outflow leads to high N d and suppressed precipitation (Berg et al., 2008;Bennartz et al., 2011). It is hypothesized that a similar effect is taking place off of North America's east coast, which could in part explain enhanced N d without a significant jump in aerosol parameter (e.g., AOD, AI) values necessarily. Grosvenor et al. (2018) suggested that high cloud fractions in wintertime off these east coasts relative to other seasons are coincident with strong temperature inversions usually associated with coldair outbreaks that serve to concentrate and confine surface layer aerosols. We examine the relative seasonal strength of the aerosol indirect effect via spatial maps of the following metric commonly used in aerosol-cloud interaction (ACI) studies: where α represents an aerosol proxy parameter that is represented here as AI, AOD, the speciated sulfate AOD (Sulfate AOD ), and sulfate surface mass concentration (Sulfate sf-mass ). The expected range by common convention is 0-1, with higher values suggestive of greater enhancement in N d for the same increase in the aerosol proxy parameter. Table 4 shows that DJF always exhibits the highest ACI values regardless of the aerosol proxy used, consistent with a stronger aerosol indirect effect in DJF over East Asia. The mean ACI values in DJF using AI, AOD, Sulfate AOD , and Sulfate sf-mass ranged from 0.25 to 0.55, 0.28-0.59, 0.25-0.53, and 0.22-0.47, respectively, depending on the sub-domain. Spatial maps of ACI (Fig. 7) do not point to significant geographic features. Coefficients of determination (R 2 ) for the linear regression between ln(N d ) and ln(α) when computing seasonal ACI values were generally low (≤ 0.30), with spatial maps of R 2 and data point numbers in Fig. S2. Poor correlations are suggestive of the non-linear nature of aerosolcloud interactions (e.g., Gryspeerdt et al., 2017) and the influence of other likely factors such as dynamical processes and turbulence, data spatial resolution and dataset size, cloud adiabaticity, wet scavenging effects, and aerosol size distribution (McComiskey et al., 2009). The results of this section suggest though that aerosol indirect effects could be strongest in DJF, meaning that N d values increase more for the same increase in aerosol. Factors that can contribute to higher ACI values in winter than summer include seasonal differences in the following: (i) dynamical processes and turbulent structures of the marine boundary layer; (ii) aerosol size distributions and consequently varying particle number concentrations for a fixed mass concentration; and (iii) hygroscopicity of particles, especially as a result of changes in the composition of the carbonaceous aerosol fraction. Regarding dynamical processes and the effects of turbulence, Fig. 2 in Painemal et al. (2021) shows that heat fluxes (i.e., latent and sensible fluxes) are strongest (lowest) in the winter (summer) over the WNAO. The greater heat fluxes in DJF can contribute to more turbulent and coupled marine boundary layer conditions in winter than summer, presumably resulting in more efficient transport and activation of aerosol in the marine boundary layer, leading to higher ACI values. Forthcoming work will probe this issue in greater detail.

Discussion of potential influential factors
We probe deeper into factors related to the N d seasonal cycle by using (Sect. 4.1) composite analyses based on high-and low-N d days and (Sect. 4.2) advanced regression techniques tackling non-linear relationships. We focus the analyses on one sub-domain (C-N) for both simplicity and intriguing characteristics: (i) among the highest anthropogenic AOD values over the WNAO; (ii) significant seasonal changes in fractional AOD contribution to the PBL; (iii) close to the Cape Cod site where CCN data were shown; and (iv) the aerosol indirect effect (Table 4) strongest (weakest) in DJF (JJA). precipitation (and thus not capturing all drizzle) and for all cloud types, including more heavily precipitating clouds deeper and higher than the low-level clouds examined for N d .

Composite analysis
Another factor potentially contributing to the observed counterintuitive trends is the temporal offset between N d estimations from MODIS-Aqua and precipitation data from PERSIANN-CDR.
The mean seasonal climatological values and anomalies suggest that high-N d cases are marked by continental outflow, high cloud fractions, high PBLH, and low SLP, all of which occur most commonly in DJF and are associated with cold-air outbreaks. These events are marked by cold air over the warm ocean leading to strong surface heat fluxes, boundary layer deepening, weakened inversion strength, and high and deep clouds (Brummer, 1996;Kolstad et al., 2009;Fletcher et al., 2016;Abel et al., 2017;Naud et al., 2018). Coincident with these features is the Icelandic Low, which is a significant climatological feature of the North Atlantic whereby subpolar low pressure builds in extratropic areas beginning in the fall with westerly winds in the boundary layer that shift more to northerly in the winter Painemal et al., 2021). This low-pressure system seems to be stronger on high-N d days, resulting in more continental outflow and high number concentrations of CCN; the greater CAO index values near the coast promote high cloud coverage, affording more opportunity for cloud processing of particles to ultimately enhance droplet activation. While there can be considerable enhancement in N d as cold-air outbreak air masses evolve over warmer waters, precipitation scavenging farther downwind will be an efficient method of boundary layer aerosol (and N d ) removal (Abel et al., 2017;Lloyd et al., 2018), which contributes at least in part to the sharp N d gradients offshore demonstrated in Fig. 1.

Multivariate regression analysis
Modeling analysis focuses on the two seasons (DJF and JJA) with the extremes in terms of seasonal mean values for N d and aerosol parameters. Added motivation for examining those two seasons stems from spatial maps of R 2 based on ACI analysis (Fig. S2). Using the surface sulfate concentration as the aerosol proxy generally yielded higher R 2 values in three seasons (DJF = 0.13, MAM = 0.05, SON = 0.08) except JJA (0.02) for which the choice did not matter owing to low R 2 (≤ 0.03) values for all four aerosol proxy variables tested. Although the R 2 values are all generally low, DJF and JJA are the seasons when surface sulfate levels are the most and least capable of explaining N d , with R 2 among the four proxy variables exhibiting the widest (DJF values: 0.07-0.13) and narrowest range (JJA: 0.01-0.03) of values. We address here how much improvement is gained in modeling N d by advancing from linear regressions based on one input variable to (i) adding more input variables and (ii) moving to a more sophisticated model (GBRT) that captures non-linear relationships.
We show in Table 5 the performance of two linear models based on a single linear regression (with sulfate mass concentration) and a multi-regression that uses 14 input variables listed in Table 2. In addition, Table 5 also lists the performance of the GBRT model that ingests 14 input variables, similar to the linear multi-regression model. The average R 2 scores of the test set for predicting N d based on a linear regression using only sulfate surface mass concentration were 0.17 and 0.09 in DJF and JJA, respectively. In contrast, R 2 between the multi-regression linear model and the test dataset increased to 0.28 and 0.25 for DJF and JJA, respectively. This increase in predictive capability was helpful to reduce the gap between seasons by presumably accounting for factors more important in JJA aside from surface concentration of sulfate. The R 2 scores increased even more to 0.47 and 0.43 for DJF and JJA, respectively, for the GBRT model. Therefore, accounting for non-linear relationships improved predictive capability in both seasons. It is important to note that the GBRT model was robust in terms of overfitting and especially generalizability as R 2 values of the test and validation sets were similar for both seasons.
We next discuss the importance ranking of different parameters from Table 2 in terms of influencing N d for DJF and JJA (Fig. 13). Low-level liquid cloud fraction was the most important parameter in both seasons with some commonality in the next three parameters for both seasons. In DJF, sulfate surface mass concentrations were the second most important factor, followed by organic carbon surface concentrations and low-level liquid cloud-top effective height. As sulfate is secondarily produced via gas-to-particle conversion processes, this result is consistent with those from Fig. 1 showing the presumed strong impact of particles smaller than 100 nm in impacting N d values close to shore. In JJA, the CAO index was the second most important, followed by organic carbon and sulfate surface concentrations. Also, our results throughout the study and supported by modeling are in agreement with Quinn et al. (2017) that sulfate particles contribute more to the CCN budget than sea salt particles. In DJF and JJA, the fifth most important factor was CAO index (second most important in JJA) and PBLH (11th most important in DJF), respectively. Fig. 13. In both seasons, but especially DJF, enhanced surface concentrations of sulfate and organic carbon coincide with higher N d , whereas there was not any obvious positive association between N d and either sea salt or dust (Fig. 14). Dust in JJA and sea salt in DJF, seasons of which each respective aerosol type is most predominant, exhibited negative relationships with N d . Such a negative relationship is plausibly related to differences between ACI when calculated using AOD versus AI (Painemal et al., 2021); for instance, coarse sea salt can expedite collision-coalescence and thus reduce N d , which has the effect of reducing ACI (Eq. 3) and even possibly yielding negative values (Table 4). Negative values of other ACI constructs coincident with poor R 2 values have previously been attributed to potential effects of giant CCN (Terai et al., 2015;Dadashazar et al., 2017), but further research needs to examine this in more detail. (i.e., rising motion) and CAO index values above 0 in DJF without such relationships in JJA (Fig. 15). The six parameters in Fig. S21 (PBLH, RH 950 , RH 800 , Rain, Wind 2 m , Winddir 2 m ) did not reveal very pronounced trends with N d in either season consistent with how they did not rank highly in importance (Fig. 13). Of particular interest is Wind 2 m , which is used here as a proxy variable for updraft speed in the marine boundary layer, which is expected to have a high impact on N d via its effect on in-cloud supersaturation. Although the ALE plot of Wind 2 m suggested a small increase of about ~ 10 cm −3 in N d as the wind speed increased, Wind 2 m did not come out as a very important parameter in either season. This may be due to the fact that environmental conditions representing updraft speed were already included in parameters such as cloud fraction and CAO index. Another explanation can be the shortcomings and high uncertainties associated with the use of Wind 2 m as a proxy for updraft speed.

Figures 14 and 15 show accumulated local effect (ALE) plots for the various parameters ranked in
The results of regression analysis highlight the high sensitivity of N d to cloud fraction regardless of season. As discussed earlier, this can be attributed largely to two factors: (i) the relationship between cloud type (e.g., stratocumulus, shallow cumulus) and cloud fraction, which can, in turn, influence cloud microphysical properties like N d , and (ii) (Boucher and Lohmann, 1995;Lowenthal et al., 2004;Storelvmo et al., 2009;McCoy et al., 2017McCoy et al., , 2018MacDonald et al., 2020). The second most important factor for DJF was the surface mass concentrations of organic carbon followed by CF low-liq. and sea salt surface mass concentrations. On the other hand, the second most important factor in JJA was CAO index followed by CF low-liq. and wind direction. ALE plots presented in Figs. S23-S25 showed similar relationships between N d and input parameters as observed for the original runs where full datasets were used as the input. Figure S26 shows the results of the GBRT model using input data with a cloud fraction between 0.2 and 0.4, the condition relatively more prevalent in JJA than DJF. The average R 2 scores for validation and test sets for these runs were 0.30/0.30 (DJF/JJA) and 0.33/0.31 (DJF/JJA), respectively. It is interesting to see that for both seasons, an aerosol parameter emerged as the most important factor. Mass concentrations of OC appeared as the most important factor in JJA (the fourth most important factor in DJF) while in DJF, sulfate concentration came out as the most important factor (the fourth most important factor in JJA) consistent with the results of previously discussed models for DJF. It should be noted that ALE plots also suggested less sensitivity of N d to sulfate in JJA than DJF, similar to the results observed in the original model run including all the data points. The second most Dadashazar et al. Page 17 Atmos Chem Phys. Author manuscript; available in PMC 2021 August 09.
important factor in DJF turned out to be the cloud-top effective height of low-level liquid clouds followed by CAO index. On the other hand, CAO index was the second most important factor in JJA followed by PBLH. ALE plots presented in Figs. S27-S29 also showed similar qualitative trends observed in original and high-cloud-coverage runs.

Unexplored factors
Additional factors impacting the relationship between aerosol and N d seasonal cycles are discussed here that warrant additional research with more detailed data at finer scales such as with aircraft. We are cognizant that this list is not fully exhaustive. As low-level cloud fraction impacted model results of Sect. 4.2 so substantially, the dynamics of the studied clouds require further characterization. As cloud fraction and CAO index are well related, especially in DJF, aerosol-cloud interactions are likely stronger than other seasons (as implied by Sect. 3.5) due in part to enhanced surface fluxes and turbulence and thus more droplet activation with higher cloud supersaturations (Painemal et al., 2021); in contrast, the smaller shallow cumulus clouds in summertime may be less favorable for droplet activation due to factors such as reduced turbulence and more lateral entrainment.
Entrainment of free tropospheric aerosol can impact N d values, with potentially varying degrees of influence between seasons. It is presumed that with summertime convection, the more broken cumulus scenes are less adiabatic through the cloudy column and more affected by entrainment and mixing; hence, N d values derived using data that remote sensors retrieve near cloud top could be considerably lower than values lower by cloud base. Satellite remote sensing studies of aerosol-cloud interactions will presumably be more challenging in winter periods versus the summer with regard to the spatial and temporal mismatch between cloud and aerosol retrievals. More specifically, it is easier to get nearly coincidental sampling in summertime due to lower cloud fractions, while in winter the frontal regions with high cloud fractions make it challenging to get aerosol retrievals. There is complexity in understanding how aerosols relate to N d due to how giant CCN can reduce N d and also since wet scavenging can remove aerosols efficiently. As aircraft data are limited and difficult to use for assessing seasonal cycles, new techniques of retrieving CCN and N d from space will greatly assist such types of studies in the future.

Conclusions
This work investigates the seasonal cycle of N d over the WNAO region in terms of concentration statistics and with discussion of potential influential factors. The results of this work have implications for increased understanding of aerosol-cloud interactions and meteorological factors influencing concentration of cloud droplets in the marine boundary layer. The results and interpretations can be summarized as follows in the order of how they were presented.
• An ACTIVATE case flight during the DJF season shows a sharp offshore N d gradient ranging from > 1000 to < 50 cm −3 explained in part by particles smaller than 100 nm activating into drops during a cold-air outbreak with post-frontal clouds. There were significant changes in aerosol composition in cloud-free air and also in droplet residual particles as a function of offshore distance. These changes included a sharp decrease in aerosol number concentration, a decrease in mass fraction of sulfate in droplet residual particles, and an increase in mass fraction of organic and chloride of droplet residual particles moving offshore.
• N d is generally highest (lowest) in DJF (JJA) over the WNAO, but aerosol parameters such as AOD, AI, surface-based aerosol mass concentrations for most species, and CCN concentrations (1% supersaturation) are generally highest in JJA and MAM and are at (or near) their lowest values in DJF. While aerosol extinction in the PBL is highest in DJF, it is driven largely by sea salt (large but few in number) and thus cannot explain the N d peak in wintertime.
• While relative humidity was generally highest in JJA across the WNAO, the differences between seasons in the PBL and FT were not sufficiently large to explain the divergent seasonal cycles of AOD and N d .

•
The susceptibility of N d to aerosols (Eq. 3) was strongest in DJF using four different proxy variables for aerosols, suggestive of at least one reason why N d can be highest when aerosol proxy variables for concentration are typically near or at their lowest values.

•
Composite maps of high-versus low-N d days across the WNAO reveal that conditions associated with the highest-N d days, regardless of season (but especially DJF), are reduced sea level pressure, stronger winds aligned with continental outflow, high low-level liquid cloud fraction, higher CAO index and PBLH, and enhanced AOD. Cold-air outbreaks are coincident with all of these conditions, especially in the colder months of DJF in sharp contrast to JJA when N d is lowest.

•
Gradient-boosted regression analysis shows that the most important predictors of N d in DJF and JJA vary to some extent, but with cloud fraction being the most important parameter, followed by either (for DJF) surface mass concentrations of sulfate and organic carbon and CAO index or (for JJA) CAO index, surface mass concentrations of organic carbon, and sulfate concentrations. Accumulated local effect plots confirm that sulfate and organics help drive the high N d values via continental outflow, which is assisted in large part by conditions associated with CAOs such as high cloud fraction and high CAO index.
Therefore, the combination of continental pollution outflow and turbulence changes contributed by surface fluxes (manifested in the strongest CAO index values in DJF and weakest in JJA) markedly influence the N d cycle, leading to differing annual cycles in cloud microphysics and aerosols. More detailed data such as from aircraft and modeling can help extend this line of research to confirm these findings and speculations such as how (i) the aerosol indirect effect is strongest in DJF due to boundary layer dynamics such as with more turbulence and mixing than other seasons (Painemal et al., 2021); (ii) enhanced giant CCN in forms such as sea salt and dust can reduce N d via expediting the collision-coalescence process; and (iii) substantial aerosol removal can occur far offshore as postfrontal clouds associated with CAOs build and then begin to precipitate. The latter hypothesis may help explain why Bermuda (> 1000 km offshore the US East Coast) was the only selected subdomain in this study to not have a seasonal N d peak in DJF.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.        Seasonal maps of the aerosol-cloud interaction (ACI) parameters over the WNAO using daily N d and four different aerosol proxy parameters (AI, AOD, Sulfate AOD , Sulfate sf-mass ) from CERES-MODIS and MERRA-2, respectively. ACI statistics associated with the six sub-domains shown are summarized in Table 4. Dadashazar et al. Page 32 Atmos Chem Phys. Author manuscript; available in PMC 2021 August 09.

Figure 8.
Seasonal climatology of sea level pressure (SLP) (middle column) and anomalies from seasonal averages for low-N d days (left column) and high-N d days (right column). In the left and right columns, red and blue contours are associated with positive and negative anomalies from the climatology, respectively. The green box represents sub-domain C-N for which the analysis was conducted. Dadashazar et al. Page 33 Atmos Chem Phys. Author manuscript; available in PMC 2021 August 09.

Figure 9.
Seasonal climatology of near-surface (2 m above ground) wind speed (middle column) and mean values for low-N d days (left column) and high-N d days (right column). The reference wind vector is shown in the top left panel. The red box represents sub-domain C-N for which the analysis was conducted.   Seasonal averages of cold-air outbreak (CAO) index (middle column) and associated anomalies on low-N d days (left column) and high-N d days (right column). The red box represents sub-domain C-N for which the analysis was conducted.  Seasonal averages of MERRA-2 AOD (middle column) and associated anomalies on low-N d days (left column) and high-N d days (right column). The red box represents sub-domain C-N for which the analysis was conducted.     List of input parameters used as predictor variables in the GBRT and linear models. Variables are grouped into three general categories. Average drop number concentration (N d ), MERRA-2 AOD, and vertically resolved AOD characteristics from CALIOP for each season over the subdomains shown in Fig. 2. Total CALIOP AOD is shown outside parentheses, and numbers inside are the percent AOD fraction in the planetary boundary layer followed by in the free troposphere. Also shown are PBLHs (shown in Fig. 4) and the relative humidity in the PBL and FT.