Parameterising cloud base updraft velocity of marine stratocumuli

The number of cloud droplets formed at the cloud base depends both on the properties of aerosol particles and the updraft velocity of an air parcel at the cloud base. As the spatial scale of updrafts is too small to be resolved in global atmospheric models, the updraft velocity is commonly parameterised based on the available turbulent kinetic energy. Here we present alternative methods through parameterising updraft velocity based on high-resolution large eddy simulation (LES) runs in the case of marine stratocumulus clouds. First we use our simulations to assess the accuracy of a simple linear parametri5 sation where the updraft velocity depends only on cloud top radiative cooling. In addition, we present two different machine learning methods (Gaussian process emulation and random forest) that account for different boundary layer conditions and cloud properties. We conclude that both machine learning parameterisations reproduce the LES-based updraft velocities at about the same accuracy, while the simple approach employing radiative cooling only produce on average lower coefficient of determination and higher root mean square error values. Finally, we apply these machine learning methods to find the key 10 parameters affecting cloud base updraft velocities.


Introduction
Clouds are important for the global climate due to their ability to reflect solar radiation (shortwave radiation) and trap outgoing longwave infrared radiation but there are still several unknowns and uncertainties related to their dynamics including aerosol-cloud interactions (e.g., Wood, 2012;Seinfeld et al., 2016;Schneider et al., 2017;Rosenfeld et al., 2019). Cloud for-

Methods
The process for creating a LES-based updraft velocity parameterisation for marine boundary layer clouds is shown as a pipeline in Fig. 1. Here the aim is to create a parameterisation that would represent detailed LES runs with low computational cost.
The pipeline has three main stages. First, we need source data that describes boundary layer conditions in a large number of marine boundary layer cloud cases (Sect. 2.2.1). Second, a representative subset of this data is selected (Sect. 2.2.2) and the corresponding simulations are run with the LES model (Sect. 2.3). Third, specific LES model outputs are selected from these simulations and corresponding parameterisations are created (Sect. 2.4).

LES
The parameterisations developed in this study are based on simulations with UCLALES-SALSA (Tonttila et al., 2017;Ahola et al., 2020), which models atmospheric dynamics with a LES and includes a four-stream radiative transfer solver (Fu and Liou, 1993). We used two different cloud microphysical modules with the LES. First, the default UCLALES (Stevens et al., 1999(Stevens et al., , 2005 includes Seifert & Beheng microphysics (SB) with diagnostic clouds (saturation adjustment for cloud water, constant cloud droplet number concentration as an input) and a two-moment bulk scheme with parameterised autoconversion for rain (Seifert and Beheng, 2006). The second set-up employs SALSA, which is a sectional scheme where aerosol, cloud droplet and raindrop size distributions and chemical composition are described with several size bins (Kokkola et al., 2008;Tonttila et al., 70 2017). Cloud activation is calculated by solving equations for condensation of water on aerosol particles and then counting the number of droplets reaching the critical droplet size. Rain formation uses the same autoconversion parameterisation as used in the SB microphysics. Due to the high number of prognostics variables, SALSA simulations are about ten times computationally heavier compared to the SB simulations.
Both SALSA and SB simulations are initialised with temperature and humidity profiles, and solar zenith angle is an ad-75 ditional input for daytime simulations. In addition, SALSA requires aerosol composition and size distributions while cloud droplet number concentration is the only cloud input for SB. Remaining model parameters and settings are the same for all simulations. These inputs are obtained from a global climate model as described in the next section.
2.2 Sampling representative initial cloud states 2.2.1 Source data from ECHAM 80 Source data describes typical boundary layer cloud conditions so that these can be used to initialise LES runs. In principle, source data could be collected from different satellite products or re-analysis data sets, but we decided to use global aerosolchemistry-climate model ECHAM-HAMMOZ (echam6.3-ham2.3-moz1.0) outputs because it provides all the needed variables including details about aerosols (Tegen et al., 2019).
The source data was collected from standard model outputs of a one-year ECHAM-HAMMOZ AMIP (Atmospheric Model 85 Intercomparison Project) type run ( Fig. 1 point A). Filtered source data was sampled from open ocean columns that represent single-layer low clouds ( Fig. 1 point 1.). In practise, first, continental or sea ice covered columns were excluded. Next, columns without a single-layer cloud above the sea surface and below 700 hPa level were screened out. The threshold cloud water content for a cloudy grid cell was 0.01 g kg −1 . Liquid (LWP) and ice (IWP) water paths were calculated for the low cloud and for the whole column. The column was accepted if the column IWP was less than 10 % of the low cloud's LWP and the low 90 cloud's LWP was more than half of the total cloud water path (LWP+IWP). These conditions ensure that the selected columns contain mostly liquid single-layer clouds below 700 hPa level (about 3000 m).
The previously described calculations produced two of the source data variables, namely LWP for the low cloud and planetary boundary layer height (H P BL ), which was defined as the difference between sea level and cloud top pressures. These and the rest of the source data variables are shown in Table 1. The first five meteorological variables in the table describe liquid 95 water potential temperature profiles and liquid water content for a well-mixed boundary layer. The last variable is the cosine of the solar zenith angle, which determines the solar radiative flux at the top of atmosphere.
SB microphysics requires cloud droplet number concentration as input. On the other hand, SALSA needs aerosol size distributions and chemical composition or hygroscopicity. In order to reduce the number of variables, log-normal particle size modes are described by using effective dry radii (r ef f ). Hygroscopicity parameter κ (Petters and Kreidenweis, 2007) 100 in ECHAM-HAMMOZ depends on aerosol chemical composition while the aerosol in SALSA is assumed to be sulphuric acid for which κ H2SO4 = 1.008 is constant. Sulphuric acid was chosen as the only chemical component of aerosols to limit computational expenses and it resembles aged sea salt aerosols. When critical supersaturation for cloud activation is the same in ECHAM-HAMMOZ and SALSA, then κr 3 = κ H2SO4 r 3 ef f , where r is the dry radius in ECHAM. The effective dry radius of the accumulation mode is considered as a variable, because this mode can be important for cloud activation. For Aitken 105 and coarse modes we fixed the effective dry radii to 0.0105 and 0.4813 µm, respectively, based on their time averages in the source data. Aitken, accumulation and coarse modes have fixed standard deviations 1.59, 1.59 and 2.0, respectively, in ECHAM-HAMMOZ and these are used also in the LES simulations.

Sampling the source data
The next part of the pipeline is sampling representative subsets from the filtered source data (Point 2. in Fig. 1). Here, these 110 subsets are called designs (Point C in Fig. 1). These are used to initialise the LES simulations and further on as a part of inputs for the updraft parameterisations after filtering (Table 2).
Creating a design is a crucial part of the parameterisation development (Diggle and Lophaven, 2006). For instance, a Gaussian Process emulator in general is accurate near the points it is trained with but loses accuracy as one tries to predict conditions far away from them (Rasmussen and Williams, 2006). Therefore, determining the distribution of the points (a good design) 115 plays a vital role in the prediction. It is of key importance to identify regions where a higher accuracy is required, either due to their relevance to the application or simply because predictions are more often done there. On the other hand, one should not focus too heavily on any single region because of the diminishing return from each additional nearby point. As the designs are model-based, the aforementioned aspects imply two competing requirements of a design: i) the design should cover the Vanhatalo (2020)).
In this work, we used a simple stratified sampling method based on binary space partitioning (BSP) trees (e.g., Fuchs et al., 1980;Tóth, 2005). The goal of the algorithm is to create a partitioning that represents the high level distributional properties of the collection (Sect. 2.2.1) and then uniformly sample a point in each partition. By enforcing a single sample per each partition, we ensure a space-covering property while still letting the distribution of the points in the collection guide which areas to focus.

125
The BSP method was used to generate separate day and night designs for both SB and SALSA microphysics. The designs are separate, because they contain different variables. For example, cosine of the solar zenith angle has an impact on updraft velocity only during daytime. The number of samples taken from the source data for each design depends on the number of variables as well as the computational resources available for the LES simulations. Both day and night SB designs have 500 samples while SALSA designs have 135 and 150 samples in total for night and day, respectively. Ten samples per variable can 130 be considered as the minimum at least for the Gaussian process emulator (Loeppky et al., 2009).
The probability density function (PDF) of each design variable is given in Fig. 2. The numbers show the full range of values in the designs from min to max. The highest cloud droplet and mode number concentration values are excluded from the figures for clarity. The method used to calculate distributions for the design variables is the Kernel Density Estimation (Scott, 1992), which smooths the PDF and was used due to the low number of samples. It should be noted that the spikes of H P BL values in 135 Filtered ECHAM data in Fig. 2e are related to the vertical discretisation of the model. We added noise to the H P BL values to get a smoother, i.e. more realistic, planetary boundary layer height distribution.

Setting up the LES runs
After creating the four designs (SB and SALSA microphysics, day and night), we generate the LES inputs and run the simulations (point 3. in Fig. 1). Initial temperature and humidity profiles for LES are reconstructed from the design variables (Table   140 1) while assuming a well-mixed cloud-topped marine boundary layer. With this assumption, boundary layer total water mixing ratio (q t ) was solved from LWP, and boundary layer height was converted from the pressure difference to meters above sea surface. Figure 3 shows an example of the initial temperature and humidity profiles that were generated using the five meteorological input variables. Here, inversion layer thickness is based on an assumed liquid water potential temperature gradient of 0.3 145 K m −1 . Above the inversion layer, θ L increases at a rate of 3 K km −1 and q t decreases linearly so that it would reach zero 2 km above the top of the inversion layer. Note that the hypothetical total water zero point would be at least 1 km above the top of the LES domain. The model also has options related to initialisation, surface interactions, radiative transfer, large scale forcing, and microphysics. For most of these, default values were used in the simulations, but the following parameters were adjusted. Initial horizontal wind profiles were set to 10 m s −1 in the East-West direction for all altitudes. Surface sensible and latent heat 155 fluxes were set to zero, which is in line with the idealised initial temperature and humidity profiles. The radiative solver needs the surface skin temperature, which was set to be the same as the air temperature in the first model layer above the surface.
In the simulations, horizontal grids cover 10 km in each direction with a 50 m resolution. Vertical grid is case-dependent so that it extends from the sea-surface up to a height that is 1.333 times the planetary boundary layer height (H P BL ). The  Coagulation, cloud-to-rain autoconversion and sedimentation processes were switched off during the first 1.5 h spin-up, which allows enough time for turbulence to develop. The simulation time was limited to 3.5 hours to prevent the model from drifting too far from the state representing the design variables. For the same reason, we nudged the liquid water potential temperature and SALSA aerosol concentrations towards their initial values using a relaxation time τ = 3600 s. This choice for the relaxation parameter is a compromise value which aims to eliminate too large changes in boundary layer height while still 170 allowing radiative cooling to create mixing. Aerosol was nudged only in the first grid layer above sea surface.
Once the simulations were ready (point D in Fig. 1), we collected standard model outputs cloud top radiative cooling values and updraft velocities from the last simulation hour and calculated averages over the domain (point 4. in Fig. 1). These simulation statistics are used in creating the updraft parameterisations (point E in Fig. 1).

175
In this study we created three different updraft parameterisations (point 5. in Fig. 1). The first parameterisation is a simple linear fit (LF) based on cloud top radiative cooling (Zheng et al., 2016) Table 1) and/or LES run outputs (i.e. cloud radiative cooling). The objective of the parameterisations is to reproduce 180 the simulated domain mean positive updraft velocities at cloud base (LES output) and predict those according to selected input variables. Since training input values include both LES inputs and outputs, it is vital that the simulations do not drift too far away from the initial state.
The main statistical measures for parameterisation intercomparisons are coefficient of determination (R 2 ) and Root Mean Square Error (RMSE). R 2 is a measure that tells how well the regression model fits the data. R 2 equals the proportion of 185 variance in the dependent variable (y-axis values) that can be explained by the independent variable (x-axis value). However, R 2 does not indicate whether there are enough data points to make a solid conclusion. RMSE is frequently used to measure the differences between predicted values (here parameterisations) and the values regarded as a ground truth (LES runs). RMSE is dependent on the scale of the numbers used, i.e. has the same physical unit as in the original data, and is sensitive to outliers.

190
The linear fit approach is inspired by the observational study of Zheng et al. (2016) who found a strong correlation between cloud top radiative cooling (CTRC) and updraft velocity at the cloud base (W b ). Based on that, they proposed a parameterisation The approach used in this study is similar to the approximation error correction method introduced in Lipponen et al. (2013) and Lipponen et al. (2018). In Lipponen et al. (2018), it was shown that predicting and correcting the approximation error of the output of an approximative model often leads to more accurate results than directly predicting the model output.
In this study, we trained a Random Forest (Breiman, 2001)

Gaussian process emulator (GPE)
The third method we used is a Gaussian process emulator (O'Hagan, 1978;O'Hagan, 2006) to directly predict the updraft velocity as a function of design variables (Table 1). Our implementation is based on the Gaussian Process Regressor given in 220 the Scikit-learn machine learning package (Pedregosa et al., 2011) that uses Rasmussen and Williams (2006) as the theoretical foundation. Gaussian process regression models are based on multivariate normal distributions and covariance functions. Given an input data, a Gaussian process regression model trained with a training dataset can predict the probability distribution of the outputs corresponding to the input data. For more information on training and evaluation of the Gaussian process regression models see, for example, Rasmussen and Williams (2006). The software used produce the results is available in GitHub (Ahola 225 et al., 2021a, d).

Post-processing the LES runs
In this study we focus on the updraft velocity at the cloud base where the majority of the cloud droplets form. From the different definitions of cloud base updraft velocity (Romakkaniemi et al., 2009), we found that the following gives the best agreement with the Zheng et al. (2016) observations: where R i,base and R i,top are the net (shortwave + longwave) radiative fluxes at the cloud base and top, respectively. Because net fluxes are defined to be positive upward, negative CTRC means cooling. The sum includes 3D (SB microphysics) or horizontally averaged 1D (SALSA microphysics) model outputs that are averaged over the last simulation hour. For SALSA 240 microphysics, 3D radiation fields were not saved in order to reduce the high number of outputs. In the SB simulations, differences between CTRC values calculated from the 3D and 1D radiation outputs were found to be negligible.
Training data for the parameterisations includes the design variables and the corresponding LES updraft velocity (Eq. 2) and CTRC (Eq. 3) outputs. Some simulations that diverged significantly from the initial conditions produced outliers, which reduced the accuracy of the parameterisations in representing the rest of the cases. Therefore, in all following results, before 245 creating a parameterisation, we filtered out simulations where cloud fraction was smaller than 0.61 or cloud top rose more than 10 % (see Table 2). These cases were spotted in an initial analysis to produce most of the outliers. All filtering parameter values were the last retrievable values from the simulations. The cases where the cloud top rose more than 10 % were mostly related used to suppress the issue, but it was not entirely eliminated.
3 Results and discussion

Parameterisation intercomparison
In the comparisons of the three parameterisations, LES runs are regarded as the ground truth, LF is regarded as a baseline simple parameterisation, LFRF improves the LF with a machine learning method, and GPE is a standalone machine learning  (Fig. 4). In nighttime simulations, the cloud radiative cooling explains much less of the variance of updraft velocity than during daytime, and the Linear Fit nighttime R 2 values for SB and SALSA are 0.36 (Fig.4a) and 0.29 (Fig.4c), respectively, compared to 0.67 (Fig. 4b) and  training data is shuffled randomly and then a 10-fold cross-validation is used. This means that for each fold, 90 % of the training data is used to train the parameterisation and predicted values are retrieved for the remaining 10 % of the points.

270
When comparing parameterisations and simulation sets, R 2 and RMSE values are in line with each other, meaning that as R 2 increases (= better fit), RMSE decreases (=smaller error). Figure 5 shows that LFRF performs better than the other methods in all simulation sets based on R 2 and RMSE values. GPE is close second in all but the daytime SALSA simulations. Both LFRF and GPE work generally well. The probable reason why the LFRF method performs so well, outperforming even GPE, is that it is supported with an embedded dependency, the linear 275 model LF. This is in line with Lipponen et al. (2013Lipponen et al. ( , 2018 where they showed that including a dependency (= correcting a rough empirical model with random forest) improved results compared to a pure random forest learning method. LF is the least accurate method in all simulation sets, except in SALSA day where it outperforms the GPE only slightly. Because LF shows worse results during night than during day (see Fig. 4), there is more room for improvement for the machine learning methods during nighttime. Parameterisations perform slightly better for simulation sets with SB than the sets with SALSA microphysics. One reason is that accounting for aerosol-cloud interactions increases the variability of model predictions, which can be difficult to capture in a parameterisation based on a relatively small set of training data. This additional variability can be seen as lower R 2 values in Fig. 4. The other main reason is that the number of computationally heavy SALSA simulations had to be limited to the lowest possible. Limited learning data set will have an impact on the accuracy of the predictions.

Permutation feature importance
We calculated permutation feature importances to reveal the significance of individual input parameters for the LFRF and GPE updraft velocity parameterisations. The permutation feature importance is defined to be the decrease in a model score (=R 2 ) when a single feature (=input variable) value is randomly shuffled (Breiman, 2001;Pedregosa et al., 2011). This method breaks the relationship between the feature and the target, thus the decrease in the model score shows how much the model depends

295
It should be noted that, as GPE is a standalone method, those variables that physically contribute to updraft velocity are the most important features also in Fig. 6. On the other hand, when improving LF with the learning method (=LFRF), an important physical aspect (=CTRC) is already incorporated, and variables that contribute to the error of the LF are shown as important features. Hence, GPE is easier to interpret physically while the results for LFRF tell which variables, beyond radiative cooling, impact the updraft velocity in the selected setup. LFRF can also be considered as a piecewise-defined constant function, when 300 within a certain subset of inputs, some features can be more important than outside the subset. As the learning method with LFRF is random, it enables that these kind of subsets can be very small and sharp-edged, meaning that training points outside a specific subset do not affect the outputs of the subset. In contrast, GPE makes predictions based on the whole training point domain. These distinct characteristics of GPE and LFRF indicate that important features for GPE are probably relevant in the whole training domain, while for LFRF, the features may be important only in certain subdomains.
305 Figure 6 shows that the most important parameters are LWP, cos µ, θ L , ∆q t , and N Ait . Of these variables, N Ait is a microphysical variable and specific only to the SALSA microphysics scheme. LWP, θ L and ∆q t are meteorological variables and common to all simulation sets. cos µ is used only in the daytime simulation sets.
Unravelling the meaning of these variables is simplest to start with cos µ as it affects updraft velocity quite straightforwardly.
For the GPE parameterisation during daytime, cos µ is the most important feature, because it strongly influences cloud top 310 radiative cooling, which indicates that the learning method grasps on significant physical aspects that affect updraft velocity.
For LFRF, cos µ is not that significant since the linear fit for the updraft velocity (w lin.f it ) already incorporates solar radiation dependency through cloud top radiative cooling. Boundary layer liquid-water potential temperature (θ L ) and boundary layer height (H P BL ) determine the cloud top tem-320 perature, so they have a direct impact on radiative cooling rates. This can be seen with GPE. With LFRF radiative cooling is already incorporated with w lin.f it , but LF does not work well with high and low θ L values, hence θ L is an important feature with LFRF.
A strong humidity jump (∆q t ) means a dry free troposphere, which allows strong radiative cooling. Radiative cooling is an important factor for turbulence, and therefore, ∆q t is an important feature for GPE. LF accounts for the radiative cooling, so 325 ∆q t is not an important feature with LFRF.
Regarding aerosol parameters, CDNC is not shown as an important feature with GPE and SB microphysics. This is likely because the bulk model uses saturation adjustment to calculate cloud water mixing ratios. However, with SALSA, the droplet concentration change affects condensation/evaporation rate, buoyancy and thus updrafts, which consequently is seen as high importances with aerosol parameters. This phenomenon (= high aerosol related feature importances) is better shown with 330 LFRF (Figs. 6e and 6g) as this is most likely a significant characteristic only within a subset of simulations with a low aerosol concentration and GPE does not pick it that well as it is not a common feature. Interestingly with LFRF, the relative role of θ L seems to be smaller when SALSA is employed instead of SB.

Conclusions
In this study we developed three cloud base updraft velocity parameterisations that can be used in global atmospheric models. As can be expected, the simple LF works well for cases where radiative cooling is the main driver for turbulence. The other machine learning techniques perform better, because they account for additional variables such as cloud thickness and inversion strength, which have an additional influence on turbulence. Overall, LFRF performs slightly better than GPE.
When choosing between LFRF and GPE, there are some points that should be considered. GPE is purely based on machine 345 learning without any underlying information about physical processes while LFRF includes the effect of cloud top radiative cooling. Therefore, when extrapolating outside of the range of the training inputs, the GPE prediction is the mean of the training outputs that may not be a good prediction. LFRF, on the other hand, reduces to the radiative cooling described by the linear fit when outside of the range of training outputs. The downside of the LFRF method is that it requires the value for the cloud top radiative cooling (CTRC) from the host model, which must be taken from the previous model time step. As a result, any 350 uncertainties in the input CTRC will influence the predictions while CTRC is not needed for the GPE. On the other hand, using CTRC as an input provides a possibility to extend LFRF also to cases with thin overlaying cloud layers, whereas extending GPE to such cases would require including these cases in the training set.
We developed updraft parameterisations for two different cloud microphysics modules implemented in UCLALES-SALSA.
One microphysics module (SALSA) accounts for aerosol-cloud interactions and the other (SB) uses prescribed values for 355 CDNC. Full prediction of the effect of aerosols on clouds and updraft velocity is complicated by the fact that meteorological parameters such as the inversion strength and the liquid water path dominate over aerosol effects. The increased computational cost of SALSA also limited the size of the training data set, which reduced the accuracy of the predictions. Nevertheless, we were able to show that aerosol number concentration has an impact on updraft velocities. In addition, SALSA could be used to predict properties that are more dependent on aerosol such as rain formation.

360
To conclude, machine learning techniques are becoming commonly used methods for parameterising complex dependencies (e.g. Rasp et al., 2018). Our work shows that the dependency of updraft velocity on boundary layer state can be predicted with a reasonable accuracy and machine learning methods are able to capture also the effect of variables that are important only in a limited subset of all possible conditions, like is the case with aerosol number concentrations. Our results also show that machine learning is effective when it is supported with an underlying physical dependency, which is in line with previous 365 studies ( Lipponen et al., 2013Lipponen et al., , 2018Silva et al., 2021;Kashinath et al., 2021). The parameterisations introduced are valid only for marine stratocumulus, but extension of the training set to cover the effects of surface heat fluxes and wind shear would improve the physical foundation of updraft parameterisations, and increase the applicability of the method to continental stratiform boundary layer clouds also (e.g., Matheou and Teixeira, 2019).