We address the problem of identifying the evaporation rates for neutral molecular clusters from synthetic (computer-simulated) cluster concentrations. We applied Bayesian parameter estimation using a Markov chain Monte Carlo (MCMC) algorithm to determine cluster evaporation/fragmentation rates from synthetic cluster distributions generated by the Atmospheric Cluster Dynamics Code (ACDC) and based on gas kinetic collision rate coefficients and evaporation rates obtained using quantum chemical calculations and detailed balances. The studied system consisted of electrically neutral sulfuric acid and ammonia clusters with up to five of each type of molecules. We then treated the concentrations generated by ACDC as synthetic experimental data. With the assumption that the collision rates are known, we tested two approaches for estimating the evaporation rates from these data. First, we studied a scenario where time-dependent cluster distributions are measured at a single temperature before the system reaches a steady state. In the second scenario, only steady-state cluster distributions are measured but at several temperatures. Additionally, in the latter case, the evaporation rates were represented in terms of cluster formation enthalpies and entropies. This reparameterization reduced the number of unknown parameters, since several evaporation rates depend on the same cluster formation enthalpy and entropy values. We also estimated the evaporation rates using previously published synthetic steady-state cluster concentration data at one temperature and compared our two cases to this setting. Both the time-dependent and the two-temperature steady-state concentration data allowed us to estimate the evaporation rates with less variance than in the steady-state single-temperature case.

We show that temperature-dependent steady-state data outperform single-temperature time-dependent data for parameter estimation, even if only two temperatures are used. We can thus conclude that for experimentally determining evaporation rates, cluster distribution measurements at several temperatures are recommended over time-dependent measurements at one temperature.

The formation of molecular clusters, and their subsequent growth to aerosol particles, is an important yet poorly understood process in our atmosphere. Clusters and aerosols affect both climate, air chemistry

Recent developments in mass spectrometers have enabled the detection, quantification and chemical characterization of ionic clusters containing between one and several tens of molecules at atmospherically relevant mixing ratios (around or below 1 part per trillion (ppt))

Even when the atmospheric cluster distributions can be accurately deduced from experimental data, these distributions do not quantify the individual kinetic parameters, such as the cluster collision and evaporation rates

Despite uncertainties involved in computational estimates of collision and evaporation rates, cluster population dynamic models based on Becker–Döring equations have been able to predict the sulfuric acid concentration dependence of cluster concentrations

In mathematical terms, the prediction of cluster concentrations using known collision and evaporation rates is called the forward problem. The associated inverse problem is to use known cluster concentrations to deduce the collision and evaporation rates. The inverse problem can be addressed with Bayesian approaches such as Markov chain Monte Carlo (MCMC) methods. In a recent paper by

In this study, we test which combinations of experimental data and fitted parameters lead to the best identification of the evaporation rates. As experiments are expensive and time consuming to perform, we use synthetic cluster concentration data created from ACDC simulations to test if the use of time-dependent cluster distribution data would significantly improve the accuracy of the evaporation rates. The use of synthetic data also allows us to know for sure if our inverse modelling actually produces the correct kinetic parameters, which would not be possible with experimental concentration data. As in the

For simplicity, we consider the case of neutral sulfuric acid–ammonia clusters containing up to five of each type of molecules. Studying neutral clusters has the advantage that we can restrict ourselves to a smaller set of kinetic parameters and ignore uncertainties related to charging and neutralization processes. In situations where a large fraction of the clusters are charged, accurate modelling would require at least 3 times as many parameters, as both the negative, positive and neutral cluster populations interact with each other. The downside of this simplification is that we lose the direct connection to potential real-life experiments, as neutral atmospheric clusters cannot currently be measured without first charging them.

We investigate three different scenarios for estimating evaporation rates. First, we use steady-state concentration measurements determined at a single temperature, similar to the approach used in

We simulated the time evolution of cluster concentrations using collision rates computed from kinetic gas theory and evaporation rates computed from the Gibbs free energies reported by

Neutral molecular clusters included in the model system (16 in total). The first column indicates the number of sulfuric acid molecules; the second column indicates the number of ammonia in the cluster.

Monomer concentrations used in simulations.

Our MCMC results are not specific to the set of molecular clusters considered here. This is supported by the fact that although the size of the system (the number of clusters or, more precisely, the maximum size of the clusters included in the simulations) has an impact on the particle formation rates at high temperatures (

Two data sets were created. In the first set, we generated time-dependent concentrations for each cluster type, measured at 1.5

Finally, we added measurement error (noise) to the cluster concentrations in both data sets. We call the resulting noisy cluster concentrations “synthetic data”. Our measurement error was sampled from a multivariate Gaussian distribution, with the variance depending on cluster type

We used a MCMC-based approach to estimate the evaporation rates which reproduce the synthetic cluster concentration data. Unlike optimization algorithms which compute a single optimal parameter set, MCMC methods sample from a target distribution which contains the most likely combinations of parameter values for the given data. Multiple samples of possible parameter sets are taken along a random walk in the target distribution and are saved as a parameter “chain”. As the length of the chain increases, the sampled sets converge to a probability (posterior) distribution of parameters, which estimates the likelihood of those parameters giving rise to the data. The particular MCMC-based algorithm we use is delayed rejection adaptive Metropolis (DRAM), which is an extended variant of the classical Metropolis algorithm

We emphasize that there are currently no theoretical principles or experimental results which set sound restrictions for even the order of magnitude of the evaporation rates. However, evaporation rates much lower than

For the cluster formation enthalpies, we chose an upper limit of 0

Additional restrictions on the cluster formation enthalpies arising from the requirement that each individual molecule is bound The cluster formation enthalpy of the

The upper limit for the formation entropies was set to 0

We first performed DRAM parameter estimation from both steady-state and time-dependent cluster concentrations at 278 K, treating evaporation rates as the unknown parameters

Next, we performed parameter estimation based on steady-state cluster concentrations at two temperatures (278 and 292 K). The number of output coefficients in this case was

Many evaporation/fragmentation reactions have the same clusters as products, and thus several of the pairs

To create a reliable sample from the underlying parameter distribution, the length of the MCMC chain must be “large enough”

In the MCMC simulations, all sets of parameters which produce cluster concentrations within the allotted noise level of the data (0.001 %) are kept in the chain. The sampling procedure is outlined in Fig.

Schematic representation of the study methods.

A graphical representation of the steady-state cluster concentration data at 278 K, as a function of the number of acid molecules in the clusters, is given in Fig.

Steady-state cluster concentrations for the clusters containing sulfuric acid and a varying number of ammonia molecules, as a function of the number of acid molecules, for

Next, we determine the base 10 logarithms of the evaporation rate coefficients from the synthetic data. Since the noise added to the cluster concentrations results in a random bias towards an increase (or decrease) from the original values produced from the ACDC, the estimates of parameters derived from synthetic data are likely to be biased. In order to average the effects attributed to this random bias, we generated three sets of synthetic data by adding random increments to the original concentration measurements. Utilizing these data sets, three independent MCMC runs were conducted, each run containing 3 million parameter samples. An example of one of the sampled chains is depicted in Figs.

Here, stationary means that the probability of transitioning from the current state at position

For each evaporation rate, we calculate the one-dimensional (that is, depending only on the evaporation rate) marginal posterior distribution as the position-wise average of the stationary parts of the three sampled chains. This procedure is needed to average the bias originating from random noise. The resulting distributions are given in Figs.

One-dimensional marginal posterior distributions (for parameter indexes ranging from 1 to 28) of the base 10 logarithm of the evaporation rates (units given in

One-dimensional marginal posterior distributions (for parameter indexes ranging from 29 to 39) of the base 10 logarithm of the evaporation rates (units given in

All the evaporation rates larger than

Comparison of 95 % confidence intervals (orange box plots) of base 10 logarithms of the evaporation rates determined from

The pairwise marginal posterior distributions for the estimated evaporation rates are illustrated in Figs.

Based on parameter estimation results, we conclude that a single-temperature steady-state cluster concentrations are not enough to estimate the evaporation rates with a reasonable accuracy (i.e. to obtain an upper and lower limits for the rates that reasonably restrict the cluster kinetics involved in the molecular-level process).

The data set for time-dependent cluster concentrations is much larger than the data set for steady-state cluster concentrations, as it contains the concentration values at multiple time instances.
The time-dependent data also contain information about the time derivatives of the concentrations (see Fig.

One-dimensional marginal posterior distributions (for parameter indexes ranging from 1 to 28) of the base 10 logarithm of the evaporation rates (units given in

From this time-dependent cluster concentration data set, we then conduct MCMC runs as described in Sect. 2.2. As in the steady-state setting, we conduct three independent MCMC runs to determine the base 10 logarithms of the evaporation rates. One of these runs is presented in Figs.

As seen in Figs.

One-dimensional marginal posterior distributions (for parameter indexes ranging from 29 to 39) of the base 10 logarithm of the evaporation rates (units given in

The one-dimensional marginal posterior distributions for the estimated parameters are shown in Figs.

Pairwise marginal posterior distributions for the evaporation rates are plotted in Figs.

In Table

Pairwise marginal posterior distributions (for parameter indexes ranging from 1 to 8) of the cluster formation enthalpies and entropies determined from steady-state cluster concentration measurements at two temperatures (

We determined cluster formation enthalpies and entropies based on two sets of steady-state cluster concentrations, corresponding to two temperatures: 278 and 292 K. These data sets are plotted in Figs.

The one-dimensional marginal posterior distributions of the formation enthalpies and entropies, built from the stationary parts of the three sampled chains merged together, are shown in Fig.

One-dimensional marginal posterior distributions of the cluster formation enthalpies (units given in

Although the posterior distributions of the formation enthalpies and entropies of

We applied Bayesian parameter estimation using a Markov chain Monte Carlo (MCMC) algorithm to identify cluster evaporation/fragmentation rates from synthetic cluster distribution data, assuming that the cluster collision rates are known. We used the Atmospheric Cluster Dynamics Code (ACDC) together with evaporation rates based on quantum chemistry and detailed balance to generate synthetic data for the purpose of optimizing and validating the parameter estimation.

First, we sought to determine the cluster evaporation rates from both steady-state and time-dependent cluster concentration data at one temperature. We were only able to identify a subset of the free parameters (evaporation rates) from the available data using either of these approaches.

Next, we used steady-state concentration data corresponding to two different temperatures. We introduced a reparameterization which expressed the evaporation rates in terms of temperature and cluster formation enthalpies and entropies. Using steady-state concentrations at two temperatures allowed us to apply two general principles of inverse problems/Bayesian estimation to the problem of estimating evaporation rates. First, the two-temperature data set enabled us to reformulate the problem in a numerically effective way (in terms of formation enthalpies and entropies), which reduced the number of unknown parameters. This reduced the number of parameters we sought to identify. Second, it also lessened the stiffness of the system, as the cluster formation enthalpies and entropies for our system span a much smaller range compared to the evaporation rates. We demonstrated that steady-state concentration data at two different temperatures could be used to determine all the unknown formation enthalpies and entropies, and thus the evaporation rates, to within acceptable accuracy. In practice, the most important evaporation rates for modelling new particle formation are those which are roughly of the same order of magnitude as the rates at which the clusters collide with the vapour molecules. If we assume that the mixing ratios for the clustering vapours are in the ppt–ppb range and use kinetic gas theory collision rates for small molecules and nanometre-sized clusters, we approximately should obtain evaporation rates in the range of

In general, the accuracy of the MCMC results naturally increases when we include additional data. In particular, including more concentration data measured at different ammonia concentrations will yield better estimates for the evaporation rates. The sensitivity of the estimates to the number of ammonia concentrations, as well as different sulfuric acid source rates, will be considered in future work.

The approach presented here can also be applied to infer evaporation rates from mass spectrometric measurements of molecular cluster concentrations. This naturally requires accounting for the process of charging neutral clusters, with its associated instrumental and data-analysis-related uncertainties. A clear conclusion of our proof-of-concept study is that steady-state data at different temperatures are more useful for determining evaporation rates than time-dependent data at a single temperature. Moreover, reliable steady-state concentrations of clusters at various temperatures are generally easier to obtain experimentally (e.g. in chamber experiments) compared to time-dependent concentrations. This finding demonstrates the more general feature of modelling of the type performed here: it can be used to optimize planning of experiments and thus save both time and resources. Determining very low (below

The kinetics of cluster formation are described by Becker–Döring
equations

We now specify the quantity and type of sinks and sources included in
our studies. We assume that the concentration of ammonia monomers is
constant, while sulfuric acid monomers are supplied to the system at a
constant rate comprising

Let

The cluster evaporation rates

We first select the flat prior distribution from which we will initially sample unknown parameters, as we wish to generate physically reasonable parameter estimates. Therefore, we generate unknown parameters within the chosen minimum and maximum bounds where all the points are equally likely to be sampled. Please see Sect. 2.2.3 and Tables 3 and 4 for more details. From the prior distribution, a starting guess for the parameters

The Metropolis algorithm then requires us to specify how to sample new parameter values

Next, we run the ACDC and Fortran simulations with the parameter values

We remark here that the likelihoods

Our implementation of the DRAM

First, we use the adaptive Metropolis (AM)

Second, we carry out local adaptation of the proposal distribution using the delayed rejection (DR) algorithm

In summary, our application of the DRAM algorithm combines the AM procedure with a two-stage DR modification. In the first stage, our algorithm carries out the Metropolis regime with both AM adaptation. The proposal covariance at the initialization of DR (denoted as

This DRAM parameter estimation was conducted using the

Parameter chains (for parameter indexes ranging from 1 to 18) of the base 10 logarithm of the evaporation rates (units given in

Parameter chains (for parameter indexes ranging from 19 to 39) of the base 10 logarithm of the evaporation rates (units given in

Pairwise marginal posterior distributions (for parameter indexes ranging from 1 to 8) of the base 10 logarithm of the evaporation rates (units given in

Pairwise marginal posterior distributions (for parameter indexes ranging from 9 to 16) of the base 10 logarithm of the evaporation rates (units given in

Pairwise marginal posterior distributions (for parameter indexes ranging from 17 to 24) of the base 10 logarithm of the evaporation rates (units given in

Pairwise marginal posterior distributions (for parameter indexes ranging from 25 to 32) of the base 10 logarithm of the evaporation rates (units given in

Pairwise marginal posterior distributions (for parameter indexes ranging from 33 to 39) of the base 10 logarithm of the evaporation rates (units given in

Time-dependent cluster concentrations. Simulated time evolution of concentrations for different cluster types at temperature

Parameter chains (for parameter indexes ranging from 1 to 28) of the base 10 logarithm of the evaporation rates (units given in

Parameter chains (for parameter indexes ranging from 29 to 39) of the base 10 logarithm of the evaporation rates (units given in

Pairwise marginal posterior distributions (for parameter indexes ranging from 1 to 8) of the base 10 logarithm of the evaporation rates (units given in

Pairwise marginal posterior distributions (for parameter indexes ranging from 9 to 16) of the base 10 logarithm of the evaporation rates (units given in

Pairwise marginal posterior distributions (for parameter indexes ranging from 17 to 24) of the base 10 logarithm of the evaporation rates (units given in

Pairwise marginal posterior distributions (for parameter indexes ranging from 25 to 32) of the base 10 logarithm of the evaporation rates (units given in

Evaporation rates (units given in

Continued.

Pairwise marginal posterior distributions (for parameter indexes ranging from 33 to 39) of the base 10 logarithm of the evaporation rates (units given in

Steady-state cluster concentrations for the clusters containing sulfuric acid and a varying number of ammonia molecules as a function of the number of acid molecules for

Parameter chains of the cluster formation enthalpies (units given in

Pairwise marginal posterior distributions (for parameter indexes ranging from 9 to 16) of the cluster formation enthalpies and entropies determined from steady-state cluster concentration measurements at two temperatures (

Pairwise marginal posterior distributions (for parameter indexes ranging from 17 to 24) of the cluster formation enthalpies and entropies determined from steady-state cluster concentration measurements at two temperatures (

Pairwise marginal posterior distributions (for parameter indexes ranging from 25 to 28) of the cluster formation enthalpies and entropies determined from steady-state cluster concentration measurements at two temperatures (

Thermodynamic parameters identified from steady-state data measured at two temperatures (278 and 292 K). The last column presents the quantum-chemistry-based values from

One-dimensional marginal distributions (for parameter indexes ranging from 1 to 28) of the base 10 logarithm of the evaporation rates (units given in

One-dimensional marginal distributions (for parameter indexes ranging from 29 to 39) of the base 10 logarithm of the evaporation rates (units given in

Evaporation rates at temperature 278 K (units given in

Continued.

The code is available via the following Zenodo repository:

AS produced the codes and conducted all the computational experiments for generation of the synthetic data and the MCMC parameter estimation, and prepared all the plots presented in the paper. TB, AS, TK, HV and HH are responsible for writing the manuscript. TO assisted with generation of the synthetic data, preformed sanity check of the results and gave valuable comments regarding the manuscript. TH and TB actively participated in development of the methodological approach. ML provided technical assistance with the

The authors declare that they have no conflict of interest

We thank the European Research Council project 692891-DAMOCLES, Academy of Finland (project no. 307331) and University of Helsinki: Faculty of Science ATMATH project for funding, Helsinki University Library for covering the open access fees and the CSC-IT Centre for Science in Espoo, Finland, for computational resources. We also thank Olli Pakarinen (Institute for Atmospheric and Earth System Research, University of Helsinki, Helsinki, Finland) for advice in plotting the synthetic data used in the present study.

This research has been supported by the European Research Council project 692891-DAMOCLES, the Academy of Finland (grant no. 307331) and University of Helsinki: Faculty of Science ATMATH project.Open-access funding was provided by Helsinki University Library.

This paper was edited by Fangqun Yu and reviewed by two anonymous referees.