Identification of molecular cluster evaporation rates, cluster formation enthalpies and entropies by Monte Carlo method

Shcherbacheva, Anna; Balehowsky, Tracey; Kubečka, Jakub; Olenius, Tinja; Helin, Tapio; Haario, Heikki; Laine, Marko; Kurtén, Theo; Vehkamäki, Hanna

doi:https://doi.org/10.5194/acp-20-15867-2020

Articles | Volume 20, issue 24

https://doi.org/10.5194/acp-20-15867-2020

Articles | Volume 20, issue 24

Research article

21 Dec 2020

Research article |

| 21 Dec 2020

Identification of molecular cluster evaporation rates, cluster formation enthalpies and entropies by Monte Carlo method

Anna Shcherbacheva, Tracey Balehowsky, Jakub Kubečka, Tinja Olenius, Tapio Helin, Heikki Haario, Marko Laine, Theo Kurtén, and Hanna Vehkamäki

Abstract

We address the problem of identifying the evaporation rates for neutral molecular clusters from synthetic (computer-simulated) cluster concentrations. We applied Bayesian parameter estimation using a Markov chain Monte Carlo (MCMC) algorithm to determine cluster evaporation/fragmentation rates from synthetic cluster distributions generated by the Atmospheric Cluster Dynamics Code (ACDC) and based on gas kinetic collision rate coefficients and evaporation rates obtained using quantum chemical calculations and detailed balances. The studied system consisted of electrically neutral sulfuric acid and ammonia clusters with up to five of each type of molecules. We then treated the concentrations generated by ACDC as synthetic experimental data. With the assumption that the collision rates are known, we tested two approaches for estimating the evaporation rates from these data. First, we studied a scenario where time-dependent cluster distributions are measured at a single temperature before the system reaches a steady state. In the second scenario, only steady-state cluster distributions are measured but at several temperatures. Additionally, in the latter case, the evaporation rates were represented in terms of cluster formation enthalpies and entropies. This reparameterization reduced the number of unknown parameters, since several evaporation rates depend on the same cluster formation enthalpy and entropy values. We also estimated the evaporation rates using previously published synthetic steady-state cluster concentration data at one temperature and compared our two cases to this setting. Both the time-dependent and the two-temperature steady-state concentration data allowed us to estimate the evaporation rates with less variance than in the steady-state single-temperature case.

We show that temperature-dependent steady-state data outperform single-temperature time-dependent data for parameter estimation, even if only two temperatures are used. We can thus conclude that for experimentally determining evaporation rates, cluster distribution measurements at several temperatures are recommended over time-dependent measurements at one temperature.

Download & links

Article (PDF, 4685 KB)

Download & links

How to cite.

Received: 08 Nov 2019 – Discussion started: 04 May 2020 – Revised: 14 Oct 2020 – Accepted: 18 Oct 2020 – Published: 21 Dec 2020

1 Introduction

The formation of molecular clusters, and their subsequent growth to aerosol particles, is an important yet poorly understood process in our atmosphere. Clusters and aerosols affect both climate, air chemistry (Yu and Turco, 2000), evapotranspiration in forest environments (Yan et al., 2018) and many other atmospheric processes (Lee et al., 2003).

Recent developments in mass spectrometers have enabled the detection, quantification and chemical characterization of ionic clusters containing between one and several tens of molecules at atmospherically relevant mixing ratios (around or below 1 part per trillion (ppt)) (Eisele and Hanson, 2000; Junninen et al., 2010; Zhao et al., 2010; Almeida et al., 2013; Ehn et al., 2014; Bianchi et al., 2016). Molecular clusters in atmospheric conditions are predominantly electrically neutral and must thus be charged prior to mass spectrometric detection. This may affect the measurement results, as only part of the sample molecules or clusters may be charged (Hyttinen et al., 2018), and the charging may also alter cluster compositions. For example, for sulfuric acid–base clusters, negative charging tends to lead to a loss of base molecules and positive charging to a loss of acid molecules (Ortega et al., 2012). Modelling is thus needed to connect measured ion cluster distributions to the original neutral population.

Even when the atmospheric cluster distributions can be accurately deduced from experimental data, these distributions do not quantify the individual kinetic parameters, such as the cluster collision and evaporation rates (Kupiainen-Määttä, 2016). The collision rates may be computed from kinetic gas theory or classical trajectory simulations with reasonable accuracy (Matsugi, 2018), although recent research has shown that long-range attractive interactions may enhance collision rates (Yang et al., 2018), for example, by around a factor of 2–3 for H₂SO₄−H₂SO₄ collisions (Halonen et al., 2019). These relatively minor uncertainties in the collision rates are dwarfed by the error margins of cluster evaporation rates. In computational applications, evaporation rates are usually computed using the detailed balance assumption together with the free energies of cluster formation, which can in turn be computed using quantum chemical (QC) methods (Kurtén et al., 2007; Ortega et al., 2012; Elm et al., 2013; Elm and Kristensen, 2017; Yu et al., 2018). Unfortunately, the evaporation rates depend exponentially on the free energies, and typically observed variations of up to several kcal mol⁻¹ between the different applicable QC methods thus translate into orders of magnitude of differences in evaporation rates (Kupiainen-Määttä et al., 2013; Nadykto et al., 2014).

Despite uncertainties involved in computational estimates of collision and evaporation rates, cluster population dynamic models based on Becker–Döring equations have been able to predict the sulfuric acid concentration dependence of cluster concentrations (Olenius et al., 2013 a), and even absolute particle formation rates (Almeida et al., 2013) in sulfuric acid–ammonia and sulfuric acid–dimethylamine (DMA) systems, without empirical model calibration or parameter tuning. The Becker–Döring equations are a system of ordinary differential equations (ODEs), which account for cluster birth and death processes (which depend on the collision and evaporation rates), as well as external cluster sinks and sources. In both studies (Olenius et al., 2013 a; Almeida et al., 2013), these equations were implemented through the Atmospheric Cluster Dynamics Code (ACDC) (McGrath et al., 2012), using kinetic gas theory collision rates and standard quantum chemistry techniques for computing cluster formation free energies (and thus evaporation rates).

In mathematical terms, the prediction of cluster concentrations using known collision and evaporation rates is called the forward problem. The associated inverse problem is to use known cluster concentrations to deduce the collision and evaporation rates. The inverse problem can be addressed with Bayesian approaches such as Markov chain Monte Carlo (MCMC) methods. In a recent paper by Kupiainen-Määttä (2016), differential evolution (DE) MCMC (Braak, 2006) was applied to determine evaporation rates for negatively charged sulfuric acid and ammonia clusters (containing up to five of each type of molecules, with the ${HSO}_{4}^{-}$ ion here defined as an “acid”). This study used steady-state cluster concentrations measured in the CLOUD (Cosmics Leaving OUtdoor Droplets) chamber experiment at constant temperature, with varying sulfuric acid and ammonia concentrations (we refer to Almeida et al., 2013 for details relevant to the experimental data). The collision rates were computed from kinetic gas theory. Kupiainen-Määttä (2016) concluded that these data were insufficient for estimation of all the evaporation rate coefficients. Another recent paper (Kürten, 2019) reported thermodynamic data (cluster formation enthalpies and entropies) for 11 neutral sulfuric acid and ammonia clusters. In the CLOUD experiment, these were deduced from new particle formation (NPF) rates measured at five different temperatures, over a wide range of sulfuric acid and ammonia concentrations. Most of the thermodynamic parameters could not be narrowly constrained, as the ranges of cluster formation enthalpies and entropies that reproduced the measured NPF rates were quite wide. However, for each cluster only one monomer evaporation rate was taken into account (either acid or base). Furthermore, the NPF rates obtained using the fitted parameters were systematically lower than the measured ones for warmer temperatures (≥248 K).

In this study, we test which combinations of experimental data and fitted parameters lead to the best identification of the evaporation rates. As experiments are expensive and time consuming to perform, we use synthetic cluster concentration data created from ACDC simulations to test if the use of time-dependent cluster distribution data would significantly improve the accuracy of the evaporation rates. The use of synthetic data also allows us to know for sure if our inverse modelling actually produces the correct kinetic parameters, which would not be possible with experimental concentration data. As in the Kupiainen-Määttä (2016) study, we compute collision rates from kinetic gas theory, while the evaporation rates used to generate our synthetic data are calculated from Gibbs free energies published by Olenius et al. (2013 b). Note that the conclusions of this study are not sensitive to the accuracy of the quantum chemical data, as our focus is on the inverse problem of how to determine evaporation rates from known concentrations rather than on the forward problem.

For simplicity, we consider the case of neutral sulfuric acid–ammonia clusters containing up to five of each type of molecules. Studying neutral clusters has the advantage that we can restrict ourselves to a smaller set of kinetic parameters and ignore uncertainties related to charging and neutralization processes. In situations where a large fraction of the clusters are charged, accurate modelling would require at least 3 times as many parameters, as both the negative, positive and neutral cluster populations interact with each other. The downside of this simplification is that we lose the direct connection to potential real-life experiments, as neutral atmospheric clusters cannot currently be measured without first charging them.

We investigate three different scenarios for estimating evaporation rates. First, we use steady-state concentration measurements determined at a single temperature, similar to the approach used in Kupiainen-Määttä (2016). Next, we test the use of time-dependent cluster concentrations measured before the system has attained a steady state. This is motivated by the fact that time-dependent data should provide additional information about the speed of the processes, which is missing from the steady-state data. Third, we apply the approach of Kürten (2019) and express the evaporation rates as parameterized functions of the temperature, with the cluster formation enthalpies and entropies (assumed here to be temperature independent) as the unknown parameters. This reparameterization is useful for two reasons. First, since the formation enthalpies and entropies of the monomers can be set to zero, and since several evaporation rates depend on the same enthalpy and entropy values, the dimension of the unknown parameter space for our problem is actually reduced, despite the apparent doubling of the number of parameters. Second, utilizing the temperature dependence allows us to produce and use arbitrarily many synthetic data sets at various temperatures, which mathematically has a regularizing effect on the problem. Note that unlike in Kürten (2019), all possible evaporation processes, including cluster fissions into two daughter clusters, are taken into consideration. Also, while Kürten (2019) used steady-state new particle formation rates measured at different temperatures to fit their data, we use cluster concentrations.

2 Simulation methods

2.1 Generation of synthetic data

We simulated the time evolution of cluster concentrations using collision rates computed from kinetic gas theory and evaporation rates computed from the Gibbs free energies reported by Olenius et al. (2013 b). To save computational time, we omitted clusters where the number of acid and base molecules differed by more than two. Based on both fundamental chemical principles and mass spectrometric data (Kirkby et al., 2011; Schobesberger et al., 2015; Elm and Kristensen, 2017; Yu et al., 2018), these clusters are quite unstable and thus have very high evaporation rates, leading to negligibly low concentrations. See Table 1 for a list of the 16 considered clusters. We included four different ammonia monomer mixing ratios between 5 and 200 ppt, corresponding to concentrations between 1.3×10⁸ and 5.0×10⁹ molecules cm⁻³ for the temperature ranges studied here. In each individual case, the ammonia mixing ratio was kept constant throughout the simulation. The source rate of sulfuric acid monomer was kept constant at $Q = 6.3 \times 10^{4}$ ${cm}^{- 3} s^{- 1}$ . To reproduce experimental conditions in the CLOUD chamber as closely as possible, the initial sulfuric acid was set to zero in each simulation. See Table 2 for a summary of the concentration settings. Additionally, we considered the losses on the CLOUD chamber walls which depend on the cluster size (Kürten et al., 2015) and a dilution loss of $S = 9.6 \times 10^{- 5}$ s⁻¹. For simplicity, we omitted the effect of relative humidity. We generated the birth–death equations using the ACDC code (McGrath et al., 2012) and then solved for the cluster concentrations using the Fortran ordinary differential equation solver VODE (Variable-Coefficient ODE Solver) (Brown et al., 1989). These equations and all related parameters are explained in Appendix A1.

Table 1Neutral molecular clusters included in the model system (16 in total). The first column indicates the number of sulfuric acid molecules; the second column indicates the number of ammonia in the cluster.

Download Print Version | Download XLSX

Table 2Monomer concentrations used in simulations.

Download Print Version | Download XLSX

Our MCMC results are not specific to the set of molecular clusters considered here. This is supported by the fact that although the size of the system (the number of clusters or, more precisely, the maximum size of the clusters included in the simulations) has an impact on the particle formation rates at high temperatures (>278 K), the particle formation rates and cluster concentrations produced using different cluster sets (e.g. 4×4, 5×5 and 6×6 sulfuric acid and ammonia molecules) are qualitatively similar (Besel et al., 2020). Thus, minor changes of the ACDC outputs due to the difference in the sets of considered clusters should not change the MCMC parameter estimation results. Additionally, the boundary conditions for the outgrowing clusters (the choice of the clusters that are considered as formed particles) have only minor influence on the simulation results, as long as the simulated system of clusters is defined in a reasonable way (Besel et al., 2020).

Two data sets were created. In the first set, we generated time-dependent concentrations for each cluster type, measured at 1.5 min time intervals before the system reaches a steady state. This corresponded to a total of 41 time steps. The steady-state single-temperature data correspond to a subset of these data sets. In the second case, we generated steady-state concentrations for all cluster types at two temperatures (278 and 292 K). In both cases, the steady-state cluster concentrations were calculated as the average of the concentrations at t₁:=50 min and t₂:=60 min. Additionally, we include a convergence parameter for assessing the closeness of cluster concentrations to the steady state for every individual ACDC simulation. This is computed as a ratio of concentrations taken at times t₂ and t₁. We then selected the ratio which deviated the most from unity, where the maximum was taken over all cluster types (Kupiainen-Määttä, 2016).

Finally, we added measurement error (noise) to the cluster concentrations in both data sets. We call the resulting noisy cluster concentrations “synthetic data”. Our measurement error was sampled from a multivariate Gaussian distribution, with the variance depending on cluster type i, temperature T and time instance t. We assume that the standard deviation of the measurement error is 0.001 % of the original concentration.

2.2 Markov chain Monte Carlo simulations

We used a MCMC-based approach to estimate the evaporation rates which reproduce the synthetic cluster concentration data. Unlike optimization algorithms which compute a single optimal parameter set, MCMC methods sample from a target distribution which contains the most likely combinations of parameter values for the given data. Multiple samples of possible parameter sets are taken along a random walk in the target distribution and are saved as a parameter “chain”. As the length of the chain increases, the sampled sets converge to a probability (posterior) distribution of parameters, which estimates the likelihood of those parameters giving rise to the data. The particular MCMC-based algorithm we use is delayed rejection adaptive Metropolis (DRAM), which is an extended variant of the classical Metropolis algorithm (Metropolis et al., 1953). We chose the DRAM algorithm as it is more efficient than the Metropolis regime at parameter estimation when the parameter space is large (Haario et al., 2006). The two algorithms and their application to our cases are described in the Appendices A2–A3.

2.2.1 Selection of minimum and maximum limits for unknown parameters

We emphasize that there are currently no theoretical principles or experimental results which set sound restrictions for even the order of magnitude of the evaporation rates. However, evaporation rates much lower than 10⁻¹⁰ s⁻¹ are irrelevant in practice, since the timescale for evaporation is then much longer than the cluster lifetime with respect to further growth. Similarly, when the evaporation rate is much greater than 10⁺¹⁰ s⁻¹, the cluster will certainly evaporate before it has a chance to grow further. The base 10 logarithm of the evaporation rates was therefore sampled in the interval of −12 to 12.

For the cluster formation enthalpies, we chose an upper limit of 0 kcal mol⁻¹, as a positive ΔH would mean an absence of attractive interactions in the molecular cluster, which is physically incorrect for polar, H-bonding molecules such as H₂SO₄ and NH₃. This same argument also applies for each individual molecule, which gives rise to the requirement that the formation enthalpy of each cluster must be lower (more negative) than that of clusters with less acid and/or base molecules. See Table 3 for the full list of restrictions arising from this requirement. As a lower limit for the overall cluster formation enthalpies, we used $Δ H = - 400$ kcal mol⁻¹. Since our largest clusters contain 10 molecules, this would imply that, on average, each H₂SO₄ in all the studied clusters is bound substantially stronger than in the exceptionally strongly bound ${HSO}_{4}^{-} \cdot H_{2} {SO}_{4}$ cluster (for which recent high-level computational studies indicate a binding enthalpy roughly around −40 kcal mol⁻¹; Elm et al., 2013; Elm and Kristensen, 2017). This in turn implies that the evaporation rate is zero for all practical purposes.

Table 3Additional restrictions on the cluster formation enthalpies arising from the requirement that each individual molecule is bound The cluster formation enthalpy of the ith cluster is denoted by ΔH_i. The notation xAyN corresponds to a cluster with x sulfuric acid and y ammonia molecules.

Download Print Version | Download XLSX

The upper limit for the formation entropies was set to 0 $cal K^{- 1} {mol}^{- 1}$ , as clustering must have a negative formation entropy ΔS, since the number of gas molecules is reduced (and translational and rotational degrees of freedom are thus converted into much more constrained vibrational degrees of freedom). The lower limit of −400 $cal K^{- 1} {mol}^{- 1}$ can be justified by noting that the typical per-molecule ΔS for clustering is around −30 $cal K^{- 1} {mol}^{- 1}$ , with a typical variation of up to ±10 $cal K^{- 1} {mol}^{- 1}$ (Kürten, 2019). For a 10-molecule cluster, this would imply a lower bound to ΔS of around −400 $cal K^{- 1} {mol}^{- 1}$ .

2.2.2 Overview of the MCMC runs

We first performed DRAM parameter estimation from both steady-state and time-dependent cluster concentrations at 278 K, treating evaporation rates as the unknown parameters θ. For the time-dependent synthetic data, the number of output coefficients was $n_{out} = N_{C} \times N_{t} + 1$ , where N_C=16 is the number of cluster types included into simulations, and N_t=41 is the number of time-step measurements available for each of the cluster types.

Next, we performed parameter estimation based on steady-state cluster concentrations at two temperatures (278 and 292 K). The number of output coefficients in this case was $n_{out} = (N_{C} + 1) \times N_{T}$ , where N_T=2 denotes the number of experiments conducted at different temperatures. We use Eqs. (A4) and (A5) to express the evaporation rates as functions of formation enthalpies, entropies and temperature:

\begin{matrix} (1) & γ_{i + j \to i, j} = f (T, {Δ H_{k}, Δ S_{k}}_{k \in {i + j, i, j}}) . \end{matrix}

In Eq. (1), we set T=278 K or T=292 K. We emphasize that the rates $γ_{i + j \to i, j}$ now depend on temperature and six other parameters: the formation enthalpy ΔH_i+j and entropy ΔS_i+j of the evaporating/fragmenting cluster i+j and the formation enthalpies ΔH_i,ΔH_j and entropies ΔS_i,ΔS_j of the product clusters i and j, respectively. In this setting θ the array of quantities ΔH_i+j, ΔS_i+j, ΔH_i, ΔH_j, ΔS_i, ΔS_j with $i + j \in {1, 2, \dots, 16}$ . Similar approaches were applied for the inverse problem of chemical kinetics modelled by the Arrhenius equation, where chemical reaction rates are temperature dependent (Vahteristo et al., 2008).

Many evaporation/fragmentation reactions have the same clusters as products, and thus several of the pairs ΔH_i,ΔS_i appear in Eq. (1) for the evaporation rates of multiple different reactant clusters. The formation enthalpies and entropies of monomers are defined in the context of molecular clustering to be zero. The number of distinct unknown formation enthalpies and entropies is thus only 28, compared to 39 unknown evaporation rates. Furthermore, the cluster formation entropy and enthalpy values all lie within 2 orders of magnitude, compared to the evaporation rates which span 24 orders of magnitude. This makes the MCMC method more efficient.

To create a reliable sample from the underlying parameter distribution, the length of the MCMC chain must be “large enough” (Haario et al., 1999, 2001); that is, many different parameter combinations must be tested. In our simulations, the MCMC chain length typically comprised 3 million samples. The MCMC acceptance probabilities (defined below) in each of the cases were about 88.0 %, which is a typical level of acceptance since the forward ACDC model (in which the evaporation and collision rates are known) is deterministic.

In the MCMC simulations, all sets of parameters which produce cluster concentrations within the allotted noise level of the data (0.001 %) are kept in the chain. The sampling procedure is outlined in Fig. 1. We tested that the MCMC chains converge to the “true” values (Olenius et al., 2013 b, i.e. the reference parameter values from) when we start sampling the chain from randomly selected initial guess.

https://acp.copernicus.org/articles/20/15867/2020/acp-20-15867-2020-f01

Figure 1Schematic representation of the study methods.

Identification of molecular cluster evaporation rates, cluster formation enthalpies and entropies by Monte Carlo method

2.1 Generation of synthetic data

2.2 Markov chain Monte Carlo simulations

2.2.1 Selection of minimum and maximum limits for unknown parameters

2.2.2 Overview of the MCMC runs

3.1 Identification of evaporation rate coefficients from steady-state data at a single temperature

3.2 Identification of evaporation rate coefficients from time-dependent data at a single temperature

3.3 Estimating formation enthalpies and entropies from steady-state concentration measurements at multiple temperatures

A1 Cluster kinetics

A2 The Metropolis algorithm

A3 The DRAM algorithm for sampling from large parameter space