Technical note: Estimating aqueous solubilities and activity coefficients of mono- and α,ω-dicarboxylic acids using COSMOtherm

We have used the COSMOtherm program to estimate activity coefficients and solubilities of monoand α,ω-dicarboxylic acids and water in binary acid–water systems. The deviation from ideality was found to be larger in the systems containing larger acids than in the systems containing smaller acids. COnductor-like Screening MOdel for Real Solvents (COSMO-RS) underestimates experimental monocarboxylic acid activity coefficients by less than a factor of 2, but experimental water activity coefficients are underestimated more especially at high acid mole fractions. We found a better agreement between COSMOthermestimated and experimental activity coefficients of monocarboxylic acids when the water clustering with a carboxylic acid and itself was taken into account using the dimerization, aggregation, and reaction extension (COSMO-RS-DARE) of COSMOtherm. COSMO-RS-DARE is not fully predictive, but fit parameters found here for water–water and acid–water clustering interactions can be used to estimate thermodynamic properties of monocarboxylic acids in other aqueous solvents, such as salt solutions. For the dicarboxylic acids, COSMO-RS is sufficient for predicting aqueous solubility and activity coefficients, and no fitting to experimental values is needed. This is highly beneficial for applications to atmospheric systems, as these data are typically not available for a wide range of mixing states realized in the atmosphere, due to a lack of either feasibility of the experiments or sample availability. Based on effective equilibrium constants of different clustering reactions in the binary solutions, acid dimer formation is more dominant in systems containing larger dicarboxylic acids (C5–C8), while for monocarboxylic acids (C1–C6) and smaller dicarboxylic acids (C2–C4), hydrate formation is more favorable, especially in dilute solutions.

Abstract. We have used the COSMOtherm program to estimate activity coefficients and solubilities of mono-and α,ω-dicarboxylic acids and water in binary acid-water systems. The deviation from ideality was found to be larger in the systems containing larger acids than in the systems containing smaller acids. COnductor-like Screening MOdel for Real Solvents (COSMO-RS) underestimates experimental monocarboxylic acid activity coefficients by less than a factor of 2, but experimental water activity coefficients are underestimated more especially at high acid mole fractions. We found a better agreement between COSMOthermestimated and experimental activity coefficients of monocarboxylic acids when the water clustering with a carboxylic acid and itself was taken into account using the dimerization, aggregation, and reaction extension (COSMO-RS-DARE) of COSMOtherm. COSMO-RS-DARE is not fully predictive, but fit parameters found here for water-water and acid-water clustering interactions can be used to estimate thermodynamic properties of monocarboxylic acids in other aqueous solvents, such as salt solutions. For the dicarboxylic acids, COSMO-RS is sufficient for predicting aqueous solubility and activity coefficients, and no fitting to experimental values is needed. This is highly beneficial for applications to atmospheric systems, as these data are typically not available for a wide range of mixing states realized in the atmosphere, due to a lack of either feasibility of the experiments or sample availability. Based on effective equilibrium constants of different clustering reactions in the binary solutions, acid dimer formation is more dominant in systems containing larger dicarboxylic acids (C 5 -C 8 ), while for monocarboxylic acids (C 1 -C 6 ) and smaller dicarboxylic acids (C 2 -C 4 ), hydrate formation is more favorable, especially in dilute solutions.
An accurate description of the different aerosol phases is important for determining parameters used in aerosol modeling, such as gas-to-particle partitioning (in particular water uptake) and chemical reactivity. A large number of reactions in the aqueous aerosol phase are strongly pH dependent (Pye et al., 2020;Weber et al., 2016), but accurate predictions of aerosol acidity are highly challenging. One element to resolve is the nature and amount of acidic material dissolved in the aqueous aerosol phase. The aqueous bulk solubility of mono-and dicarboxylic acids have been measured in multiple studies (Saxena and Hildemann, 1996;Apelblat and Manzurola, 1987, 1989, 1990Cornils and Lappe, 2000;Song et al., 2012;Romero and Suárez, 2009;Omar and Ulrich, 2006;Brooks et al., 2002). However, acid activity data of carboxylic acid-water systems are much scarcer. Jones and Bury (1927) derived the activity coefficients of formic (n = 1), acetic (n = 2), propanoic (n = 3), and butanoic (n = 4) acids in aqueous solutions at the freezing points of the binary solutions using freezing-point depression measurements. Using freezing-point depression measurements, activity coefficients are calculated using Lewis and Randall's equation for non-electrolytes. Hansen et al. (1955) derived activity coefficients of acetic, propanoic, and butanoic acids in water and the activity coefficients of water in acetic, propanoic, butanoic, pentanoic (n = 5), and hexanoic (n = 6) acids, at 298.15 K, using partial pressure measurements. In addition, Hansen et al. (1955) represented the experimental points using self-consistent activity coefficient functions. Activity coefficients of malonic, succinic, and glutaric acids (m = 3, 4, and 5) have been measured by Davies and Thomas (1956) and Soonsin et al. (2010) in bulk and particle experiments, respectively.
Group contribution methods, such as UNIFAC (Fredenslund et al., 1975) and AIOMFAC (Zuend et al., 2008), are often used to estimate activity coefficients of atmospherically relevant compounds. More recently, a quantumchemistry-based COnductor-like Screening MOdel for Real Solvents (COSMO-RS; Klamt, 1995;Klamt et al., 1998;Eckert and Klamt, 2002) has been used to predict thermodynamic properties of multifunctional compounds. Solubilities and activity coefficients of carboxylic acids have also been estimated using the COSMO-RS theory implemented in the COSMOtherm program (COSMOtherm, 2019). For instance, Schröder et al. (2010) estimated the aqueous solubilities of various polycarboxylic acids using the BP_TZVP_C21_0025 parametrization of COSMOtherm and found that COSMOtherm was able to predict the temperature dependence of the solubilities of dicarboxylic acids (m = 2-8) well, while the absolute solubility estimates were not in good agreement with experiments. Additionally, Michailoudi et al. (2020) estimated the activity coefficients of monocarboxylic acids with even numbers of carbon atoms (n = 2, 4, 6, 8, 10, 12) at infinite dilution. In addition, they estimated the solubility of the same acids in pure water and different aqueous electrolyte solutions. They found a good agreement between experimental and estimated aqueous solubilities of the acids with the exception of butanoic acid, which in experiments has been seen to be fully soluble (Saxena and Hildemann, 1996), while COSMOtherm predicted a finite solubility.
Recent work has shown that the absolute COSMOtherm solubility and activity coefficient estimates can be improved by excluding conformers containing intramolecular hydrogen bonds from the COSMOtherm calculation . However, based on the hydrogen bonding definition of COSMOtherm, monocarboxylic acids are not able to form intramolecular hydrogen bonds. Therefore, other methods are needed to improve COSMOtherm estimates of monocarboxylic acids. On the other hand, carboxylic acids are able to form hydrogen-bonded dimers where two molecules are bound by two simultaneous intermolecular hydrogen bonds. These concerted multiple contacts, such as is seen in carboxylic acid dimer formation, are not captured by COSMO-RS. A dimerization, aggregation, and reaction extension to the COSMO-RS theory (COSMO-RS-DARE) was developed to account for these interactions (Sachsenhauser et al., 2014). For example, Cysewski (2019) was able to improve the agreement between experimental and estimated solubilities of ethenzamide in various organic solvents using COSMO-RS-DARE.
Most atmospherically relevant multifunctional compounds are not readily available for experimental determination of thermodynamic properties. Accurate theoretical estimates are therefore essential for advancing current aerosol process modeling to include more complex compounds and mixtures. Here, we demonstrate the applicability of COSMO-RS theory in calculating condensed-phase properties of atmospherically relevant organic compounds. Carboxylic acids are among the most abundant and well-characterized organic compounds in the troposphere and are, therefore, a good compound class to use to validate the use of COSMO-RS in atmospheric research. We use the newly developed COSMO-RS-DARE, as well as COSMO-RS, to estimate activity coefficients of monocarboxylic acids (n = 1-6) and α,ω-dicarboxylic acids (m = 2-8) with water in binary acidwater mixtures. In addition, we estimate aqueous solubilities and effective equilibrium constants of cluster formation of the acids.

COSMOtherm calculations
We use the COSMOtherm software (release 19 and parametrization BP_TZVPD_FINE_19) (COSMOtherm, 2019) to estimate the solubilities and activity coefficient of linear mono-and dicarboxylic acids in binary aqueous solutions. In addition, we compute the effective equilibrium constants of water and acid dimerization (formation of a hydrogen-bonded cluster containing two water molecules or two acid molecules, respectively) and acid hydration (formation of a hydrogen-bonded cluster containing one acid and one water molecule).

Activity coefficients
COSMOtherm calculates the activity coefficient (γ ) of compound i with mole fraction x i using the pseudo-chemical potentials at composition {x i } (µ * i (x i )) and at the reference state (µ * • i (x • , T , P )). By default, the reference state used in COSMOtherm is the pure compound (labeled as convention I; Levine, 2009): at P = 10 5 Pa reference pressure. T is the temperature (K), and R is the gas constant (kJ K −1 mol −1 , when µ * is given in kJ mol −1 ). Pseudo-chemical potential (Ben-Naim, 1987) is an auxiliary quantity defined using the chemical potential at the reference state µ • : Pseudo-chemical potential has recently been used in molecular level solvation thermodynamics instead of chemical potential (Sordo, 2015). The benefit of pseudo-chemical potential is that it is valid for any concentration and fluid mixture, while the conventional chemical potential cannot necessarily be used to describe infinite dilution (x i → 0) (Ben-Naim, 1978). By definition, the activity coefficient of a compound at the reference state is unity (γ I i (x i = 1) = 1), which leads to equal chemical and pseudo-chemical potential at the reference state. At other states (x i < 1), the relation between chemical and pseudo-chemical potentials (µ and µ * , respectively) can be expressed as Unless otherwise mentioned, the mole fractions x i correspond to mole fractions of undissociated acid or neutral nonprotonated water.

Solubility
Solubilities are calculated by finding the liquid-liquid equilibrium (LLE) or the solid-liquid equilibrium (SLE) of the binary liquid-water or solid-water systems, respectively. In LLE, the chemical potential (µ) of a compound is equal in both of the liquid phases (α and β): Combining Eqs. (2) and (3) gives the relation between chemical potential and pseudo-chemical potential at the reference state: Equation (5) can be substituted for chemical potential in Eq. (4), giving where a α i and a β i are the activities (a = xγ ) of compound i in phases α and β, respectively. The liquid-liquid equilibrium condition between the solvent-rich phase (α) and the soluterich phase (β) becomes The SLE is solved from the solid-liquid equilibrium condition (Eckert and Klamt, 2019): where x SOL,i is the mole fraction solubility (SOL) of compound i in the solvent. The free energy of fusion of the solute ( G fus (T )) is calculated from the experimentally determined heat of fusion ( H fus ) and melting point (T melt ) using the Schröder-van Laar equation (Prigogine and Defay, 1954): Here the heat capacity of fusion ( C p,fus ) is estimated from the melting point and the heat of fusion: (10) Table 1 shows experimental melting points and heats of fusion of the dicarboxylic acids of this study. Melting points and heats of fusion of the monocarboxylic acids are not used, since all of the monocarboxylic acids studied here are in liquid phase at 298.15 K.

Effective equilibrium constants
COSMOtherm estimates effective equilibrium constants of condensed-phase reactions from the free energy of the reaction ( G I• r ): The reaction free energy is calculated from the free energies of the pure reactants (G I• react ) and products (G I• prod ): The free energy of compound i is the sum of the energy of the solvated compound (E COSMO ), the averaged correction for the dielectric energy (dE; Klamt et al., 1998), and the pseudo-chemical potential of the pure compound:  Color coding of σ surfaces: red is negative partial charge, blue is positive partial charge, green is neutral partial charge, and grey is omitted σ surface.

Concentration-dependent reactions (COSMO-RS-DARE)
In COSMO-RS, the surface of a molecule is divided into surface segments that represent the surface charges of the molecule. The surface is considered an interface between a virtual conductor around the molecule and the cavity formed by the molecule (Klamt and Schüürmann, 1993). Each surface segment has an area (Å −2 ) and a screening charge density (σ , given in units of eÅ −2 , where e represents the charge on an electron). Interactions between molecules are described through the interaction between surface segments of the different molecules. Examples of σ surfaces used in COSMOtherm calculations are shown in Fig. 1. The red color of a σ surface signifies a positive screening charge density (negative partial charge) and the blue color a negative screening charge density (positive partial charge). Concerted multiple contacts, such as carboxylic acid dimer formation, are not captured by COSMO-RS.
COSMOtherm is able to consider these hydrogen-bonded clusters using the dimerization, aggregation, and reaction extension (COSMO-RS-DARE; Sachsenhauser et al., 2014). We use the COSMO-RS-DARE method in our activity coefficient and solubility calculations. In our equilibrium constant calculations, the clusters in the system are included as the product of the clustering reactions. The method is described below and the full COSMO-RS-DARE derivation can be found in Sachsenhauser et al. (2014).
A clustering reaction between molecules A and B can be described by the equilibrium: In acid-water systems, A and B can be either a carboxylic acid or a water molecule. In COSMO-RS-DARE, the product clusters are included in COSMOtherm calculations by using the σ surfaces of molecule A in the cluster and omitting the part of the σ surface that is assigned to the molecule clustered with A (i.e., molecule B). Similarly, the clustering product of molecule B is included in the calculation by omitting the σ surface assigned to molecule A from the σ surface of A·B. Examples of these partial σ surfaces are shown on the righthand side of Fig. 1. The formation of hydrogen bonds (in hydrates or dimers) is taken into account using the interaction energy of the two reacting compounds. The formation free energy of the cluster (G(A,A·B)) is calculated using fit parameters c H and c S (enthalpic and entropic contributions, respectively) to describe the interaction between the monomers A and B in the cluster (A·B): The fit parameters are used because COSMOtherm is unable to calculate the energy of a monomer in a cluster. Instead, the energy of a monomer in a cluster is assumed to be equal to the energy of the lowest-energy conformer of the same compound, and the favorability of the cluster formation is estimated using the fit parameters. Without temperaturedependent experimental data, it is not possible to fit both fit parameters. We therefore consider the enthalpic parameter c H as the total formation free energy parameter at 298.15 K, setting the entropic parameter c S to zero. COSMO-RS-DARE was originally developed for systems containing carboxylic acids in nonpolar solvents (Sachsenhauser et al., 2014). In a carboxylic acid-water system, both the carboxylic acid and water are able to form strongly bound clusters. In addition, hydrated acids can be formed. We are thus including the interactions of the clustering reactions for both A and B, even when A = B.

Input file generation
The ".cosmo" files of water and the monocarboxylic acids with a low number of conformers (< 10) are taken from the COSMObase (COSMObase, 2011) database. For the dicarboxylic acids, acid and water dimers, and the hydrates of pimelic (m = 7) and suberic (m = 8) acids, we use the following systematic conformer search approach detailed by Kurtén et al. (2018) as it has been shown to give more consistent results than other conformer sampling approaches. The conformers are found using the systematic conformer search in the Spartan program (Wavefunction Inc., 2014. The conformer set is then used as input to the COSMOconf program (COSMOconf, 2013) (using the TURBOMOLE program; TURBOMOLE, 2010), which runs initial single-point COSMO calculations at the BP/def-SV(P) level of theory to compare the pseudo-chemical potentials of the conformers and remove similar structures. Initial geometry optimizations are calculated at the BP/def-SV(P) level of theory, and duplicate structures are removed by comparing the new geometries and pseudo-chemical potentials. Final geometries are optimized at the BP/def-TZVP level of theory, and after a second duplicate removal step, final single-point energies are calculated at the BP/def2-TZVPD-FINE level of theory.
For acid dimers, we use the lowest gas-phase energy structures found by Elm et al. (2019) as a starting structure for systematic conformer search. For hydrated monocarboxylic acids and smaller dicarboxylic acids (m ≤ 6), the clusters are built by adding a water molecule to each conformer of the free acids. For monocarboxylic acids, the water molecule is placed on the carboxylic acid group forming two intermolecular hydrogen bonds between the molecules. For dicarboxylic acids, a water molecule is added to either end of the acid, forming two hydrate conformers from a single acid conformer. For the dicarboxylic acid conformers with the two acid groups close to each other, additional conformers are created for cases where the water molecule is interacting with both acid groups. Figure 2 illustrates the formation of two different adipic acid hydrate conformers from a single monomer conformer. A cluster conformer where the water molecule is attached to one carboxylic acid group is shown on the top right corner, and in the bottom right corner con- former, the water molecule is bound to both acid groups. Due to the large number of conformers of nonhydrated pimelic (m = 7) and suberic (m = 8) acids (75 and 132, respectively), the monohydrate conformers of those two acids are sampled separately using Spartan.
We use only clusters of two molecules in our calculations. In carboxylic acid dimers, the hydrogen bond donors and acceptors are saturated, which means that carboxylic acids are unlikely to form larger clusters than dimers (Vawdrey et al., 2004;Elm et al., 2014Elm et al., , 2019. Computational studies (Aloisio et al., 2002;Weber et al., 2012;Kildgaard et al., 2018) have shown that, in the gas phase, the energetically most favorable dihydrate is formed by two water molecules attaching to the same carboxylic acid group. Therefore, adding a second water molecule to the cluster does not significantly change the probability distribution of the screening charge density (σ profile) of the acid in the cluster compared to the acid in a monohydrate or dimer.
Conformers containing no intramolecular hydrogen bonds (Kurtén et al., 2018; are used in the COSMO-RS solubility and activity coefficient calculations. Due to the intermolecular hydrogen bonding in the hydrate and dimer clusters, all conformers (up to 40 conformers) of monomers and clusters are used in the effective equilibrium constant calculations. In COSMO-RS-DARE calculations, we use all conformers of the monomers and only the lowest solvated energy conformers of the clusters.

Effective equilibrium constants of clustering reactions
We estimated the effective equilibrium constants of the different clustering reactions (i.e., hydration and dimerization) of the binary acid-water systems. A comparison between the hydration and acid dimerization equilibrium constants in the aqueous phase is given in Fig. 3 and Table 2. The equilibrium , respectively) in condensed phase, at 298.15 K. See Table 2 for the values. constants for both the dimerization and hydration reactions are similar between all of the monocarboxylic acids (not labeled in Fig. 3). For the dicarboxylic acid, we can see larger variation in both the hydration and dimerization reactions. Note that the COSMO-RS-DARE method is not used in the effective equilibrium constant calculations, because the clusters are already included in the calculation as products. For all of the acids, the effective equilibrium constant of dimerization is higher than that of the hydrate formation of the corresponding acid, meaning that acid dimer formation is energetically more favorable than hydrate formation. However, in dilute conditions, water is more abundant, shifting the equilibrium from acid dimerization to hydra-tion. The dimerization-to-hydration ratio is the lowest for oxalic (m = 2) and malonic (m = 3) acids, while monocarboxylic acids and succinic acid (m = 4) have similar (intermediate) ratios, and the larger dicarboxylic acids (m = 5-8) have higher ratios. This means that, in dilute solutions, oxalic, malonic, and succinic acids will most likely interact with water instead of other acid molecules. Vawdrey et al. (2004) calculated the dimerization enthalpies (at the B3LYP/6-31++G(2d,p) level of theory) of monocarboxylic acids (n = 2-6) and found a notable evenodd variation (dimerization of the acids with odd numbers of carbon atoms is more favorable than of acids with even numbers of carbon atoms). The same is seen here in the condensed phase, where the effective equilibrium constants of butanoic and hexanoic acids are lower than of propanoic and pentanoic acids, respectively. Otherwise, there is a slightly increasing trend in the effective equilibrium constants with increasing number of carbon atoms in the monocarboxylic acids. For larger dicarboxylic acids (m ≥ 4), Elm et al. (2019) found an even-odd alternation in the dimer-to-monomer ratio in the gas phase, calculated at the DLPNO-CCSD(T)/aug-cc-pVTZ//ωB97X-D/6-31++G(d,p) level of theory. We observe a similar increase in the effective equilibrium constants with the increasing carbon chain length in the smaller dicarboxylic acids (m = 2-5) and an even-odd alternation in the larger dicarboxylic acids (m = 4-8) in any condensed phase.

Monocarboxylic acids
We calculated the activity coefficient of the monocarboxylic acids and water in the binary acid-water mixtures using the COSMO-RS-DARE method. Hansen et al. (1955) derived the activity coefficients of acetic (n = 2), propanoic (n = 3), and butanoic (n = 4) acids in mixtures with water from partial pressure measurements. In addition, they determined activity coefficients of water in aqueous acetic, propanoic, butanoic, pentanoic (n = 5), and hexanoic (n = 6) acid mixtures. We used these experimental activity coefficients to fit the enthalpic parameters (c H ) for each of the acids in the COSMO-RS-DARE calculations. Figure 4 shows a comparison between the estimated and experimentally determined activity coefficients of these monocarboxylic acids, and formic acid (n = 1), for which no experimental activity coefficient data are available.
The reactions included in the calculations are water dimer (H 2 O · H 2 O) and acid hydrate (RCOOH · H 2 O) formation. A comparison between COSMO-RS-estimated activity coefficients and COSMO-RS-DARE-estimated activity coefficients, with different clusters included in the calculation, is shown for acetic acid in Fig. S1 of the Supplement. For acetic acid, we found the best fit between experimental and estimated activity coefficients using c H = 0 kJ mol −1 for both the water dimerization and acid hydration reactions. Decreas-  Hansen et al. (1955), and the markers are the experimental points from the same study. For the studied acids with finite aqueous solubilities at 298.15 K (pentanoic and hexanoic acid), water activity coefficients were measured using acid-rich solutions (Hansen et al., 1955). The water activity coefficients at high x acid are not shown in the figure, because COSMO-RS-DARE overpredicts the experiments by several orders of magnitude. All activity coefficient values are given in Tables S3 and S4  of the Supplement. ing the c H of either clustering reaction leads to stronger deviation from ideality, which in our case leads to a worse fit for water activity coefficient, and positive parameter values cannot be used to lower the interaction enthalpy. The effective equilibrium constant for water dimer formation (5.71 × 10 5 ) is below that of acetic acid hydration (4.36 × 10 6 ), which explains why the fit parameter of the water dimer hydration should be higher (or equal, since positive values are not possible) than the parameter for acid hydrate formation. Additionally, we calculated UNIFAC predictions of acid and water activity coefficients using AIOMFAC-web (AIOMFACweb, 2020;Zuend et al., 2008Zuend et al., , 2011. These calculations (without inorganic ions) correspond to modified UNIFAC calculations by Peng et al. (2001). From Fig. S1 we see that, for acetic acid, the UNIFAC model underestimates the experimental activity coefficients more than even the COSMO-RS estimate. Similar to COSMO-RS, UNIFAC is not able to predict the increasing trend of water activity coefficients with the increasing acid mole fraction.
For the other monocarboxylic acids studied here, we used the same c H value for water dimerization that was found for the acetic acid-water system, and we fitted the c H of acid hydrate formation to the experimental activity coefficients of water and the acids in the corresponding acid-water systems (Hansen et al., 1955). The enthalpic parameter values of acid hydration used to estimate the activity coefficients shown in Fig. 4 are 0.0, 0.0, −10.5, −14.6, −9.2, and −8.4 kJ mol −1 for formic (n = 1), acetic (n = 2), propanoic (n = 3), butanoic (n = 4), pentanoic (n = 5), and hexanoic (n = 6) acid, respectively. For formic acid, we used the same c H parameter as for acetic acid due to lack of experimental activity coefficients. If the enthalpic parameters in COSMO-RS-DARE calculations are not fitted to experimental activity coefficients and instead are set to zero, the activity coefficients of both acid and water underestimate the experimental activity coefficients of Hansen et al. (1955) (see Fig. S2 of the Supplement). If no experimental activity coefficients are available for fitting the COSMO-RS-DARE parameters, COSMO-RS estimates agree with experiments and are overall better than COSMO-RS-DARE or UNIFAC. COSMO-RS-estimated acid activity coefficients are close to the measured values in all mixing states, and for water activity coefficients, the agreement between COSMO-RS and experiments is good in mixing states with x acid < 0.75. Sachsenhauser et al. (2014) used the COSMO-RS-DARE method for binary systems containing either acetic (n = 2) or propanoic (n = 3) acid and a nonpolar organic solvent. Their calculations show that the dimerization parameter (equivalent to c H in our calculations) is higher for propanoic acid than for acetic acid. This is opposite to what we observed for the hydration parameters, where c H was found to be higher for acetic acid than for propanoic acid. This indicates that the fit parameters of one clustering reaction cannot be used to estimate the corresponding fit parameters of another clustering reactions of the same compound.
While COSMO-RS is fully predictive, COSMO-RS-DARE requires parameter fitting using experimental data. Fitted COSMO-RS-DARE parameters from one system can be used in other systems where the same clustering reactions are relevant. For instance, Sachsenhauser et al. (2014) found that the same interaction parameters of acid dimers can be used in systems containing other similar (nonpolar) solvents. This indicates that our interaction enthalpies can be applied to other aqueous systems, e.g., ternary systems containing an inorganic salt, in addition to the carboxylic acid and water. This would allow for extending the findings of this study to atmospherically relevant aerosol solutions.
The increasing length of the acid carbon backbone leads to larger deviation from ideality (γ = 1) for both the acid and water. In convention I, this means the acid and water activity coefficient values are higher in mixtures containing the longer acids than the shorter acids. We observe that COSMO-RS-DARE estimated activity coefficients agree well with the experiments once the c H parameter is fitted. However, when the hydrate and water dimer reactions are included, COSMO-RS-DARE is not able to predict realistic activity coefficients for water at high mole fractions (x acid > 0.9) of the acids. This is likely due to the low concentration of water in the binary solution, leading to errors in the description of the interactions between water molecules. Still, COSMO-RS-DARE estimates agree well with the experiments at least up to 0.9 mole fraction of the monocarboxylic acids. This is an improvement compared to the UNIFAC model, which fails to reproduce experimental water activity coefficients already at Figure 5. Activity coefficients of (a, b) malonic acid and (c, d) water in the binary mixtures at 298.15 K calculated using different clustering reactions in the COSMO-RS-DARE calculation. As a comparison are activity coefficients of malonic acid by Davies and Thomas (1956) (at 298.15 K given in convention III) and Soonsin et al. (2010) (particle measurements at various temperatures given in convention I) and of water by Maffia and Meirelles (2001), Choi and Chan (2002), Wise et al. (2003), Peng et al. (2001), Marsh et al. (2017, Braban et al. (2003), and AIOMFAC-web (2020). acid mole fractions above 0.25. At very high acid mole fractions (x acid > 0.95), COSMO-RS-DARE predicts several orders of magnitude higher activity coefficients than what was seen in experiments.

Dicarboxylic acids
We tested the effect of including different clusters in the activity coefficient calculation of malonic acid (m = 3). A comparison between the experimental, UNIFAC-modeled, and COSMOtherm-estimated activity coefficients is shown in Fig. 5. The malonic acid activity coefficients are compared in convention III (Fig. 5a) and in convention I (Fig. 5b). In convention III, acid activity coefficients are given with respect to a 1 mol kg −1 solution reference state (see the Supplement for more information). The COSMOtherm-estimated water activity coefficients are compared with experimental bulk (Fig. 5c) and particle (Fig. 5d) phase activity coefficients and UNIFAC-estimated activity coefficients.
For malonic acid (and other studied dicarboxylic acids; see Figs. S3-S5 of the Supplement), COSMO-RS-DARE is not able to improve the agreement between experiments and COSMOtherm estimates; the best overall fit is found using COSMO-RS. The water activity coefficients estimated using COSMO-RS are close to ones estimated using the UNIFAC model (modified UNIFAC; Peng et al., 2001). Similarly to what has been seen with the UNIFAC model, COSMO-RS is able to predict water activity coefficients obtained from bulk and evaporation (supersaturated) measurements. Figure S4 of the Supplement shows comparisons between experimental and COSMOtherm-estimated water activity coefficients in oxalic, adipic, and pimelic acids. For these three acids, only water activities have been determined experimentally (Braban et al., 2003;Maffia and Meirelles, 2001;Marsh et al., 2017;Peng et al., 2001). In addition, water activities in adipic and pimelic acid solutions were only measured in particle solutions (Marsh et al., 2017). We found a good agreement between the particle measurements and COSMO-RS-estimated water activity coefficients, with COSMO-RS slightly overestimating the experiments. This result is in line with previous comparisons of hydroxy carboxylic acids .
The COSMO-RS-estimated activity coefficients of the studied dicarboxylic acids are shown in Fig. 6. We can see that, using convention I, the activity coefficients of the smaller dicarboxylic acids are lower than of the larger dicarboxylic acids. Comparing COSMO-RS (solid lines) and UNIFAC estimates (dotted lines), there is less variation between the UNIFAC-estimated activity coefficients for the different acids studied than between the COSMO-RS estimates. This indicates that, in COSMO-RS, the number of carbon atoms has a larger effect on activity coefficients than estimated by UNIFAC.
Additionally, we computed activity coefficients with consideration of the first dissociation step for oxalic acid (the most acidic dicarboxylic acid of this study) with dissociation of oxalic acid included in the COSMO-RS calculation. In this case, the system contains neutral oxalic acid (H 2 A) and wa- ter (H 2 O), as well as singly or doubly deprotonated oxalic acid (HA − or A 2− , respectively) and hydronium ion (H 3 O + ) according to the dissociation equilibrium.
While both acid groups of oxalic acid can dissociate, here we consider only the first deprotonation, because the second dissociation constant of oxalic acid in water is higher (3.81; Rumble, 2018) than the first one (1.25; Rumble, 2018) and has a smaller effect on the equilibrium. Figure S6 of the Supplement shows the difference between activity coefficients in a system where dissociation of oxalic acid is included and the binary system containing only neutral compounds. The calculation procedure is explained in more detail in the Supplement. There is no large difference in water activity coefficients when the ions are added to the system. A small change is seen in the acid activity coefficients, especially in the concentrated solutions where the estimated mole fraction of dissociated acid and hydronium ion is high. For the other carboxylic acids studied here, the effect of including dissociation is likely to be smaller than for oxalic acid, due to the lower mole fractions of ions present in solutions of less acidic compounds.

Aqueous solubility
We estimated the aqueous solubility of the monocarboxylic acids (n = 1-6) using the COSMO-RS-DARE method. Since activity coefficients are used in the equilibrium conditions of the LLE calculations, we used the same c H parameters that were fitted in the activity coefficient calculations to determine whether the same parameter value can be used in LLE calculations. As a comparison, we computed the same solubilities using COSMO-RS. Based on previous COSMOtherm calculations, Michailoudi et al. (2020) found a good agreement with experimental aqueous solubilities of fatty acids with even numbers of carbon atoms (n = 2, 4, 6, . . . , 12). A comparison between experimentally determined aqueous solubilities and the COSMOtherm estimates of monocarboxylic acids are shown in Fig. 7. We see that when using COSMO-RS-DARE, COSMOtherm is able to predict the miscibility of the smaller monocarboxylic acids (n = 1-4), but the experimental solubilities of pentanoic (n = 5) and hexanoic (n = 6) acids are overestimated to a greater degree than when using COSMO-RS. On the other hand, COSMO-RS underestimates the experimental solubility of butanoic acid by a factor of 18, while COSMO-RS-DARE overestimates the experimental solubilities (upper limit) of pentanoic and hexanoic acids only by factors of 3.4 and 4.1, respectively.
For dicarboxylic acids, we estimated aqueous solubilities using COSMO-RS. The COSMOtherm-estimated and experimental solubilities are shown in Fig. 8. Different experimental heat of fusion and melting point values have been reported for some of the studied dicarboxylic acids. We calculated the lower and upper limit free energies of fusion by combining the different experimental values, and the aqueous solubilities were estimated using the two different free energy of fusion values. The higher G fus estimate gives a lower aqueous solubility. The variability in the COSMOthermestimated solubilities is smaller than in the experimental solubilities.
The COSMO-RS solubility estimates of most of the dicarboxylic acids (m = 3-7) are within the range of experimentally determined solubilities. Using all lowest-energy conformers (up to 40 conformers), instead of only conformers containing no intramolecular hydrogen bonds, lowers the solubility estimates of all acids by a factor of 1.2 on average. The same effect of including conformers containing intramolecular hydrogen bonds has been previously seen in aqueous solubilities of citric, tartaric, malic, and maleic  Table 1), respectively. acids, as well as multifunctional organosulfates .

Conclusions
We compared COSMOtherm-estimated activity coefficients and aqueous solubilities of simple carboxylic acids with experimental values and a commonly used UNIFAC model, and we generally found a good agreement between experiments and COSMO-RS estimates. Using COSMO-RS-DARE, we were able to further improve the agreement between estimated and experimental water activity coefficients in binary monocarboxylic acid-water systems significantly compared to using COSMO-RS or UNIFAC. The COSMO-RS estimates of monocarboxylic acid activity coefficient in aqueous solutions agree with the experiments quite well, and they were further improved by COSMO-RS-DARE when the enthalpic fitting parameters were fitted using experimental activity coefficients. We were also able to estimate activity coefficients of pentanoic and hexanoic acids using only experimental water activity coefficients in the fitting of the COSMO-RS-DARE enthalpic parameters. In addition, COSMO-RS-DARE was able to predict the miscibility of butanoic acid in water (using the fitting parameters of activity coefficient calculations), while COSMO-RS predicted a finite solubility. However, in aqueous solubility calculations of pentanoic and hexanoic acids, COSMO-RS led to a better agreement between the experiments and estimates compared to COSMO-RS-DARE.
For dicarboxylic acid-water systems, COSMO-RS produced better agreement with experiments than COSMO-RS-DARE. The experimental water activity coefficients from different sources have large variations and COSMO-RSestimated water activity coefficients fit within the range of experimental water activity coefficients obtained from bulk and evaporation measurements. We also found a good agreement between COSMO-RS-estimated coefficients and experimental acid activity coefficients at all acid mole fractions.
COSMO-RS was able to reproduce the same even-odd behavior of the dicarboxylic acid properties that has previously been seen experimentally in vapor pressures (Bilde et al., 2003) and solubilities (Zhang et al., 2013), as well as computationally in gas-phase dimer formation (Elm et al., 2019). The calculated even-odd behavior observed here in aqueous solubilities is likely partially due to the even-odd variation of the melting points and heats of fusion. There is also no visible even-odd behavior in the COSMO-RS-estimated activity coefficients of the dicarboxylic acids. However, evenodd variation is seen in the effective equilibrium constants of dimerization of the larger dicarboxylic acids (m ≥ 4), which do not rely on experimental properties.
Mono-and dicarboxylic acids are very common in the atmosphere and often used as model compounds for oxygenated functionalities in a range of applications from vapor pressure, condensation-evaporation, cloud condensation nuclei activity, and hygroscopicity but also aerosol-phase and heterogeneous reactivity (Prenni et al., 2001;McNeill et al., 2008;Schwier et al., 2012;Rossignol et al., 2016). Solubilities and activity coefficients of these secondary organic aerosol (SOA) constituents are needed to accurately predict their activities and to determine central properties such as composition, phase state, and chemical reactivity. Accurate computational tools are critical to provide this information for systems where experimental data are not readily accessible in literature or by experimental design. We showed that COSMOtherm provides a good solution to estimating thermodynamic properties of atmospherically relevant organic compounds that are not commercially available for measurements. In addition to simple binary systems studied here, COSMOtherm can be used to predict liquid-phase properties, such as activity coefficients, in complex, atmospherically relevant systems.
Data availability. The research data have been deposited in a reliable public data repository (the CERN Zenodo service) and can be accessed at https://doi.org/10.5281/zenodo.3842593 .