<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">ACP</journal-id><journal-title-group>
    <journal-title>Atmospheric Chemistry and Physics</journal-title>
    <abbrev-journal-title abbrev-type="publisher">ACP</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Atmos. Chem. Phys.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1680-7324</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/acp-26-7631-2026</article-id><title-group><article-title>Machine learning interatomic potentials with accurate long-range interactions for molecular dynamics collision simulations of atmospherically-relevant molecules</article-title><alt-title>Accurate long-range MLIPs</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Neefjes</surname><given-names>Ivo</given-names></name>
          <email>ivo.neefjes@chem.au.dk</email>
        <ext-link>https://orcid.org/0000-0003-4549-0114</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Kubečka</surname><given-names>Jakub</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Elm</surname><given-names>Jonas</given-names></name>
          
        <ext-link>https://orcid.org/0000-0003-3736-4329</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Aarhus University, Department of Chemistry, Langelandsgade 140, 8000, Aarhus, Denmark</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>DTU, Department of Chemical and Biochemical Engineering, Søltofts Plads, 2800, Kongens Lyngby, Denmark</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Ivo Neefjes (ivo.neefjes@chem.au.dk)</corresp></author-notes><pub-date><day>29</day><month>May</month><year>2026</year></pub-date>
      
      <volume>26</volume>
      <issue>10</issue>
      <fpage>7631</fpage><lpage>7645</lpage>
      <history>
        <date date-type="received"><day>5</day><month>February</month><year>2026</year></date>
           <date date-type="rev-request"><day>16</day><month>February</month><year>2026</year></date>
           <date date-type="rev-recd"><day>22</day><month>April</month><year>2026</year></date>
           <date date-type="accepted"><day>27</day><month>April</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Ivo Neefjes et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://acp.copernicus.org/articles/26/7631/2026/acp-26-7631-2026.html">This article is available from https://acp.copernicus.org/articles/26/7631/2026/acp-26-7631-2026.html</self-uri><self-uri xlink:href="https://acp.copernicus.org/articles/26/7631/2026/acp-26-7631-2026.pdf">The full text article is available as a PDF file from https://acp.copernicus.org/articles/26/7631/2026/acp-26-7631-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e107">Molecular collisions and subsequent clustering events are fundamental to atmospheric cluster formation. Accurately modeling these processes requires interatomic potentials that simultaneously capture the long-range forces governing collision kinetics and the short-range quantum effects driving reactivity. In this work, we evaluate the AIMNet2 and PaiNN machine learning architectures trained on GFN1-xTB and <inline-formula><mml:math id="M1" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c quantum chemical data for molecular collisions involving sulfuric acid.</p>

      <p id="d2e117">The models exhibit low mean absolute errors in energies and forces and accurately reproduce potentials of mean force relative to the GFN1-xTB reference. However, discrepancies are observed for the collision dynamics. While AIMNet2 accurately reproduces reference collision rate coefficients across all systems, PaiNN underestimates the rate coefficient for the charged sulfuric acid–bisulfate system by <inline-formula><mml:math id="M2" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 50 %. This error originates from the model's local atomic environment approximation, which neglects the strong long-range attractive forces at large intermolecular distances. Simulations with the OPLS-AA classical force field demonstrate that simple fixed partial charges are sufficient to describe these interactions.</p>

      <p id="d2e127">Comparing models trained on GFN1-xTB and <inline-formula><mml:math id="M3" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c data reveals that while increasing the level of electronic structure theory significantly alters the potential energy surface in the short-range binding region, it generally has less impact on the long-range shoulder and the resulting collision rate coefficients.</p>

      <p id="d2e137">Our results highlight that while local equivariant models like PaiNN offer exceptional accuracy for thermodynamics, correctly simulating collision kinetics in systems with strong long-range interactions requires models that explicitly account for forces beyond the local environment, such as AIMNet2.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>European Research Council</funding-source>
<award-id>101040353</award-id>
</award-group>
<award-group id="gs2">
<funding-source>HORIZON EUROPE Marie Sklodowska-Curie Actions</funding-source>
<award-id>101105506</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e149">Atmospheric aerosol particles influence the climate by affecting cloud formation and scattering sunlight <xref ref-type="bibr" rid="bib1.bibx15" id="paren.1"/>, while also posing risks to human health via inhalation <xref ref-type="bibr" rid="bib1.bibx10" id="paren.2"/>. According to the latest Intergovernmental Panel on Climate Change (IPCC) assessment report, interactions between aerosols and clouds remain one of the largest sources of uncertainty in current global climate models <xref ref-type="bibr" rid="bib1.bibx5" id="paren.3"/>. A major contributor to this uncertainty is the difficulty of accurately modeling the earliest stages of particle formation <xref ref-type="bibr" rid="bib1.bibx42" id="paren.4"/>.</p>
      <p id="d2e164">Most atmospheric aerosol particles form through a gas-to-particle conversion process called new particle formation (NPF), in which gas-phase molecules collide and stick together to form clusters <xref ref-type="bibr" rid="bib1.bibx24" id="paren.5"/>. These clusters then grow further through condensation and coagulation <xref ref-type="bibr" rid="bib1.bibx44" id="paren.6"/>. The earliest stages of this formation are inherently dynamic: molecules approach each other under the influence of long-range attractive forces (e.g., van der Waals or electrostatic interactions), then rearrange and relax to accommodate one another while forming a thermodynamically stable cluster.</p>
      <p id="d2e173">To capture the dynamic nature of these initial steps, researchers increasingly rely on atomistic molecular dynamics (MD) simulations. These simulations provide a fully atomistic description and offer insight into the molecular-level dynamics governing collisions and the formation of stable clusters. However, accurately capturing the necessary physics at a reasonable computational cost remains challenging. The quality of MD simulations is in large part determined by the level of theory at which the  interaction potential between the nuclei in the system is obtained. An ideal interatomic potential for particle formation MD must satisfy three competing requirements: it must accurately capture long-range attractive forces to model how molecules initially approach; it must describe short-range quantum effects, such as chemical reactions, to model cluster stabilization; and it must be computationally efficient enough to sample a statistically significant number of events.</p>
      <p id="d2e176">These effects include chemical reactions like proton transfers, which play a critical role in stabilizing atmospheric clusters. While semi-empirical quantum chemistry methods such as GFN1-xTB <xref ref-type="bibr" rid="bib1.bibx13" id="paren.7"/> can model bond-breaking, they can exhibit significant errors for complex hydrogen-bonded systems, including quantitative inaccuracies in binding energies <xref ref-type="bibr" rid="bib1.bibx30" id="paren.8"/> and qualitative misidentifications of lowest-energy cluster configurations <xref ref-type="bibr" rid="bib1.bibx21" id="paren.9"/>. Higher levels of theory, such as the DFT composite method <inline-formula><mml:math id="M4" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c <xref ref-type="bibr" rid="bib1.bibx27" id="paren.10"/>, offer the necessary short- and long-range accuracy, but their computational cost makes even short MD simulations of small systems prohibitively expensive. Thus, neither classical nor conventional ab initio methods fully satisfy the requirements for large-scale cluster formation MD.</p>
      <p id="d2e199">Recently, several machine learning (ML) architectures have been developed to construct accurate interatomic potentials for molecular systems. These machine learning interatomic potentials (MLIPs) offer a potential solution to this tradeoff, promising to reproduce the accuracy of high-level quantum theory at a reasonable computational cost. For instance, the polarizable atom interaction neural network (PaiNN) is an equivariant message-passing neural network capable of accelerating MD simulations while maintaining accuracy comparable to its reference training data <xref ref-type="bibr" rid="bib1.bibx34 bib1.bibx22" id="paren.11"/>. Similarly, the second-generation atoms-in-molecules neural network (AIMNet2) has demonstrated high predictive accuracy across a wide range of molecular systems with remarkable efficiency, enabling simulations of systems containing up to <inline-formula><mml:math id="M5" display="inline"><mml:mrow><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mn mathvariant="normal">5</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> atoms <xref ref-type="bibr" rid="bib1.bibx1" id="paren.12"/>.</p>
      <p id="d2e219">However, MLIPs often rely on a local atomic environment approximation, in which the model encodes the environment around each atom up to a user-defined cutoff radius. This approximation improves transferability and computational efficiency but inherently limits the model to short-range interactions. The PaiNN model addresses this through a message-passing framework, where atoms exchange information with their neighbors via message and update blocks. Through multiple iterations, the effective interaction range grows, allowing atoms to indirectly access information from beyond the immediate cutoff. However, if all atoms in one subsystem (e.g., a molecule) lie beyond the cutoff radius of another, the interaction graph becomes disconnected. Consequently, no messages are exchanged, and the model treats the subsystems as non-interacting. AIMNet2 mitigates this by supplementing message passing with explicit long-range contributions. It predicts partial charges to model analytical Coulomb interactions and adds dispersion effects via the D3(BJ) correction scheme <xref ref-type="bibr" rid="bib1.bibx11 bib1.bibx12" id="paren.13"/>.</p>
      <p id="d2e225">The potential of MLIPs for atmospheric modeling has already been demonstrated by several recent studies that simulated the evolution of systems containing tens of particle-forming molecules to observe cluster formation dynamics <xref ref-type="bibr" rid="bib1.bibx17 bib1.bibx18 bib1.bibx25" id="paren.14"/>. As the field increasingly adopts these methods for large-scale simulations, it is important to evaluate how well different model architectures capture long-range interactions alongside the necessary short-range accuracy and computational efficiency.</p>
      <p id="d2e231">A rigorous metric for evaluating this long-range capability is the canonical collision rate coefficient. In cluster distribution dynamics models, such as the Atmospheric Cluster Dynamics Code (ACDC) <xref ref-type="bibr" rid="bib1.bibx26" id="paren.15"/>, cluster-forming collisions and cluster-removing evaporations are treated as independent processes, assuming that dissociation prior to thermalization from collisional excitation is negligible <xref ref-type="bibr" rid="bib1.bibx9" id="paren.16"/>. This yields a pressure-independent collision rate coefficient, which represents the frequency of collisions per unit concentration. Traditionally, this coefficient is calculated using kinetic gas theory. In this framework, colliding partners are approximated as hard spheres, and intermolecular  interactions are neglected entirely. While analytical approaches like the central field model can account for long-range forces, they require interaction parameters that are significantly more difficult to determine than standard hard-sphere radii <xref ref-type="bibr" rid="bib1.bibx29" id="paren.17"/>.</p>
      <p id="d2e243">Atomistic MD collision trajectory simulations in the free molecular regime provide a powerful alternative to calculate these coefficients directly from the underlying physical interactions <xref ref-type="bibr" rid="bib1.bibx14 bib1.bibx28 bib1.bibx43 bib1.bibx20 bib1.bibx40" id="paren.18"/>. As demonstrated by <xref ref-type="bibr" rid="bib1.bibx14" id="text.19"/>, explicitly capturing long-range interactions via MD using the classical OPLS-AA force field resulted in an enhancement factor of 2.7 relative to kinetic gas theory for sulfuric acid dimerization. Accurately reproducing these enhanced collision rates serves as a robust metric for evaluating the long-range behavior of MLIPs in  atmospheric applications.</p>
      <p id="d2e252">In this methodological study, we assess the ability of the PaiNN and AIMNet2 architectures to describe collisions governed by long-range interactions. We sampled training configurations using GFN1-xTB dynamics, subsequently computing energies and forces at both the GFN1-xTB and <inline-formula><mml:math id="M6" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c levels. Additionally, we employed delta-learning to upscale GFN1-xTB simulations with PaiNN corrections to the <inline-formula><mml:math id="M7" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c level of theory. Since sulfuric acid is a key contributor to particle formation <xref ref-type="bibr" rid="bib1.bibx36" id="paren.20"/>, we studied the sulfuric acid dimer, the sulfuric acid–dimethylamine system (to investigate stabilizing proton transfers), and the sulfuric acid–bisulfate system (to examine strong ionic long-range contributions). Following hyperparameter tuning, we evaluated model performance by comparing electronic energy and force predictions against independent test sets. Furthermore, we calculated the potential of mean force (PMF) through umbrella sampling to compare against GFN1-xTB reference data. Finally, we derived collision rate coefficients from MD collision trajectory simulations to evaluate the long-range dynamics of the models and examine how they vary across different levels of theory. By validating these ML models in the context of atmospheric particle formation, this study establishes the necessary groundwork for large-scale MD simulations in this domain.</p>
      <p id="d2e273">The remainder of the paper is organized as follows. Section <xref ref-type="sec" rid="Ch1.S2"/> details the computational framework, including the AIMNet2 and PaiNN architectures, and the methodology of the MD and umbrella sampling simulations. Section <xref ref-type="sec" rid="Ch1.S3"/> presents the hyperparameter tuning and training results, followed by an analysis of the PMFs, collision probabilities, and rate coefficients predicted by each model. Finally, Sect. <xref ref-type="sec" rid="Ch1.S4"/> summarizes our findings and outlines potential future applications.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Theory and methods</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Collision systems</title>
      <p id="d2e297">We investigated three collision systems containing the atmospherically relevant species sulfuric acid (<inline-formula><mml:math id="M8" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>), dimethylamine (<inline-formula><mml:math id="M9" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>), and bisulfate (<inline-formula><mml:math id="M10" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>) in the form of the sulfuric acid dimer (<inline-formula><mml:math id="M11" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M12" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>), the acid–base system <inline-formula><mml:math id="M13" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M14" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, and the ion–molecule system <inline-formula><mml:math id="M15" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M16" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>. The structures of these species are shown in Fig. <xref ref-type="fig" rid="F1"/>.</p>

      <fig id="F1"><label>Figure 1</label><caption><p id="d2e451">Ball-and-stick representations of sulfuric acid, dimethylamine, and bisulfate. This study considers collision systems of sulfuric acid paired with itself, dimethylamine, and bisulfate. Atom color code: sulfur (yellow), oxygen (red), nitrogen (blue), carbon (gray), and hydrogen (white).</p></caption>
          <graphic xlink:href="https://acp.copernicus.org/articles/26/7631/2026/acp-26-7631-2026-f01.png"/>

        </fig>

</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Machine learning interatomic potentials</title>
<sec id="Ch1.S2.SS2.SSS1">
  <label>2.2.1</label><title>PaiNN</title>
      <p id="d2e475">The polarizable atom interaction neural network (PaiNN) extends the message-passing formalism of the SchNet architecture by incorporating rotation-equivariant features to handle vectorial information <xref ref-type="bibr" rid="bib1.bibx34" id="paren.21"/>. This enables the representation of directional properties, which is essential for accurately describing forces, dipoles, and other tensorial quantities. PaiNN has been trained successfully on relatively small datasets, achieving accuracy competitive with kernel methods <xref ref-type="bibr" rid="bib1.bibx34" id="paren.22"/>.</p>
      <p id="d2e484">The standard PaiNN architecture is not explicitly charge-aware and lacks long-range electrostatic or dispersion corrections. Instead, the model handles charged systems by implicitly learning the local, short-range effects of the charge directly from the energies and forces provided in the training data. However, a limitation of this purely local approach in the context of gas-phase collisions is that interactions cannot be transmitted between atoms separated by more than the cutoff distance without intermediate atoms to mediate the message passing. Increasing the cutoff can capture these long-range effects, but at the expense of higher computational cost and potentially reduced accuracy, as the model must learn to generalize over a significantly larger spatial domain.</p>
</sec>
<sec id="Ch1.S2.SS2.SSS2">
  <label>2.2.2</label><title>AIMNet2</title>
      <p id="d2e495">AIMNet2 is the second generation of the Atoms-In-Molecules Neural Network developed by <xref ref-type="bibr" rid="bib1.bibx1" id="text.23"/>. Using a message-passing architecture, the model iteratively refines invariant representations of local atomic environments, defined by radial symmetry functions, to build complex “atom-in-molecule” (AIM) embeddings. While directional dependencies are not enforced through explicit equivariance, AIMNet2 captures these effects implicitly through its atom-centered representations and iterative message passing. A key feature of AIMNet2 is its generalized embedding strategy, which avoids element-specific subnetworks and allows the model to flexibly represent highly diverse chemical compositions.</p>
      <p id="d2e501">Beyond its local representations, AIMNet2 explicitly incorporates electronic and long-range physical effects. The architecture is charge-aware, using the total molecular charge as an input parameter to dynamically infer atom-centered partial charges during the message-passing phase. These partial charges are iteratively updated through a neural charge equilibration (NQE) scheme. The total potential energy is then calculated as the sum of the local configurational energy, explicit Coulombic electrostatic interactions derived from the learned partial charges, and a D3(BJ) dispersion correction <xref ref-type="bibr" rid="bib1.bibx11 bib1.bibx12" id="paren.24"/>.</p>
      <p id="d2e507">Designed for generalizability, AIMNet2 natively supports systems with different charge states and spin multiplicities. By explicitly accounting for these varying electronic states and long-range interactions, the model is well-suited for a wide range of chemically complex systems.</p>
</sec>
<sec id="Ch1.S2.SS2.SSS3">
  <label>2.2.3</label><title>Delta-learning</title>
      <p id="d2e519">Rather than training directly on molecular properties (e.g., electronic energies and forces) at a high level of theory, one can train on the difference between the high-level target and a more computationally efficient, lower-level method. In this framework, molecular dynamics (MD) simulations are performed at the lower level of theory but are corrected to approximate the high level of theory <xref ref-type="bibr" rid="bib1.bibx3" id="paren.25"/>. When the two levels of theory are correlated, this delta-learning approach can substantially reduce model errors <xref ref-type="bibr" rid="bib1.bibx33" id="paren.26"/>. The main drawback is that while the evaluation of the machine learning model is typically fast, the overall simulation speed is fundamentally limited by the computational cost of the lower-level baseline. In this work, we applied delta-learning using the PaiNN architecture to learn the correction between GFN1-xTB <xref ref-type="bibr" rid="bib1.bibx13" id="paren.27"/> as the low-level baseline and <inline-formula><mml:math id="M17" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c <xref ref-type="bibr" rid="bib1.bibx27" id="paren.28"/> as the high-level method. We refer to this approach as <inline-formula><mml:math id="M18" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN.</p>
</sec>
<sec id="Ch1.S2.SS2.SSS4">
  <label>2.2.4</label><title>Data generation</title>
      <p id="d2e557">For each of the three collision systems, we performed collision trajectory simulations at 300 K (output frequency of 100 steps) and umbrella sampling simulations at 300 and 500 K (output frequency of 250 steps) using the GFN1-xTB method (TBlite, version 0.2.1) <xref ref-type="bibr" rid="bib1.bibx6" id="paren.29"/>, following the methodologies detailed in Sect. <xref ref-type="sec" rid="Ch1.S2.SS3.SSS2"/> and <xref ref-type="sec" rid="Ch1.S2.SS3.SSS3"/>. All structures from a given system were pooled into a unified candidate dataset.</p>
      <p id="d2e567">To construct a comprehensive training set, we employed a sampling strategy guided by the potential of mean force (PMF; see Sect. <xref ref-type="sec" rid="Ch1.S2.SS3.SSS3"/>) along the center of mass distance between the collision partners. Candidate structures were binned by center-of-mass distance, and the number of samples selected per bin was determined by a weighted combination of the local PMF curvature and the raw sampling density. This ensures that regions where the potential energy surface (PES) changes rapidly along the collision coordinate are sampled more densely. Collision trajectory data were included to capture both non-interacting configurations and high-energy collision events, while the 500 K umbrella sampling data were incorporated to cover high-energy fluctuations and prevent model instability during thermal excursions.</p>
      <p id="d2e572">Within each bin, we enforced structural diversity using a root-mean-square deviation (RMSD) filter. The RMSD threshold was dynamically relaxed near the PMF minimum to capture equilibrium fluctuations, while a stricter threshold was applied in high-energy regions to maximize configurational coverage. Finally, a global RMSD filter was applied to remove any remaining near-duplicates across the entire dataset, resulting in a final training set of 20 000 structures per system.</p>
      <p id="d2e575">Atomic forces of the selected structures were obtained through potential energy gradient calculations with respect to the nuclear coordinates using GFN1-xTB (TBLite, version 0.2.1) and <inline-formula><mml:math id="M19" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c (ORCA, version 6.0.1) <xref ref-type="bibr" rid="bib1.bibx31" id="paren.30"/>. Owing to the high computational cost, no direct MD simulations were run at the <inline-formula><mml:math id="M20" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c level. Instead, it was assumed that the GFN1-xTB PES sufficiently overlaps with the relevant regions of the <inline-formula><mml:math id="M21" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c PES. This assumption holds as long as the GFN1-xTB PES has the same topological features as the <inline-formula><mml:math id="M22" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c PES in the relevant regions. Small structural discrepancies are corrected, as the higher-level nuclear gradients provide a net force directing the geometry toward the true <inline-formula><mml:math id="M23" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c minimum. The assumption, however, breaks down if important regions of the <inline-formula><mml:math id="M24" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c PES are entirely missing from the GFN1-xTB PES. While this is unlikely for the relatively simple collision systems studied here, more complex clusters may require the dataset to be augmented with unvisited structures.</p>
      <p id="d2e625">We note that while <inline-formula><mml:math id="M25" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c is used in this study, atomic forces can be calculated using any quantum chemistry method on the GFN1-xTB structures to obtain a training set at that level of theory.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Molecular dynamics simulations</title>
<sec id="Ch1.S2.SS3.SSS1">
  <label>2.3.1</label><title>Force calculations</title>
      <p id="d2e651">In MD simulations, atomistic trajectories are generated by integrating Newton's equations of motion over discrete time steps. At each step, the forces acting on the nuclei are computed, and the system is propagated classically.</p>
      <p id="d2e654">In this study, we employed several methods for force evaluation. Simulations using the semi-empirical GFN1-xTB method <xref ref-type="bibr" rid="bib1.bibx13" id="paren.31"/> and the trained AIMNet2 and PaiNN models were performed within the Atomic Simulation Environment (ASE) <xref ref-type="bibr" rid="bib1.bibx16" id="paren.32"/>. These calculations were executed through the <monospace>tblite</monospace> <xref ref-type="bibr" rid="bib1.bibx6" id="paren.33"/>, <monospace>aimnet2ase</monospace>, and <monospace>SchNetPack</monospace> <xref ref-type="bibr" rid="bib1.bibx35" id="paren.34"/> calculators, respectively. Furthermore, we employed a <inline-formula><mml:math id="M26" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-learning approach (<inline-formula><mml:math id="M27" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN), wherein baseline GFN1-xTB forces were corrected by a model trained on the difference between GFN1-xTB and the target <inline-formula><mml:math id="M28" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c theory (or GFN1-xTB itself for validation).</p>
      <p id="d2e700">Additionally, classical MD simulations were performed using the OPLS-AA force field <xref ref-type="bibr" rid="bib1.bibx19" id="paren.35"/>. These simulations were carried out with the LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) code <xref ref-type="bibr" rid="bib1.bibx32 bib1.bibx39" id="paren.36"/>. A detailed description of the OPLS-AA parameters employed in this work can be found in Sect. S1 of the Supplement.</p>
</sec>
<sec id="Ch1.S2.SS3.SSS2">
  <label>2.3.2</label><title>Collision trajectory simulations</title>
      <p id="d2e717">Canonical collision rate coefficients are given by

              <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M29" display="block"><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi><mml:munderover><mml:mo movablelimits="false">∫</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mi mathvariant="normal">∞</mml:mi></mml:munderover><mml:mi mathvariant="normal">d</mml:mi><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:munderover><mml:mo movablelimits="false">∫</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mi mathvariant="normal">∞</mml:mi></mml:munderover><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:msub><mml:mi>f</mml:mi><mml:mi mathvariant="normal">MB</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>)</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>b</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msub><mml:mi>P</mml:mi><mml:mi mathvariant="normal">c</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mo>)</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="normal">d</mml:mi><mml:mi>b</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> is the initial relative velocity between the collision partners, <inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi mathvariant="normal">MB</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> represents the Maxwell–Boltzmann relative speed distribution, and <inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi mathvariant="normal">c</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the collision probability. The impact parameter <inline-formula><mml:math id="M33" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> is defined as the perpendicular distance between the initial velocity vectors of the two partners.</p>
      <p id="d2e858">To obtain collision rate coefficients from MD simulations, we approximated Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>) with a Riemann sum. The initial relative velocity <inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> was sampled from 50 to 800 m s<sup>−1</sup> in steps of 50 m s<sup>−1</sup>. This velocity range covers 99 % of the Maxwell–Boltzmann distribution for all systems. The impact parameter <inline-formula><mml:math id="M37" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> ranged from 0 to 20 Å in steps of 1 Å for the neutral systems (<inline-formula><mml:math id="M38" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M39" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M40" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M41" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>), and was extended to 60 Å for the ionic <inline-formula><mml:math id="M42" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M43" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> system. While a maximum impact parameter of 20 Å was sufficient for the neutral pairs <xref ref-type="bibr" rid="bib1.bibx14 bib1.bibx28" id="paren.37"/>, the larger cutoff for the ionic system was necessary because electrostatic attraction maintains non-negligible collision probabilities at much greater distances.</p>
      <p id="d2e1007">For each (<inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M45" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula>) pair, we performed 100 independent trajectory simulations. The collision probability <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi mathvariant="normal">c</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is estimated as the fraction of trajectories that result in a collision. A collision is defined to occur if the center-of-mass distance between the partners falls below the sum of their hard-sphere radii, derived from their liquid bulk densities, for at least one output frame. These sums are 5.5, 5.7, and 5.5 Å for the <inline-formula><mml:math id="M47" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M48" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M49" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M50" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M51" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M52" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> systems, respectively.</p>
      <p id="d2e1150">Prior to the collision simulations, the monomers were individually equilibrated in NVT simulations using the GFN1-xTB method (TBlite, version 0.2.1). Atomic velocities were initialized from a Maxwell–Boltzmann distribution at 300 K, and the temperature was maintained by a Langevin thermostat with a friction constant of 0.1 fs<sup>−1</sup>. Each equilibration run lasted 13 ns with a 1 fs timestep and an output frequency of 1000 steps. Convergence of the total, translational, vibrational, and rotational temperatures was achieved after approximately 3 ns. The remaining 10 ns were sampled every 1000 steps to yield 10 000 equilibrated starting configurations.</p>
      <p id="d2e1166">At the start of each collision trajectory, two monomers were randomly selected from their respective equilibrated structures and placed 30 Å apart along the <inline-formula><mml:math id="M54" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axis. This value was chosen as a compromise between minimizing initial interaction forces and limiting computational cost. The collision partners were offset along the <inline-formula><mml:math id="M55" display="inline"><mml:mi>z</mml:mi></mml:math></inline-formula> axis by the impact parameter <inline-formula><mml:math id="M56" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> and assigned opposing velocities of <inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M58" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> <inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>/</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>. Each trajectory was propagated in the NVE ensemble for a duration sufficient for a non-interacting particle to traverse the initial separation plus the offset <inline-formula><mml:math id="M60" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula>, with an additional safety margin of 10 Å.</p>
</sec>
<sec id="Ch1.S2.SS3.SSS3">
  <label>2.3.3</label><title>Umbrella sampling</title>
      <p id="d2e1241">We performed umbrella sampling simulations along the center-of-mass distance coordinate <inline-formula><mml:math id="M61" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> between the collision partners <xref ref-type="bibr" rid="bib1.bibx41" id="paren.38"/>. The reaction coordinate was discretized into 91 windows centered from 2.0 to 20.0 Å with a step size of 0.2 Å. In each window, the system was restrained by a harmonic bias potential <inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:msub><mml:mi>V</mml:mi><mml:mi mathvariant="normal">bias</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M63" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> <inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:msub><mml:mi>k</mml:mi><mml:mi mathvariant="normal">bias</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the window center and <inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi mathvariant="normal">bias</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M67" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 100 kcal mol<sup>−1</sup> Å<sup>−2</sup>, consistent with the parameters used by <xref ref-type="bibr" rid="bib1.bibx23" id="text.39"/>.</p>
      <p id="d2e1362">Starting configurations were drawn from the final 10 000 output frames of the equilibration trajectories described in Sect. <xref ref-type="sec" rid="Ch1.S2.SS3.SSS2"/>. These structures were translated to align with the center of the target umbrella window. For windows at short distances (<inline-formula><mml:math id="M70" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> <inline-formula><mml:math id="M71" display="inline"><mml:mo>&lt;</mml:mo></mml:math></inline-formula> 6.0 Å), where direct placement might result in steric clashes, the collision partners were initially placed 6.0 Å apart. The bias potential was then gradually increased from 0 to 100 kcal mol<sup>−1</sup> Å<sup>−2</sup> over the first 2000 time steps.</p>
      <p id="d2e1405">To enhance sampling, ten independent simulations were performed for each window. Each simulation began with 100 000 steps of equilibration in the NVT ensemble (Langevin thermostat, friction 0.1 fs<sup>−1</sup>), followed by a 500 000-step production run using the canonical sampling through velocity rescaling (CSVR) thermostat with a time constant of 25 fs <xref ref-type="bibr" rid="bib1.bibx4" id="paren.40"/>. The output frequency for both thermodynamic data and structural configurations was set to 250 steps.</p>
      <p id="d2e1423">The unbiased free energy profile was reconstructed using the umbrella integration method as implemented in the <monospace>umbrella_integration</monospace> code <xref ref-type="bibr" rid="bib1.bibx37" id="paren.41"/>. Because this profile represents the Helmholtz free energy <inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:mi>A</mml:mi><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> of finding particles at a distance <inline-formula><mml:math id="M76" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> in three-dimensional space, it includes an entropic term due to the increasing volume of the spherical shell <inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:mn mathvariant="normal">4</mml:mn><mml:mi mathvariant="italic">π</mml:mi><mml:msup><mml:mi>r</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mi>r</mml:mi></mml:mrow></mml:math></inline-formula>. To obtain the effective interaction potential <inline-formula><mml:math id="M78" display="inline"><mml:mrow><mml:mi>w</mml:mi><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> (the one-dimensional PMF), we subtracted this radial entropic contribution:

              <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M79" display="block"><mml:mrow><mml:mi>w</mml:mi><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi>A</mml:mi><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:mfenced close="]" open="["><mml:mrow><mml:mo>-</mml:mo><mml:msub><mml:mi>k</mml:mi><mml:mi mathvariant="normal">B</mml:mi></mml:msub><mml:mi>T</mml:mi><mml:mi>ln⁡</mml:mi><mml:mo>(</mml:mo><mml:mn mathvariant="normal">4</mml:mn><mml:mi mathvariant="italic">π</mml:mi><mml:msup><mml:mi>r</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M80" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi mathvariant="normal">B</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the Boltzmann constant and <inline-formula><mml:math id="M81" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> the temperature.</p>
</sec>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Results and discussion</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Hyperparameter tuning</title>
      <p id="d2e1577">Training efficiency and model performance are dependent on the choice of hyperparameters. To optimize the PaiNN and AIMNet2 architectures for our systems, we performed hyperparameter tuning using the Weights and Biases (W&amp;B) platform <xref ref-type="bibr" rid="bib1.bibx2" id="paren.42"/>. For PaiNN, we tuned the number of features, batch size, number of blocks, and radial basis size. For AIMNet2, we tuned the AIM size, number of features, batch size, batches per epoch, vector channels, radial basis size, and learning rate. Testing three values for each hyperparameter results in a total of 81 combinations for PaiNN and 729 for AIMNet2, making a systematic hyperparameter grid search computationally prohibitive. Consequently, we employed a random search.</p>
      <p id="d2e1583">100-epoch long tuning runs were performed on a subset of 2000 structures labeled at the GFN1-xTB level of theory. The target was to minimize the validation loss, defined as a weighted sum of the mean squared errors (MSE) in potential energies, atomic force components, and, in the case of AIMNet2, atomic partial charges. Given the importance of accurate forces for stable molecular dynamics (MD) simulations, and the fact that force data (<inline-formula><mml:math id="M82" display="inline"><mml:mrow><mml:mn mathvariant="normal">3</mml:mn><mml:mo>×</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi mathvariant="normal">atom</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) vastly outnumber energy data, we assigned a greater weight to the force loss. For PaiNN, the energy:force_components weight ratio was set to <inline-formula><mml:math id="M83" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>:</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">99</mml:mn></mml:mrow></mml:math></inline-formula>. For AIMNet2, which also predicts partial charges, the energy:force_components:atomic_charges weights were set to <inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:mn mathvariant="normal">9</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>:</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">90</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>:</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d2e1635">We completed 70 unique tuning runs for PaiNN and 50 for AIMNet2. The results are visualized in the Supplement (Figs. S1 and S2). Tables <xref ref-type="table" rid="T1"/> and <xref ref-type="table" rid="T2"/> summarize the tested values and quantify the impact of each hyperparameter using W&amp;B's hyperparameter importance and Pearson correlation metrics. The importance metric, derived from a random forest algorithm, quantifies the relative impact of each hyperparameter on the validation loss. A negative correlation indicates that increasing the parameter value reduces the validation loss.</p>

<table-wrap id="T1"><label>Table 1</label><caption><p id="d2e1646">Tested hyperparameter values for PaiNN, including the hyperparameter importance and Pearson correlation with respect to the validation loss obtained from Weights and Biases <xref ref-type="bibr" rid="bib1.bibx2" id="paren.43"/>.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">PaiNN</oasis:entry>
         <oasis:entry colname="col2">Features</oasis:entry>
         <oasis:entry colname="col3">Batch size</oasis:entry>
         <oasis:entry colname="col4">Blocks</oasis:entry>
         <oasis:entry colname="col5">Radial basis</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">128</oasis:entry>
         <oasis:entry colname="col3">2</oasis:entry>
         <oasis:entry colname="col4">3</oasis:entry>
         <oasis:entry colname="col5">64</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Values</oasis:entry>
         <oasis:entry colname="col2">160</oasis:entry>
         <oasis:entry colname="col3">4</oasis:entry>
         <oasis:entry colname="col4">4</oasis:entry>
         <oasis:entry colname="col5">48</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">192</oasis:entry>
         <oasis:entry colname="col3">8</oasis:entry>
         <oasis:entry colname="col4">5</oasis:entry>
         <oasis:entry colname="col5">32</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Importance</oasis:entry>
         <oasis:entry colname="col2">0.163</oasis:entry>
         <oasis:entry colname="col3">0.626</oasis:entry>
         <oasis:entry colname="col4">0.097</oasis:entry>
         <oasis:entry colname="col5">0.115</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Correlation</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M85" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.402</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M86" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.180</oasis:entry>
         <oasis:entry colname="col4">0.791</oasis:entry>
         <oasis:entry colname="col5">0.287</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<table-wrap id="T2" specific-use="star"><label>Table 2</label><caption><p id="d2e1795">Tested hyperparameter values for AIMNet2, including the hyperparameter importance and Pearson correlation with respect to the validation loss obtained from Weights and Biases <xref ref-type="bibr" rid="bib1.bibx2" id="paren.44"/>.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="8">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">AIMNet2</oasis:entry>
         <oasis:entry colname="col2">AIM size</oasis:entry>
         <oasis:entry colname="col3">Features</oasis:entry>
         <oasis:entry colname="col4">Batch size</oasis:entry>
         <oasis:entry colname="col5">Batches per epoch</oasis:entry>
         <oasis:entry colname="col6">Vector channels</oasis:entry>
         <oasis:entry colname="col7">Radial basis</oasis:entry>
         <oasis:entry colname="col8">Learning rate</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">512</oasis:entry>
         <oasis:entry colname="col3">32</oasis:entry>
         <oasis:entry colname="col4">16</oasis:entry>
         <oasis:entry colname="col5">1000</oasis:entry>
         <oasis:entry colname="col6">16</oasis:entry>
         <oasis:entry colname="col7">20</oasis:entry>
         <oasis:entry colname="col8">1 <inline-formula><mml:math id="M87" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 10<sup>−3</sup></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Values</oasis:entry>
         <oasis:entry colname="col2">256</oasis:entry>
         <oasis:entry colname="col3">16</oasis:entry>
         <oasis:entry colname="col4">8</oasis:entry>
         <oasis:entry colname="col5">500</oasis:entry>
         <oasis:entry colname="col6">12</oasis:entry>
         <oasis:entry colname="col7">16</oasis:entry>
         <oasis:entry colname="col8">4 <inline-formula><mml:math id="M89" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 10<sup>−4</sup></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">128</oasis:entry>
         <oasis:entry colname="col3">8</oasis:entry>
         <oasis:entry colname="col4">4</oasis:entry>
         <oasis:entry colname="col5">100</oasis:entry>
         <oasis:entry colname="col6">8</oasis:entry>
         <oasis:entry colname="col7">12</oasis:entry>
         <oasis:entry colname="col8">1 <inline-formula><mml:math id="M91" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 10<sup>−4</sup></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Importance</oasis:entry>
         <oasis:entry colname="col2">0.039</oasis:entry>
         <oasis:entry colname="col3">0.087</oasis:entry>
         <oasis:entry colname="col4">0.022</oasis:entry>
         <oasis:entry colname="col5">0.586</oasis:entry>
         <oasis:entry colname="col6">0.087</oasis:entry>
         <oasis:entry colname="col7">0.142</oasis:entry>
         <oasis:entry colname="col8">0.038</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Correlation</oasis:entry>
         <oasis:entry colname="col2">0.001</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M93" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.178</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M94" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.034</oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M95" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.621</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M96" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.267</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M97" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.180</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M98" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.214</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e2085">For PaiNN, the batch size and number of features were identified as the most important hyperparameters, with a smaller batch size and a higher number of features correlating with improved performance. In contrast, for AIMNet2, the “batches per epoch” was the dominant hyperparameter. This difference stems from how each training framework defines an epoch. In PaiNN, an epoch follows the standard definition of a single, full pass through the training data. Thus, the training set size and batch size strictly determine the batches per epoch. However, the AIMNet2 framework decouples the definition of an epoch from the dataset size, operationally defining it as a user-specified number of steps before each validation check. Therefore, in AIMNet2, the product of the batch size and the batches per epoch determines the total number of samples processed per operational epoch. When this product is smaller than the training set size, a validation step is triggered before the model has seen all training samples. When the product exceeds the dataset size (e.g., using 1000 batches of size 16 means the AIMNet2 model processes 16 000 samples per validation cycle for our 2000-molecule dataset), the model sees data multiple times per epoch, which further aids convergence.</p>
      <p id="d2e2088">Based on these results, we selected the following hyperparameters for the final production models. For PaiNN (and <inline-formula><mml:math id="M99" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN), we used 256 features, a batch size of 2, 4 interaction blocks, and a radial basis size of 32. For AIMNet2, we selected an AIM size of 128, 16 features, a batch size of 8, 16 vector channels, a radial basis size of 20, and a learning rate of 4 <inline-formula><mml:math id="M100" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 10<sup>−3</sup>. To account for the larger production dataset, the number of batches per epoch was increased to 4000. The short-range cutoff for AIMNet2 was set to 5 Å. Since PaiNN lacks explicit long-range interactions beyond the local environment, its cutoff was extended to 10 Å. A full list of hyperparameters for each model is provided in Sect. S3.</p>
      <p id="d2e2117">It is important to note that we did not necessarily identify the optimal hyperparameter combination for our systems. For instance, while our 100-epoch tuning procedure offers a reasonable indication of training behavior, some hyperparameters might converge slower but dominate during longer training. Identifying the best training settings would require a systematic search over a broader range of values. While automated techniques such as Bayesian optimization <xref ref-type="bibr" rid="bib1.bibx38" id="paren.45"/> could be explored for more complex systems in future work, the chosen hyperparameters provide sufficiently low test errors for the collision systems studied here, as discussed in the following subsection.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Training</title>
      <p id="d2e2131">Using the hyperparameters determined in Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/>, we trained AIMNet2, PaiNN, and <inline-formula><mml:math id="M102" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN models for each system using either GFN1-xTB or <inline-formula><mml:math id="M103" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c reference data (20 000 structures per system). Training durations were set to 1000 (AIMNet2), 600 (PaiNN), and 400 (<inline-formula><mml:math id="M104" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN) epochs, chosen to balance their respective convergence rates with computational cost. Model performance was evaluated on an independent test set of <inline-formula><mml:math id="M105" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 2000 structures per system, sampled from the 300 K umbrella sampling trajectories across both center-of-mass distance (between 2.0 and 20.0 Å) and potential energy. We report the mean absolute errors (MAEs) for electronic energies (<inline-formula><mml:math id="M106" display="inline"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi mathvariant="normal">el</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) and component-wise forces (<inline-formula><mml:math id="M107" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula>) in Table <xref ref-type="table" rid="T3"/>. As a sanity check, we also trained a <inline-formula><mml:math id="M108" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN model using GFN1-xTB as both the baseline and the target reference for the <inline-formula><mml:math id="M109" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M110" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> system. Since this model was trained on the difference between two identical levels of theory, it should essentially predict zero correction.</p>

<table-wrap id="T3" specific-use="star"><label>Table 3</label><caption><p id="d2e2227">Mean absolute errors (MAE) of the machine learning models relative to the level of theory they were trained on for electronic energies (<inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi mathvariant="normal">el</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) and component-wise forces (<inline-formula><mml:math id="M112" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula>) across the three studied systems: <inline-formula><mml:math id="M113" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M114" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M115" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M116" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M117" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M118" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>. Results are reported for AIMNet2, PaiNN, and <inline-formula><mml:math id="M119" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN trained on GFN1-xTB and <inline-formula><mml:math id="M120" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c training data. Units: <inline-formula><mml:math id="M121" display="inline"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi mathvariant="normal">el</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> in kcal mol<sup>−1</sup> and <inline-formula><mml:math id="M123" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula> in kcal mol<sup>−1</sup> Å<sup>−1</sup>.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right" colsep="1"/>
     <oasis:colspec colnum="5" colname="col5" align="center"/>
     <oasis:colspec colnum="6" colname="col6" align="center"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">System</oasis:entry>
         <oasis:entry colname="col2">Model</oasis:entry>
         <oasis:entry rowsep="1" namest="col3" nameend="col4" align="center" colsep="1">GFN1-xTB </oasis:entry>
         <oasis:entry rowsep="1" namest="col5" nameend="col6"><inline-formula><mml:math id="M126" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">MAE<sub><italic>E</italic><sub>el</sub></sub></oasis:entry>
         <oasis:entry colname="col4">MAE<sub><italic>F</italic></sub></oasis:entry>
         <oasis:entry colname="col5">MAE<sub><italic>E</italic><sub>el</sub></sub></oasis:entry>
         <oasis:entry colname="col6">MAE<sub><italic>F</italic></sub></oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M131" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M132" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">AIMNet2</oasis:entry>
         <oasis:entry colname="col3">0.015</oasis:entry>
         <oasis:entry colname="col4">0.028</oasis:entry>
         <oasis:entry colname="col5">0.056</oasis:entry>
         <oasis:entry colname="col6">0.076</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">PaiNN</oasis:entry>
         <oasis:entry colname="col3">0.023</oasis:entry>
         <oasis:entry colname="col4">0.036</oasis:entry>
         <oasis:entry colname="col5">0.039</oasis:entry>
         <oasis:entry colname="col6">0.051</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M133" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN</oasis:entry>
         <oasis:entry colname="col3">3.4 <inline-formula><mml:math id="M134" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 10<sup>−6</sup></oasis:entry>
         <oasis:entry colname="col4">1.5 <inline-formula><mml:math id="M136" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 10<sup>−6</sup></oasis:entry>
         <oasis:entry colname="col5">0.011</oasis:entry>
         <oasis:entry colname="col6">0.027</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M138" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M139" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">AIMNet2</oasis:entry>
         <oasis:entry colname="col3">0.026</oasis:entry>
         <oasis:entry colname="col4">0.054</oasis:entry>
         <oasis:entry colname="col5">0.075</oasis:entry>
         <oasis:entry colname="col6">0.102</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">PaiNN</oasis:entry>
         <oasis:entry colname="col3">0.020</oasis:entry>
         <oasis:entry colname="col4">0.039</oasis:entry>
         <oasis:entry colname="col5">0.032</oasis:entry>
         <oasis:entry colname="col6">0.062</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M140" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
         <oasis:entry colname="col4">–</oasis:entry>
         <oasis:entry colname="col5">0.016</oasis:entry>
         <oasis:entry colname="col6">0.038</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M141" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M142" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">AIMNet2</oasis:entry>
         <oasis:entry colname="col3">0.020</oasis:entry>
         <oasis:entry colname="col4">0.036</oasis:entry>
         <oasis:entry colname="col5">0.109</oasis:entry>
         <oasis:entry colname="col6">0.092</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">PaiNN</oasis:entry>
         <oasis:entry colname="col3">0.233</oasis:entry>
         <oasis:entry colname="col4">0.200</oasis:entry>
         <oasis:entry colname="col5">0.417</oasis:entry>
         <oasis:entry colname="col6">0.241</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M143" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
         <oasis:entry colname="col4">–</oasis:entry>
         <oasis:entry colname="col5">0.037</oasis:entry>
         <oasis:entry colname="col6">0.054</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e2858">In general, the models achieve excellent accuracy. All MAEs fall below the standard chemical accuracy thresholds of 1 kcal mol<sup>−1</sup> for energies and 1 kcal mol<sup>−1</sup> Å<sup>−1</sup> for forces. Notably, for the neutral <inline-formula><mml:math id="M147" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M148" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M149" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M150" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> systems, the reproduction errors are exceptionally low, with MAEs below 0.1 kcal mol<sup>−1</sup> and 0.1 kcal mol<sup>−1</sup> Å<sup>−1</sup>. The <inline-formula><mml:math id="M154" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN methodology proves particularly effective. It correctly predicts negligible corrections in the GFN1-xTB sanity check, and  consistently achieves the lowest errors when <inline-formula><mml:math id="M155" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c is the target level. The highest energy error was observed for the <inline-formula><mml:math id="M156" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M157" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> system trained with PaiNN on <inline-formula><mml:math id="M158" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c, as expected due to the lack of long-range interactions.</p>
      <p id="d2e3054">To better understand the distribution of these errors, we analyzed the energy deviations as a function of center-of-mass distance (Fig. <xref ref-type="fig" rid="F2"/>) for the <inline-formula><mml:math id="M159" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c target level of theory. For the <inline-formula><mml:math id="M160" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>-<inline-formula><mml:math id="M161" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> system, the errors are consistently low across the entire coordinate. By contrast, the <inline-formula><mml:math id="M162" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M163" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> system exhibited larger errors at short distances (<inline-formula><mml:math id="M164" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> <inline-formula><mml:math id="M165" display="inline"><mml:mo>&lt;</mml:mo></mml:math></inline-formula> 3 Å). These deviations correspond to the highly repulsive regime of the potential energy surface (PES), where steric hindrance drives the potential energy tens of kcal mol<sup>−1</sup> above the minimum. Consequently, the probability of visiting these configurations during standard MD is negligible.</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e3163">Electronic energy reproduction errors for machine learning models trained on <inline-formula><mml:math id="M167" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c data relative to the <inline-formula><mml:math id="M168" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c level of theory, shown as a function of the center-of-mass (COM) distance across the three studied systems: <inline-formula><mml:math id="M169" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M170" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> <bold>(a)</bold>, <inline-formula><mml:math id="M171" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M172" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> <bold>(b)</bold>, and <inline-formula><mml:math id="M173" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M174" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> <bold>(c)</bold>. Results are shown for AIMNet2, PaiNN, and <inline-formula><mml:math id="M175" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN.</p></caption>
          <graphic xlink:href="https://acp.copernicus.org/articles/26/7631/2026/acp-26-7631-2026-f02.png"/>

        </fig>

      <p id="d2e3301">In contrast, the ionic <inline-formula><mml:math id="M176" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M177" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> system illustrates the inherent limitations of applying a purely local representation to systems with strong long-range interactions. Although the electronic energy MAE for PaiNN appears relatively low, the distance-resolved error plot (Fig. <xref ref-type="fig" rid="F2"/>c) reveals significant deviations at separations <inline-formula><mml:math id="M178" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> <inline-formula><mml:math id="M179" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 10 Å. At these large distances, the training data already capture substantial stabilization energy due to long-range ion–dipole interactions, which also induce slight geometric distortions in the approaching collision partners. However, because PaiNN employs a strict 10 Å spatial cutoff, it evaluates the system as two completely isolated, non-interacting collision partners. During training, the model must map the significantly lowered potential energy of the interacting system to these isolated, slightly distorted molecular structures. Consequently, PaiNN erroneously learns to associate the long-range electrostatic stabilization entirely with these slight internal structural changes. In essence, the model is forced to view this distorted geometry as the lowest energy conformer of the isolated molecule, creating an artificial global minimum that fundamentally distorts the PES.</p>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Potentials of mean force</title>
<sec id="Ch1.S3.SS3.SSS1">
  <label>3.3.1</label><title>GFN1-xTB training data</title>
      <p id="d2e3364">The potential of mean force (PMF) along the center-of-mass distance represents the effective free energy averaged over all collision orientations accessed during the simulations, showing how the system's stability changes as the collision partners approach (see e.g., Fig. <xref ref-type="fig" rid="F3"/>). The well depth and shape provide information on the binding strength, while the shoulder towards larger distances reflects the strength of long-range interactions.</p>

      <fig id="F3"><label>Figure 3</label><caption><p id="d2e3371">Potentials of mean force (PMF) as a function of center-of-mass (COM) distance for the <inline-formula><mml:math id="M180" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M181" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M182" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M183" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M184" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M185" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> systems. The profiles compare the reference GFN1-xTB level of theory with predictions from AIMNet2, PaiNN, and <inline-formula><mml:math id="M186" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN models trained on GFN1-xTB data.</p></caption>
            <graphic xlink:href="https://acp.copernicus.org/articles/26/7631/2026/acp-26-7631-2026-f03.png"/>

          </fig>

      <p id="d2e3485">Although all the models achieved low mean absolute errors and included high-energy structures (from simulations at 500 K) in the training sets, the umbrella sampling simulations occasionally explored untrained regions of the PES. This caused a breakdown in dynamics and resulted in unphysical geometries. Including these trajectories introduced large errors into the predicted PMFs. Consequently, we implemented a script to filter these simulations by monitoring the distance between any two hydrogen atoms. We discarded any simulation where the maximum distance exceeded the window center by more than 8.0 Å or the minimum distance fell below 1.0 Å. While this script does not test for every possible failure, it is a reasonable compromise between filtering out erroneous simulations and efficiently automating the process over the large amount of data generated. Approximately half of the umbrella sampling calculations contained failed simulations. However, no more than three independent simulation runs were ever removed from a single window. This ensured that every window retained at least 7 simulations (500 000 steps each, saved every 250 steps), providing a dataset of at least 14 000 structures per window for constructing the PMF. A detailed list of removed simulations is provided in Sect. S4.</p>
      <p id="d2e3489">Figure <xref ref-type="fig" rid="F3"/> shows the PMFs for all three systems calculated using AIMNet2 and PaiNN (trained on GFN1-xTB), compared against the GFN1-xTB reference. For the <inline-formula><mml:math id="M187" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M188" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> system, the <inline-formula><mml:math id="M189" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN model trained on GFN1-xTB was again included as a sanity check. While this model should theoretically reproduce the GFN1-xTB reference PMF, small deviations are nonetheless expected due to the inherent uncertainty associated with finite umbrella sampling. Additionally, because the machine learning models lack training data in the highly repulsive, physically inaccessible regimes at short distances, much larger deviations can occur in these regions (see Fig. S3).</p>
      <p id="d2e3533">Table <xref ref-type="table" rid="T4"/> lists the root-mean-square errors (RMSEs) of the predicted PMFs relative to the reference. We report the RMSE only between the point where the PMF drops below zero in the short-range repulsive region and the first point it returns to zero in the long-range non-interacting region. At shorter distances, the steep energies of the repulsive wall lead to sparse training data coverage, which can result in localized high errors. However, because these configurations are physically inaccessible at atmospheric temperatures, excluding this region ensures the reported RMSE reflects the model's performance in the region relevant to the clustering dynamics. Conversely, we exclude the asymptotic long-range tail because the collision partners are essentially non-interacting here. Including an extensive non-interacting region, which is well-sampled and exhibits minimal energy variation, would disproportionately lower the average error, masking the model's performance in the interaction region. While the PMF should theoretically approach zero asymptotically, sampling noise causes the long-range zero-crossing to occur at finite distances (<inline-formula><mml:math id="M190" display="inline"><mml:mo lspace="0mm">&lt;</mml:mo></mml:math></inline-formula> 20 Å) for the systems studied here.</p>

<table-wrap id="T4" specific-use="star"><label>Table 4</label><caption><p id="d2e3548">Root-mean-square errors (RMSE) in kcal mol<sup>−1</sup> for the potentials of mean force (PMFs) predicted by the machine learning models relative to the GFN1-xTB reference. The RMSE is evaluated between the point where the PMF drops below zero in the short-range repulsive region and the first point it returns to zero in the long-range non-interacting region.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1">System</oasis:entry>

         <oasis:entry colname="col2">Method</oasis:entry>

         <oasis:entry colname="col3">Evaluation Range (Å)</oasis:entry>

         <oasis:entry colname="col4">RMSE (kcal mol<sup>−1</sup>)</oasis:entry>

       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="2"><inline-formula><mml:math id="M193" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M194" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col2">AIMNet2</oasis:entry>

         <oasis:entry rowsep="1" colname="col3" morerows="2">3.28–11.06</oasis:entry>

         <oasis:entry colname="col4">0.15</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">PaiNN</oasis:entry>

         <oasis:entry colname="col4">0.058</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2"><inline-formula><mml:math id="M195" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN</oasis:entry>

         <oasis:entry colname="col4">0.042</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="2"><inline-formula><mml:math id="M196" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M197" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col2">AIMNet2</oasis:entry>

         <oasis:entry rowsep="1" colname="col3" morerows="2">2.72–8.78</oasis:entry>

         <oasis:entry colname="col4">0.046</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">PaiNN</oasis:entry>

         <oasis:entry colname="col4">0.17</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2"><inline-formula><mml:math id="M198" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN</oasis:entry>

         <oasis:entry colname="col4">–</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col1" morerows="2"><inline-formula><mml:math id="M199" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M200" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col2">AIMNet2</oasis:entry>

         <oasis:entry colname="col3" morerows="2">2.90–15.00</oasis:entry>

         <oasis:entry colname="col4">0.20</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">PaiNN</oasis:entry>

         <oasis:entry colname="col4">0.090</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M201" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN</oasis:entry>

         <oasis:entry colname="col4">–</oasis:entry>

       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e3819">All models exhibit excellent agreement with the reference PMF. The highest observed RMSE is 0.20 kcal mol<sup>−1</sup> for AIMNet2 applied to the <inline-formula><mml:math id="M203" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M204" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> system, which is five times lower than the standard threshold for chemical accuracy (1 kcal mol<sup>−1</sup>). In other words, the reproduction error introduced by the machine learning models is negligible compared to generally accepted error margins in computational chemistry. PaiNN performs better than AIMNet2 for the <inline-formula><mml:math id="M206" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M207" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M208" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M209" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> systems, while the opposite is true for the <inline-formula><mml:math id="M210" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M211" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> system. Across the three systems, the average error is 0.13 kcal mol<sup>−1</sup> for AIMNet2 and 0.11 kcal mol<sup>−1</sup> for PaiNN. Thus, PaiNN performs slightly better overall, though the difference is marginal.</p>
      <p id="d2e3998">The performance of PaiNN on the <inline-formula><mml:math id="M214" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M215" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> system is somewhat surprising, given that it cannot capture interactions beyond its 10 Å local atomic environment cutoff. However, Fig. <xref ref-type="fig" rid="F3"/> shows that the PMF effectively decays to zero around 13 Å. As long as any two atoms between the collision partners remain within 10 Å, the message-passing algorithm treats the system as connected. Given that the sum of the hard-sphere radii for <inline-formula><mml:math id="M216" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M217" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> is approximately 3.89 Å <xref ref-type="bibr" rid="bib1.bibx28" id="paren.46"/>, PaiNN with cutoff 10 Å can model interactions up to at least 13 Å center-of-mass distance. We note, however, that while the PMF vanishes at 13 Å, this represents an average energy. Individual trajectories with specific orientations may still exhibit longer-range interactions.</p>
</sec>
<sec id="Ch1.S3.SS3.SSS2">
  <label>3.3.2</label><title><inline-formula><mml:math id="M218" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c training data</title>
      <p id="d2e4080">We subsequently trained AIMNet2, PaiNN, and <inline-formula><mml:math id="M219" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN on the higher-level <inline-formula><mml:math id="M220" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c data to generate the PMFs presented in Fig. <xref ref-type="fig" rid="F4"/>. The PMFs obtained with the GFN1-xTB method are shown for comparison. First, we observe that all three ML models are in excellent agreement with each other. While obtaining a reference PMF directly using the <inline-formula><mml:math id="M221" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c method is computationally prohibitive, the fact that these models, utilizing distinct algorithms, predict very similar PMFs strongly suggests that the <inline-formula><mml:math id="M222" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c potential energy surface is accurately reproduced.</p>

      <fig id="F4"><label>Figure 4</label><caption><p id="d2e4115">Potentials of mean force (PMF) as a function of center-of-mass (COM) distance for the <inline-formula><mml:math id="M223" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M224" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M225" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M226" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M227" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M228" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> systems. The profiles compare the reference GFN1-xTB level of theory with predictions from AIMNet2, PaiNN, and <inline-formula><mml:math id="M229" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN models trained on <inline-formula><mml:math id="M230" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c data.</p></caption>
            <graphic xlink:href="https://acp.copernicus.org/articles/26/7631/2026/acp-26-7631-2026-f04.png"/>

          </fig>

      <p id="d2e4236">Comparing the PMFs based on <inline-formula><mml:math id="M231" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c data to the GFN1-xTB reference, we observe that the shoulder regions are similar between methods, while the minima show significant differences. Most notably, <inline-formula><mml:math id="M232" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c predicts significantly different well depths: the minima for the <inline-formula><mml:math id="M233" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M234" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M235" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M236" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> systems are lower by 2.3 and 3.8 kcal mol<sup>−1</sup>, respectively, compared to GFN1-xTB. For the <inline-formula><mml:math id="M238" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M239" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> complex, the methods differ on both the position (shifted by 0.18 Å) and the depth (difference of 1.3 kcal mol<sup>−1</sup>) of the minimum, indicating distinct lowest free energy geometries.</p>
      <p id="d2e4376">In several of our recent studies, we have integrated the PMF well to obtain binding free energy estimates <xref ref-type="bibr" rid="bib1.bibx23 bib1.bibx30" id="paren.47"/>. While GFN1-xTB can capture correct qualitative trends, obtaining quantitatively accurate binding energies requires higher levels of theory. Due to the computational cost of running MD with these methods, ML approaches must be employed.</p>
</sec>
</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>Computational cost</title>
      <p id="d2e4391">We briefly discuss the training and evaluation of the computational costs for the three models. On an NVIDIA V100-16GB GPU, the AIMNet2, PaiNN, and <inline-formula><mml:math id="M241" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN models, trained on 20,000 <inline-formula><mml:math id="M242" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M243" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> structures, complete approximately 30, 6, and 6 epochs per GPU hour, respectively. For evaluation, we performed umbrella sampling for the <inline-formula><mml:math id="M244" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M245" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> system (at the 4.0 Å window) on an Intel Xeon Gold 6248R CPU. Under these conditions, AIMNet2, PaiNN, and <inline-formula><mml:math id="M246" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN achieved speeds of approximately 120 000, 130 000, and 65 000 steps per CPU hour, respectively. The performance of <inline-formula><mml:math id="M247" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN is inherently limited by the computational cost of the baseline method (here GFN1-xTB), as both the baseline energy and the ML correction must be evaluated at every step. We note that these timings are highly dependent on the specific system architecture and software implementation. Specifically, our group has more extensive experience optimizing PaiNN compared to AIMNet2. Consequently, our PaiNN implementation may be more streamlined. Therefore, these timings should be considered indicative rather than a rigorous benchmark.</p>
</sec>
<sec id="Ch1.S3.SS5">
  <label>3.5</label><title>Collision probabilities</title>
      <p id="d2e4490">Collision rate coefficients are calculated via Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>), where the collision probability <inline-formula><mml:math id="M248" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi mathvariant="normal">c</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is obtained from MD collision trajectory simulations over a relevant range of initial relative velocities <inline-formula><mml:math id="M249" display="inline"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and impact parameters <inline-formula><mml:math id="M250" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula>. Figure <xref ref-type="fig" rid="F5"/> compares the collision probabilities obtained from GFN1-xTB MD simulations with those from the AIMNet2 and PaiNN models trained on GFN1-xTB data. The AIMNet2 results are in excellent agreement with the GFN1-xTB reference, showing only a slightly lower collision probability in the tail towards higher <inline-formula><mml:math id="M251" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula>. Conversely, the PaiNN heat map clearly highlights the limitations of the local environment approximation for modeling collisions. Above the cutoff plus the sum of molecular radii (<inline-formula><mml:math id="M252" display="inline"><mml:mo lspace="0mm">∼</mml:mo></mml:math></inline-formula> 13.89 Å), the message-passing algorithm no longer detects interactions between the collision partners. Consequently, zero collisions are registered past 14 Å. We also note that even below 14 Å, PaiNN appears to underestimate the collision probability compared to the reference. Therefore, for systems with strong long-range interactions, it is necessary to employ a model that accounts for interactions beyond the local atomic environment cutoff, such as AIMNet2.</p>

      <fig id="F5"><label>Figure 5</label><caption><p id="d2e4556">Comparison of collision probabilities derived from molecular dynamics simulations for the <inline-formula><mml:math id="M253" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M254" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> system at 300 K. The heat maps show the probability distribution across impact parameter <inline-formula><mml:math id="M255" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> and initial relative velocity <inline-formula><mml:math id="M256" display="inline"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> for the reference GFN1-xTB method <bold>(a)</bold> and the AIMNet2 <bold>(b)</bold> and PaiNN <bold>(c)</bold> models trained on GFN1-xTB data.</p></caption>
          <graphic xlink:href="https://acp.copernicus.org/articles/26/7631/2026/acp-26-7631-2026-f05.png"/>

        </fig>

      <p id="d2e4622">The <inline-formula><mml:math id="M257" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M258" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M259" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M260" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> systems exhibit similar trends, although the discrepancies are less pronounced due to the weaker long-range interactions in these systems. These results are presented in Sect. S6.</p>
</sec>
<sec id="Ch1.S3.SS6">
  <label>3.6</label><title>Collision rate coefficients</title>
      <p id="d2e4701">Using the collision probabilities, we calculated the corresponding canonical collision rate coefficients via Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>). Table <xref ref-type="table" rid="T5"/> presents these values, comparing the performance of the three ML models against the GFN1-xTB reference and the classical OPLS-AA force field.</p>

<table-wrap id="T5" specific-use="star"><label>Table 5</label><caption><p id="d2e4711">Collision rate coefficients calculated at 300 K using AIMNet2, PaiNN, and <inline-formula><mml:math id="M261" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN compared to GFN1-xTB and OPLS-AA reference values. All values are in10<sup>−15</sup> m<sup>3</sup> s<sup>−1</sup>.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="7">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="center"/>
     <oasis:colspec colnum="3" colname="col3" align="center"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="center"/>
     <oasis:colspec colnum="6" colname="col6" align="center"/>
     <oasis:colspec colnum="7" colname="col7" align="center"/>
     <oasis:thead>
       <oasis:row>

         <oasis:entry colname="col1">System</oasis:entry>

         <oasis:entry rowsep="1" namest="col2" nameend="col3">Reference Methods </oasis:entry>

         <oasis:entry colname="col4">Training Data</oasis:entry>

         <oasis:entry rowsep="1" namest="col5" nameend="col7">ML Models </oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1"/>

         <oasis:entry colname="col2">GFN1-xTB</oasis:entry>

         <oasis:entry colname="col3">OPLS-AA</oasis:entry>

         <oasis:entry colname="col4"/>

         <oasis:entry colname="col5">AIMNet2</oasis:entry>

         <oasis:entry colname="col6">PaiNN</oasis:entry>

         <oasis:entry colname="col7"><inline-formula><mml:math id="M265" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN</oasis:entry>

       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="1"><inline-formula><mml:math id="M266" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M267" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry rowsep="1" colname="col2" morerows="1">0.771</oasis:entry>

         <oasis:entry rowsep="1" colname="col3" morerows="1">0.796</oasis:entry>

         <oasis:entry colname="col4">GFN1-xTB</oasis:entry>

         <oasis:entry colname="col5">0.744</oasis:entry>

         <oasis:entry colname="col6">0.760</oasis:entry>

         <oasis:entry colname="col7">0.774</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col4"><inline-formula><mml:math id="M268" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c</oasis:entry>

         <oasis:entry colname="col5">0.708</oasis:entry>

         <oasis:entry colname="col6">0.750</oasis:entry>

         <oasis:entry colname="col7">0.761</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="1"><inline-formula><mml:math id="M269" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M270" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry rowsep="1" colname="col2" morerows="1">0.721</oasis:entry>

         <oasis:entry rowsep="1" colname="col3" morerows="1">0.741</oasis:entry>

         <oasis:entry colname="col4">GFN1-xTB</oasis:entry>

         <oasis:entry colname="col5">0.797</oasis:entry>

         <oasis:entry colname="col6">0.725</oasis:entry>

         <oasis:entry colname="col7">–</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col4"><inline-formula><mml:math id="M271" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c</oasis:entry>

         <oasis:entry colname="col5">0.717</oasis:entry>

         <oasis:entry colname="col6">0.724</oasis:entry>

         <oasis:entry colname="col7">0.717</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col1" morerows="1"><inline-formula><mml:math id="M272" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M273" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col2" morerows="1">2.27</oasis:entry>

         <oasis:entry colname="col3" morerows="1">2.60</oasis:entry>

         <oasis:entry colname="col4">GFN1-xTB</oasis:entry>

         <oasis:entry colname="col5">2.24</oasis:entry>

         <oasis:entry colname="col6">1.24</oasis:entry>

         <oasis:entry colname="col7">–</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col4"><inline-formula><mml:math id="M274" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c</oasis:entry>

         <oasis:entry colname="col5">3.22</oasis:entry>

         <oasis:entry colname="col6">1.26</oasis:entry>

         <oasis:entry colname="col7">2.27</oasis:entry>

       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e5054">We note that while pressure-independent canonical collision rate coefficients, as obtained here, are commonly employed in cluster distribution dynamics models, this relies on the assumption that the system is strongly bound and has enough degrees of freedom to effectively distribute excess collisional energy <xref ref-type="bibr" rid="bib1.bibx9" id="paren.48"/>. This assumption is likely valid for larger clusters, given that the number of degrees of freedom strictly increases with cluster size, and binding energies generally do as well. However, dimers with fewer degrees of freedom that are less strongly bound, such as the <inline-formula><mml:math id="M275" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M276" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> system, might not effectively thermalize immediately after collision. Consequently, for these collisions, the true collision rate coefficient could be smaller than the coefficient calculated here, due to pressure dependence.</p>
      <p id="d2e5093">To contextualize the obtained collision rate coefficients, it is important to note that the accuracy of particle formation rates in cluster distribution dynamics simulations depends on both the collision and evaporation rate coefficients. Because evaporation rates depend exponentially on binding free energies, uncertainties in binding free energies typically outweigh errors in collision rate coefficients. An error of just 1 kcal mol<sup>−1</sup> in binding free energies introduces a factor of <inline-formula><mml:math id="M278" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 5 uncertainty in the evaporation rate. As such, we consider an error of a factor of 1.5 in the collision rate coefficients acceptable. In the worst-case scenario where collision and evaporation rate coefficient errors compound in the same direction, a factor of 1.5 collision rate coefficient error would still result in an overall uncertainty in the particle formation rates of less than an order of magnitude.</p>
      <p id="d2e5115">Evaluated against this threshold, PaiNN yields notably lower rate coefficients for the charged <inline-formula><mml:math id="M279" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M280" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> system. For both GFN1-xTB and <inline-formula><mml:math id="M281" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c training data, the model underestimates the  rates by nearly 50 % (roughly a factor of 2) relative to the GFN1-xTB reference. As discussed in Sect. <xref ref-type="sec" rid="Ch1.S3.SS5"/>, this substantial deviation stems from the model's inability to detect collisions beyond its 10 Å cutoff, effectively neglecting the significant contribution of the long-range tail.</p>
      <p id="d2e5156">Conversely, for the neutral systems (<inline-formula><mml:math id="M282" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M283" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M284" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M285" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>), the ML models trained on GFN1-xTB data exhibit excellent agreement with the reference calculations. All three architectures reproduce the GFN1-xTB reference rate coefficients closely, with the largest deviation observed for AIMNet2 applied to the <inline-formula><mml:math id="M286" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M287" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> system (<inline-formula><mml:math id="M288" display="inline"><mml:mo lspace="0mm">∼</mml:mo></mml:math></inline-formula> 10 % discrepancy).</p>
      <p id="d2e5271">When targeting the higher-level <inline-formula><mml:math id="M289" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c theory, the predicted collision rate coefficients are generally similar to the GFN1-xTB reference values. The notable exception is the <inline-formula><mml:math id="M290" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M291" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> system modeled with AIMNet2, where the predicted rate is significantly higher. An analysis of the collision probabilities (Figs. S6c and S4c) shows that this difference results primarily from increased collisions at low initial relative velocities. In the GFN1-xTB reference, collision probabilities in this region are low even at small impact parameters. This behavior has previously been attributed to repulsive electrostatic interactions as the collision partners approach <xref ref-type="bibr" rid="bib1.bibx14" id="paren.49"/>. However, AIMNet2 trained on the higher-level <inline-formula><mml:math id="M292" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c data does not exhibit this repulsion. This suggests that the repulsive feature observed in GFN1-xTB could be an artifact of the semi-empirical method failing to accurately describe the long-range potential.</p>
      <p id="d2e5320">The classical OPLS-AA force field also produces collision rates close to the GFN1-xTB reference values. For the charged <inline-formula><mml:math id="M293" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M294" display="inline"><mml:mrow class="chem"><mml:msubsup><mml:mi mathvariant="normal">HSO</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>-</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> system, the OPLS-AA result lies between the GFN1-xTB reference and the AIMNet2 prediction trained on <inline-formula><mml:math id="M295" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c. Given that OPLS-AA relies on fixed partial charges, this agreement suggests that explicit dynamic electron density reorganization is not strictly necessary to model the collision process, provided the underlying electrostatic potential is sufficiently accurate. Simple fixed partial charges appear sufficient to model the approach, a finding consistent with observations by <xref ref-type="bibr" rid="bib1.bibx20" id="text.50"/>.</p>
      <p id="d2e5362">Despite this accuracy in capturing collisions, classical force fields are insufficient for modeling the full nucleation process, as short-range interactions, particularly proton transfers, require an explicit quantum mechanical treatment. Proton transfers stabilize acid–base clusters (e.g., between sulfuric acid and amines) which play a crucial role in atmospheric particle formation. Classical force fields like OPLS-AA are fundamentally unable to model these proton transfer events. Figure <xref ref-type="fig" rid="F6"/> illustrates the geometry of the <inline-formula><mml:math id="M296" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>–<inline-formula><mml:math id="M297" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> system immediately before a collision and after cluster formation. A proton initially bound to <inline-formula><mml:math id="M298" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="normal">SO</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> transfers to <inline-formula><mml:math id="M299" display="inline"><mml:mrow class="chem"><mml:mi mathvariant="normal">NH</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub><mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> during the clustering process, eventually becoming separated by more than 5 Å from its original oxygen atom. Simulating this separation is impossible with classical harmonic bond potentials. In contrast, both the GFN1-xTB method and all three tested ML models successfully capture these dynamic proton transfers.</p>

      <fig id="F6"><label>Figure 6</label><caption><p id="d2e5442">Stick-and-ball representations of two frames from a molecular dynamics collision trajectory simulation of the sulfuric acid–dimethylamine system. Frame 93 shows the system just before collision, with hydrogen H<sub>1</sub> bonded to oxygen O<sub>3</sub> of sulfuric acid (highlighted with green circles). Output frame 592 shows a post-collision configuration, where hydrogen H<sub>1</sub> is now bonded to nitrogen N<sub>1</sub> of dimethylamine, while oxygen O<sub>3</sub> of sulfuric acid is more than 5 Å away. Atom color code: sulfur (yellow), oxygen (red), nitrogen (blue), carbon (gray), and hydrogen (white).</p></caption>
          <graphic xlink:href="https://acp.copernicus.org/articles/26/7631/2026/acp-26-7631-2026-f06.png"/>

        </fig>

</sec>
</sec>
<sec id="Ch1.S4" sec-type="conclusions">
  <label>4</label><title>Conclusions</title>
      <p id="d2e5505">With recent advances in machine learning interatomic potentials, molecular dynamics has rapidly evolved into a field capable of directly modeling cluster formation for large systems over long timescales, all while reproducing the accuracy of high-level quantum chemistry theory. While the low concentrations of cluster-forming vapors in the atmosphere still necessitate approximations (such as artificially increased concentrations), the ability to model the inherently dynamic cluster formation process at this level of theory is a significant breakthrough. However, machine learning models are frequently treated as black boxes. It is crucial that the increased accuracy in modeling complex short-range interactions, such as cluster reconfiguration and proton transfer, does not come at the expense of the long-range interactions that govern the initial collisions.</p>
      <p id="d2e5508">To address this, we evaluated the AIMNet2 and PaiNN machine learning methodologies, as well as a <inline-formula><mml:math id="M305" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>-PaiNN method (using PaiNN to learn the correlation between GFN1-xTB and high-level quantum theory), for their ability to reproduce collision dynamics for sulfuric acid with either  sulfuric acid, dimethylamine, and bisulfate. All models achieved low mean absolute errors on test sets and showed excellent agreement with GFN1-xTB reference potentials of mean force.</p>
      <p id="d2e5518">However, when comparing collision rate coefficients, we observe significant differences. For the charged sulfuric acid–bisulfate system, PaiNN predicted collision rate coefficients approximately 50 % lower than the reference method. This error arises from the strictly short-ranged nature of the local atomic environment approximation. PaiNN only considers interactions up to a specific cutoff (here, 10 Å). In charged systems, strong long-range interactions can induce collisions from distances far beyond this cutoff. AIMNet2 avoids this issue by augmenting its local short-ranged modeling with explicit long-range Coulombic interactions (via learned partial charges) and dispersion corrections, allowing it to accurately replicate the reference rates.</p>
      <p id="d2e5521">Our intention is not to highlight a specific failing of PaiNN, but rather to use it as a case study. We demonstrate that low mean absolute errors on a static test set do not automatically guarantee that a model is fit for purpose in dynamic simulations. A model must always be validated against reference data for the specific physical properties of interest. We also note that while we used a generous 10 Å cutoff, many standard implementations use 5 Å. At such small cutoffs, similar discrepancies would likely appear even for neutral systems. That being said, we note that the PaiNN architecture could be adapted to also include explicit long-range interactions, similar to AIMNet2.</p>
      <p id="d2e5525">In conclusion, we urge researchers to validate trained machine learning models beyond simple scalar metrics like mean absolute errors and root mean square errors, as these condensed measures can mask specific physical limitations. In future work, we will employ AIMNet2 and PaiNN, applying each where it is most appropriate, to directly study the cluster formation of nucleation precursor vapors with accurate descriptions of both short- and long-range interactions.</p>
</sec>

      
      </body>
    <back><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d2e5533">The training datasets, trained models, and AIMNet2 model definition and training configuration file, and molecular dynamics collision trajectory submission scripts are available in the Atmospheric Cluster Database (ACDB) (Elm, 2019) at:  <uri>https://github.com/elmjonas/ACDB/tree/master/Articles/neefjes26_long_range_NN</uri> (last access: 29 January 2026; <ext-link xlink:href="https://doi.org/10.5281/zenodo.11422835" ext-link-type="DOI">10.5281/zenodo.11422835</ext-link>; Elm and Kubecka, 2024).</p>
  </notes><app-group>
        <supplementary-material position="anchor"><p id="d2e5542">The supplement related to this article is available online at <inline-supplementary-material xlink:href="https://doi.org/10.5194/acp-26-7631-2026-supplement" xlink:title="pdf">https://doi.org/10.5194/acp-26-7631-2026-supplement</inline-supplementary-material>.</p></supplementary-material>
        </app-group><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e5551">Conceptualization: J.E.; methodology: I.N., J.K., J.E.; formal analysis: I.N.; investigation: I.N., J.K.; resources: J.E.; Writing – original draft: I.N.; writing – review and editing: I.N., J.K., J.E.; visualization: I.N.; project administration: J.E.; funding acquisition: J.K., J.E; supervision: J.E.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e5557">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e5563">Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.  Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e5572">The numerical results presented in this work were obtained at the Centre for Scientific Computing, Aarhus (<uri>https://phys.au.dk/forskning/faciliteter/cscaa/</uri>, last access: 4 February 2026). We thank Haide Wu, Olexandr Isayev, and Roman Zubatiuk for their support regarding AIMNet2.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e5580">This project was funded by the European Union (ERC, ExploreFNP, project 101040353, and MSCA, HYDRO-CLUSTER, project 101105506). This work was also supported by the Danish National Research Foundation (DNRF172) through the Center of Excellence for Chemistry of Clouds.</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e5586">This paper was edited by John Plane and reviewed by Patrick Rinke and three anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Anstine et al.(2025)</label><mixed-citation>Anstine, D. M., Zubatyuk, R., and Isayev, O.: AIMNet2: a neural network  potential to meet your neutral, charged, organic, and elemental-organic  needs, Chem. Sci., 16, 10228–10244, <ext-link xlink:href="https://doi.org/10.1039/D4SC08572H" ext-link-type="DOI">10.1039/D4SC08572H</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Biewald(2020)</label><mixed-citation>Biewald, L.: Experiment Tracking with Weights and Biases, <uri>https://www.wandb.com/</uri> (last access: 23 January 2026), 2020.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Bogojeski et al.(2020)</label><mixed-citation>Bogojeski, M., Vogt-Maranto, L., Tuckerman, M. E., Müller, K.-R., and  Burke, K.: Quantum chemical accuracy from density functional approximations  via machine learning, Nat. Commun., 11, 5223,  <ext-link xlink:href="https://doi.org/10.1038/s41467-020-19093-1" ext-link-type="DOI">10.1038/s41467-020-19093-1</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Bussi et al.(2007)</label><mixed-citation>Bussi, G., Donadio, D., and Parrinello, M.: Canonical sampling through velocity rescaling, J. Chem. Phys., 126, 014101, <ext-link xlink:href="https://doi.org/10.1063/1.2408420" ext-link-type="DOI">10.1063/1.2408420</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Chen et al.(2021)</label><mixed-citation>Chen, D., Rojas, M., Samset, B. H., Cobb, K., Niang, A. D., Edwards, P., Emori, S., Faria, S., Hawkins, E., Hope, P., Huybrechts, P., Meinshausen, M.,  Mustafa, S., Plattner, G.-K., and Tréguier, A.-M.: Framing, Context, and Methods, in: Climate Change 2021: The Physical Science Basis. Working Group I Contribution to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change, edited by: Masson-Delmotte, V., Zhai, P., Pirani, A., Connors, S., Péan, C., Berger, S., Caud, N., Chen, Y., Goldfarb, L., Gomis, M., Huang, M., Leitzell, K., Lonnoy, E., Matthews, J., Maycock, T., Waterfield, T., Yelekçi, O., Yu, R., and Zhou, B., Cambridge University Press, Cambridge, UK, 147–286, <ext-link xlink:href="https://doi.org/10.1017/9781009157896.003" ext-link-type="DOI">10.1017/9781009157896.003</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Ehlert(2022)</label><mixed-citation>Ehlert, S.: TBlite, Version 0.2.1, GitHub [code], <uri>https://github.com/tblite/tblite</uri> (last access: 29 January 2026), 2022.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Elm(2019)</label><mixed-citation>Elm, J.: An Atmospheric Cluster Database Consisting of Sulfuric Acid, Bases, Organics, and Water, ACS Omega, 4, 10965–10974, <ext-link xlink:href="https://doi.org/10.1021/acsomega.9b00860" ext-link-type="DOI">10.1021/acsomega.9b00860</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Elm and Kubečka(2024)</label><mixed-citation>Elm, J. and Kubečka, J.: Atmospheric Cluster Database (ACDB) (v2.0), Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.11422835" ext-link-type="DOI">10.5281/zenodo.11422835</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Elm et al.(2020)</label><mixed-citation>Elm, J., Kubečka, J., Besel, V., Jääskeläinen, M. J., Halonen, R., Kurtén, T., and Vehkamäki, H.: Modeling the formation and growth of atmospheric molecular clusters: A review, J. Aerosol Sci., 149, 105621,  <ext-link xlink:href="https://doi.org/10.1016/j.jaerosci.2020.105621" ext-link-type="DOI">10.1016/j.jaerosci.2020.105621</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Gan et al.(2013)</label><mixed-citation>Gan, W. Q., FitzGerald, J. M., Carlsten, C., Sadatsafavi, M., and Brauer, M.: Associations of Ambient Air Pollution with Chronic Obstructive Pulmonary  Disease Hospitalization and Mortality, Am. J. Resp. Crit. Care, 187,  721–727, <ext-link xlink:href="https://doi.org/10.1164/rccm.201211-2004oc" ext-link-type="DOI">10.1164/rccm.201211-2004oc</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Grimme et al.(2010)</label><mixed-citation>Grimme, S., Antony, J., Ehrlich, S., and Krieg, H.: A consistent and accurate  <italic>ab initio</italic> parametrization of density functional dispersion correction (DFT-D)  for the 94 elements H-Pu, J. Chem. Phys., 132, 154104,  <ext-link xlink:href="https://doi.org/10.1063/1.3382344" ext-link-type="DOI">10.1063/1.3382344</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Grimme et al.(2011)</label><mixed-citation>Grimme, S., Ehrlich, S., and Goerigk, L.: Effect of the damping function in  dispersion corrected density functional theory, J. Comput. Chem., 32,  1456–1465, <ext-link xlink:href="https://doi.org/10.1002/jcc.21759" ext-link-type="DOI">10.1002/jcc.21759</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Grimme et al.(2017)</label><mixed-citation>Grimme, S., Bannwarth, C., and Shushkov, P.: A Robust and Accurate  Tight-Binding Quantum Chemical Method for Structures, Vibrational  Frequencies, and Noncovalent Interactions of Large Molecular Systems  Parametrized for All spd-Block Elements (<inline-formula><mml:math id="M306" display="inline"><mml:mrow><mml:mi>Z</mml:mi><mml:mo>=</mml:mo></mml:mrow></mml:math></inline-formula> 1–86), J. Chem. Theory Comput., 13, 1989–2009, <ext-link xlink:href="https://doi.org/10.1021/acs.jctc.7b00118" ext-link-type="DOI">10.1021/acs.jctc.7b00118</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Halonen et al.(2019)</label><mixed-citation>Halonen, R., Zapadinsky, E., Kurtén, T., Vehkamäki, H., and Reischl, B.: Rate enhancement in collisions of sulfuric acid molecules due to long-range intermolecular forces, Atmos. Chem. Phys., 19, 13355–13366, <ext-link xlink:href="https://doi.org/10.5194/acp-19-13355-2019" ext-link-type="DOI">10.5194/acp-19-13355-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Haywood and Boucher(2000)</label><mixed-citation>Haywood, J. and Boucher, O.: Estimates of the direct and indirect radiative  forcing due to tropospheric aerosols: A review, Rev. Geophys., 38, 513–543,  <ext-link xlink:href="https://doi.org/10.1029/1999RG000078" ext-link-type="DOI">10.1029/1999RG000078</ext-link>, 2000.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Hjorth Larsen et al.(2017)</label><mixed-citation>Hjorth Larsen, A., Jørgen Mortensen, J., Blomqvist, J., Castelli, I. E.,  Christensen, R., Dułak, M., Friis, J., Groves, M. N., Hammer, B., Hargus, C., Hermes, E. D., Jennings, P. C., Bjerre Jensen, P., Kermode, J., Kitchin,  J. R., Leonhard Kolsbjerg, E., Kubal, J., Kaasbjerg, K., Lysgaard, S., Bergmann Maronsson, J., Maxson, T., Olsen, T., Pastewka, L., Peterson, A., Rostgaard, C., Schiøtz, J., Schütt, O., Strange, M., Thygesen, K. S., Vegge, T., Vilhelmsen, L., Walter, M., Zeng, Z., and Jacobsen, K. W.: The atomic simulation environment – a Python library for working with atoms, J. Phys.-Condens. Mat., 29, 273002, <ext-link xlink:href="https://doi.org/10.1088/1361-648X/aa680e" ext-link-type="DOI">10.1088/1361-648X/aa680e</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Jiang et al.(2022)</label><mixed-citation>Jiang, S., Liu, Y.-R., Huang, T., Feng, Y.-J., Wang, C.-Y., Wang, Z.-Q., Ge,  B.-J., Liu, Q.-S., Guang, W.-R., and Huang, W.: Towards fully ab initio  simulation of atmospheric aerosol nucleation, Nat. Commun., 13, 6067,  <ext-link xlink:href="https://doi.org/10.1038/s41467-022-33783-y" ext-link-type="DOI">10.1038/s41467-022-33783-y</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Jiang et al.(2023)</label><mixed-citation>Jiang, S., Liu, Y.-R., Wang, C.-Y., and Huang, T.: Benchmarking general neural network potential ANI-2x on aerosol nucleation molecular clusters, Int. J. Quantum Chem., 123, e27087, <ext-link xlink:href="https://doi.org/10.1002/qua.27087" ext-link-type="DOI">10.1002/qua.27087</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Jorgensen et al.(1996)</label><mixed-citation>Jorgensen, W. L., Maxwell, D. S., and Tirado-Rives, J.: Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids, J. Am. Chem. Soc., 118, 11225–11236,  <ext-link xlink:href="https://doi.org/10.1021/ja9621760" ext-link-type="DOI">10.1021/ja9621760</ext-link>, 1996.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Knattrup et al.(2025)</label><mixed-citation>Knattrup, Y., Neefjes, I., Kubečka, J., and Elm, J.: Growth of atmospheric freshly nucleated particles: a semi-empirical molecular dynamics study, Aerosol Research, 3, 237–251, <ext-link xlink:href="https://doi.org/10.5194/ar-3-237-2025" ext-link-type="DOI">10.5194/ar-3-237-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Kubečka et al.(2019)</label><mixed-citation>Kubečka, J., Besel, V., Kurtén, T., Myllys, N., and Vehkamäki, H.:  Configurational Sampling of Noncovalent (Atmospheric) Molecular Clusters:  Sulfuric Acid and Guanidine, J. Phys. Chem. A, 123, 6022–6033,  <ext-link xlink:href="https://doi.org/10.1021/acs.jpca.9b03853" ext-link-type="DOI">10.1021/acs.jpca.9b03853</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Kubečka et al.(2024)</label><mixed-citation>Kubečka, J., Ayoubi, D., Tang, Z., Knattrup, Y., Engsvang, M., Wu, H., and  Elm, J.: Accurate modeling of the potential energy surface of atmospheric  molecular clusters boosted by neural networks, Environ. Sci.: Adv., 3,  1438–1451, <ext-link xlink:href="https://doi.org/10.1039/D4VA00255E" ext-link-type="DOI">10.1039/D4VA00255E</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Kubečka et al.(2025)</label><mixed-citation>Kubečka, J., Knattrup, Y., Trolle, G. B., Reischl, B., Lykke-Møller, A. S.,  Elm, J., and Neefjes, I.: Thermodynamics of molecular binding and clustering  in the atmosphere revealed through conventional and ML-enhanced umbrella  sampling, ChemRxiv [preprint], <ext-link xlink:href="https://doi.org/10.26434/chemrxiv-2025-b5xxr" ext-link-type="DOI">10.26434/chemrxiv-2025-b5xxr</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Kulmala et al.(2013)</label><mixed-citation>Kulmala, M., Kontkanen, J., Junninen, H., Lehtipalo, K., Manninen, H. E.,  Nieminen, T., Petäjä, T., Sipilä, M., Schobesberger, S., Rantala, P.,  Franchin, A., Jokinen, T., Järvinen, E., Äijälä, M., Kangasluoma, J.,  Hakala, J., Aalto, P. P., Paasonen, P., Mikkilä, J., Vanhanen, J., Aalto,  J., Hakola, H., Makkonen, U., Ruuskanen, T., Mauldin, R. L., Duplissy, J.,  Vehkamäki, H., Bäck, J., Kortelainen, A., Riipinen, I., Kurtén, T.,  Johnston, M. V., Smith, J. N., Ehn, M., Mentel, T. F., Lehtinen, K. E. J.,  Laaksonen, A., Kerminen, V.-M., and Worsnop, D. R.: Direct Observations of  Atmospheric Aerosol Nucleation, Science, 339, 943–946,  <ext-link xlink:href="https://doi.org/10.1126/science.1227385" ext-link-type="DOI">10.1126/science.1227385</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Liu and Jiang(2025)</label><mixed-citation>Liu, Y.-R. and Jiang, Y.: Predicting Composition Evolution for a Sulfuric  Acid-Dimethylamine System from Monomer to Nanoparticle Using Machine  Learning, J. Phys. Chem. A, 129, 222–231, <ext-link xlink:href="https://doi.org/10.1021/acs.jpca.4c06062" ext-link-type="DOI">10.1021/acs.jpca.4c06062</ext-link>,  2025.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>McGrath et al.(2012)</label><mixed-citation>McGrath, M. J., Olenius, T., Ortega, I. K., Loukonen, V., Paasonen, P., Kurtén, T., Kulmala, M., and Vehkamäki, H.: Atmospheric Cluster Dynamics Code: a flexible method for solution of the birth-death equations, Atmos. Chem. Phys., 12, 2345–2355, <ext-link xlink:href="https://doi.org/10.5194/acp-12-2345-2012" ext-link-type="DOI">10.5194/acp-12-2345-2012</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Müller et al.(2023)</label><mixed-citation>Müller, M., Hansen, A., and Grimme, S.: <inline-formula><mml:math id="M307" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula>B97X-3c: A composite  range-separated hybrid DFT method with a molecule-optimized polarized valence  double-<inline-formula><mml:math id="M308" display="inline"><mml:mi mathvariant="italic">ζ</mml:mi></mml:math></inline-formula> basis set, J. Chem. Phys., 158, 014103,  <ext-link xlink:href="https://doi.org/10.1063/5.0133026" ext-link-type="DOI">10.1063/5.0133026</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Neefjes et al.(2022)</label><mixed-citation>Neefjes, I., Halonen, R., Vehkamäki, H., and Reischl, B.: Modeling approaches for atmospheric ion–dipole collisions: all-atom trajectory simulations and central field methods, Atmos. Chem. Phys., 22, 11155–11172, <ext-link xlink:href="https://doi.org/10.5194/acp-22-11155-2022" ext-link-type="DOI">10.5194/acp-22-11155-2022</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Neefjes et al.(2025)</label><mixed-citation>Neefjes, I., Reischl, B., and Yang, H.: Comparison of collision rate  coefficient model predictions for different interaction strengths and  temperatures, J. Aerosol Sci., 189, 106638,  <ext-link xlink:href="https://doi.org/10.1016/j.jaerosci.2025.106638" ext-link-type="DOI">10.1016/j.jaerosci.2025.106638</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Neefjes et al.(2026)</label><mixed-citation>Neefjes, I., Knattrup, Y., Wu, H., Trolle, G. B., Elm, J., and Kubečka, J.: Thermodynamic benchmarking of hydrated atmospheric clusters in early particle formation, Aerosol Research, 4, 1–22, <ext-link xlink:href="https://doi.org/10.5194/ar-4-1-2026" ext-link-type="DOI">10.5194/ar-4-1-2026</ext-link>, 2026.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Neese(2012)</label><mixed-citation>Neese, F.: The ORCA program system, WIRES Comput. Molec. Sci., 2, 73–78,  <ext-link xlink:href="https://doi.org/10.1002/wcms.81" ext-link-type="DOI">10.1002/wcms.81</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Plimpton(1995)</label><mixed-citation>Plimpton, S.: Fast Parallel Algorithms for Short-Range Molecular Dynamics,  J. Comput. Phys., 117, 1–19, <ext-link xlink:href="https://doi.org/10.1006/jcph.1995.1039" ext-link-type="DOI">10.1006/jcph.1995.1039</ext-link>, 1995.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Ramakrishnan et al.(2015) </label><mixed-citation>Ramakrishnan, R., Dral, P. O., Rupp, M., and von Lilienfeld, O. A.: Big Data  Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach, J. Chem. Theory Comput., 11, 2087–2096, <ext-link xlink:href="https://doi.org/10.1021/acs.jctc.5b00099" ext-link-type="DOI">10.1021/acs.jctc.5b00099</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Schütt et al.(2021)</label><mixed-citation>Schütt, K. T., Unke, O. T., and Gastegger, M.: Equivariant message  passing for the prediction of tensorial properties and molecular spectra,  arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.2102.03150" ext-link-type="DOI">10.48550/arXiv.2102.03150</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Schütt et al.(2024)</label><mixed-citation>Schütt, K. T., Hessmann, S. S. P., Gebauer, N. W. A., Lederer, J., and  Gastegger, M.: SchNetPack 2.0: A neural network toolbox for atomistic machine  learning, Version 2.1.1, GitHub [code], <uri>https://github.com/atomistic-machine-learning/schnetpack</uri>  (last access: 29 January 2026), 2024.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Sipilä et al.(2010)</label><mixed-citation>Sipilä, M., Berndt, T., Petäjä, T., Brus, D., Vanhanen, J., Stratmann,  F., Patokoski, J., Mauldin III, R. L., Hyvärinen, A.-P., Lihavainen, H., and Kulmala, M.: The Role of Sulfuric Acid in Atmospheric Nucleation, Science, 327, 1243–1246, <ext-link xlink:href="https://doi.org/10.1126/science.1180315" ext-link-type="DOI">10.1126/science.1180315</ext-link>, 2010. </mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Stroet and Deplazes(2016)</label><mixed-citation>Stroet, M. and Deplazes, E.: Umbrella Integration: Initial Version, Version v0.1, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.164996" ext-link-type="DOI">10.5281/zenodo.164996</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Stuke et al.(2021)</label><mixed-citation>Stuke, A., Rinke, P., and Todorović, M.: Efficient hyperparameter tuning  for kernel ridge regression with Bayesian optimization, Mach. Learn.: Sci. Technol., 2, 035022, <ext-link xlink:href="https://doi.org/10.1088/2632-2153/abee59" ext-link-type="DOI">10.1088/2632-2153/abee59</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Thompson et al.(2022)</label><mixed-citation>Thompson, A. P., Aktulga, H. M., Berger, R., Bolintineanu, D. S., Brown, W. M., Crozier, P. S., in 't Veld, P. J., Kohlmeyer, A., Moore, S. G., Nguyen, T. D., Shan, R., Stevens, M. J., Tranchida, J., Trott, C., and Plimpton, S. J.: LAMMPS – a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales, Comput. Phys. Commun., 271, 108171, <ext-link xlink:href="https://doi.org/10.1016/j.cpc.2021.108171" ext-link-type="DOI">10.1016/j.cpc.2021.108171</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>Tikkanen et al.(2025)</label><mixed-citation>Tikkanen, V., Yang, H., Vehkamäki, H., and Reischl, B.: Gas-phase collision rate enhancement factors for acid–base clusters up to 2 nm in diameter from atomistic simulation and the interacting hard-sphere model, Atmos. Chem. Phys., 25, 17237–17251, <ext-link xlink:href="https://doi.org/10.5194/acp-25-17237-2025" ext-link-type="DOI">10.5194/acp-25-17237-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Torrie and Valleau(1977)</label><mixed-citation>Torrie, G. and Valleau, J.: Nonphysical sampling distributions in Monte Carlo  free-energy estimation: Umbrella sampling, J. Comput. Phys., 23, 187–199,  <ext-link xlink:href="https://doi.org/10.1016/0021-9991(77)90121-8" ext-link-type="DOI">10.1016/0021-9991(77)90121-8</ext-link>, 1977.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Tröstl et al.(2016)</label><mixed-citation>Tröstl, J., Chuang, W. K., Gordon, H., et al.: The Role of  Low-Volatility Organic Compounds in Initial Particle Growth in the Atmosphere, Nature, 533, 527–531, <ext-link xlink:href="https://doi.org/10.1038/nature18271" ext-link-type="DOI">10.1038/nature18271</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Yang et al.(2023)</label><mixed-citation>Yang, H., Neefjes, I., Tikkanen, V., Kubečka, J., Kurtén, T., Vehkamäki, H., and Reischl, B.: Collision-sticking rates of acid–base clusters in the gas phase determined from atomistic simulation and a novel analytical interacting hard-sphere model, Atmos. Chem. Phys., 23, 5993–6009, <ext-link xlink:href="https://doi.org/10.5194/acp-23-5993-2023" ext-link-type="DOI">10.5194/acp-23-5993-2023</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Zhang et al.(2012)</label><mixed-citation>Zhang, R., Khalizov, A., Wang, L., Hu, M., and Xu, W.: Nucleation and Growth  of Nanoparticles in the Atmosphere, Chem. Rev., 112, 957–2011,  <ext-link xlink:href="https://doi.org/10.1021/cr2001756" ext-link-type="DOI">10.1021/cr2001756</ext-link>, 2012.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Machine learning interatomic potentials with accurate long-range interactions for molecular dynamics collision simulations of atmospherically-relevant molecules</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Anstine et al.(2025)</label><mixed-citation>
      
Anstine, D. M., Zubatyuk, R., and Isayev, O.: AIMNet2: a neural network  potential to meet your neutral, charged, organic, and elemental-organic  needs, Chem. Sci., 16, 10228–10244, <a href="https://doi.org/10.1039/D4SC08572H" target="_blank">https://doi.org/10.1039/D4SC08572H</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Biewald(2020)</label><mixed-citation>
      
Biewald, L.: Experiment Tracking with Weights and Biases, <a href="https://www.wandb.com/" target="_blank"/> (last access: 23 January 2026), 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Bogojeski et al.(2020)</label><mixed-citation>
      
Bogojeski, M., Vogt-Maranto, L., Tuckerman, M. E., Müller, K.-R., and  Burke, K.: Quantum chemical accuracy from density functional approximations  via machine learning, Nat. Commun., 11, 5223,  <a href="https://doi.org/10.1038/s41467-020-19093-1" target="_blank">https://doi.org/10.1038/s41467-020-19093-1</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Bussi et al.(2007)</label><mixed-citation>
      
Bussi, G., Donadio, D., and Parrinello, M.: Canonical sampling through velocity rescaling, J. Chem. Phys., 126, 014101, <a href="https://doi.org/10.1063/1.2408420" target="_blank">https://doi.org/10.1063/1.2408420</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Chen et al.(2021)</label><mixed-citation>
      
Chen, D., Rojas, M., Samset, B. H., Cobb, K., Niang, A. D., Edwards, P., Emori, S., Faria, S., Hawkins, E., Hope, P., Huybrechts, P., Meinshausen, M.,  Mustafa, S., Plattner, G.-K., and Tréguier, A.-M.: Framing, Context, and Methods, in: Climate Change 2021: The Physical Science Basis. Working Group I Contribution to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change, edited by: Masson-Delmotte, V., Zhai, P., Pirani, A., Connors, S., Péan, C., Berger, S., Caud, N., Chen, Y., Goldfarb, L., Gomis, M., Huang, M., Leitzell, K., Lonnoy, E., Matthews, J., Maycock, T., Waterfield, T., Yelekçi, O., Yu, R., and Zhou, B., Cambridge University Press, Cambridge, UK, 147–286, <a href="https://doi.org/10.1017/9781009157896.003" target="_blank">https://doi.org/10.1017/9781009157896.003</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Ehlert(2022)</label><mixed-citation>
      
Ehlert, S.: TBlite, Version 0.2.1, GitHub [code], <a href="https://github.com/tblite/tblite" target="_blank"/> (last access: 29 January 2026), 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Elm(2019)</label><mixed-citation>
      
Elm, J.: An Atmospheric Cluster Database Consisting of Sulfuric Acid, Bases, Organics, and Water, ACS Omega, 4, 10965–10974, <a href="https://doi.org/10.1021/acsomega.9b00860" target="_blank">https://doi.org/10.1021/acsomega.9b00860</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Elm and Kubečka(2024)</label><mixed-citation>
      
Elm, J. and Kubečka, J.: Atmospheric Cluster Database (ACDB) (v2.0), Zenodo [data set], <a href="https://doi.org/10.5281/zenodo.11422835" target="_blank">https://doi.org/10.5281/zenodo.11422835</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Elm et al.(2020)</label><mixed-citation>
      
Elm, J., Kubečka, J., Besel, V., Jääskeläinen, M. J., Halonen, R., Kurtén, T., and Vehkamäki, H.: Modeling the formation and growth of atmospheric molecular clusters: A review, J. Aerosol Sci., 149, 105621,  <a href="https://doi.org/10.1016/j.jaerosci.2020.105621" target="_blank">https://doi.org/10.1016/j.jaerosci.2020.105621</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Gan et al.(2013)</label><mixed-citation>
      
Gan, W. Q., FitzGerald, J. M., Carlsten, C., Sadatsafavi, M., and Brauer, M.: Associations of Ambient Air Pollution with Chronic Obstructive Pulmonary  Disease Hospitalization and Mortality, Am. J. Resp. Crit. Care, 187,  721–727, <a href="https://doi.org/10.1164/rccm.201211-2004oc" target="_blank">https://doi.org/10.1164/rccm.201211-2004oc</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Grimme et al.(2010)</label><mixed-citation>
      
Grimme, S., Antony, J., Ehrlich, S., and Krieg, H.: A consistent and accurate  <i>ab initio</i> parametrization of density functional dispersion correction (DFT-D)  for the 94 elements H-Pu, J. Chem. Phys., 132, 154104,  <a href="https://doi.org/10.1063/1.3382344" target="_blank">https://doi.org/10.1063/1.3382344</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Grimme et al.(2011)</label><mixed-citation>
      
Grimme, S., Ehrlich, S., and Goerigk, L.: Effect of the damping function in  dispersion corrected density functional theory, J. Comput. Chem., 32,  1456–1465, <a href="https://doi.org/10.1002/jcc.21759" target="_blank">https://doi.org/10.1002/jcc.21759</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Grimme et al.(2017)</label><mixed-citation>
      
Grimme, S., Bannwarth, C., and Shushkov, P.: A Robust and Accurate  Tight-Binding Quantum Chemical Method for Structures, Vibrational  Frequencies, and Noncovalent Interactions of Large Molecular Systems  Parametrized for All spd-Block Elements (<i>Z</i> =  1–86), J. Chem. Theory Comput., 13, 1989–2009, <a href="https://doi.org/10.1021/acs.jctc.7b00118" target="_blank">https://doi.org/10.1021/acs.jctc.7b00118</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Halonen et al.(2019)</label><mixed-citation>
      
Halonen, R., Zapadinsky, E., Kurtén, T., Vehkamäki, H., and Reischl, B.: Rate enhancement in collisions of sulfuric acid molecules due to long-range intermolecular forces, Atmos. Chem. Phys., 19, 13355–13366, <a href="https://doi.org/10.5194/acp-19-13355-2019" target="_blank">https://doi.org/10.5194/acp-19-13355-2019</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Haywood and Boucher(2000)</label><mixed-citation>
      
Haywood, J. and Boucher, O.: Estimates of the direct and indirect radiative  forcing due to tropospheric aerosols: A review, Rev. Geophys., 38, 513–543,  <a href="https://doi.org/10.1029/1999RG000078" target="_blank">https://doi.org/10.1029/1999RG000078</a>, 2000.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Hjorth Larsen et al.(2017)</label><mixed-citation>
      
Hjorth Larsen, A., Jørgen Mortensen, J., Blomqvist, J., Castelli, I. E.,  Christensen, R., Dułak, M., Friis, J., Groves, M. N., Hammer, B., Hargus, C., Hermes, E. D., Jennings, P. C., Bjerre Jensen, P., Kermode, J., Kitchin,  J. R., Leonhard Kolsbjerg, E., Kubal, J., Kaasbjerg, K., Lysgaard, S., Bergmann Maronsson, J., Maxson, T., Olsen, T., Pastewka, L., Peterson, A., Rostgaard, C., Schiøtz, J., Schütt, O., Strange, M., Thygesen, K. S., Vegge, T., Vilhelmsen, L., Walter, M., Zeng, Z., and Jacobsen, K. W.: The atomic simulation environment – a Python library for working with atoms, J. Phys.-Condens. Mat., 29, 273002, <a href="https://doi.org/10.1088/1361-648X/aa680e" target="_blank">https://doi.org/10.1088/1361-648X/aa680e</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Jiang et al.(2022)</label><mixed-citation>
      
Jiang, S., Liu, Y.-R., Huang, T., Feng, Y.-J., Wang, C.-Y., Wang, Z.-Q., Ge,  B.-J., Liu, Q.-S., Guang, W.-R., and Huang, W.: Towards fully ab initio  simulation of atmospheric aerosol nucleation, Nat. Commun., 13, 6067,  <a href="https://doi.org/10.1038/s41467-022-33783-y" target="_blank">https://doi.org/10.1038/s41467-022-33783-y</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Jiang et al.(2023)</label><mixed-citation>
      
Jiang, S., Liu, Y.-R., Wang, C.-Y., and Huang, T.: Benchmarking general neural network potential ANI-2x on aerosol nucleation molecular clusters, Int. J. Quantum Chem., 123, e27087, <a href="https://doi.org/10.1002/qua.27087" target="_blank">https://doi.org/10.1002/qua.27087</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Jorgensen et al.(1996)</label><mixed-citation>
      
Jorgensen, W. L., Maxwell, D. S., and Tirado-Rives, J.: Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids, J. Am. Chem. Soc., 118, 11225–11236,  <a href="https://doi.org/10.1021/ja9621760" target="_blank">https://doi.org/10.1021/ja9621760</a>, 1996.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Knattrup et al.(2025)</label><mixed-citation>
      
Knattrup, Y., Neefjes, I., Kubečka, J., and Elm, J.: Growth of atmospheric freshly nucleated particles: a semi-empirical molecular dynamics study, Aerosol Research, 3, 237–251, <a href="https://doi.org/10.5194/ar-3-237-2025" target="_blank">https://doi.org/10.5194/ar-3-237-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Kubečka et al.(2019)</label><mixed-citation>
      
Kubečka, J., Besel, V., Kurtén, T., Myllys, N., and Vehkamäki, H.:  Configurational Sampling of Noncovalent (Atmospheric) Molecular Clusters:  Sulfuric Acid and Guanidine, J. Phys. Chem. A, 123, 6022–6033,  <a href="https://doi.org/10.1021/acs.jpca.9b03853" target="_blank">https://doi.org/10.1021/acs.jpca.9b03853</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Kubečka et al.(2024)</label><mixed-citation>
      
Kubečka, J., Ayoubi, D., Tang, Z., Knattrup, Y., Engsvang, M., Wu, H., and  Elm, J.: Accurate modeling of the potential energy surface of atmospheric  molecular clusters boosted by neural networks, Environ. Sci.: Adv., 3,  1438–1451, <a href="https://doi.org/10.1039/D4VA00255E" target="_blank">https://doi.org/10.1039/D4VA00255E</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Kubečka et al.(2025)</label><mixed-citation>
      
Kubečka, J., Knattrup, Y., Trolle, G. B., Reischl, B., Lykke-Møller, A. S.,  Elm, J., and Neefjes, I.: Thermodynamics of molecular binding and clustering  in the atmosphere revealed through conventional and ML-enhanced umbrella  sampling, ChemRxiv [preprint], <a href="https://doi.org/10.26434/chemrxiv-2025-b5xxr" target="_blank">https://doi.org/10.26434/chemrxiv-2025-b5xxr</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Kulmala et al.(2013)</label><mixed-citation>
      
Kulmala, M., Kontkanen, J., Junninen, H., Lehtipalo, K., Manninen, H. E.,  Nieminen, T., Petäjä, T., Sipilä, M., Schobesberger, S., Rantala, P.,  Franchin, A., Jokinen, T., Järvinen, E., Äijälä, M., Kangasluoma, J.,  Hakala, J., Aalto, P. P., Paasonen, P., Mikkilä, J., Vanhanen, J., Aalto,  J., Hakola, H., Makkonen, U., Ruuskanen, T., Mauldin, R. L., Duplissy, J.,  Vehkamäki, H., Bäck, J., Kortelainen, A., Riipinen, I., Kurtén, T.,  Johnston, M. V., Smith, J. N., Ehn, M., Mentel, T. F., Lehtinen, K. E. J.,  Laaksonen, A., Kerminen, V.-M., and Worsnop, D. R.: Direct Observations of  Atmospheric Aerosol Nucleation, Science, 339, 943–946,  <a href="https://doi.org/10.1126/science.1227385" target="_blank">https://doi.org/10.1126/science.1227385</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Liu and Jiang(2025)</label><mixed-citation>
      
Liu, Y.-R. and Jiang, Y.: Predicting Composition Evolution for a Sulfuric  Acid-Dimethylamine System from Monomer to Nanoparticle Using Machine  Learning, J. Phys. Chem. A, 129, 222–231, <a href="https://doi.org/10.1021/acs.jpca.4c06062" target="_blank">https://doi.org/10.1021/acs.jpca.4c06062</a>,  2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>McGrath et al.(2012)</label><mixed-citation>
      
McGrath, M. J., Olenius, T., Ortega, I. K., Loukonen, V., Paasonen, P., Kurtén, T., Kulmala, M., and Vehkamäki, H.: Atmospheric Cluster Dynamics Code: a flexible method for solution of the birth-death equations, Atmos. Chem. Phys., 12, 2345–2355, <a href="https://doi.org/10.5194/acp-12-2345-2012" target="_blank">https://doi.org/10.5194/acp-12-2345-2012</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Müller et al.(2023)</label><mixed-citation>
      
Müller, M., Hansen, A., and Grimme, S.: <i>ω</i>B97X-3c: A composite  range-separated hybrid DFT method with a molecule-optimized polarized valence  double-<i>ζ</i> basis set, J. Chem. Phys., 158, 014103,  <a href="https://doi.org/10.1063/5.0133026" target="_blank">https://doi.org/10.1063/5.0133026</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Neefjes et al.(2022)</label><mixed-citation>
      
Neefjes, I., Halonen, R., Vehkamäki, H., and Reischl, B.: Modeling approaches for atmospheric ion–dipole collisions: all-atom trajectory simulations and central field methods, Atmos. Chem. Phys., 22, 11155–11172, <a href="https://doi.org/10.5194/acp-22-11155-2022" target="_blank">https://doi.org/10.5194/acp-22-11155-2022</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Neefjes et al.(2025)</label><mixed-citation>
      
Neefjes, I., Reischl, B., and Yang, H.: Comparison of collision rate  coefficient model predictions for different interaction strengths and  temperatures, J. Aerosol Sci., 189, 106638,  <a href="https://doi.org/10.1016/j.jaerosci.2025.106638" target="_blank">https://doi.org/10.1016/j.jaerosci.2025.106638</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Neefjes et al.(2026)</label><mixed-citation>
      
Neefjes, I., Knattrup, Y., Wu, H., Trolle, G. B., Elm, J., and Kubečka, J.: Thermodynamic benchmarking of hydrated atmospheric clusters in early particle formation, Aerosol Research, 4, 1–22, <a href="https://doi.org/10.5194/ar-4-1-2026" target="_blank">https://doi.org/10.5194/ar-4-1-2026</a>, 2026.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Neese(2012)</label><mixed-citation>
      
Neese, F.: The ORCA program system, WIRES Comput. Molec. Sci., 2, 73–78,  <a href="https://doi.org/10.1002/wcms.81" target="_blank">https://doi.org/10.1002/wcms.81</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Plimpton(1995)</label><mixed-citation>
      
Plimpton, S.: Fast Parallel Algorithms for Short-Range Molecular Dynamics,  J. Comput. Phys., 117, 1–19, <a href="https://doi.org/10.1006/jcph.1995.1039" target="_blank">https://doi.org/10.1006/jcph.1995.1039</a>, 1995.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Ramakrishnan et al.(2015) </label><mixed-citation>
      
Ramakrishnan, R., Dral, P. O., Rupp, M., and von Lilienfeld, O. A.: Big Data  Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach, J. Chem. Theory Comput., 11, 2087–2096, <a href="https://doi.org/10.1021/acs.jctc.5b00099" target="_blank">https://doi.org/10.1021/acs.jctc.5b00099</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Schütt et al.(2021)</label><mixed-citation>
      
Schütt, K. T., Unke, O. T., and Gastegger, M.: Equivariant message  passing for the prediction of tensorial properties and molecular spectra,  arXiv [preprint], <a href="https://doi.org/10.48550/arXiv.2102.03150" target="_blank">https://doi.org/10.48550/arXiv.2102.03150</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Schütt et al.(2024)</label><mixed-citation>
      
Schütt, K. T., Hessmann, S. S. P., Gebauer, N. W. A., Lederer, J., and  Gastegger, M.: SchNetPack 2.0: A neural network toolbox for atomistic machine  learning, Version 2.1.1, GitHub [code], <a href="https://github.com/atomistic-machine-learning/schnetpack" target="_blank"/>  (last access: 29 January 2026), 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Sipilä et al.(2010)</label><mixed-citation>
      
Sipilä, M., Berndt, T., Petäjä, T., Brus, D., Vanhanen, J., Stratmann,  F., Patokoski, J., Mauldin III, R. L., Hyvärinen, A.-P., Lihavainen, H., and Kulmala, M.: The Role of Sulfuric Acid in Atmospheric Nucleation, Science, 327, 1243–1246, <a href="https://doi.org/10.1126/science.1180315" target="_blank">https://doi.org/10.1126/science.1180315</a>, 2010.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Stroet and Deplazes(2016)</label><mixed-citation>
      
Stroet, M. and Deplazes, E.: Umbrella Integration: Initial Version, Version v0.1, Zenodo [code], <a href="https://doi.org/10.5281/zenodo.164996" target="_blank">https://doi.org/10.5281/zenodo.164996</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Stuke et al.(2021)</label><mixed-citation>
      
Stuke, A., Rinke, P., and Todorović, M.: Efficient hyperparameter tuning  for kernel ridge regression with Bayesian optimization, Mach. Learn.: Sci. Technol., 2, 035022, <a href="https://doi.org/10.1088/2632-2153/abee59" target="_blank">https://doi.org/10.1088/2632-2153/abee59</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Thompson et al.(2022)</label><mixed-citation>
      
Thompson, A. P., Aktulga, H. M., Berger, R., Bolintineanu, D. S., Brown, W. M., Crozier, P. S., in 't Veld, P. J., Kohlmeyer, A., Moore, S. G., Nguyen, T. D., Shan, R., Stevens, M. J., Tranchida, J., Trott, C., and Plimpton, S. J.: LAMMPS – a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales, Comput. Phys. Commun., 271, 108171, <a href="https://doi.org/10.1016/j.cpc.2021.108171" target="_blank">https://doi.org/10.1016/j.cpc.2021.108171</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Tikkanen et al.(2025)</label><mixed-citation>
      
Tikkanen, V., Yang, H., Vehkamäki, H., and Reischl, B.: Gas-phase collision rate enhancement factors for acid–base clusters up to 2 nm in diameter from atomistic simulation and the interacting hard-sphere model, Atmos. Chem. Phys., 25, 17237–17251, <a href="https://doi.org/10.5194/acp-25-17237-2025" target="_blank">https://doi.org/10.5194/acp-25-17237-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Torrie and Valleau(1977)</label><mixed-citation>
      
Torrie, G. and Valleau, J.: Nonphysical sampling distributions in Monte Carlo  free-energy estimation: Umbrella sampling, J. Comput. Phys., 23, 187–199,  <a href="https://doi.org/10.1016/0021-9991(77)90121-8" target="_blank">https://doi.org/10.1016/0021-9991(77)90121-8</a>, 1977.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Tröstl et al.(2016)</label><mixed-citation>
      
Tröstl, J., Chuang, W. K., Gordon, H., et al.: The Role of  Low-Volatility Organic Compounds in Initial Particle Growth in the Atmosphere, Nature, 533, 527–531, <a href="https://doi.org/10.1038/nature18271" target="_blank">https://doi.org/10.1038/nature18271</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Yang et al.(2023)</label><mixed-citation>
      
Yang, H., Neefjes, I., Tikkanen, V., Kubečka, J., Kurtén, T., Vehkamäki, H., and Reischl, B.: Collision-sticking rates of acid–base clusters in the gas phase determined from atomistic simulation and a novel analytical interacting hard-sphere model, Atmos. Chem. Phys., 23, 5993–6009, <a href="https://doi.org/10.5194/acp-23-5993-2023" target="_blank">https://doi.org/10.5194/acp-23-5993-2023</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Zhang et al.(2012)</label><mixed-citation>
      
Zhang, R., Khalizov, A., Wang, L., Hu, M., and Xu, W.: Nucleation and Growth  of Nanoparticles in the Atmosphere, Chem. Rev., 112, 957–2011,  <a href="https://doi.org/10.1021/cr2001756" target="_blank">https://doi.org/10.1021/cr2001756</a>, 2012.

    </mixed-citation></ref-html>--></article>
