26 Jan 2021

26 Jan 2021

Review status: this preprint is currently under review for the journal ACP.

Predicting Gas-Particle Partitioning Coefficients of Atmospheric Molecules with Machine Learning

Emma Lumiaro1, Milica Todorović1, Theo Kurten2, Hanna Vehkamäki3, and Patrick Rinke1 Emma Lumiaro et al.
  • 1Department of Applied Physics, Aalto University, P.O. Box 11100, 00076 Aalto, Espoo, Finland
  • 2Department of Chemistry, Faculty of Science, PO Box 55, FI-00014 University of Helsinki, Finland
  • 3Institute for Atmospheric and Earth System Research/Physics, Faculty of Science, PO Box 64, FI-00014 University of Helsinki, Finland

Abstract. The formation, properties and lifetime of secondary organic aerosols in the atmosphere are largely determined by gas-particle partitioning coefficients of the participating organic vapours. Since these coefficients are often difficult to measure and to compute, we developed a machine learning model to predict them given molecular structure as input. Our data-driven approach is based on the dataset by Wang et al. (Atmos. Chem. Phys., 17, 7529 (2017)), who computed the partitioning coefficients and saturation vapour pressures of 3414 atmospheric oxidation products from the master chemical mechanism using the COSMOtherm program. We trained a kernel ridge regression (KRR) machine learning model on the saturation vapour pressure (Psat), and on two equilibrium partitioning coefficients: between a water-insoluble organic matter phase and the gas phase (KWIOM/G), and between an infinitely dilute solution with pure water and the gas phase (KW/G). For the input representation of the atomic structure of each organic molecule to the machine, we tested different descriptors. We find that the many-body tensor representation (MBTR) works best for our application, but the topological fingerprint (TopFP) approach is almost as good, and is significantly more cost effective. Our best machine learning model (KRR with a Gaussian kernel + MBTR) predicts Psat and KWIOM/G to within 0.3 logarithmic units and KW/G to within 0.4 logarithmic units of the original COSMOtherm calculations. This is equal or better than the typical accuracy of COSMOtherm predictions compared to experimental data (where available). We then applied our machine learning model to a dataset of 35,383 molecules that we generated based on a carbon 10 backbone functionalized with 0 to 6 carboxyl, carbonyl or hydroxyl groups to evaluate its performance for polyfunctional compounds with potentially low Psat. The resulting saturation vapor pressure and partitioning coefficient distributions were physico-chemically reasonable, and the volatility predictions for the most highly oxidized compounds were in qualitative agreement with experimentally inferred volatilities of atmospheric oxidation products with similar elemental composition.

Emma Lumiaro et al.

Status: open (until 23 Mar 2021)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on acp-2020-1258', Frank Wania, 13 Feb 2021 reply
  • RC2: 'Comment on acp-2020-1258', Anonymous Referee #2, 16 Feb 2021 reply

Emma Lumiaro et al.

Data sets

Atmospheric C10 dataset E. Lumiaro, M. Todorovic, T. Kurten, H. Vehkamaki, and P. Rinke

Model code and software

KRR for Atmospheric Molecules E. Lumiaro, M. Todorovic, T. Kurten, H. Vehkamaki, and P. Rinke

Emma Lumiaro et al.


Total article views: 205 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
160 39 6 205 3 1
  • HTML: 160
  • PDF: 39
  • XML: 6
  • Total: 205
  • BibTeX: 3
  • EndNote: 1
Views and downloads (calculated since 26 Jan 2021)
Cumulative views and downloads (calculated since 26 Jan 2021)

Viewed (geographical distribution)

Total article views: 223 (including HTML, PDF, and XML) Thereof 222 with geography defined and 1 with unknown origin.
Country # Views %
  • 1
Latest update: 24 Feb 2021
Short summary
Study of climate change relies on climate models, which require an understanding of aerosol formation. In this study, we train a machine learning model to predict the partitioning coefficients of atmospheric molecules, which govern the condensation into aerosols. The model can make instant predictions based on molecular structures with accuracy surpassing that of standard computational methods. This will allow the screening of low volatility molecules that contribute most to aerosol formation.