01 Sep 2020

01 Sep 2020

Review status: a revised version of this preprint was accepted for the journal ACP and is expected to appear here in due course.

Eight years of sub-micrometre organic aerosol composition data from the boreal forest characterized using a machine-learning approach

Liine Heikkinen1, Mikko Äijälä1, Kaspar R. Daellenbach1, Gang Chen2, Olga Garmash1, Diego Aliaga1, Frans Graeffe1, Meri Räty1, Krista Luoma1, Pasi Aalto1, Markku Kulmala1, Tuukka Petäjä1, Douglas Worsnop1,3, and Mikael Ehn1 Liine Heikkinen et al.
  • 1Institute for Atmospheric and Earth System Research /Physics, Faculty of Science, University of Helsinki, Helsinki, 00014, Finland
  • 2Laboratory of Atmospheric Chemistry, Paul Scherrer Institute, Villigen, Switzerland
  • 3Aerodyne Research Inc., Billerica, MA, USA

Abstract. The Station for Measuring Ecosystem Atmosphere Relations (SMEAR) II is a unique station in the world due to the wide range of long-term measurements tracking the Earth-atmosphere interface. In this study, we characterize the composition of organic aerosol (OA) at SMEAR II by quantifying its driving constituents. We utilize a multi-year data set of OA mass spectra measured in situ with an Aerosol Chemical Speciation Monitor (ACSM) at the station. To our knowledge, this mass spectral time series is the longest of its kind published to date, and its detailed analysis required development of a new methodology. To this purpose, we developed an efficient and robust data analysis framework utilizing machine learning tools. These included unsupervised feature extraction and classification stages to manage and process the large amounts of data. The extensive chemometric analysis was conducted with a combination of Positive Matrix Factorization (PMF), rolling window analysis, bootstrapping, K-Means clustering, data weighting and diagnostics based algorithmic choice-making, among others. This combination of statistical tools provided a data driven analysis methodology to achieve robust solutions with minimal subjectivity.

Following the extensive statistical analyses, we were able to divide the 2012–2019 SMEAR II OA data (mass concentration interquartile range (IQR): 0.7, 1.3, 2.6 µg m−3) to three sub-categories: low-volatility oxygenated OA (LV-OOA), semi-volatile oxygenated OA (SV-OOA), and primary OA (POA). LV-OOA was the most dominant OA type (organic mass fraction IQR: 49, 62, and 73 %). The seasonal cycle of LV-OOA was bimodal, with peaks both in summer and in February. We associated the wintertime LV-OOA with anthropogenic sources and assumed biogenic influence in LV-OOA formation in summer. Through a brief trajectory analysis, we estimated summertime natural LV-OOA formation of tens of ng m−3 h−1 over the boreal forest. SV-OOA was the second highest contributor to OA mass (organic mass fraction IQR: 19, 31, and 43 %). Due to SV-OOA’s clear peak in summer, we estimate biogenic processes as the main drivers in its formation. Unlike for LV-OOA, the highest SV-OOA concentrations were detected in stable summertime nocturnal surface layers. However, also the nearby sawmills likely played a significant role in SV-OOA production as also exemplified by previous studies at SMEAR II. POA, taken as a mix of two different OA types reported previously, hydrocarbon-like OA (HOA) and biomass burning OA (BBOA), made up a minimal OA mass fraction (IQR: 2, 6, and 13 %). Both POA organic mass fraction and mass concentration peaked in winter. Its appearance at SMEAR II was linked to strong southerly winds. The high wind speeds probably enabled the POA transport to SMEAR II from faraway sources in a relatively fresh state. In case of slower wind speeds, POA likely evaporated or aged into oxidized organic aerosol before detection. The POA organic mass fraction was significantly lower than reported by aerosol mass spectrometer (AMS) measurements two to four years prior to the ACSM measurements. While the co-located long-term measurements of black carbon supported the hypothesis of higher POA loadings prior to year 2012, it is also possible that ACSM was less efficiently capturing short term (POA) pollution plumes. Despite the length of the ACSM data set, we did not focus on quantifying long-term trends of POA (nor other components) due to the high sensitivity of OA composition to meteorological anomalies, the occurrence of which is likely not normally distributed over the eight year measurement period.

We hope that our successfully applied methodology encourages also other researchers possessing several-year-long time series of similar data to tackle the data analysis via similar semi- or unsupervised machine learning approaches. This way aerosol chemometric analysis procedures would be further developed into yet more streamlined and autonomous directions.

Liine Heikkinen et al.

Status: closed
Status: closed
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
Printer-friendly Version - Printer-friendly version Supplement - Supplement
Status: closed
Status: closed
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
Printer-friendly Version - Printer-friendly version Supplement - Supplement

Liine Heikkinen et al.

Liine Heikkinen et al.


Total article views: 480 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
322 155 3 480 34 10 8
  • HTML: 322
  • PDF: 155
  • XML: 3
  • Total: 480
  • Supplement: 34
  • BibTeX: 10
  • EndNote: 8
Views and downloads (calculated since 01 Sep 2020)
Cumulative views and downloads (calculated since 01 Sep 2020)

Viewed (geographical distribution)

Total article views: 699 (including HTML, PDF, and XML) Thereof 694 with geography defined and 5 with unknown origin.
Country # Views %
  • 1
Latest update: 14 Jun 2021
Short summary
In many locations worldwide aerosol particles have shown to be made up of organic aerosol (OA). The boreal forest is a region, where aerosol particles possess a high OA mass fraction. Here, we studied OA omposition using the longest time series of OA composition ever obtained from a boreal environment. For this purpose, we developed a new analysis framework, and discovered that most of the OA was highly oxidized with strong seasonal behaviour reflecting different sources in summer and winter.