Eight years of sub-micrometre organic aerosol composition data from the boreal forest characterized using a machine-learning approach
- 1Institute for Atmospheric and Earth System Research /Physics, Faculty of Science, University of Helsinki, Helsinki, 00014, Finland
- 2Laboratory of Atmospheric Chemistry, Paul Scherrer Institute, Villigen, Switzerland
- 3Aerodyne Research Inc., Billerica, MA, USA
Abstract. The Station for Measuring Ecosystem Atmosphere Relations (SMEAR) II is a unique station in the world due to the wide range of long-term measurements tracking the Earth-atmosphere interface. In this study, we characterize the composition of organic aerosol (OA) at SMEAR II by quantifying its driving constituents. We utilize a multi-year data set of OA mass spectra measured in situ with an Aerosol Chemical Speciation Monitor (ACSM) at the station. To our knowledge, this mass spectral time series is the longest of its kind published to date, and its detailed analysis required development of a new methodology. To this purpose, we developed an efficient and robust data analysis framework utilizing machine learning tools. These included unsupervised feature extraction and classification stages to manage and process the large amounts of data. The extensive chemometric analysis was conducted with a combination of Positive Matrix Factorization (PMF), rolling window analysis, bootstrapping, K-Means clustering, data weighting and diagnostics based algorithmic choice-making, among others. This combination of statistical tools provided a data driven analysis methodology to achieve robust solutions with minimal subjectivity.
Following the extensive statistical analyses, we were able to divide the 2012–2019 SMEAR II OA data (mass concentration interquartile range (IQR): 0.7, 1.3, 2.6 µg m−3) to three sub-categories: low-volatility oxygenated OA (LV-OOA), semi-volatile oxygenated OA (SV-OOA), and primary OA (POA). LV-OOA was the most dominant OA type (organic mass fraction IQR: 49, 62, and 73 %). The seasonal cycle of LV-OOA was bimodal, with peaks both in summer and in February. We associated the wintertime LV-OOA with anthropogenic sources and assumed biogenic influence in LV-OOA formation in summer. Through a brief trajectory analysis, we estimated summertime natural LV-OOA formation of tens of ng m−3 h−1 over the boreal forest. SV-OOA was the second highest contributor to OA mass (organic mass fraction IQR: 19, 31, and 43 %). Due to SV-OOA’s clear peak in summer, we estimate biogenic processes as the main drivers in its formation. Unlike for LV-OOA, the highest SV-OOA concentrations were detected in stable summertime nocturnal surface layers. However, also the nearby sawmills likely played a significant role in SV-OOA production as also exemplified by previous studies at SMEAR II. POA, taken as a mix of two different OA types reported previously, hydrocarbon-like OA (HOA) and biomass burning OA (BBOA), made up a minimal OA mass fraction (IQR: 2, 6, and 13 %). Both POA organic mass fraction and mass concentration peaked in winter. Its appearance at SMEAR II was linked to strong southerly winds. The high wind speeds probably enabled the POA transport to SMEAR II from faraway sources in a relatively fresh state. In case of slower wind speeds, POA likely evaporated or aged into oxidized organic aerosol before detection. The POA organic mass fraction was significantly lower than reported by aerosol mass spectrometer (AMS) measurements two to four years prior to the ACSM measurements. While the co-located long-term measurements of black carbon supported the hypothesis of higher POA loadings prior to year 2012, it is also possible that ACSM was less efficiently capturing short term (POA) pollution plumes. Despite the length of the ACSM data set, we did not focus on quantifying long-term trends of POA (nor other components) due to the high sensitivity of OA composition to meteorological anomalies, the occurrence of which is likely not normally distributed over the eight year measurement period.
We hope that our successfully applied methodology encourages also other researchers possessing several-year-long time series of similar data to tackle the data analysis via similar semi- or unsupervised machine learning approaches. This way aerosol chemometric analysis procedures would be further developed into yet more streamlined and autonomous directions.
Liine Heikkinen et al.
Liine Heikkinen et al.
Liine Heikkinen et al.
Viewed (geographical distribution)