Articles | Volume 21, issue 13
Atmos. Chem. Phys., 21, 10081–10109, 2021

Special issue: Pan-Eurasian Experiment (PEEX) – Part II

Atmos. Chem. Phys., 21, 10081–10109, 2021

Research article 06 Jul 2021

Research article | 06 Jul 2021

Eight years of sub-micrometre organic aerosol composition data from the boreal forest characterized using a machine-learning approach

Eight years of sub-micrometre organic aerosol composition data from the boreal forest characterized using a machine-learning approach
Liine Heikkinen1,a, Mikko Äijälä1, Kaspar R. Daellenbach1,b, Gang Chen2, Olga Garmash1, Diego Aliaga1, Frans Graeffe1, Meri Räty1, Krista Luoma1, Pasi Aalto1, Markku Kulmala1, Tuukka Petäjä1, Douglas Worsnop1,3, and Mikael Ehn1 Liine Heikkinen et al.
  • 1Institute for Atmospheric and Earth System Research/Physics, Faculty of Science, University of Helsinki, Helsinki, 00014, Finland
  • 2Laboratory of Atmospheric Chemistry, Paul Scherrer Institute, Villigen, Switzerland
  • 3Aerodyne Research Inc., Billerica, MA, USA
  • anow at: Department of Environmental Science and Bolin Centre for Climate Research, Stockholm University, Stockholm, Sweden
  • bnow at: Laboratory of Atmospheric Chemistry, Paul Scherrer Institute, Villigen, Switzerland

Correspondence: Liine Heikkinen ( and Mikael Ehn (


The Station for Measuring Ecosystem–Atmosphere Relations (SMEAR) II, located within the boreal forest of Finland, is a unique station in the world due to the wide range of long-term measurements tracking the Earth–atmosphere interface. In this study, we characterize the composition of organic aerosol (OA) at SMEAR II by quantifying its driving constituents. We utilize a multi-year data set of OA mass spectra measured in situ with an Aerosol Chemical Speciation Monitor (ACSM) at the station. To our knowledge, this mass spectral time series is the longest of its kind published to date. Similarly to other previously reported efforts in OA source apportionment from multi-seasonal or multi-annual data sets, we approached the OA characterization challenge through positive matrix factorization (PMF) using a rolling window approach. However, the existing methods for extracting minor OA components were found to be insufficient for our rather remote site. To overcome this issue, we tested a new statistical analysis framework. This included unsupervised feature extraction and classification stages to explore a large number of unconstrained PMF runs conducted on the measured OA mass spectra. Anchored by these results, we finally constructed a relaxed chemical mass balance (CMB) run that resolved different OA components from our observations. The presented combination of statistical tools provided a data-driven analysis methodology, which in our case achieved robust solutions with minimal subjectivity.

Following the extensive statistical analyses, we were able to divide the 2012–2019 SMEAR II OA data (mass concentration interquartile range (IQR): 0.7, 1.3, and 2.6 µg m−3) into three sub-categories – low-volatility oxygenated OA (LV-OOA), semi-volatile oxygenated OA (SV-OOA), and primary OA (POA) – proving that the tested methodology was able to provide results consistent with literature. LV-OOA was the most dominant OA type (organic mass fraction IQR: 49 %, 62 %, and 73 %). The seasonal cycle of LV-OOA was bimodal, with peaks both in summer and in February. We associated the wintertime LV-OOA with anthropogenic sources and assumed biogenic influence in LV-OOA formation in summer. Through a brief trajectory analysis, we estimated summertime natural LV-OOA formation of tens of ng m−3 h−1 over the boreal forest. SV-OOA was the second highest contributor to OA mass (organic mass fraction IQR: 19 %, 31 %, and 43 %). Due to SV-OOA's clear peak in summer, we estimate biogenic processes as the main drivers in its formation. Unlike for LV-OOA, the highest SV-OOA concentrations were detected in stable summertime nocturnal surface layers. Two nearby sawmills also played a significant role in SV-OOA production as also exemplified by previous studies at SMEAR II. POA, taken as a mix of two different OA types reported previously, hydrocarbon-like OA (HOA) and biomass burning OA (BBOA), made up a minimal OA mass fraction (IQR: 2 %, 6 %, and 13 %). Notably, the quantification of POA at SMEAR II using ACSM data was not possible following existing rolling PMF methodologies. Both POA organic mass fraction and mass concentration peaked in winter. Its appearance at SMEAR II was linked to strong southerly winds. Similar wind direction and speed dependence was not observed among other OA types. The high wind speeds probably enabled the POA transport to SMEAR II from faraway sources in a relatively fresh state. In the event of slower wind speeds, POA likely evaporated and/or aged into oxidized organic aerosol before detection. The POA organic mass fraction was significantly lower than reported by aerosol mass spectrometer (AMS) measurements 2 to 4 years prior to the ACSM measurements. While the co-located long-term measurements of black carbon supported the hypothesis of higher POA loadings prior to year 2012, it is also possible that short-term (POA) pollution plumes were averaged out due to the slow time resolution of the ACSM combined with the further 3 h data averaging needed to ensure good signal-to-noise ratios (SNRs). Despite the length of the ACSM data set, we did not focus on quantifying long-term trends of POA (nor other components) due to the high sensitivity of OA composition to meteorological anomalies, the occurrence of which is likely not normally distributed over the 8-year measurement period.

Due to the unique and realistic seasonal cycles and meteorology dependences of the independent OA subtypes complemented by the reasonably low degree of unexplained OA variability, we believe that the presented data analysis approach performs well. Therefore, we hope that these results encourage also other researchers possessing several-year-long time series of similar data to tackle the data analysis via similar semi- or unsupervised machine-learning approaches. This way the presented method could be further optimized and its usability explored and evaluated also in other environments.

1 Introduction

Despite the small sizes of atmospheric aerosol particles, they play an important role in the climate system. They interfere with solar radiation via direct absorption and scattering (direct aerosol radiative effect) and participate in cloud formation and processing, thereby influencing the interactions between clouds and radiation (indirect aerosol radiative effect). In addition to the size of aerosol particles, their chemical composition plays an important role determining their direct or indirect radiative effects via composition-linked parameters such as aerosol hygroscopicity (water affinity), volatility, and reflectivity.

The number concentrations of aerosol particles in the atmosphere range from a few particles per cubic centimetre to even millions, so they cannot be considered individually but are typically divided into populations, groups, or classes based on for example some above-mentioned characteristics. Thus, the classification of aerosol particles is a necessary and critical task preceding their further understanding. Real aerosol populations are spatially mixed, overlapping, and smeared in the atmosphere, and their physical and chemical characteristics are for the most part not discretely distributed but continuous. Therefore, practically all classifications of atmospheric aerosol are simplifications due to their complex interactions and change processes in the atmosphere, and any divisions between classes are to some extent arbitrary and debatable selections. Nevertheless, various statistical methods can be used to perform objective, well-founded aerosol classifications and construct aerosol models which strike a good balance between mathematical robustness, complexity (or simplicity), and usability for various purposes. In the following, some common classifications are discussed.

Organic aerosol (OA) is a major sub-micrometre aerosol constituent (Zhang et al., 2007). OA can be emitted directly as primary OA (POA) or it can form in the atmosphere via condensation or uptake of oxidized organic vapours. The latter OA fraction is termed as secondary organic aerosol (SOA). Various combustion processes are the main sources of POA. These combustion processes include for example diesel combustion in car engines, which emits hydrocarbon-like OA (HOA), or biomass burning in forms of residential heating or wild/agricultural fires, both of which emit biomass burning OA (BBOA). The number of SOA precursors in the ambient air is immense, making the linking of ambient SOA observations to SOA precursors and detailed formation processes extremely challenging.

The utilization of positive matrix factorization (PMF, Sect. 4.1) on OA mass spectra recorded by aerosol mass spectrometers (AMS; Aerodyne Research Inc., MA, USA; Canagaratna et al., 2007) has linked SOA to two oxygenated organic aerosol (OOA) groups characterized by volatility: semi-volatile oxygenated OA, i.e. SV-OOA, and low-volatility oxygenated OA, i.e. LV-OOA. These groups are alternatively also named by their degree of oxygenation: less-oxygenated OA, i.e. LO-OOA, and more-oxygenated OA, i.e. MO-OOA. In reality, atmospheric oxidation of aerosols is a continuum process, and therefore such a division is mathematical, not clear cut, and to some extent arbitrary. Due to the prominent link between OA degree of oxygenation and volatility, the SV-OOA and LO-OOA and the LV-OOA and MO-OOA usually describe the same OA fractions, respectively (Jimenez et al., 2009; Ng et al., 2011a). LV-OOA is typically identified by an AMS OA mass spectrum dominated by a CO2+ (at m/Q 44 Th in LV-OOA mass spectrum) OA fragment (Jimenez et al., 2009; Ng et al., 2010). SV-OOA in turn typically has lower CO2+ mass fraction but a high C2H3O+ (at m/Q 43 Th in the SV-OOA mass spectrum) fragment (Jimenez et al., 2009; Ng et al., 2010). The CO2+ fragment has been linked to various organic acids (Duplissy et al., 2011), whereas the C2H3O+ has been thought of as a marker of non-acid oxygenates (Ng et al., 2011a). Importantly, a large amount of evidence suggests that photochemical ageing of OA leads to an increasingly significant contribution of CO2+ in the OA mass spectrum (Alfarra, 2004; De Gouw et al., 2005; Aiken et al., 2008; Kleinman et al., 2008; Jimenez et al., 2009; Ng et al., 2010, 2011a). This indicates OA transformation to more oxygenated forms upon atmospheric ageing, which ultimately yields OA of low volatility. Such OA processing (ageing scheme) has been shown to apply for several SOA and POA types.

While the direct POA emissions can nowadays often be quite well distinguished from SOA, perhaps due to the limitations in chemical information provided by AMS-type instruments and/or the overall similarity of SOA mass spectra regardless of the source, ambient SOA source apportionment is rarely successfully conducted. Source apportionment is also generally difficult due to complexity of atmospheric aerosol chemistry, meteorological and atmospheric transport processes, and inherent methodological (both experimental and data analytical) limitations. However, SOA formation from various precursors has been a topic of numerous laboratory studies giving insights into the most dominant ambient SOA formation pathways. Biogenic volatile organic compounds (BVOCs) have been shown to have a high SOA formation potential upon oxidation (Hallquist et al., 2009). Although the number of different organic species in the atmosphere is enormous (104–105) (Goldstein and Galbally, 2007), isoprene and monoterpenes clearly distinguish themselves as the most emitted biogenic VOC (Guenther et al., 2012). While isoprene-derived SOA formation is hampered by the relatively high volatility distribution of isoprene oxidation products (Hallquist et al., 2009; Surratt et al., 2010; Shrivastava et al., 2017), monoterpenes stand out as one of the major biogenic SOA precursors, due to the production of readily condensable vapours upon oxidation (Donahue et al., 2011; Ehn et al., 2014). The boreal biome, which represents ∼15 % of the Earth's terrestrial area, making up ∼30 % of the world's forests (Prăvălie, 2018), serves an example of a region with relatively high monoterpene emissions (Guenther et al., 2012; Rinne et al., 2009). Measurements from the boreal forests also provide evidence of high content of naturally produced biogenic SOA (Tunved et al., 2006; Yttri et al., 2011).

The current study is targeted on the analysis of OA composition at the well-established Station for Measuring Ecosystem–Atmosphere Relations (SMEAR II; Sect. 2.1) located in the monoterpene-rich boreal forest of Finland. What makes this station unique is the large amount of long-term measurements conducted at the site. We recently reported the long-term phenomenology of sub-micrometre aerosol chemical composition seasonality at the site (Heikkinen et al., 2020). We reported a high OA mass fraction of the sub-micrometre particulate matter, ranging between 50 % and 80 %. The current work specifically focuses on this sub-micrometre particulate matter mass fraction with a goal to gain understanding of OA composition and specifically its seasonal variability at SMEAR II, which has never been reported for the site before. The data analysis includes PMF on the OA mass spectra recorded by an Aerosol Chemical Speciation Monitor (ACSM, Sect. 2.2), but due to the near-decade-long mass spectral input from a rather remote measurement site, handling the data retrieved via PMF analyses required also the utilization of new analysis tools. Inspired by our previous work regarding statistical analyses of OA mass spectra (Äijälä et al., 2017, 2019), we tackled the analysis problem by combining and applying various advanced statistical methods and machine-learning tools. After the extensive analyses, we not only report OA composition variability at SMEAR II, but equally highlight the development of the new framework for long-term OA mass spectral analysis.

2 Measurements

This section contains a brief description of the boreal SMEAR II measurement site and the ACSM measurements conducted. For a more comprehensive measurement and station meteorology descriptions, we direct the reader to Heikkinen et al. (2020).

2.1 Station for Measuring Ecosystem–Atmosphere Relations (SMEAR II)

The measurements were conducted at the SMEAR II station described in detail previously (Hari and Kulmala, 2005; Williams et al., 2011; Heikkinen et al., 2020). SMEAR II is well known due to the broad variety of measurements taking place at the station, tracking more than 1000 different environmental parameters within the Earth–atmosphere interface (Hari and Kulmala, 2005). The station is located in Southern Finland (6151 N, 2417 E; 181 m above sea level) in a ca. 60-year-old Scots pine (Pinus sylvestris) dominated forest. The station, recognized as a rural site, has low anthropogenic emissions, apart from two nearby sawmills situated 6–7 km to the southeast from SMEAR II. In the event of south-easterly winds, both monoterpene and OA concentration are elevated at SMEAR II (Eerdekens et al., 2009; Liao et al., 2011; Äijälä et al., 2017; Heikkinen et al., 2020). The dominant sources of air pollutants at SMEAR II are air masses travelling from industrialized areas in Southern Finland, St. Petersburg (Russia), and continental Europe (Patokoski et al., 2015; Riuttanen et al., 2013; Yttri et al., 2011; Tunved et al., 2006). The surrounding forest emits multiple biogenic non-methane VOCs, dominantly monoterpenes (Hakola et al., 2012; Barreira et al., 2017). Monoterpenes have been recognized to yield condensable vapours at SMEAR II (Yan et al., 2016; Rose et al., 2018; Ehn et al., 2012) known to efficiently form SOA (Ehn et al., 2014).

2.2 Aerosol Chemical Speciation Monitor (ACSM)

The Aerosol Chemical Speciation Monitor (ACSM; Aerodyne Research Inc., USA), described in detail by Ng et al. (2011c), serves as the key instrument in this study. The ACSM measurements at SMEAR II, together with the data processing techniques, are documented in detail in our earlier work (Heikkinen et al., 2020). Here, we utilize ACSM data recorded between April 2012 and September 2019. The 2019 measurements and data preparation were performed exactly the same way as for the 2012–2018 data (Heikkinen et al., 2020).

The ACSM, which is developed following the same technology as the AMS (Canagaratna et al., 2007), samples ambient air with a flow rate of 1.4 cm3 s−1 through an aerodynamic lens having ∼100 % transmission of ca. 75–650 nm particles in vacuum aerodynamic diameter (Dva) but further passes through particles up to ca. 1 µm in Dva, albeit less efficiently (Liu et al., 2007). The particles are flash vaporized at 600 C under high vacuum and ionized with 70 eV electron impact ionization. The resulting ions and their fragments are guided to a mass analyser that is a residual gas analyser (RGA) quadrupole, which scans through different mass-to-charge ratios (m/Q). The particulate matter detected by the ACSM is referred to as non-refractory (NR) sub-micrometre particulate matter (PM1). The term “non-refractory” is attributed to the instrument limitation to detect only material flash evaporating at 600 C and being unable to reliably measure extremely-heat-resistant chemical components such as sea salt and black carbon. The term “PM1” is linked to the aerodynamic lens approximate cut-off at 1 µm.

The NR-PM1 reported from ACSM measurements is a difference (diff) between the signal of particle-laden air and signal recorded when the sampling flow passed a particle filter (filtered air). In addition to the diff measurement style, which is measured using a chopper instead of a filter in the AMS, the lack of particle sizing and the cheaper detector model are the major differences between the AMS and the ACSM. Indeed, while the AMS utilizes a multichannel plate detector (MCP) gaining high signal-to-noise (SNR) ratios, the ACSM employs a secondary electron multiplier (SEM) that provides a longer lifetime at the cost of SNR. To improve the SNR, the ACSM data utilized here were 3 h averages instead of the original sampling resolution of 30 min.

As explained previously (Heikkinen et al., 2020), the ACSM was measuring through the roof of an air-conditioned container. The inlet system contained a PM2.5 cyclone and a 3 L min−1 overflow to avoid inlet losses. From summer 2013 onwards, a Nafion drier was included in the sampling line, which kept the sample flow relative humidity (RH) below 30 %. The instrument provides the NR-PM1 chemical species' mass concentration every 30 min. The mass concentration calculations, namely the conversion from amperes to µg m−3, were based on ionization efficiencies, routinely calibrated using size-selected ammonium sulfate and ammonium nitrate particles and a TSI Condensation Particle Counter (CPC; TSI 3772) as a reference instrument. A final collection efficiency (CE) correction was applied based on a 2-month moving median comparison with a collocated differential mobility particle sizer as the commonly used composition-based CE correction (Middlebrook et al., 2012) was not applicable due to ammonium concentration being most of the time below the detection limit. A detailed description of the CE correction is presented in Heikkinen et al. (2020). The CE correction was applied to the OA mass spectra prior to the PMF analyses.

3 The openair and time-over-land (TOL) analyses

This section provides a brief description of wind and air mass trajectory analyses coupled to the analysis of OA composition at SMEAR II.

3.1 The openair polar plots

The openair polar plots are used in the paper to show how OA composition varied under different wind direction and speed combinations (openair polar plots using the R-based package presented by Carslaw and Ropkins, 2012). The concentration fields were calculated by binning the OA component concentration data into different wind direction and speed bins. The field was then smoothed by interpolation, which was performed between grid centres. These openair polar plots are drawn utilizing the ZeFir pollution tracker (Petit et al., 2017), which is an Igor Pro (WaveMetrics Inc., USA) graphical interface for producing openair polar plots (among other functionalities). The wind data used for openair polar plots were recorded at the SMEAR II mast, above the forest canopy (16.8 to 67.2 m a.g.l.), with Thies 2D Ultrasonic anemometers. The wind roses are presented in Fig. S1.

3.2 HYSPLIT trajectories and TOL

The time each air mass spent over land before reaching SMEAR II was calculated hourly using 96 h long HYSPLIT (Stein et al., 2015) air mass back trajectories, with arrival heights of 100 m above ground level. The model was run with NCEP/GDAS (Kanamitsu, 1989) archive data as the meteorological input, with the 1 horizontal resolution data set used for years 2012–2013 and the 0.5 resolution data set for 2014–2018. Trajectories were grouped into three different source regions: clean sector, Europe sector, and Russia sector (Fig. S2). A source region criterion resembling our clean sector classification was used before by Tunved et al. (2006), with similar calculations on the time spent over land. The Europe and Russia sectors are considered polluted as mentioned earlier in Sect. 2.1. The grouping criterion was that the trajectory had to spend a minimum of 90 % of the time in a sector. This means that all the trajectories grouped into the clean sector have spent minimum 90 % of the time in the clean sector before arriving at SMEAR II. If the trajectory did not reach this criterion in any of the sectors, it was discarded and not considered in any further analyses. Time spent over islands, other than the British Isles, is not considered in the time-over-land (TOL) value.

4 Statistical methods

This section provides an introduction to the statistical methods utilized in this study. The application of these tools is explained later in Sect. 5. Here, we provide the basics of the main statistical tools utilized: PMF and its application in aerosol mass spectrometry as well as k-means clustering.

4.1 Positive matrix factorization (PMF) and the Multilinear Engine (ME-2)

Positive matrix factorization (PMF) (Paatero and Tapper, 1993; Paatero, 1997) is a widely used algorithm in chemometrics, which helps sort complex measurement data into factors with altering abundances, with static factor profiles without prior knowledge regarding the factor features. More precisely, PMF approximates the measurement data matrix (X) as a linear combination of these constant factor profiles (F) and their temporal proportions (G), with both F and G containing only non-negative elements (gi,k0, fk,j0). The PMF model iteratively minimizes uncertainty-weighted model residuals (Q) using a least-squares algorithm, directing the model solution towards combinations of F and G best describing X. The PMF equation in matrix notation can be written as follows:


where E equals to the model residual matrix. If written element-wise, this equation becomes

(1) x i , j = k = 1 p g i , k f k , j + e i , j .

Here, the subscript i is the time column index, j the variable row index, and k the factor index in the PMF solution containing p factors (p defined by user). The following equation for Q,

(2) Q = i = 1 m j = 1 n e i , j σ i , j 2 ,

can then be written as

(3) Q = i = 1 m j = 1 n x i , j - k = 1 p g i , k f k , j σ i , j 2 ,

where σ equals the measurement uncertainty.

Importantly, the PMF algorithm is frequently solved in robust mode, in which outliers are dynamically reweighted to prevent the PMF model fits from being pulled towards outliers. The outliers are defined as data cells where the ratio between the model residual and uncertainty exceeds a user-defined threshold, α, usually set as α=4 (Paatero, 1997). The Q values given by the PMF model are calculated using the robust mode.

The reliability of one modelled Q minimum is not usually enough. Indeed, sometimes the PMF solutions are representative of only a local Q minimum instead of the global Q minimum. To avoid interpretations of a PMF solution representing a local Q minimum, it is recommended to start PMF from multiple different starting points, e.g. seeds. Increasing the number of seeds, preferably together with random resampling (bootstrap) (Efron, 1979), helps map the stability of the PMF solution. In the bootstrapping approach, the different PMF seeds have slightly different input matrices, which contain randomly chosen rows of the original matrix. Bootstrapping is a suitable tool for PMF statistical uncertainty evaluation if sufficient amounts of resamples are conducted (Norris et al., 2008; Paatero et al., 2014).

Multilinear Engine (ME-2) is a popular PMF solver to reduce rotational ambiguity of PMF. One advantage of it is the possibility to introduce known F rows (or G columns) to PMF model during model initialization (Paatero and Hopke, 2009). This approach is traditionally conducted in three ways: via techniques named chemical mass balance (CMB), a value, and pulling techniques (Paatero and Hopke, 2009). In CMB (Watson et al., 1984), all of the rows in F (i.e. all factor profiles) are known beforehand. It can be considered as a far extreme from the traditional PMF, where none of the factor profiles is known. The a-value approach falls somewhere between CMB and PMF. Now, certain elements of F or G can be constrained to the PMF, and the model output variability from the constraint is given by a scalar, a. a can be applied to the entire F row (or G columns) or alternatively to their individual elements. The more constraints and the tighter they are (a→0), the closer the a-value approach is to CMB. Indeed, the case of having all p rows of F constrained with an a value of zero equals the CMB method. If pulling equations are introduced to the PMF model, PMF pulls the fj,k (or gi,k in the event of G pulling) towards a user-defined anchor during the iterative steps.

The evaluation of the appropriate number of factors in the PMF solution (p) can be (for example) estimated by observing the decrease in Q and the ratio between Q and the expected Q (Qexp, which is the Q normalized by the degrees of freedom of the model solution) (Paatero and Tapper, 1993). The decrease in Q/Qexp as a function of p can be, to some extent, used to understand what the optimal number of factors in the solution could be. While Q/Qexp tends to always drop as a function of p, the optimal p is typically where the Q/Qexp drops stop being significant.

PMF application in aerosol mass spectrometry

The application of PMF was first utilized with the organic aerosol data matrix obtained via AMS measurements in 2007 (Lanz et al., 2007) and has since then become a widely used and popular method in OA source apportionment. PMF is conducted so that F equals the mass spectral profiles and G the time series, usually in µg m−3. A comprehensive overview of AMS PMF studies and methodologies utilized between 2007–2011 has been given previously by Zhang et al. (2011). Ulbrich et al. (2009) introduced thorough AMS PMF interpretation guidelines and Crippa et al. (2014) introduced guidelines for the ME-2 a-value approach. Since 2011, PMF with ME-2 has also been applied successfully to ACSM data (e.g. Fröhlich et al., 2015; Canonaco et al., 2013; Zhang et al., 2019).

Preparation of the PMF input (organic aerosol data matrix and a corresponding error matrix) for both AMS and ACSM data can be done with their data processing software. The preparations are based on PMF Evaluation Tool (PET) WaveMetrics Igor Pro functions (Ulbrich et al., 2009). Before initializing any PMF solver (such as the ME-2), certain preparations are often necessary for optimal modelling. The m/Q values with low SNR (i.e. m/Q having more noise than signal) are downweighted by increasing their error. Paatero and Hopke (2003) suggested that m/Q having SNR <0.2 should be downweighted heavily or removed from the analysis and m/Q with 0.2 < SNR < 2 downweighted by a factor of 2–3. Another noisy data downweighting approach was suggested by (Visser et al., 2015), where the errors are downweighted continuously with a penalty function SNR−1, when SNR < 1. These downweightings have been done either based on the average SNR across the data set or cell-wise. Another data input modification prior to PMF initialization should be performed regarding CO2+ (m/Q 44 Th)-related variables (i.e. m/Q 16–20 and 28 Th) because the information stored at these m/Q is directly estimated from m/Q 44 Th. Such high correlation between these variables would be considered in the PMF modelling with importance that is too high. To avoid this, CO2+-related variables are typically excluded or downweighted accordingly.

PMF analysis has become easily accessible for the whole AMS and ACSM community upon the development of Igor Pro (WaveMetrics Inc., USA) based user-friendly PMF analysis tools, such as the Source Finder (SoFi, Paul Scherrer Institute and Datalystica Ltd., Switzerland) (Canonaco et al., 2013, 2021) and PET (Ulbrich et al., 2009). Recently, after the launch of the commercial SoFi Pro software (Datalystica Ltd., Switzerland) (Canonaco et al., 2021), many advanced PMF methods also became available. These methods include rolling PMF (Parworth et al., 2015) and PMF resampling (bootstrap).

The assumption of static factor profiles serves one of the questions of the atmospheric representativeness of the PMF output. A rolling PMF approach was suggested (Parworth et al., 2015) to account for such factor profile temporal variability. In the rolling PMF approach, a PMF run is conducted a short time window at a time (the timescale for which the static factor profile is assumed valid). This time window is shifted across the data set in even smaller time steps, creating overlap between PMF windows. In practice this means choosing an n day time window in an m day data set (nm) and shifting the window q days at a time (q<n) chronologically along the time axis, until all the m days are covered.

As the rolling PMF approach results in a large amount of PMF runs, and the amount grows even larger in the event of incorporating bootstrapping (typically 100–1000 seeds per PMF window), manual investigation and conclusion making becomes very challenging. The challenge of sorting as well as accepting good rolling PMF runs and/or rejecting unrealistic rolling PMF runs is addressed in SoFi Pro via criteria-based selection of PMF runs (Canonaco et al., 2021). The user-defined criteria, best describing each PMF factor (for example correlation between NOx and HOA, which both are emitted from traffic), are evaluated for each PMF run, and their scores (for example the Pearson correlation coefficient R between NOx and HOA) are presented. The user can then select all the PMF runs above certain thresholds (for example R>0.5) or select all of the PMF runs. Such criteria-based selection of PMF runs was first introduced by Daellenbach et al. (2017) and Visser et al. (2015). Selection and the averaging of all of the PMF runs without criteria-based sorting would work only in the case of having all factors, or all but one factor, constrained. In the case of having two or more free PMF factors, it is likely that their positions in the PMF output matrices are frequently changing, i.e. being situated in different columns in G. In the case of constrained PMF factors, they will always appear in their pre-designated G columns.

4.2k-means clustering

k-means clustering (Ball and Hall, 1965; MacQueen, 1967; Steinhaus, 1956; Jain, 2010) is the most popular unsupervised machine-learning approach utilized in data classification. It works particularly well (and is computationally efficient) for large data sets with a small number of well-definable clusters (k). The k-means algorithm works as follows:

  1. picking k number of centroids (i.e. cluster centre points) and assigning each sample (for example a mass spectrum) to its nearest centroid based on a selected distance metric, usually the squared Euclidean distance – this step is nowadays performed following the k-means++ algorithm (Arthur and Vassilvitskii, 2007), proven to not only speed up the clustering process, but also significantly improve its accuracy;

  2. moving the centroids to represent the new mean of the cluster;

  3. reassigning the all the points to their closest centroids (this sometimes moves points from one cluster to another);

  4. repeating steps 2 and 3 until convergence is achieved (i.e. data points stop moving between clusters and the centroids stabilize).

The goal of the k-means clustering algorithm is to minimize the following objective function:

(4) J = j = 1 k i = 1 n x i - c j 2 ,

where k is the number of clusters, n the number of data points, xi the ith data point, and cj the centroid of cluster j, and xi-cj2 represents the Euclidean squared distance function. Hence, this makes the object function, J, the average squared Euclidean distance between points in the same cluster.

Silhouette score

The silhouette score (Rousseeuw, 1987) is one of the many metrics available for evaluating the number of clusters present in the data set. It is calculated based on both intra-cluster distances of data points (cohesion, a) and their distances to points assigned in other clusters (separation, b). The silhouette score for the ith sample can be expressed as

(5) s i = ( b i - a i ) max ( a i , b i ) .

The silhouette scores range between [-1,1]. The scores for the ith sample can be interpreted as follows:

  1. si=-1 – the sample is (likely) assigned to a wrong cluster;

  2. si=0 – the sample is at the decision boundary between clusters;

  3. si=1 – the sample is well clustered.

The silhouette score (si) is calculated for an individual sample in Eq. (5) but can also be defined for clusters (s) as the average over all silhouette scores of samples belonging to the cluster, or for the entire solution (average over all samples), yielding diagnostic information on point, cluster, and solution level. Kaufman and Rousseeuw (2009) further suggested an average cluster silhouette s=0.25 as a lower limit for weak structure and s=0.50 as a lower limit for strong cluster structures. Strong structures indicate of a good clustering result, where the samples in the cluster are very similar to each other while being very different from the samples assigned to other clusters.

5 The application of PMF and k-means clustering in the current study

The current study focuses on conducting rolling PMF on 8 years of OA data recorded by an ACSM at the SMEAR II station. First, we performed unconstrained rolling PMF runs. We used these runs to determine the common OA factor profiles through k-means clustering. The ultimate goal of the unconstrained PMF and k-means clustering was to provide mass spectral profiles as a priori input for a PMF run in which all of the profiles are constrained with reported intra-cluster mass spectral variabilities. This PMF approach is therefore termed as rolling relaxed CMB, i.e. rolling rCMB. This section contains a detailed description of this framework. A written overview of the method is given below and the workflow is summarized Fig. 1.

Figure 1A pyramid flow chart roughly describing the steps from data collection to the final OA model (i.e. the time series of OA components making up the total OA signal). The statistical analysis steps (in green) are explained in further detail in Sects. 4 and 5 in the paper as well as listed in Appendix A.


5.1 Rolling PMF

The initial rolling PMF was conducted using the 2012–2019 ACSM data (Fig. 2a), prepared with the ACSM data processing software, i.e. the WaveMetrics (USA) Igor Pro-based ACSM Local toolkit, as PMF input. No downweighting based on low SNR or relation to CO2+ was conducted with the ACSM Local software. The data matrices were imported to an Igor experiment with the SoFi Pro (6.A1) toolkit and averaged from the initial 30 min time resolution to 3 h time resolution in order to improve the SNR. The error propagation was accounted for during averaging (linear terms of the squared Taylor series expansion on the measurement data). Upon the initialization of the PMF matrices, all the CO2+-related variables (i.e. m/Q 16, 17, 18, and 28 Th) were excluded from the analysis. Then, the errors of the noisy variables (SNR < 1) were weighted cell-wise by SNR−1.

Figure 2(a) The 3 h averaged time series of OA measured at SMEAR II and utilized in the current study. The y axis represents OA mass concentration in µg m−3 and the x axis the time. The figure also depicts the data coverage within the 8 years. The yellow shaded region represents the first 2 months of measurement data, which are further shown in panel (b). (b) Schematic figure visualizing the rolling window approach. Now, the x axis spans from 1 April to 1 June 2012, and the six OA time series represent the time spans of successive rolling PMF windows. With the settings used in the current study, this 2-month period would be part of six rolling PMF windows.


Only the m/Q range of 12–100 Th was included in the rolling PMF. This mass range has been typically chosen for the ACSM PMF analysis, and it avoids introducing the ACSM internal standard, naphthalene at m/Q 128, to the PMF run. m/Q 29, 31, and 38 Th were excluded from the analysis due to unknown interferences, likely from air and instrumental issues from time to time affecting these signals and yielding mass spectra not resembling any known aerosol type.

The rolling PMF was initialized with a constant factor number of three. The decision was made based on several (standard, i.e. rolling mechanism disabled) PMF runs, having time series lengths ranging from few months to years. Three factors were considered as an upper limit of the number of factors, as a greater number would not significantly reduce Q/Qexp or produce meaningful factor profiles. This step required a subjective decision.

The rolling window width was set to 30 d with 10 d window shifts. Previous studies conducted by Parworth et al. (2015) and Canonaco et al. (2021) set the window width to 2 weeks and the shift to 1 d, which is much shorter than selected here. However, as shown by Canonaco et al. (2021) the PMF solutions were seemingly equally good for window widths higher than 2 weeks (tests up to window width of 28 d). Only window widths shorter than 2 weeks led to a less good PMF result. As the time span of our data was nearly 8 times greater than utilized in the previous studies, we sped up the PMF modelling process by choosing a longer window width and shift. More testing could be conducted on appropriate lengths. However, if the number of PMF runs were to increase significantly from the amount performed here, it would be feasible to perform the PMF modelling on a server. With the current settings, the rolling PMF run performed in this study using a PC lasted 48 h.

Finally, also, the bootstrap mechanism (resampling) was enabled, and a hundred iterations were conducted at each window. A subset of the rolling PMF input is visualized in Fig. 2b. The rolling PMF yielded 62 700 factor profiles (20 900 three-factor solutions) and time series, respectively, distributed in 209 PMF windows.

5.2k-means clustering PMF profiles

Selecting and sorting the rolling PMF output via various criteria into three factors would have required a significant understanding of the PMF output beforehand. Choosing solid criteria can be straightforward near known pollution sources, but in the event of multiple unknown factors and distant sources it becomes complicated. SMEAR II represents a station with minimal anthropogenic sources. To exemplify the challenges in correlation-based criteria at SMEAR II, we can take the correlation between NOx and HOA as an example. Both of these species are emitted from traffic and known to correlate well near traffic sources. However, in the case of transported traffic emissions, many things can affect the lifetime of the emitted species, which affects the correlation between the emissions at SMEAR II. If we pick the effect of wet deposition as an example, it will remove the particulate HOA much more efficiently than gaseous NOx. If HOA and BBOA were constrained within a SMEAR II OA PMF run, it would not be surprising that the PMF output would suggest that 10 % of the OA mass was made up of HOA and 30 % of BBOA. As shown later on in this paper, these numbers are highly unrealistic. Due to the difficulty in interpreting correlations between HOA and BBOA and their markers, correlation analyses do not directly answer when constraining HOA or BBOA would have been appropriate. This is why traditional rolling PMF techniques would prevent us from HOA/BBOA quantification. This complexity motivated us to (1) use mass spectral clustering to explore the types of OA resolved within the unconstrained rolling PMF runs (i.e. answering when HOA/BBOA were present) and (2) perform rolling rCMB (Sect. 5.3) to explore the temporal behaviour of these OA types. The bootstrap iterations for each PMF window, respectively, were clustered using k-means (Phase I; see detailed description in Sect. 5.2.1). This step was followed by exploring the number of clusters across all PMF windows by further clustering all the Phase I cluster centroids (Phase II; see detailed description in Sect. 5.2.2). All the clustering procedures conducted in this study were performed within MATLAB 2017a using the kmeans algorithm, which utilizes k-means++. k-means was selected as the clustering algorithm due to previous successful OA mass spectral classification performed by Äijälä et al. (2017, 2019). Future work could be conducted in exploring the potential of other clustering algorithms.

5.2.1 Solutions for rolling windows (k-means clustering Phase I)

The rolling PMF output was uploaded into MATLAB from Hierarchical Data Format (HDF) files created for each PMF window, respectively, during the ME-2 modelling process. Prior to clustering, we scaled the PMF output with the following function suggested by Stein and Scott (1994):

(6) weight m Q = m Q s m ,

where m/Q equals the mass-to-charge ratio ranging from 12–100 Th, and sm=1.36 (recommendation by Äijälä et al., 2017). We previously showed the information value gains of mass scaling in conjunction with AMS data (Äijälä et al., 2017). Indeed, if not applied, several OA types could not be classified (Äijälä et al., 2017). Following Eq. (6), each signal at each m/Q was multiplied by its m/Q corresponding weight value. As recommended by Äijälä et al. (2017), the usage of this scaling factor gives gradually more weight to the patterns at the end of mass spectrum, containing a lot of information regarding OA sources.

Importantly, the following clustering of bootstrap iterations one rolling window at a time was conducted using cosine (dis)similarity (Sokal and Sneath, 1963) as the k-means distance metric as opposed to the commonly used squared Euclidean distance. This decision was again based on our earlier work in which various k-means distance metric alternatives were explored, and the best classification outcomes (i.e. highest number of mathematically-well-structured clusters, the centroids of which resembled well-known OA types found in the literature) resulted from clustering efforts utilizing cosine angles along with correlations (Äijälä et al., 2017). While nearly equally good clustering outcomes were achieved between these two metrics, we decided to report the cosine (dis)similarity results due to the popularity of cosine angles in mass spectral comparisons (Stein and Scott, 1994). Cosine (dis)similarity describes the similarity between two n-dimensional (n, i.e. the number of m/Q, which was 70 in our study) vectors (A and B in the equation below) via the cosine of the angle between them. Hence, the metric is not magnitude but orientation dependent. In our case this also meant that normalization of the weighted mass spectra was not necessary. The cosine (dis)similarity is defined as follows:

(7) Cosine (dis)similarity = A B A B ,

where A and B are n-dimensional vectors, which in the current case would correspond to two mass spectra.

Silhouette values were utilized to evaluate the clustering outcome similarly to Äijälä et al. (2017). Other metrics were not tested within this work as they would operate only by using squared Euclidean distance measures within our analysis software, MATLAB 2017a.

Finally, the PMF window-by-window clustering of bootstrap iterations was conducted as follows:

  1. clustering (MATLAB 2017a kmeans function using cosine (dis)similarity as the distance metric) and calculating mean silhouette values (MATLAB 2017a silhouette function using cosine (dis)similarity as the distance metric) for 2–4 clusters per PMF window – this step was performed using the 300 mass-scaled (Eq. 6) mass spectral profiles (3 factor profiles, 100 iterations) given by the 30 d rolling window;

  2. finding the number of clusters achieving the highest mean silhouette value in the PMF window – only this clustering result was used in the following steps as it was considered the best solution;

  3. undoing the mass scaling and calculating silhouette-weighted cluster centroids (with the median of all mass spectra belonging to the cluster, each multiplied by their spectra-specific silhouettes) for each PMF window; the weighting of the cluster centroid calculation by silhouette scores was performed similarly to Äijälä et al. (2017, 2019) studies – all mass spectra possessing a negative silhouette score were discarded from the cluster centroid calculation and the rest of the mass spectra were multiplied by their spectra-corresponding silhouettes; this way, the spectra with the highest silhouette scores would influence the cluster centroid the most, and the spectra with the lowest silhouette score were either discarded (if the silhouette score is zero or negative) or have minimal weight on the final cluster centroid; this step helps to alleviate possible k-means susceptibility to outliers in clusters;

  4. appending the silhouette-weighted cluster centroids in a matrix (FI) – if the PMF window was clustered with three factors in the third step listed here, then FI would gain three new rows, one for each cluster centroid mass spectrum;

  5. moving to the next PMF window and repeating steps 1–6 until all PMF window are clustered and matrix FI contains all the silhouette-weighted centroids from each PMF window.

All the steps presented above were done programmatically in MATLAB. The final number of mass spectra stored in FI was 479. The overall mean silhouette values for 2–4 clusters were high, strongly indicating segregation of strong cluster structures in the PMF window-by-window clustering of bootstrap iterations (Fig. 3a). The optimal number of clusters in the PMF windows was 2 in ca. 80 % of the PMF windows (Fig. 3b), which meant that only ca. 20 % of the PMF windows contained three to four different resolvable PMF factors.

Figure 3(a) Box-and-whisker diagram displaying the silhouette score distribution for k (number of factors) = [2, 4] representing all 209 PMF windows (Phase I). The green and red shadings indicate the ranges of strong and weak cluster structures, respectively. (b) Fraction of PMF windows achieving the highest silhouette score when the number of clusters (k) was 2, 3, or 4. (c) Silhouette score distribution for k= [2, 10] for Phase II (i.e. clustering the 479 profiles obtained from the 209 PMF windows in Phase I). (d) Evolution of the median silhouettes in Phase II in k space as a function of the mass scaling (Eq. 6) factor, sm, which gives dynamically more weight to the end of the mass spectrum. The colour scale presents the sm value for each line. (e) Cumulative distribution function (CDF) of the k=3 Phase II silhouette scores for the three clusters (named LV-OOA, SV-OOA, and POA), respectively. This subplot shows that POA has the weakest cluster structure and LV-OOA the strongest. (f) Temporal behaviour of the median silhouette score of each cluster in the k=3 Phase II solution. Here, each month displayed must contain a minimum of 30 d of cluster appearance, explaining the gap in the POA seasonal cycle, as it is not as frequently resolved as the other clusters.


5.2.2 Overall classification of mass spectra (k-means clustering Phase II)

The next step was to explore the dominant mass spectral clusters in the whole data set. Phase II contained the following steps:

  1. performing mass scaling (Eq. 6) for FI mass spectra, as performed earlier in the PMF window-by-window clustering of bootstrap iterations (Phase I; Sect. 5.2.1);

  2. calculating mean silhouette scores (MATLAB 2017a silhouette function using cosine (dis)similarity as the distance metric) for 2–10 clusters;

  3. exploring how many clusters are needed to gain the highest mean silhouette score; in the event of a vague difference between silhouettes (as shown in Fig. 3c), the step is followed by performing steps 1 and 2 again with different mass scaling sm values; the optimal number of clusters should preserve the high silhouette score even at high sm values; we explored k= [3, 6] solution space with different sm values (sm= [0, 5]); by increasing sm, the silhouette value for k=3 increased to the same level as k=2, while k>3 solution silhouettes decreased below the strong cluster limit (Fig. 3d); we thus selected three clusters for the following steps;

  4. clustering (MATLAB 2017a kmeans function using cosine (dis)similarity as the distance metric) the mass weighted mass spectra (sm=1.36) with the number of clusters defined in the previous step;

  5. undoing the mass scaling and calculating silhouette-weighted, normalized cluster centroids (cluster median) and the cluster mass spectral variability (lower and higher quartiles); these cluster centroids represent the prevailing OA types in SMEAR II sub-micrometre aerosol.

The three different OA clusters found by this method were named low-volatility oxygenated organic aerosol (LV-OOA), semi-volatile oxygenated organic aerosol (SV-OOA), and primary organic aerosol (POA). The LV-OOA and SV-OOA clusters had generally high silhouette scores, whereas the POA cluster had a weaker structure (Fig. 3e). More discussion on the mass spectral features is provided in the results section (Sect. 6.1).

5.3 Rolling rCMB

After gaining the prevailing OA types mass spectral features via the above-explained clustering processes, we wanted to gain understanding of the temporal features and mass loading of each OA type. As the HDF files for each rolling PMF window also contain time series information for each factor profile, we were able to calculate cluster-specific time series utilizing these time series connected to each cluster member spectra. The time series of the OA types were discontinuous since factors were not resolved in every window. Therefore, we utilized the silhouette-weighted cluster interquartile ranges (IQRs) gained in Sect. 5.2.2 to constrain a rolling rCMB run to gain continuous time series for each OA type. These cluster-specific time series extracted from the initial PMF were afterwards used to evaluate the rolling rCMB run (Sect. 5.3.1) but also enabled us to explore the silhouette score temporal behaviour. The silhouette score monthly medians are visualized in Fig. 3f. Only SV-OOA showed some seasonality, which could hint that SV-OOA composition has some, yet little, inter-annual variability. Due to the stability in the monthly median silhouettes, we consider the mass spectral classification robust.

The rolling rCMB run was conducted via rolling PMF using the cluster centroids of the OA factor profiles as a priori information. After extracting the governing mass spectral features across the data set, we exported the silhouette-weighted and normalized mass spectra to SoFi Pro 6B. We set up a PMF run with three factors, all of them constrained with our silhouette-weighted cluster centroids (median factor profiles). However, differing from the traditional CMB approach, we passed the ME-2 allowed limits within which the factor profiles should vary. These limits were the 25th percentile (lower limit) and 75th percentile (higher limit) of the silhouette-weighted cluster centroid spectra. The rolling rCMB was otherwise initialized exactly like the initial rolling PMF run. The CO2+ related variables were excluded, and the errors of the weak variables were treated similarly (cell-wise SNR−1 penalty function). The rolling window length was again 30 d with a 10 d shift, and resampling was enabled with 100 seeds. m/Q 29, 31, and 38 Th were still discarded from the analysis. The final rolling rCMB results for each factor, respectively, were obtained by averaging over the 20 900 PMF runs for each time point (in total 3×20900=62700 factor profiles and time series). As all the factor positions in rolling rCMB were fixed (LV-OOA profile was constrained at the F matrix first row, SV-OOA at the second, and POA at the third), such averaging was appropriate.

5.3.1 Rolling rCMB residual analysis and output evaluation

To evaluate the averaged rolling rCMB output, we first compared the Q/Qexp values between the initial rolling PMF and rolling rCMB. The comparison of the Q/Qexp retrieved from each iteration in each rolling window is visualized in Fig. S3. As expected, the mean rolling rCMB Q/Qexp value was higher (38 % increase) than that of the initial rolling PMF Q/Qexp. This is typical as Q/Qexp tends to increase whenever constraints are added to the PMF run. However due to the relaxed approach, the Q/Qexp increase is for example much less dramatic than shown in Canonaco et al. (2013) CMB tests. We find the observed Q/Qexp increase acceptable, considering the higher information value (interpretability) provided by the rCMB solution.

To continue the rolling rCMB result evaluation via residuals, we investigated rolling rCMB model uncertainty-scaled residuals (R matrix, ri,j in cell notation in Eq. 8). R elements were calculated with SoFi Pro using the following equation:

(8) r i , j = e i , j σ i , j ,

where σi,j indicates the measurement error provided in the initial PMF input error matrix and ei,j the model residual (i.e. the difference between model input and model output: xi,j (measurement) – xi,j (modelled)). A normalized scaled residual histogram is presented in Fig. 4. The scaled residual histogram, presented in Fig. 4 in green, is fairly unimodal and spreads between [-4,4] (most data between [-3,3]) as desired (Paatero and Hopke, 2003) but tends to have high frequency of slightly negative, near-zero readings. We connected this behaviour to periods with high SNR (i.e. summers; Fig. S4). As downweighting of the noisy and weak variables made as a function of SNR−1 (Sects. 4.1 and 5.1), which further influences σij in Eq. (8), the seasonality in SNR was seemingly driving the scaled residual seasonal cycles. This was visible, yet to a lesser extent, in a test rCMB run conducted without downweighting (Fig. 4 in purple) and with a more traditional average stepwise downweighting procedure (not shown), which further brings us to the conclusion that the PMF input matrix errors are also SNR dependent (Ulbrich et al., 2009) and could perhaps be further optimized. However, it should be kept in mind that the scaled residuals in general speak for a good performance of rolling rCMB in modelling the input data, and the scaled residual time series shown in Fig. S5 reveal no evident patterns/trends except the negative values in summers. An additional investigation into the real unexplained variation within the data (shown later on in Fig. 8) revealed no correlations with temperature or sub-micrometre PM components.

Figure 4Normalized histograms (probability density function, PDF) of the scaled residuals obtained from rolling rCMB. The effect of downweighting weak/bad variables is visible by the high scaled residual frequencies at negative near-zero readings in green. If rolling rCMB is conducted without downweighting, the scaled residual distribution behaves in a highly normal manner (shown in purple).


Annual median scaled residual mass spectra are visualized in Fig. S5. Even clustering attempts on the scaled residual matrix do not reveal clear structures in the scaled residual matrix, although an overall median scaled residual mass spectrum calculated using the negative residuals alone would hint towards some resemblance with POA at m/Q>50 Th. We note that this could indicate minor POA overestimation in the rolling rCMB and speculate whether introducing time-dependent profile variation limits to ME-2 could help us overcome the issue. With the method presented here, we could easily extract time-dependent limits for ME-2 variability. However, introducing such limits to a dynamic approach to the ME-2/SoFi Pro analysis software is not yet possible.

The comparisons between rolling rCMB time series (Fig. S6) to the cluster-specific time series serve as the final step in rolling rCMB validation. The overall Pearson correlation coefficient between the mean-cluster time series and the sum of rolling rCMB factor time series is approaching unity (R=0.99), and the correlations between different OA classes are 0.97, 0.94, and 0.78 for LV-OOA, SV-OOA, and POA, respectively (Fig. S6). In fact, such a high degree of agreement indicates very good rolling rCMB performance in retrieving time series for the different OA classes. As a final note, as discussed previously, the POA appearance in the time series retrieved after the Phase II clustering was likely dependent on the POA mass fraction in different PMF windows. We evaluated that 95 % (3σ) of the PMF windows where POA was not classified had a POA mass fraction (i.e. the mass fraction of POA in relation to the total rolling rCMB OA mass; fPOA) of 6 % (Fig. S7a), when the POA-explained variation (i.e. rolling rCMB-derived variability explained by POA compared to the total measurement variability) was 7 % (Fig. S7b). Such numbers resemble the PMF rule-of-thumb detection limit of ca. 5 % estimated by Ulbrich et al. (2009). This final note indicates simply that the POA cluster was not found when the POA concentration was near-zero in rolling rCMB. Such behaviour is certainly a factor explaining the slopes between the cluster-specific time series and rolling rCMB time series presented in Fig. S6.

6 Results and discussion

In this section, we introduce the key features of the LV-OOA, SV-OOA, and POA clusters' mass spectra (Sect. 6.1). After the detailed mass spectral investigation, which explains the naming of each cluster, we further discuss the temporal behaviour of these OA classes (data retrieved via rolling rCMB; Sect. 6.2). The section then includes a brief analysis of wind direction and speed dependences of the OA classes (Sect. 6.3.1) via openair polar plots (Carslaw and Ropkins, 2012; Petit et al., 2017). Finally, we explore LV-OOA, SV-OOA, and POA loading as a function of time over land in the clean sector (Sect. 6.3.2) to yield understanding on natural OOA production over the NW quadrant of Europe.

6.1 Mass spectral features of OA clusters

The cluster centroids resulting from the overall classification of SMEAR II mass spectra serve as one of the key results of the current study (Fig. 5). The three OA classes were previously named as low-volatility oxygenated organic aerosol (LV-OOA), semi-volatile oxygenated organic aerosol (SV-OOA), and primary organic aerosol (POA), but we start this subsection by motivating the decisions behind each OA cluster name.

Figure 5The left panels (a, c, e) represent silhouette-weighted median cluster centroid mass spectra for low-volatility oxygenated organic aerosol (LV-OOA), semi-volatile oxygenated organic aerosol (SV-OOA), and primary OA (POA) obtained when the number of clusters (k) equals 3 in Phase II k-means clustering (final result). Here, y axis indicates the relative signal intensity and the x axis the mass-to-charge ratio (m/Q). The panel titles include the mean ± standard deviation of the cluster silhouette score (s) and the number of spectra belonging to each cluster (N). The error bars visualize the 25th and 75th percentiles (i.e. the lower and higher quartiles). The right panels (b, d, f) show the mean LV-OOA, SV-OOA, and POA mass spectra obtained from rolling rCMB. The error bars visualize the standard deviation of each m/Q signal fraction.


The naming of LV-OOA was based on the dominance of m/Q 44 Th in the mass spectrum, and the naming of SV-OOA was done due to the high m/Q 43 Th (higher than m/Q 44 Th). The naming of the POA was motivated based on the resemblance of the POA mass spectrum with both hydrocarbon-like OA (HOA) and biomass-burning OA (BBOA). The cosine (dis)similarities between POA and HOA or BBOA (both references from Ng et al., 2011b; spectra downloaded from, last access: 3 June 2020; Ulbrich et al., 2009) were 0.85 and 0.80, respectively. If a mass scaling (Eq. 6 with various sm) was applied to all spectra, the cosine (dis)similarities between POA and HOA and BBOA, respectively, approach 0.90 with high sm values. This possibly happened because less weight was given to m/Q 44 (and 43 Th), which is higher in our POA than in typical fresh HOA or BBOA spectra (see for example Ng et al., 2011b), likely meaning that our POA cluster is more oxidized than fresh POA. As we expect HOA and BBOA to be primary in origin, and our cluster centroid spectrum resembles both of them, we decided to call this OA class POA.

To further motivate our selection of names for the three clusters (as well as to visualize the cluster structures for the readers), we displayed all the different mass spectra belonging to each cluster in an m/Q 43 Th vs. m/Q 44 Th organic signal contribution space (f44 vs. f43 space; Fig. 6a). Ng et al. (2010) first introduced this projection, also called the “triangle plot”. This perspective separates well the LV-OOA, SV-OOA, and POA clusters. They are placed in each corner of the triangle in Fig. 6a. LV-OOA lies on the top of the triangle, exhibiting the highest OA mass fraction of m/Q 44 Th (i.e. f44; hereafter this same nomenclature logic is used also for other OA mass fractions of various m/Q values), whereas SV-OOA and POA lie at the bottom of the graph, possessing nearly equally low f44. The f43, on the other hand, is highest for SV-OOA and lowest for POA (nearly equally low as for LV-OOA).

Figure 6(a) A triangle plot visualizing the mass spectra distribution in each cluster in f44 vs. f43 space, (b) Van Krevelen diagram visualizing the mass spectra in H:C vs. O:C space for LV-OOA and SV-OOA, (c) mass spectra in f44 vs. f60 space for indications of fresh BBOA, and (d) f55×f57 vs. f60 space for indications of HOA and BBOA.


By using a parametrization provided by Canagaratna et al. (2015), we converted the f44 vs. f43 plot into a hydrogen-to-carbon ratio (H:C=1.12+6.74×f43-17.77×f432) vs. oxygen-to-carbon ratio (O:C=0.079+4.31×f44) space (Van Krevelen (VK) diagram (Van Krevelen, 1950); Fig. 6b). The bulk OA data from AMS measurements have been shown to follow a −1 slope on the VK diagram (Heald et al., 2010), where the most fresh OA has the highest H:C and lowest O:C and the aged OA the opposite. The evolution of OA in the VK space following different lines results mainly from OA functionalization. In the event of a slope of 0, OA functionalization would occur mostly by addition of alcohol or peroxide groups. In the event of a slope of −1, carboxylic acid groups are added and the slope of −2 would indicate additions of ketone or aldehyde groups. Factorized OA data were previously visualized in the VK diagram by Ng et al. (2011a), where the slope for OOA data was ca. −0.5. They suggested that ambient OOA ageing would result from addition of alcohol and peroxide functional groups without introducing fragmentation and/or the addition of carboxylic acid groups with fragmentation. Here, we visualize only SV-OOA and LV-OOA, as they provide better statistics than POA as the number of objects in POA cluster was small, and these points would be highly scattered in the VK diagram. Furthermore, it is also mentioned in Canagaratna et al. (2015) that the parameterization works less well for POA.

Before interpretation of the VK diagram, we revisit results from European ACSM inter-comparisons conducted at the Aerosol Chemical Monitor Calibration Center (ACMCC). A large variability within f44 was observed between different ACSM units (Crenn et al., 2015; Fröhlich et al., 2015; Freney et al., 2019). Furthermore, the observed f44 were systematically higher than the f44 measured with a co-located high-resolution AMS, which was shown to give consistent O:C for a suite of organic samples with known O:C values. While the f44 variability was not significantly propagated in OA class mass fractions retrieved with PMF analyses of co-located ACSM data sets (Fröhlich et al., 2015), the O:C ratios of different classes were naturally affected (as O:C parameterization for AMS-type instruments is directly f44 dependent). The f44 variability has been to some extent explained by an AMS/ACSM vaporizer artefact, which leads to a release of CO2+ in the presence of high nitrate mass fractions (Pieber et al., 2016; Freney et al., 2019). Even though the presence of m/Q 44 Th has been minor in our ammonium nitrate calibrations, and the nitrate mass fraction is generally low at SMEAR II, we cannot be sure whether the f44 and thus the O:C ratios presented in the VK are overestimated. Thus, the absolute O:C values should be interpreted with caution. However, if comparing the VK diagram to the VK diagram drawn by Ng et al. (2011a) representing 43 ambient AMS data sets, we can see that our SV-OOA O:C is similar to the SV-OOA O:C retrieved by Ng et al. (2010), but our O:C for LV-OOA is higher. Still, our LV-OOA values do resemble those retrieved by Äijälä et al. (2019) with an AMS.

In general, the separation of SV-OOA and LV-OOA in the VK is distinct: the O:C of SV-OOA is ca. 30 % of the LV-OOA O:C. The SV-OOA H:C is highest and stays rather constant in the SV-OOA cluster data cloud (slope = 0, slope of adding alcohol or peroxide groups), whereas the H:C decreases as a function of O:C in the LV-OOA cluster data cloud. Due to the scatter in the LV-OOA data cloud we do not aim to quantify a slope for it.

The second row of projections visualized in Fig. 6 focuses on visualizing key POA characteristics. The f44 vs. f60 visualization used in Fig. 6c is common to distinguish fresh BBOA from aged OA (Cubison et al., 2011). The lower the f44 is, the more fresh the OA is expected to be, and the higher the f60 is, the higher the fresh BBOA fraction. The POA captured most of the high f60 cases (i.e. cases with an f60 above-determined background of 0.003, Cubison et al., 2011), and the rest (which also had the highest f44) were included in the LV-OOA cluster. These were clear LV-OOA cluster outliers as these spectra silhouette scores were all below 0.20. Owing to their high f60, these outlier spectra likely originate from biomass burning but are mixed within the LV-OOA cluster due to the high humic-like substance content of the BBOA (e.g. Ng et al., 2010). If moving to Fig. 6d, i.e. an f55×f57 vs. f60 diagram, we can see that these high f60-containing LV-OOA points are situated at the bottom of the plot, and all POA objects score a much higher f55×f57. f57 has been associated with HOA (Zhang et al., 2005), while f55 is present in HOA mass spectra usually at equally high contributions. However, f55 is not a good HOA marker alone, as it is present in all of the mass spectra (Fig. 5). Thus, the y axis in Fig. 6d was chosen to be a product of the two instead of a sum of the two, as in this way a high f55 (often the case with biogenic SOA) with marginal f57 would not be classified as a HOA marker. To conclude, Fig. 6c and d visualize how POA contains both HOA and BBOA features.

6.2 Temporal variability of OA composition

This section contains the analysis of the OA components' time series retrieved via rolling rCMB. These time series are visualized in monthly resolution in Fig. 7. While some of the OA composition variability could be visually extracted from Fig. 7, we focus on the description of Figs. 8–10, which summarize the temporal behaviour of each OA component. The three components explained ca. 70 %–80 % of the OA variation at SMEAR II (Fig. 8a). The unexplained variation can be split into data with low SNR (noisy) and data with high SNR. The unexplained fraction due to high noise (low SNR) was lowest in summer at ca. 10 % and otherwise ca. 20 %. The rest of the unexplained OA variability (data with high SNR) was nearly constant at 10 %–12 %. This fraction is termed as the “real unexplained variation” and includes only the variation made up by variables having the unexplained variation fraction less than 25 % (Paatero, 2004). As mentioned before, the unexplained variation did not correlate with any external data or show seasonal or diel patterns.

Figure 7Monthly resolution time series of LV-OOA (a), SV-OOA (b), and POA (c) mass concentrations obtained with rolling rCMB. The light shadings indicate the area between the 10th and 90th percentiles, and the dark shadings indicate the area between the 25th and 75th percentiles. The solid line represents the monthly medians for each month of measurements in 2012–2019. Note the different y-axis scales (grid lines are drawn every 1 µg m−3).


Figure 8Panel (a) depicts the variability of the rolling rCMB compared to measurement variability (scaled by uncertainty). The unexplained fraction is ca. 30 % outside summer while in summer months it drops to ca. 25 %. This variation in the unexplained variation is due to increased noisy fraction (light grey) outside summer. The real unexplained fraction (in black) stays at rather constant at ca. 11 %. Panel (b) shows fLV-OOA, fSV-OOA, and fPOA in different months. This panel only visualizes their variability in rolling rCMB.


Figure 9The monthly mass concentrations of LV-OOA (a), SV-OOA (b), and POA (c) obtained with rolling rCMB. The light shadings indicate the area between the 10th and 90th percentiles, and the dark shadings indicate the area between the 25th and 75th percentiles. The narrow dotted lines represent monthly medians for individual years, and the dark lines with circle markers represent the overall monthly mean concentrations. Note the different y-axis scales (grid lines are drawn every 0.2 µg m−3).


Figure 10The median diel cycles of LV-OOA (a), SV-OOA (b), and POA (c) obtained via rolling rCMB. The y axes represent the local time of day (UTC+2) and the x axes the month. The colour scales represent the mass concentration of each OA type. Note the different scales for each plot. Each grid point represents a 14 d× 3 h period, visualized with the MATLAB 2017a contourf function.


6.2.1 LV-OOA

LV-OOA was always the dominating OA type at SMEAR II, both in terms of OA mass fraction (fLV-OOA; Fig. 8b) and absolute concentration (Figs. 7a and 9a). LV-OOA is understood to form as a result of OA ageing in the atmosphere (e.g. Jimenez et al., 2009). Indeed, several OA types have been shown to chemically transform to LV-OOA in relatively short timescales (e.g. Jimenez et al., 2009). This makes the dominance of such aged OA product perfectly reasonable at a rural background site, such as SMEAR II. LV-OOA made up ca. 60 % of OA mass concentration, and the median absolute LV-OOA loading was 0.74 µg m−3 (overall LV-OOA IQR 0.35, 0.74, and 1.46 µg m−3).

LV-OOA loading had a bimodal seasonal cycle. The first peak occurred in February (February LV-OOA IQR: 0.30, 0.64, and 1.28 µg m−3), similarly to previously reported SMEAR II NR-PM1 inorganics (Heikkinen et al., 2020). We previously speculated that this February peak of NR-PM1 inorganics could result from a combination of meteorology-driven phenomena, such as more southerly winds compared to other winter months, the enhanced amount of solar radiation enabling photochemistry, or relatively dry conditions (in terms of less precipitation) diminishing wet deposition of aerosol particles upon transport from more polluted areas. Similar phenomena could certainly favour also higher LV-OOA loading in February. While the LV-OOA mass spectrum does not offer insights of possible LV-OOA sources (spectrum comprises mostly of m/Q 44 Th; Fig. 5a), we can still assume the wintertime LV-OOA sources to be mostly anthropogenic due to reduced biogenic activity in the wintertime boreal environment. Wintertime LV-OOA could be to a large extent for example aged wood-burning organic aerosol as wood burning is expected to be the most dominant wintertime OA source in Europe (Jiang et al., 2019). Also anthropogenic SOA formation in urban plumes is a potentially high source of wintertime OOA (Shah et al., 2019). Despite the less efficient oxidation (OH radical concentration much lower in wintertime compared to summer), the cold wintertime temperatures enable condensation of less oxidized organic vapours (e.g. Stolzenburg et al., 2018), which could favour wintertime SOA formation. Due to ageing processes, it is likely that such wintertime (anthropogenic) SOA would be detected as LV-OOA at SMEAR II due to OOA ageing during transport from the faraway urban plumes. The diel cycle of wintertime LV-OOA showed no diel pattern (Fig. 10a). Such behaviour is typical for long-range transported, i.e. not locally produced, air pollutants, as boundary layer dynamics will not influence their concentration in the surface layer. More discussion on LV-OOA sources, supporting the above-mentioned statements on the anthropogenic and biogenic influences on LV-OOA, is presented later in the paper in conjunction with wind and air mass trajectory analyses (Sect. 6.3).

The second, yet most significant, peak of LV-OOA loading occurred in summer (summertime LV-OOA IQR: 0.65, 1.18, and 2.01 µg m−3; Fig. 9a), when biogenic emissions rapidly produce SOA in ambient air. It is likely that in summertime biogenic processes were the dominating sources of LV-OOA. LV-OOA possessed a diel cycle clearly only in summer, where the LV-OOA reached a maximum concentration during daytime (Fig. 10a). It is likely that in contrast to wintertime, LV-OOA was produced also locally via photochemical pathways during daytime.

6.2.2 SV-OOA

The highest SV-OOA OA mass fraction (fSV-OOA Fig. 8b) and loading (Figs. 7b and 9b) were observed in summer (unimodal seasonal cycle). The summertime fSV-OOA was ca. 40 % (summertime SV-OOA IQR: 0.33, 0.59, and 1.07 µg m−3), and otherwise it was ca. 25 %–30 % (wintertime SV-OOA IQR: 0.10, 0.17, and 0.28 µg m−3). The seasonal cycle of SV-OOA could be explained by the surrounding forest's enhanced biogenic activity in summer months, which leads to biogenic SOA formation. However, we are not able to confirm whether all of the SV-OOA is of biogenic origin. This is because the nearby sawmills in Korkeakoski (ca. 7 km NE of SMEAR II; Sect. 2.1) represent significant SV-OOA sources (e.g. Äijälä et al., 2017). It is likely that SV-OOA production from terpenes emitted from the Korkeakoski sawmills also expresses seasonality following the air's oxidation capacity. In addition, it is possible that terpene emissions from the Korkeakoski sawmills are also temperature dependent.

SV-OOA possessed a diel cycle in all months but December and January. The SV-OOA diel cycle was typical for semi-volatile species: the maximum loading was achieved in early mornings (Fig. 10b), when atmospheric mixing layer is typically the shallowest and temperature the lowest. We previously reported a similar seasonal cycle for NR-PM1 nitrate at SMEAR II (Heikkinen et al., 2020). The SV-OOA formation is likely strongly linked to the accumulation of monoterpenes in these shallow nocturnal boundary layers in forests. During calm, stable nights radiative cooling promotes formation of inversion layers hindering vertical dispersion of the forest's emissions. The cooling of the air enables partitioning of less-oxygenated gaseous species yielded from monoterpene oxidation to the condensed phase enhancing also SV-OOA formation. SV-OOA formation via condensation of highly oxidized organic molecules (HOMs, which commonly originate from monoterpene oxidation; Bianchi et al., 2019) has been previously suggested to occur at SMEAR II's nocturnal boundary layer(s) (Hao et al., 2018).

It is important to mention here that if these ACSM measurements were conducted in a higher altitude, perhaps even a few tens of metres above ground level, such a strong diel cycle would likely not have been captured. In addition, upon the development of the turbulent daytime boundary layer, the SV-OOA yielded during the night likely does not play any major role in the SV-OOA loading within this daytime boundary layer. The BVOC oxidation in the boreal forest is more efficient during daytime compared to night-time (e.g. Peräkylä et al., 2014), which would mean a higher production of condensable vapours potentially forming SV-OOA during daytime.

When summing up SV-OOA and LV-OOA, we can see that summertime OA was nearly exclusively OOA (which is typically a good approximation of SOA), and even in wintertime the OOA organic mass fraction was ca. 80 %. High OA mass fractions of OOA in PM1 have been observed all over the northern mid-latitudes (Zhang et al., 2007).

6.2.3 POA

The fPOA seasonal cycle was opposite to that of SV-OOA, with highest fPOA achieved in wintertime (13 %; Fig. 8b). The summertime fPOA was 3 % and the overall median ca. 6 %. Interestingly, when comparing the overall median to fPOA estimated previously at SMEAR II, we observe much lower fractions. For example, Äijälä et al. (2019) report a HOA OA mass fraction of 6 % and BBOA OA mass fraction of 21 %. The sum of them, which should somewhat represent POA, is 21 percentage points higher than the mean fPOA reported here. As the Äijälä et al. (2019) study was conducted with an AMS, the data set should certainly better capture short-term pollution plumes compared to the ACSM, which has significantly lower time resolution and higher noise level. Another important fact to consider is that the Äijälä et al. (2019) study period is situated between years 2008 and 2010. It is possible that POA emissions have reduced since then or the emissions were for some reason higher than usual between 2008 and 2010. Hints of such long-term reduction or higher concentrations in 2008–2010 at SMEAR II can be observed in the equivalent black carbon (eBC) concentrations. The eBC concentration between years 2008 and 2011 was nearly twice as high as between years 2013–2018 (Luoma et al., 2021). This could certainly explain some of the discrepancy between these studies.

In addition to the fPOA, also the absolute POA concentration peaked in winter (Figs. 7c and 9c). The seasonal cycle resembles that of NOx shown in our previous work (Heikkinen et al., 2020), which in turn follows the cycles of atmospheric boundary layer height and temperature. Several phenomena can explain a larger wintertime POA loading: wintertime POA dispersed in a shallower atmospheric mixing layer compared to summer, and sources of POA are possibly greater in winter due to enhanced need for residential heating and less POA evaporation due to cold temperatures. In addition, POA wintertime ageing to LV-OOA is possibly hindered compared to summertime, due to less efficient photochemical oxidation. The wintertime POA diel cycle showed most of the time a minor afternoon maximum, and a minor night-time elevation was slightly visible only in late January/early February (Fig. 10c). Typical HOA diel cycles in populated areas show an extremely distinct diel pattern following morning and evening rush hours (e.g. Zhang et al., 2005). In residential areas, BBOA in turn typically clearly peaks in the evening, when domestic heating takes place, and the emissions are dispersed in the nocturnal boundary layer (e.g. Canonaco et al., 2013). Due to SMEAR II's distance from major HOA and BBOA sources, we did not observe such clear POA diel cycles in either summer or winter. The summertime POA diel cycle resembled a diel cycle of the sum of LV-OOA and SV-OOA. As discussed earlier in Sect. 5.3.1, it is likely that summertime POA loading was overestimated by the rolling rCMB model (Fig. S3).

6.3 Wind and air mass trajectory influence on OA composition

In this section we will discuss the wind direction and speed dependencies of OA composition, which provide useful insights in estimating whether OA is locally produced or transported. After this analysis we briefly examine the OA types' behaviour as a function of time over land (Sect. 3) to understand the potential magnitude of natural aerosol formation over the boreal forest.

6.3.1 Wind direction and speed dependency of OA composition

The openair polar plot for LV-OOA is displayed in Fig. 11a. Based on this figure, elevated LV-OOA concentrations could be expected from SE (polluted sectors) regardless of the wind speed. In the event of easterly winds, the LV-OOA concentrations were generally the highest if wind speeds stayed below 20 km h−1 (ca. 5.6 m s−1). On the contrary, in the case of NW winds (winds from the clean sector) with wind speeds exceeding 20 km h−1, the LV-OOA concentration approached zero, implying clean air transport. The LV-OOA openair polar plot resembles greatly the overall NR-PM1 organics' openair polar plot visualized previously in Heikkinen et al. (2020), which was also expected due to LV-OOA being the dominant OA component. The LV-OOA openair polar plot had more southerly influence in wintertime (Fig. 11b), and significantly less LV-OOA was detected with SE winds compared to the overall picture (Fig. 11a). The summertime LV-OOA openair polar plot (Fig. 11c) in turn was nearly identical to the median plot including all months.

Figure 11The openair polar plots (Carslaw and Ropkins, 2012) for LV-OOA (a–c), SV-OOA (d–f), and POA (g–i) obtained via the ZeFir pollution tracker WaveMetrics Igor Pro toolkit (Petit et al., 2017). The first column represents the median over all seasons, the second column the median over wintertime, and the third column the median summertime. The distances from the circle origins indicate wind speeds (in km h−1). Wind speed grid lines are presented with dark grey dashed lines. The colour scales represent the mass concentration of each OA type modelled via rolling rCMB during the specific wind direction and speed combinations. Note that the scales are different among the subplots. As these figures do not indicate any likelihood of these wind direction and speed combinations, Fig. S1 is important to keep in mind while interpreting them. Briefly, N–NE–E is the least likely direction of wind, and S–SW–W is the most likely. Wind speeds rarely exhibit 20 km h−1. The wind direction and speed data are collected above the boreal forest canopy.


The SV-OOA concentration was highest with low wind speeds (below 10 km h−1, i.e. ca. 2.8 m s−1; Fig. 11d). In addition, SE winds favoured SV-OOA presence. As SV-OOA loading peaked at night (Fig. 9c), the low wind speed dependence of SV-OOA indicates that calm nights are most suitable for SV-OOA detection. Low nocturnal wind speeds promote the formation of shallow nocturnal boundary layers, as the mixing is not enhanced by mechanically produced eddies. Thus, both the SV-OOA diel cycle and the SV-OOA formation boost at low wind speeds support the hypothesis that SV-OOA is produced locally and it builds up in the night-time surface air. However, the Korkeakoski sawmills probably explain why the SV-OOA concentration field is darker at the SE side of the openair plot origin (Fig. 11d). The wintertime SV-OOA openair polar plot still showed highest SV-OOA loading with low wind speeds but having less SE influence in the concentration field (Fig. 11e). The summertime polar plot (Fig. 11f) again resembled the overall plot (Fig. 11f). This summertime concentration field of SV-OOA greatly resembled the summertime LV-OOA concentration field (Fig. 11c). The Pearson correlation coefficient between these fields was R=0.87. This similarity supports the previously stated hypothesis that summertime LV-OOA was likely of biogenic origin (also with possible sawmill influence).

Finally, the POA openair polar plot (Fig. 11g) exemplifies how specific wind direction and speed combinations were required for POA detection: POA was resolvable only if the wind direction was S–SE and wind speed ca. 20 km h−1 (rarely the case at SMEAR II; Fig. S1). While such high wind speeds ultimately reduce the time the air masses spend over populated areas with potentially high POA emissions, the high wind speeds also enable fast transport of the POA types making their detection at fresh state possible (before POA has evaporated and/or aged).

The wintertime POA openair polar plot also had SE influence with lower wind speeds (Fig. 11h). It greatly resembled the wintertime LV-OOA openair polar plot (Fig. 11b). The Pearson correlation coefficient between the wintertime POA concentration field and LV-OOA concentration field was R=0.93. The high agreement between these concentration fields supports the previously stated hypothesis that wintertime LV-OOA was likely of anthropogenic origin. The summertime POA openair polar plot (Fig. 11i) did not differ greatly from the other POA openair polar plots, which gives some confidence in summertime POA quantification: if most summertime POA was overestimated, the summertime POA openair polar plot would likely have a similar wind dependence to OOA.

6.3.2 Time-over-land analysis

Tunved et al. (2006) showed how (organic) aerosol mass concentration increased as a function of time over land (TOL; i.e. the number of hours the air mass spent over the forested land surface upwind of SMEAR II) when the land surface had little anthropogenic influence (e.g. in the clean north-westerly sector; Fig. S2). This increase was attributed to natural (biogenic) OA production in the boreal boundary layer. Here, we observe a similar increase in the clean sector (Fig. 12a), with LV-OOA loading being the most sensitive to TOL (Fig. 12). The lower increase shown for SV-OOA (Fig. 12) in comparison to LV-OOA supports our hypothesis of SV-OOA sources being also local and SV-OOA ageing into LV-OOA. The relationship between POA and TOL was not significant (Fig. 12). The increase in LV-OOA loading as a function of TOL indicates OA formation in the boreal boundary layer, its build-up in the air mass, and ageing into LV-OOA prior to arrival at SMEAR II. Such a phenomenon is not visible when investigating the OA types' behaviour as a function of TOL in polluted sectors (Fig. S8). Indeed, none of the OA-types indicate links between OA loading and TOL in either air masses of European (southerly sector) or Russian (easterly sector) origin. We are not surprised by such a lack of correlation between OA and TOL as the picture is greatly hampered by anthropogenic emissions. As the anthropogenic emissions are minor in the clean sector, and as suggested by Tunved et al. (2006), the OA production in the clean sector is dominated by biogenic SOA formation.

Figure 12(a) The different rolling rCMB factors (y axis in µg m−3) vs. TOL (x axis in hours) for the clean sector (least polluted north-western sector as defined by Tunved et al., 2006; see Fig. S2 for a more precise sector definition). The data are binned to 5-hourly TOL bins. The shaded areas represents the concentration interquartile ranges (25th to 75th percentile) and the square markers the median concentrations. (b) The OA type vs. TOL scatter plot slopes (y axis) for each month (x axis) colour coded by the Pearson correlation coefficient for each scatter plot. The slopes (in µg m−3 h−1) are calculated for a linear fit in the TOL range of [20, 70] h. (c) The OA type concentration in a TOL bin divided by the median OA type concentration when TOL was <25 h as a function of TOL. The plot visualizes how the SV-OOA and LV-OOA have similar OA type enhancements as a function of TOL until TOL = 40 h.


The biogenic SOA hypothesis is supported also by the seasonality of the OA vs. TOL relationship (Fig. 12b): the highest correlation between the two and the steepest OA increase as a function of TOL are observed in July, which held the greatest temperatures during the measurement period (Heikkinen et al., 2020). Such temperature dependence is typically associated with biogenic SOA production (e.g. Daellenbach et al., 2017; Stefenelli et al., 2019) as the emission rates of several SOA precursors (such as monoterpenes) increase as a function of temperature (Guenther et al., 1993). The linear regression slopes for a LV-OOA vs. TOL scatter plot would suggest LV-OOA formation of ca. 42 ng m−3 h−1 in July, which is twice the SV-OOA vs. TOL slope (Fig. 12b). To exemplify these numbers, 3 d over the boreal forest in July would yield ca. 3 µg m−3 of LV-OOA and 1.6 µg m−3 of SV-OOA. The slopes for LV-OOA stay below 10 ng m−3 h−1 between October and April (values similar to the slopes for SV-OOA at the same time; Fig. 12b), when there is less of biogenic plant activity. These slopes were similar in magnitude to those derived previously for SMEAR II data (Tunved et al., 2006; Liao et al., 2014). Another interesting feature extracted from this analysis was that if the OA type vs. TOL slopes were calculated using data only below TOL = 40 h, the SV-OOA and LV-OOA slopes would be identical, and only after TOL exceeds 40 h, LV-OOA loading keeps increasing while the SV-OOA loading shows a minor decreasing trend (Fig. 12c). More analysis and perhaps investigations of similar plots from other boreal research stations could give us insights into whether the figure informs more on the timescales of OA chemistry or whether it is linked to meteorology and/or distance to the ocean from the measurement station. Additionally, cloud processing and subsequent precipitation will influence aerosol size distribution during the transport to the observation site. However, in this study we did not take these interactions and precipitation processes into account. Our aim was to explore the net effect of TOL in sub-micrometre aerosol chemistry at a fixed site. Therefore a need to explore these features in a systematic manner in the future also exists.

7 Conclusions

Organic aerosol (OA) mass spectra are recorded continuously with an Aerosol Chemical Speciation Monitor (ACSM) since 2012 at SMEAR II station, located within the boreal forest in Southern Finland. The goal of the current paper was to yield understanding of the main OA components: their mass spectral features and temporal behaviours. The large extent of input data (8 years) and the relatively remote measurement location required us to develop a new framework for conducting OA chemical characterization, as to our knowledge there are no previous studies where equally long or longer time series of OA mass spectra from similar environments have been characterized. We approached the OA characterization via positive matrix factorization (PMF; Paatero and Tapper, 1994). However, due to the length of the data set, we conducted the PMF with a 30 d rolling window approach, which enabled factor profile variability across the 8 years (Canonaco et al., 2021; Parworth et al., 2015). The rolling PMF yielded an extremely large number of PMF solutions (20 900 solutions, 62 700 factor profiles). We explored the PMF profiles across the solution space using k-means clustering to gain understanding of the dominant OA types at the station. We revealed and identified three significantly different OA clusters – low-volatility oxygenated OA (LV-OOA), semi-volatile oxygenated OA (SV-OOA), and primary OA (POA) – from these data. To attain their temporal variabilities, we performed a rolling relaxed chemical mass balance (rolling rCMB) run, anchored by the observed clusters and their intra-cluster variabilities as opposed to the more conventional methods introduced by for example Canonaco et al. (2021). The selection of k-means and rolling rCMB combination instead of a conventional rolling PMF enabled us to quantify POA at SMEAR II. The rCMB run explained ca. 70 % of the observed OA at SMEAR II, and nearly two-thirds of the unexplained variation was due to high noise level of the data, leaving the real unexplained variation at only 11 %. The analysis method utilized here turned out to be robust, and it required little analyst interference. Therefore, our framework presents a technique to effectively analyse long-term AMS or ACSM data sets while reducing subjective bias upon analysis. However, more work is potentially needed in the future to optimize the analysis stages proposed.

With equal importance to the tested data analysis framework, we also presented the OA composition and its variability at SMEAR II. The main conclusion to be drawn from the OA composition at SMEAR II is that this boreal OA is nearly exclusively oxidized organic aerosol, mostly highly oxidized LV-OOA. The result was well in line with previous studies from the Northern Hemisphere showing the ubiquity of OOA especially at rural measurement sites (Zhang et al., 2007). The LV-OOA seasonal cycle was bimodal, culminating in February and summer. The wintertime LV-OOA was likely anthropogenic, and the February peak coincided with NR-PM1 inorganics (Heikkinen et al., 2020). The summertime LV-OOA had enhanced biogenic influence, and it linearly increased the longer the air mass spent over the boreal forest. We estimated natural LV-OOA production of several tens of ng m−3 per hour. These numbers were well in line with previous studies investigating the natural aerosol production in the boreal forest (Tunved et al., 2006; Liao et al., 2014). SV-OOA was the second most abundant OA type, and the maximum SV-OOA concentration was detected in early mornings during summer. Both biogenic processes and emissions from the nearby sawmill contribute to the SV-OOA mass as also exemplified in previous studies (e.g. Äijälä et al., 2017). The highest SV-OOA loadings were observed when sampling from shallow nocturnal surface layers, but it is possible that the production of SV-OOA was highest during daytime when most BVOC oxidation takes place. Finally, the POA, the mass spectrum of which resembled both hydrocarbon-like OA and biomass burning OA, attained significant OA mass fractions only in winter. Still, those OA mass fractions were significantly lower compared to earlier long-term descriptions of SMEAR II OA composition (Äijälä et al., 2019). This discrepancy could be for example linked to a decrease in POA emissions as hinted by decreasing BC trends at the site (Luoma et al., 2021) or the ACSM limited capability in detecting short-term (pollution) plumes, which average out even more due to the 3 h averaging applied to the PMF input data, which was necessary to improve the SNR at this rural background site. More generally, due to OA composition sensitivity to meteorological conditions and anomalies, even longer time series need to be accumulated in order to reliably estimate trends of POA and other OA constituents at SMEAR II based on ACSM data.

Appendix A: Statistical analysis workflow

Figure A1Workflow describing the machine-learning analysis approach utilized in the current study. In a nutshell, this method describes how k-means clustering can be used to classify OA mass spectral profiles from a large number of unconstrained rolling PMF runs and how this information can be further utilized in a relaxed CMB run to gain insight into the OA classes' temporal behaviours. The method comprises four main phases: (1) performing rolling PMF (Sect. 5.1), (2) performing window-by-window (file-by-file) clustering of rolling window iterations (Phase I clustering; Sect. 5.2.1), (3) conducting overall classification of the centroids calculated for all PMF windows (Phase II clustering; Sect. 5.2.2), and finally (4) performing rolling relaxed chemical mass balancing using the centroids retrieved in the previous step as CMB anchors (Sect. 5.3). Sections 4 and 5 in the paper introduce all the vocabulary needed for understanding this figure. These sections also contain detailed descriptions of each step in the method.


Data availability

The ACSM NR-PM1 OA concentration data are available at the EBAS database under the EMEP ACTRIS framework (, last access: 30 June 2021) (NILU, 2021) as well as upon request from the corresponding authors. The PMF matrices and OA classes' mass spectral profiles and time series are available upon request from the corresponding authors. The wind direction and speed data are available at the SmartSMEAR data repository (, last access: 30 June 2021) (Junninen et al., 2009). The contact information of the original data contributors can be requested from


The supplement related to this article is available online at:

Author contributions

LH, MÄ, ME, TP, MK, and DW designed the study. LH, MÄ and FG performed the ACSM measurements. PA provided the size distribution data needed for 2019 ACSM data processing. MR performed time-over-land calculations with HYSPLIT trajectories. KL provided BC data for 2019 ACSM data processing. The ACSM data processing was performed by LH. KRD, GC, and MÄ assisted LH with (rolling) PMF. MÄ and DA assisted LH with the k-means clustering. LH performed the overall analysis and data visualization and wrote the paper with comments from the co-authors. ME supervised all the steps in this process.

Competing interests

The authors declare that they have no conflict of interest.


Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Special issue statement

This article is part of the special issue “Pan-Eurasian Experiment (PEEX) – Part II”. It is not associated with a conference.


We thank the SMEAR II staff for keeping the measurements running; COST COLOSSAL for valuable guidance and discussions; Francesco Canonaco (Datalystica Ltd.) for PMF support; and Santtu Mikkonen, Jean-Eudes Petit, Christopher Cappa, and the COALA group for useful discussions. We gratefully thank also the anonymous reviewers for their valuable insight. We acknowledge the NOAA Air Resources Laboratory (ARL) for the provision of the HYSPLIT transport and dispersion model used in this publication.

Financial support

This research has been supported by the European Research Council (COALA (grant no. 638703), ERA-PLANET (689443), and FORCeS (821205)), the Academy of Finland (grant nos. 317380, 320094, 307537, 324259, 333397, 334792, 329274, and 328616), and the Swiss National Science Foundation in the framework of COST (SAMSAM (grant no. IZCOZ0_177063)).

Open-access funding was provided by the Helsinki University Library.

Review statement

This paper was edited by Hang Su and reviewed by two anonymous referees.


Aiken, A. C., Decarlo, P. F., Kroll, J. H., Worsnop, D. R., Huffman, J. A., Docherty, K. S., Ulbrich, I. M., Mohr, C., Kimmel, J. R., Sueper, D., Sun, Y., Zhang, Q., Trimborn, A., Northway, M., Ziemann, P. J., Canagaratna, M. R., Onasch, T. B., Alfarra, R. M., Prevot, A. S. H., Dommen, J., Duplissy, J., Metzger, A., Baltensperger, U., and Jimenez, J. L.: O/C and OM/OC ratios of primary, secondary, and ambient organic aerosols with high-resolution time-of-flight aerosol mass spectrometry, Environ. Sci. Technol., 42, 4478–4485, 2008. 

Äijälä, M., Heikkinen, L., Fröhlich, R., Canonaco, F., Prévôt, A. S. H., Junninen, H., Petäjä, T., Kulmala, M., Worsnop, D., and Ehn, M.: Resolving anthropogenic aerosol pollution types – deconvolution and exploratory classification of pollution events, Atmos. Chem. Phys., 17, 3165–3197,, 2017. 

Äijälä, M., Daellenbach, K. R., Canonaco, F., Heikkinen, L., Junninen, H., Petäjä, T., Kulmala, M., Prévôt, A. S. H., and Ehn, M.: Constructing a data-driven receptor model for organic and inorganic aerosol – a synthesis analysis of eight mass spectrometric data sets from a boreal forest site, Atmos. Chem. Phys., 19, 3645–3672,, 2019. 

Alfarra, M. R.: Insights into the atmospheric organic aerosols using an aerosol mass spectrometer, PhD thesis, University of Manchester, Manchester, UK, 2004. 

Arthur, D. and Vassilvitskii, S.: k-means++: The Advantages of Careful Seeding, in: Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, 7–9 January 2007, pp. 1027–1035, 2007. 

Ball, G. H. and Hall, D. J.: ISODATA, a novel method of analysis and pattern classification, DTIC Document, Technical report, Stanford Research Institute, Menlo Park, CA, USA, 1965. 

Barreira, L. M. F., Duporté, G., Parshintsev, J., Hartonen, K., Jussila, M., Aalto, J., Bäck, J., Kulmala, M., and Riekkola, M.-L.: Emissions of biogenic volatile organic compounds from the boreal forest floor and understory: a study by solid-phase microextraction and portable gas chromatography-mass spectrometry, Boreal Env. Res., 22, 393–413, 2017. 

Bianchi, F., Kurtén, T., Riva, M., Mohr, C., Rissanen, M. P., Roldin, P., Berndt, T., Crounse, J. D., Wennberg, P. O., Mentel, T. F., Wildt, J., Junninen, H., Jokinen, T., Kulmala, M., Worsnop, D. R., Thornton, J. A., Donahue, N., Kjaergaard, H. G., and Ehn, M.: Highly Oxygenated Organic Molecules (HOM) from Gas-Phase Autoxidation Involving Peroxy Radicals: A Key Contributor to Atmospheric Aerosol, Chem. Rev., 119, 3472–3509,, 2019. 

Canagaratna, M. R., Jayne, J. T., Jimenez, J. L., Allan, J. D., Alfarra, M. R., Zhang, Q., Onasch, T. B., Drewnick, F., Coe, H., Middlebrook, A., Delia, A., Williams, L. R., Trimborn, A. M., Northway, M. J., DeCarlo, P. F., Kolb, C. E., Davidovits, P., and Worsnop, D. R.: Chemical and microphysical characterization of ambient aerosols with the aerodyne aerosol mass spectrometer, Mass Spectrom. Rev., 26, 185–222,, 2007. 

Canagaratna, M. R., Jimenez, J. L., Kroll, J. H., Chen, Q., Kessler, S. H., Massoli, P., Hildebrandt Ruiz, L., Fortner, E., Williams, L. R., Wilson, K. R., Surratt, J. D., Donahue, N. M., Jayne, J. T., and Worsnop, D. R.: Elemental ratio measurements of organic compounds using aerosol mass spectrometry: characterization, improved calibration, and implications, Atmos. Chem. Phys., 15, 253–272,, 2015. 

Canonaco, F., Crippa, M., Slowik, J. G., Baltensperger, U., and Prévôt, A. S. H.: SoFi, an IGOR-based interface for the efficient use of the generalized multilinear engine (ME-2) for the source apportionment: ME-2 application to aerosol mass spectrometer data, Atmos. Meas. Tech., 6, 3649–3661,, 2013. 

Canonaco, F., Tobler, A., Chen, G., Sosedova, Y., Slowik, J. G., Bozzetti, C., Daellenbach, K. R., El Haddad, I., Crippa, M., Huang, R.-J., Furger, M., Baltensperger, U., and Prévôt, A. S. H.: A new method for long-term source apportionment with time-dependent factor profiles and uncertainty assessment using SoFi Pro: application to 1 year of organic aerosol data, Atmos. Meas. Tech., 14, 923–943,, 2021. 

Carslaw, D. C. and Ropkins, K.: openair – an R package for air quality data analysis, Environ. Modell. Softw., 27, 52–61, 2012. 

Crenn, V., Sciare, J., Croteau, P. L., Verlhac, S., Fröhlich, R., Belis, C. A., Aas, W., Äijälä, M., Alastuey, A., Artiñano, B., Baisnée, D., Bonnaire, N., Bressi, M., Canagaratna, M., Canonaco, F., Carbone, C., Cavalli, F., Coz, E., Cubison, M. J., Esser-Gietl, J. K., Green, D. C., Gros, V., Heikkinen, L., Herrmann, H., Lunder, C., Minguillón, M. C., Močnik, G., O'Dowd, C. D., Ovadnevaite, J., Petit, J.-E., Petralia, E., Poulain, L., Priestman, M., Riffault, V., Ripoll, A., Sarda-Estève, R., Slowik, J. G., Setyan, A., Wiedensohler, A., Baltensperger, U., Prévôt, A. S. H., Jayne, J. T., and Favez, O.: ACTRIS ACSM intercomparison – Part 1: Reproducibility of concentration and fragment results from 13 individual Quadrupole Aerosol Chemical Speciation Monitors (Q-ACSM) and consistency with co-located instruments, Atmos. Meas. Tech., 8, 5063–5087,, 2015. 

Crippa, M., Canonaco, F., Lanz, V. A., Äijälä, M., Allan, J. D., Carbone, S., Capes, G., Ceburnis, D., Dall'Osto, M., Day, D. A., DeCarlo, P. F., Ehn, M., Eriksson, A., Freney, E., Hildebrandt Ruiz, L., Hillamo, R., Jimenez, J. L., Junninen, H., Kiendler-Scharr, A., Kortelainen, A.-M., Kulmala, M., Laaksonen, A., Mensah, A. A., Mohr, C., Nemitz, E., O'Dowd, C., Ovadnevaite, J., Pandis, S. N., Petäjä, T., Poulain, L., Saarikoski, S., Sellegri, K., Swietlicki, E., Tiitta, P., Worsnop, D. R., Baltensperger, U., and Prévôt, A. S. H.: Organic aerosol components derived from 25 AMS data sets across Europe using a consistent ME-2 based source apportionment approach, Atmos. Chem. Phys., 14, 6159–6176,, 2014. 

Cubison, M. J., Ortega, A. M., Hayes, P. L., Farmer, D. K., Day, D., Lechner, M. J., Brune, W. H., Apel, E., Diskin, G. S., Fisher, J. A., Fuelberg, H. E., Hecobian, A., Knapp, D. J., Mikoviny, T., Riemer, D., Sachse, G. W., Sessions, W., Weber, R. J., Weinheimer, A. J., Wisthaler, A., and Jimenez, J. L.: Effects of aging on organic aerosol from open biomass burning smoke in aircraft and laboratory studies, Atmos. Chem. Phys., 11, 12049–12064,, 2011. 

Daellenbach, K. R., Stefenelli, G., Bozzetti, C., Vlachou, A., Fermo, P., Gonzalez, R., Piazzalunga, A., Colombi, C., Canonaco, F., Hueglin, C., Kasper-Giebl, A., Jaffrezo, J.-L., Bianchi, F., Slowik, J. G., Baltensperger, U., El-Haddad, I., and Prévôt, A. S. H.: Long-term chemical analysis and organic aerosol source apportionment at nine sites in central Europe: source identification and uncertainty assessment, Atmos. Chem. Phys., 17, 13265–13282,, 2017. 

De Gouw, J. A., Middlebrook, A. M., Warneke, C., Goldan, P. D., Kuster, W. C., Roberts, J. M., Fehsenfeld, F. C., Worsnop, D. R., Canagaratna, M. R., Pszenny, A. A. P., Keene, W. C., Marchewka, M., Bertman, S. B., and Bates, T. S.: Budget of organic carbon in a polluted atmosphere: Results from the New England Air Quality Study in 2002, J. Geophys. Res.-Atmos., 110, D16305,, 2005. 

Donahue, N. M., Trump, E. R., Pierce, J. R., and Riipinen, I.: Theoretical constraints on pure vapor-pressure driven condensation of organics to ultrafine particles, Geophys. Res. Lett., 38, L16801,, 2011. 

Duplissy, J., DeCarlo, P. F., Dommen, J., Alfarra, M. R., Metzger, A., Barmpadimos, I., Prevot, A. S. H., Weingartner, E., Tritscher, T., Gysel, M., Aiken, A. C., Jimenez, J. L., Canagaratna, M. R., Worsnop, D. R., Collins, D. R., Tomlinson, J., and Baltensperger, U.: Relating hygroscopicity and composition of organic aerosol particulate matter, Atmos. Chem. Phys., 11, 1155–1165,, 2011. 

Eerdekens, G., Yassaa, N., Sinha, V., Aalto, P. P., Aufmhoff, H., Arnold, F., Fiedler, V., Kulmala, M., and Williams, J.: VOC measurements within a boreal forest during spring 2005: on the occurrence of elevated monoterpene concentrations during night time intense particle concentration events, Atmos. Chem. Phys., 9, 8331–8350,, 2009. 

Efron, B.: Bootstrap methods: another look at the jackknife, Annual Statistics, 20, 393–403, 1979. 

Ehn, M., Kleist, E., Junninen, H., Petäjä, T., Lönn, G., Schobesberger, S., Dal Maso, M., Trimborn, A., Kulmala, M., Worsnop, D. R., Wahner, A., Wildt, J., and Mentel, Th. F.: Gas phase formation of extremely oxidized pinene reaction products in chamber and ambient air, Atmos. Chem. Phys., 12, 5113–5127,, 2012. 

Ehn, M., Thornton, J. A., Kleist, E., Sipilä, M., Junninen, H., Pullinen, I., Springer, M., Rubach, F., Tillmann, R., Lee, B., Lopez-Hilfiker, F., Andres, S., Acir, I.-H., Rissanen, M., Jokinen, T., Schobesberger, S., Kangasluoma, J., Kontkanen, J., Nieminen, T., Kurtén, T., Nielsen, L. B., Jørgensen, S., Kjaergaard, H. G., Canagaratna, M., Maso, M. D., Berndt, T., Petäjä, T., Wahner, A., Kerminen, V.-M., Kulmala, M., Worsnop, D. R., Wildt, J., and Mentel, T. F.: A large source of low-volatility secondary organic aerosol, Nature, 506, 476–479,, 2014. 

Freney, E., Zhang, Y., Croteau, P., Amodeo, T., Williams, L., Truong, F., Petit, J.-E., Sciare, J., Sarda-Esteve, R., Bonnaire, N., Arumae, T., Aurela, M., Bougiatioti, A., Mihalopoulos, N., Coz, E., Artinano, B., Crenn, V., Elste, T., Heikkinen, L., Poulain, L., Wiedensohler, A., Herrmann, H., Priestman, M., Alastuey, A., Stavroulas, I., Tobler, A., Vasilescu, J., Zanca, N., Canagaratna, M., Carbone, C., Flentje, H., Green, D., Maasikmets, M., Marmureanu, L., Minguillon, M. C., Prevot, A. S. H., Gros, V., Jayne, J., and Favez, O.: The second ACTRIS inter-comparison (2016) for Aerosol Chemical Speciation Monitors (ACSM): Calibration protocols and instrument performance evaluations, Aerosol Sci. Tech., 53, 830–842,, 2019. 

Fröhlich, R., Crenn, V., Setyan, A., Belis, C. A., Canonaco, F., Favez, O., Riffault, V., Slowik, J. G., Aas, W., Aijälä, M., Alastuey, A., Artiñano, B., Bonnaire, N., Bozzetti, C., Bressi, M., Carbone, C., Coz, E., Croteau, P. L., Cubison, M. J., Esser-Gietl, J. K., Green, D. C., Gros, V., Heikkinen, L., Herrmann, H., Jayne, J. T., Lunder, C. R., Minguillón, M. C., Močnik, G., O'Dowd, C. D., Ovadnevaite, J., Petralia, E., Poulain, L., Priestman, M., Ripoll, A., Sarda-Estève, R., Wiedensohler, A., Baltensperger, U., Sciare, J., and Prévôt, A. S. H.: ACTRIS ACSM intercomparison – Part 2: Intercomparison of ME-2 organic source apportionment results from 15 individual, co-located aerosol mass spectrometers, Atmos. Meas. Tech., 8, 2555–2576,, 2015. 

Goldstein, A. H. and Galbally, I. E.: Known and Unexplored Organic Constituents in the Earth's Atmosphere, Environ. Sci. Technol., 41, 1514–1521,, 2007. 

Guenther, A. B., Zimmerman, P. R., Harley, P. C., Monson, R. K., and Fall, R.: Isoprene and monoterpene emission rate variability: model evaluations and sensitivity analyses, J. Geophys. Res.-Atmos., 98, 12609–12617, 1993. 

Guenther, A. B., Jiang, X., Heald, C. L., Sakulyanontvittaya, T., Duhl, T., Emmons, L. K., and Wang, X.: The Model of Emissions of Gases and Aerosols from Nature version 2.1 (MEGAN2.1): an extended and updated framework for modeling biogenic emissions, Geosci. Model Dev., 5, 1471–1492,, 2012. 

Hakola, H., Hellén, H., Hemmilä, M., Rinne, J., and Kulmala, M.: In situ measurements of volatile organic compounds in a boreal forest, Atmos. Chem. Phys., 12, 11665–11678,, 2012. 

Hallquist, M., Wenger, J. C., Baltensperger, U., Rudich, Y., Simpson, D., Claeys, M., Dommen, J., Donahue, N. M., George, C., Goldstein, A. H., Hamilton, J. F., Herrmann, H., Hoffmann, T., Iinuma, Y., Jang, M., Jenkin, M. E., Jimenez, J. L., Kiendler-Scharr, A., Maenhaut, W., McFiggans, G., Mentel, Th. F., Monod, A., Prévôt, A. S. H., Seinfeld, J. H., Surratt, J. D., Szmigielski, R., and Wildt, J.: The formation, properties and impact of secondary organic aerosol: current and emerging issues, Atmos. Chem. Phys., 9, 5155–5236,, 2009. 

Hao, L., Garmash, O., Ehn, M., Miettinen, P., Massoli, P., Mikkonen, S., Jokinen, T., Roldin, P., Aalto, P., Yli-Juuti, T., Joutsensaari, J., Petäjä, T., Kulmala, M., Lehtinen, K. E. J., Worsnop, D. R., and Virtanen, A.: Combined effects of boundary layer dynamics and atmospheric chemistry on aerosol composition during new particle formation periods, Atmos. Chem. Phys., 18, 17705–17716,, 2018. 

Hari, P. and Kulmala, M.: Station for measuring ecosystem-atmosphere relations (SMEAR II), Boreal Environ. Res., 10, 315–322, 2005. 

Heald, C. L., Kroll, J. H., Jimenez, J. L., Docherty, K. S., DeCarlo, P. F., Aiken, A. C., Chen, Q., Martin, S. T., Farmer, D. K., and Artaxo, P.: A simplified description of the evolution of organic aerosol composition in the atmosphere, Geophys. Res. Lett., 37, L08803,, 2010. 

Heikkinen, L., Äijälä, M., Riva, M., Luoma, K., Dällenbach, K., Aalto, J., Aalto, P., Aliaga, D., Aurela, M., Keskinen, H., Makkonen, U., Rantala, P., Kulmala, M., Petäjä, T., Worsnop, D., and Ehn, M.: Long-term sub-micrometer aerosol chemical composition in the boreal forest: inter- and intra-annual variability, Atmos. Chem. Phys., 20, 3151–3180,, 2020. 

Jain, A. K.: Data clustering: 50 years beyond K-means, Pattern Recogn. Lett., 31, 651–666, 2010. 

Jiang, J., Aksoyoglu, S., El-Haddad, I., Ciarelli, G., Denier van der Gon, H. A. C., Canonaco, F., Gilardoni, S., Paglione, M., Minguillón, M. C., Favez, O., Zhang, Y., Marchand, N., Hao, L., Virtanen, A., Florou, K., O'Dowd, C., Ovadnevaite, J., Baltensperger, U., and Prévôt, A. S. H.: Sources of organic aerosols in Europe: a modeling study using CAMx with modified volatility basis set scheme, Atmos. Chem. Phys., 19, 15247–15270,, 2019. 

Jimenez, J. L., Canagaratna, M. R., Donahue, N. M., Prevot, A. S. H., Zhang, Q., Kroll, J. H., DeCarlo, P. F., Allan, J. D., Coe, H., Ng, N. L., Aiken, A. C., Docherty, K. S., Ulbrich, I. M., Grieshop, A. P., Robinson, A. L., Duplissy, J., Smith, J. D., Wilson, K. R., Lanz, V. A., Hueglin, C., Sun, Y. L., Tian, J., Laaksonen, A., Raatikainen, T., Rautiainen, J., Vaattovaara, P., Ehn, M., Kulmala, M., Tomlinson, J. M., Collins, D. R., Cubison, M. J., Dunlea, E. J., Huffman, J. A., Onasch, T. B., Alfarra, M. R., Williams, P. I., Bower, K., Kondo, Y., Schneider, J., Drewnick, F., Borrmann, S., Weimer, S., Demerjian, K., Salcedo, D., Cottrell, L., Griffin, R., Takami, A., Miyoshi, T., Hatakeyama, S., Shimono, A., Sun, J. Y., Zhang, Y. M., Dzepina, K., Kimmel, J. R., Sueper, D., Jayne, J. T., Herndon, S. C., Trimborn, A. M., Williams, L. R., Wood, E. C., Middlebrook, A. M., Kolb, C. E., Baltensperger, U., and Worsnop, D. R.: Evolution of Organic Aerosols in the Atmosphere, Science, 326, 1525–1529,, 2009. 

Junninen, H., Lauri, A., Keronen, P., Aalto, P., Hiltunen, V., Hari, P., and Kulmala, M.: Smart-SMEAR: on-line data exploration and visualization tool for SMEAR stations, Boreal Environ. Res., 14, 447–457, 2009 (data available at:, last access: 30 June 2021). 

Kanamitsu, M.: Description of the NMC Global Data Assimilation and Forecast System, Weather Forecast., 4, 335–342,<0335:DOTNGD>2.0.CO;2, 1989. 

Kaufman, L. and Rousseeuw, P. J.: Finding groups in data: an introduction to cluster analysis, John Wiley & Sons, Hoboken, New Jersey, 2009. 

Kleinman, L. I., Springston, S. R., Daum, P. H., Lee, Y.-N., Nunnermacker, L. J., Senum, G. I., Wang, J., Weinstein-Lloyd, J., Alexander, M. L., Hubbe, J., Ortega, J., Canagaratna, M. R., and Jayne, J.: The time evolution of aerosol composition over the Mexico City plateau, Atmos. Chem. Phys., 8, 1559–1575,, 2008. 

Lanz, V. A., Alfarra, M. R., Baltensperger, U., Buchmann, B., Hueglin, C., and Prévôt, A. S. H.: Source apportionment of submicron organic aerosols at an urban site by factor analytical modelling of aerosol mass spectra, Atmos. Chem. Phys., 7, 1503–1522,, 2007. 

Liao, L., Dal Maso, M., Taipale, R., Rinne, J., Ehn, M., Junninen, H., Äijälä, M., Nieminen, T., Alekseychik, P., Hulkkonen, M., Worsnop, D., Kerminen, V.-M., and Kulmala, M.: Monoterpene pollution episodes in a forest environment: indication of anthropogenic origin and association with aerosol particles, Boreal Environ. Res., 16, 288–303, 2011. 

Liao, L., Kerminen, V.-M., Boy, M., Kulmala, M., and Dal Maso, M.: Temperature influence on the natural aerosol budget over boreal forests, Atmos. Chem. Phys., 14, 8295–8308,, 2014. 

Liu, P. S., Deng, R., Smith, K. A., Williams, L. R., Jayne, J. T., Canagaratna, M. R., Moore, K., Onasch, T. B., Worsnop, D. R., and Deshler, T.: Transmission efficiency of an aerodynamic focusing lens system: Comparison of model calculations and laboratory measurements for the Aerodyne Aerosol Mass Spectrometer, Aerosol Sci. Tech., 41, 721–733, 2007. 

Luoma, K., Niemi, J. V., Aurela, M., Fung, P. L., Helin, A., Hussein, T., Kangas, L., Kousa, A., Rönkkö, T., Timonen, H., Virkkula, A., and Petäjä, T.: Spatiotemporal variation and trends in equivalent black carbon in the Helsinki metropolitan area in Finland, Atmos. Chem. Phys., 21, 1173–1189,, 2021. 

MacQueen, J.: Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 21 June–18 July 1965 and 27 December 1965–7 January 1966, Statistical Laboratory of the University of California, Berkeley, USA, 281–297, 1967. 

Middlebrook, A. M., Bahreini, R., Jimenez, J. L., and Canagaratna, M. R.: Evaluation of composition-dependent collection efficiencies for the aerodyne aerosol mass spectrometer using field data, Aerosol Sci. Tech., 46, 258–271, 2012. 

Ng, N. L., Canagaratna, M. R., Zhang, Q., Jimenez, J. L., Tian, J., Ulbrich, I. M., Kroll, J. H., Docherty, K. S., Chhabra, P. S., Bahreini, R., Murphy, S. M., Seinfeld, J. H., Hildebrandt, L., Donahue, N. M., DeCarlo, P. F., Lanz, V. A., Prévôt, A. S. H., Dinar, E., Rudich, Y., and Worsnop, D. R.: Organic aerosol components observed in Northern Hemispheric datasets from Aerosol Mass Spectrometry, Atmos. Chem. Phys., 10, 4625–4641,, 2010. 

Ng, N. L., Canagaratna, M. R., Jimenez, J. L., Chhabra, P. S., Seinfeld, J. H., and Worsnop, D. R.: Changes in organic aerosol composition with aging inferred from aerosol mass spectra, Atmos. Chem. Phys., 11, 6465–6474,, 2011a. 

Ng, N. L., Canagaratna, M. R., Jimenez, J. L., Zhang, Q., Ulbrich, I. M., and Worsnop, D. R.: Real-Time Methods for Estimating Organic Component Mass Concentrations from Aerosol Mass Spectrometer Data, Environ. Sci. Technol., 45, 910–916,, 2011b. 

Ng, N. L., Herndon, S. C., Trimborn, A., Canagaratna, M. R., Croteau, P. L., Onasch, T. B., Sueper, D., Worsnop, D. R., Zhang, Q., Sun, Y. L., and Jayne, J. T.: An Aerosol Chemical Speciation Monitor (ACSM) for Routine Monitoring of the Composition and Mass Concentrations of Ambient Aerosol, Aerosol Sci. Tech., 45, 780–794, 2011c. 

Norris, G., Vedantham, R., Wade, K., Brown, S., Prouty, J., and Foley, C.: EPA Positive Matrix Factorization (PMF) 3.0 Fundamentals & user guide, U.S. Environmental Protection Agency Office of Research and Development Washington, DC, 2008. 

Norsk institutt for luftforskning (NILU): EBAS, available at:, last access: 30 June 2021. 

Paatero, P.: Least squares formulation of robust non-negative factor analysis, Chemometr. Intell. Lab., 37, 23–35,, 1997. 

Paatero, P.: User's guide for positive matrix factorization programs PMF2 and PMF, University of Helsinki, Helsinki, Finland, 2004. 

Paatero, P. and Hopke, P. K.: Discarding or downweighting high-noise variables in factor analytic models, Anal. Chim. Acta, 490, 277–289,, 2003. 

Paatero, P. and Hopke, P. K.: Rotational tools for factor analytic models, J. Chemometr., 23, 91–100,, 2009. 

Paatero, P. and Tapper, U.: Analysis of different modes of factor analysis as least squares fit problems, Chemometr. Intell. Lab., 18, 183–194,, 1993. 

Paatero, P. and Tapper, U.: Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values, Environmetrics, 5, 111–126,, 1994. 

Paatero, P., Eberly, S., Brown, S. G., and Norris, G. A.: Methods for estimating uncertainty in factor analytic solutions, Atmos. Meas. Tech., 7, 781–797,, 2014. 

Parworth, C., Fast, J., Mei, F., Shippert, T., Sivaraman, C., Tilp, A., Watson, T., and Zhang, Q.: Long-term measurements of submicrometer aerosol chemistry at the Southern Great Plains (SGP) using an Aerosol Chemical Speciation Monitor (ACSM), Atmos. Environ., 106, 43–55,, 2015. 

Patokoski, J., Ruuskanen, T. M., Kajos, M. K., Taipale, R., Rantala, P., Aalto, J., Ryyppö, T., Nieminen, T., Hakola, H., and Rinne, J.: Sources of long-lived atmospheric VOCs at the rural boreal forest site, SMEAR II, Atmos. Chem. Phys., 15, 13413–13432,, 2015. 

Peräkylä, O., Vogt, M., Tikkanen, O.-P., Laurila, T., Kajos, M. K., Rantala, P. A., Patokoski, J., Aalto, J., Yli-Juuti, T., Ehn, M., Sipilä, M., Paasonen, P., Rissanen, M., Nieminen, T., Taipale, R., Keronen, P., Lappalainen, H. K., Ruuskanen, T. M., Rinne, J., Kerminen, V.-M., Kulmala, M., Bäck, J., and Petäjä, T.: Monoterpenes' oxidation capacity and rate over a boreal forest, Boreal Environ. Res., 19, 293–310, 2014. 

Petit, J.-E., Favez, O., Albinet, A., and Canonaco, F.: A user-friendly tool for comprehensive evaluation of the geographical origins of atmospheric pollution: Wind and trajectory analyses, Environ. Modell. Softw., 88, 183–187, 2017. 

Pieber, S. M., El Haddad, I., Slowik, J. G., Canagaratna, M. R., Jayne, J. T., Platt, S. M., Bozzetti, C., Daellenbach, K. R., Fröhlich, R., Vlachou, A., Klein, F., Dommen, J., Miljevic, B., Jiménez, J. L., Worsnop, D. R., Baltensperger, U., and Prévôt, A. S. H.: Inorganic Salt Interference on CO2+ in Aerodyne AMS and ACSM Organic Aerosol Composition Studies, Environ. Sci. Technol., 50, 10494–10503,, 2016. 

Prăvălie, R.: Major perturbations in the Earth's forest ecosystems. Possible implications for global warming, Earth-Sci. Rev., 185, 544–571, 2018. 

Rinne, J., Bäck, J., and Hakola, H.: Biogenic volatile organic compound emissions from the Eurasian taiga: current knowledge and future directions, Boreal Environ. Res., 14, 807–826, 2009. 

Riuttanen, L., Hulkkonen, M., Dal Maso, M., Junninen, H., and Kulmala, M.: Trajectory analysis of atmospheric transport of fine particles, SO2, NOx and O3 to the SMEAR II station in Finland in 1996–2008, Atmos. Chem. Phys., 13, 2153–2164,, 2013. 

Rose, C., Zha, Q., Dada, L., Yan, C., Lehtipalo, K., Junninen, H., Mazon, S. B., Jokinen, T., Sarnela, N., Sipilä, M., Petäjä, T., Kerminen, V.-M., Bianchi, F., and Kulmala, M.: Observations of biogenic ion-induced cluster formation in the atmosphere, Science Advances, 4, eaar5218,, 2018. 

Rousseeuw, P. J.: Silhouettes – a Graphical Aid to the Interpretation and Validation of Cluster-Analysis, J. Comput. Appl. Math., 20, 53–65,, 1987. 

Shah, V., Jaeglé, L., Jimenez, J. L., Schroder, J. C., Campuzano-Jost, P., Campos, T. L., Reeves, J. M., Stell, M., Brown, S. S., Lee, B. H., Lopez-Hilfiker, F. D., and Thornton, J. A.: Widespread Pollution From Secondary Sources of Organic Aerosols During Winter in the Northeastern United States, Geophys. Res. Lett., 46, 2974–2983,, 2019. 

Shrivastava, M., Cappa, C. D., Fan, J., Goldstein, A. H., Guenther, A. B., Jimenez, J. L., Kuang, C., Laskin, A., Martin, S. T., Ng, N. L., Petaja, T., Pierce, J. R., Rasch, P. J., Roldin, P., Seinfeld, J. H., Shilling, J., Smith, J. N., Thornton, J. A., Volkamer, R., Wang, J., Worsnop, D. R., Zaveri, R. A., Zelenyuk, A., and Zhang, Q.: Recent advances in understanding secondary organic aerosol: Implications for global climate forcing, Rev. Geophys., 55, 509–559,, 2017. 

Sokal, R. R. and Sneath, P. H.: Principles of numerical taxonomy, Principles of numerical taxonomy, Taxon, 12, 190–199, 1963. 

Stefenelli, G., Pospisilova, V., Lopez-Hilfiker, F. D., Daellenbach, K. R., Hüglin, C., Tong, Y., Baltensperger, U., Prévôt, A. S. H., and Slowik, J. G.: Organic aerosol source apportionment in Zurich using an extractive electrospray ionization time-of-flight mass spectrometer (EESI-TOF-MS) – Part 1: Biogenic influences and day–night chemistry in summer , Atmos. Chem. Phys., 19, 14825–14848,, 2019. 

Stein, A. F., Draxler, R. R., Rolph, G. D., Stunder, B. J. B., Cohen, M. D., and Ngan, F.: NOAA's HYSPLIT Atmospheric Transport and Dispersion Modeling System, B. Am. Meteorol. Soc., 96, 2059–2077,, 2015. 

Stein, S. E. and Scott, D. R.: Optimization and testing of mass spectral library search algorithms for compound identification, J. Am. Soc. Mass Spectr., 5, 859–866,, 1994. 

Steinhaus, H.: Sur la division des corp materiels en parties, Bulletin de l'académie polonaise des sciences, 1, 801–804, 1956. 

Stolzenburg, D., Fischer, L., Vogel, A. L., Heinritzi, M., Schervish, M., Simon, M., Wagner, A. C., Dada, L., Ahonen, L. R., Amorim, A., Baccarini, A., Bauer, P. S., Baumgartner, B., Bergen, A., Bianchi, F., Breitenlechner, M., Brilke, S., Buenrostro Mazon, S., Chen, D., Dias, A., Draper, D. C., Duplissy, J., El Haddad, I., Finkenzeller, H., Frege, C., Fuchs, C., Garmash, O., Gordon, H., He, X., Helm, J., Hofbauer, V., Hoyle, C. R., Kim, C., Kirkby, J., Kontkanen, J., Kürten, A., Lampilahti, J., Lawler, M., Lehtipalo, K., Leiminger, M., Mai, H., Mathot, S., Mentler, B., Molteni, U., Nie, W., Nieminen, T., Nowak, J. B., Ojdanic, A., Onnela, A., Passananti, M., Petäjä, T., Quéléver, L. L. J., Rissanen, M. P., Sarnela, N., Schallhart, S., Tauber, C., Tomé, A., Wagner, R., Wang, M., Weitz, L., Wimmer, D., Xiao, M., Yan, C., Ye, P., Zha, Q., Baltensperger, U., Curtius, J., Dommen, J., Flagan, R. C., Kulmala, M., Smith, J. N., Worsnop, D. R., Hansel, A., Donahue, N. M., and Winkler, P. M.: Rapid growth of organic aerosol nanoparticles over a wide tropospheric temperature range, P. Natl. Acad. Sci. USA, 115, 9122–9127,, 2018. 

Surratt, J. D., Chan, A. W. H., Eddingsaas, N. C., Chan, M., Loza, C. L., Kwan, A. J., Hersey, S. P., Flagan, R. C., Wennberg, P. O., and Seinfeld, J. H.: Reactive intermediates revealed in secondary organic aerosol formation from isoprene, P. Natl. Acad. Sci. USA, 107, 6640–6645,, 2010. 

Tunved, P., Hansson, H.-C., Kerminen, V.-M., Ström, J., Maso, M. D., Lihavainen, H., Viisanen, Y., Aalto, P. P., Komppula, M., and Kulmala, M.: High Natural Aerosol Loading over Boreal Forests, Science, 312, 261–263, 2006. 

Ulbrich, I. M., Canagaratna, M. R., Zhang, Q., Worsnop, D. R., and Jimenez, J. L.: Interpretation of organic components from Positive Matrix Factorization of aerosol mass spectrometric data, Atmos. Chem. Phys., 9, 2891–2918,, 2009. 

Van Krevelen, D. W.: Graphical-statistical method for the study of structure and reaction processes of coal, Fuel, 24, 269–284, 1950. 

Watson, J. G., Cooper, J. A., and Huntzicker, J. J.: The effective variance weighting for least squares calculations applied to the mass balance receptor model, Atmos. Environ. (1967), 18, 1347–1355,, 1984. 

Williams, J., Crowley, J., Fischer, H., Harder, H., Martinez, M., Petäjä, T., Rinne, J., Bäck, J., Boy, M., Dal Maso, M., Hakala, J., Kajos, M., Keronen, P., Rantala, P., Aalto, J., Aaltonen, H., Paatero, J., Vesala, T., Hakola, H., Levula, J., Pohja, T., Herrmann, F., Auld, J., Mesarchaki, E., Song, W., Yassaa, N., Nölscher, A., Johnson, A. M., Custer, T., Sinha, V., Thieser, J., Pouvesle, N., Taraborrelli, D., Tang, M. J., Bozem, H., Hosaynali-Beygi, Z., Axinte, R., Oswald, R., Novelli, A., Kubistin, D., Hens, K., Javed, U., Trawny, K., Breitenberger, C., Hidalgo, P. J., Ebben, C. J., Geiger, F. M., Corrigan, A. L., Russell, L. M., Ouwersloot, H. G., Vilà-Guerau de Arellano, J., Ganzeveld, L., Vogel, A., Beck, M., Bayerle, A., Kampf, C. J., Bertelmann, M., Köllner, F., Hoffmann, T., Valverde, J., González, D., Riekkola, M.-L., Kulmala, M., and Lelieveld, J.: The summertime Boreal forest field measurement intensive (HUMPPA-COPEC-2010): an overview of meteorological and chemical influences, Atmos. Chem. Phys., 11, 10599–10618,, 2011. 

Visser, S., Slowik, J. G., Furger, M., Zotter, P., Bukowiecki, N., Canonaco, F., Flechsig, U., Appel, K., Green, D. C., Tremper, A. H., Young, D. E., Williams, P. I., Allan, J. D., Coe, H., Williams, L. R., Mohr, C., Xu, L., Ng, N. L., Nemitz, E., Barlow, J. F., Halios, C. H., Fleming, Z. L., Baltensperger, U., and Prévôt, A. S. H.: Advanced source apportionment of size-resolved trace elements at multiple sites in London during winter, Atmos. Chem. Phys., 15, 11291–11309,, 2015. 

Yan, C., Nie, W., Äijälä, M., Rissanen, M. P., Canagaratna, M. R., Massoli, P., Junninen, H., Jokinen, T., Sarnela, N., Häme, S. A. K., Schobesberger, S., Canonaco, F., Yao, L., Prévôt, A. S. H., Petäjä, T., Kulmala, M., Sipilä, M., Worsnop, D. R., and Ehn, M.: Source characterization of highly oxidized multifunctional compounds in a boreal forest environment using positive matrix factorization, Atmos. Chem. Phys., 16, 12715–12731,, 2016. 

Yttri, K. E., Simpson, D., Nøjgaard, J. K., Kristensen, K., Genberg, J., Stenström, K., Swietlicki, E., Hillamo, R., Aurela, M., Bauer, H., Offenberg, J. H., Jaoui, M., Dye, C., Eckhardt, S., Burkhart, J. F., Stohl, A., and Glasius, M.: Source apportionment of the summer time carbonaceous aerosol at Nordic rural background sites, Atmos. Chem. Phys., 11, 13339–13357,, 2011.  

Zhang, Q., Worsnop, D. R., Canagaratna, M. R., and Jimenez, J. L.: Hydrocarbon-like and oxygenated organic aerosols in Pittsburgh: insights into sources and processes of organic aerosols, Atmos. Chem. Phys., 5, 3289–3311,, 2005. 

Zhang, Q., Jimenez, J. L., Canagaratna, M., Allan, J., Coe, H., Ulbrich, I., Alfarra, M., Takami, A., Middlebrook, A., Sun, Y., Dzepina, K., Dunlea, E. J., Docherty, K. S., DeCarlo, P. F., Salcedo, D., Onasch, T., Borrmann, S., Weimer, S., Demerjian, K., Williams, P., Bower, K., Bahreini, R., Cottrell, L., Griffin, R., Rautiainen, J., Sun, J. Y., Zhang, Y. M., and Worsnop, D.: Ubiquity and dominance of oxygenated species in organic aerosols in anthropogenically-influenced Northern Hemisphere midlatitudes, Geophys. Res. Lett., 34, L13801,​​​​​​​, 2007. 

Zhang, Q., Jimenez, J. L., Canagaratna, M. R., Ulbrich, I. M., Ng, N. L., Worsnop, D. R., and Sun, Y.: Understanding atmospheric organic aerosols via factor analysis of aerosol mass spectrometry: a review, Anal. Bioanal. Chem., 401, 3045–3067,, 2011. 

Zhang, Y., Favez, O., Petit, J.-E., Canonaco, F., Truong, F., Bonnaire, N., Crenn, V., Amodeo, T., Prévôt, A. S. H., Sciare, J., Gros, V., and Albinet, A.: Six-year source apportionment of submicron organic aerosols from near-continuous highly time-resolved measurements at SIRTA (Paris area, France), Atmos. Chem. Phys., 19, 14755–14776,, 2019. 

Short summary
In many locations worldwide aerosol particles have been shown to be made up of organic aerosol (OA). The boreal forest is a region where aerosol particles possess a high OA mass fraction. Here, we studied OA composition using the longest time series of OA composition ever obtained from a boreal environment. For this purpose, we tested a new analysis framework and discovered that most of the OA was highly oxidized, with strong seasonal behaviour reflecting different sources in summer and winter.
Final-revised paper