Source identification of atmospheric organic vapors in two European pine forests: Results from Vocus PTR-TOF observations

Atmospheric organic vapors play essential roles in the formation of secondary organic aerosol. Source identification of these 15 vapors is thus fundamental to understand their emission sources and chemical evolution in the atmosphere and their further impact on air quality and climate change. In this study, a Vocus proton-transfer-reaction time-of-flight mass spectrometer (PTR-TOF) was deployed in two forested environments, the Landes forest in southern France and the boreal forest in southern Finland, to measure atmospheric organic vapors, including both volatile organic compounds (VOCs) and their oxidation products. For the first time, we performed binned positive matrix factorization (binPMF) analysis on the complex mass spectra 20 acquired with the Vocus PTR-TOF and identified various emission sources as well as oxidation processes in the atmosphere. Based on separate analysis of lowand high-mass ranges, fifteen PMF factors in the Landes forest and nine PMF factors in the Finnish boreal forest were resolved, showing a high similarity between the two sites. Factors representing monoterpenes dominate the biogenic VOCs in both forests, with less contributions from the isoprene factors and sesquiterpene factors. Particularly, various terpene reaction products were separated into individual PMF factors with varying oxidation degrees, 25 such as lightly oxidized compounds from both monoterpene and sesquiterpene oxidations, monoterpene-derived organic nitrates, and monoterpene more oxidized compounds. These factors display similar mass profiles and diurnal variations between the two sites, revealing similar terpene reaction pathways in these forests. With the distinct characteristics of VOCs and oxygenated VOCs measured by the Vocus PTR-TOF, this study identified various primary emission sources and secondary oxidation processes of atmospheric organic vapors in the European pine forests, providing a more comprehensive 30 understanding of gas-phase atmospheric chemistry. https://doi.org/10.5194/acp-2020-648 Preprint. Discussion started: 16 July 2020 c © Author(s) 2020. CC BY 4.0 License.

Therefore, the identification of these organic vapors and their sources is fundamental for understanding the effects of 35 atmospheric aerosols on climate change and air quality (Schell et al., 2001;Maria et al., 2004). Large amounts of VOCs with varying physicochemical properties are emitted from both biogenic and anthropogenic sources (Friedrich et al., 1999;Kesselmeier et al., 1999), and their oxidation processes in the atmosphere can lead to the formation of thousands of structurally distinct products containing multiple functional groups (Hallquist et al., 2009;Wennberg et al., 2018). Due to the enormous challenge in characterizing these organic vapors at molecular level, knowledge of their sources or formation pathways has 40 remained lacking.
Globally, SOA production from biogenic sources is much larger than that from anthropogenic sources (Tsigaridis and Kanakidou, 2003). As a group of highly reactive gases, typically with one or more C = C double bounds, terpenes make up a major fraction of biogenic VOCs, including isoprene, monoterpenes, and sesquiterpenes (Guenther et al., 1995). In the atmosphere, they react with various oxidants, i.e., hydroxyl radical (OH), ozone (O3), and nitrate radical (NO3), and produce a 45 large variety of oxygenated molecules. Isoprene is the most emitted biogenic VOC on the global scale but has a relatively small SOA yield (Ahlberg et al., 2017;McFiggans et al., 2019). Monoterpenes are important sources of SOA (Ehn et al., 2014;Zhang et al., 2018) and their oxidation processes have been found to play important roles in new particle formation (Kirkby et al., 2016;Simon et al., 2020). High ambient concentrations of monoterpenes have been observed in numerous pine forests (Hakola et al., 2012;Noe et al., 2012;Bourtsoukidis et al., 2014). While the concentrations of sesquiterpenes are generally 50 much smaller than those of isoprene and monoterpenes (Sakulyanontvittaya et al., 2008;Sindelarova et al., 2014), sesquiterpenes could contribute significantly to SOA formation because of their reactivity and high aerosol yields (Calogirou et al., 1999;Khan et al., 2017). Previous studies found that sesquiterpenes contributed to the O3 and OH reactivity in forest environments (Kim et al., 2011;Hellén et al., 2018). The recently developed Vocus proton-transfer-reaction time-of-flight mass spectrometry (PTR-TOF) enables the real-time detection of both VOCs and their oxidation products. With a new 55 chemical ionization source called Vocus, the instrument significantly improves its detection efficiency of product ions compared with conventional PTR instruments (Krechmer et al., 2018). Based on a laboratory comparison of different chemical ionization techniques, Riva et al. (2019) revealed that Vocus PTR-TOF is sensitive to a large range of oxygenated VOCs. With the deployment of a Vocus PTR-TOF in the French Landes forest, Li et al. (2020) observed various terpenes and terpene oxidation products, including a range of organic nitrates. 60 With the benefit of the state-of-the-art capabilities of Vocus PTR-TOF to detect hundreds to thousands of molecules, a great challenge arises to analyze the complicated dataset where emission sources and atmospheric physical and chemical processes are mixed together. The characteristic analysis based solely on individually identified compounds cannot give the full picture of the measurements. Factor analytical techniques, e.g., positive matrix factorization (PMF), have been utilized to https://doi.org/10.5194/acp-2020-648 Preprint. Discussion started: 16 July 2020 c Author(s) 2020. CC BY 4.0 License. extract information from mass spectrometer data by resolving co-varying signals with common sources or atmospheric 65 processes into a single factor (Paatero and Tapper, 1994). For example, PMF analysis has been widely applied by the research community using aerosol mass spectrometer to identify multiple primary organic aerosol sources and SOA aging processes (Lanz et al., 2007;Ulbrich et al., 2009;Zhang et al., 2011). Yan et al. (2016) successfully applied PMF to unit-mass-resolution (UMR) nitrate ion-based chemical ionization mass spectrometer (NO3 -CIMS) data to differentiate mainly monoterpene highly oxygenated organic molecules (HOMs) formed from different formation pathways in the boreal forest. The application of PMF 70 to high-resolution (HR) NO3 -CIMS data by Massoli et al. (2018) identified more HOM factors at an isoprene-dominated forest site in Alabama, USA. Recently, the mass spectral binning combined with PMF (binPMF) was proposed as a novel and simple method for analyzing high-resolution mass spectra datasets (Zhang et al., 2019a). This approach divides the full mass spectra into small bins as input data to PMF, thus avoiding the time-consuming and complicated peak identification. Zhang et al. (2019b) further applied binPMF to sub-ranges of ambient NO3 -CIMS mass spectra and separated more meaningful factors 75 related to chemical processes yielding HOMs.
In this work, we present the first factor analysis on Vocus PTR-TOF datasets to identify and apportion the contribution of different sources and formation pathways to atmospheric organic vapors. The measurements were conducted in two forest ecosystems in Europe, the French Landes forest and the boreal forest in southern Finland. Due to orders of magnitude differences in the signal intensities of ions between lower mass range and higher mass range, we divided the mass spectra into 80 two sub-ranges (50-200 Th and 201-320 Th) and performed binPMF analysis on these ranges separately. The resolved factors were linked to possible sources or chemistry processes by examining their mass profiles, time series, diurnal cycles, and correlation with molecular markers. Spatial comparisons were discussed for the common sources apportioned in both forests.

Site description and measurement period 85
The measurement data were obtained during summertime in two forest environments in Europe, the Landes forest in southwestern France and the boreal forest research station SMEAR (Station for Measuring Forest Ecosystem-Atmosphere Relations) Ⅱ in southern Finland. The field campaign in the Landes forest was conducted from 8 to 20 July 2018 as part of the Characterization of Emissions and Reactivity of Volatile Organic Compounds in the Landes forest (CERVOLAND) campaign.
An overview of the Vocus PTR-TOF measurements in the Landes forest has been presented earlier by Li et al. (2020). The 90 ambient observations at the SMEAR Ⅱ station were performed during 18 June -18 July 2019.
The Landes forest (44º29'N, 0º57'W) is the largest man-made pine forest in Europe, mainly filled with maritime pine trees (Pinus pinaster Aiton). The sampling site is situated at the European Integrated Carbon Observation System (ICOS) station at Bilos. The nearest urban area of the Bordeaux metropole is around 40 km to the northwest. A more detailed description of the measurement site can be found elsewhere (Kammer et al., 2018;Bsaibes et al., 2020;Li et al., 2020). Ambient meteorological parameters, including temperature, relative humidity (RH), wind speed and direction, solar radiation, and pressure, and mixing ratios of nitrogen oxides (NO and NO2) and O3 were continuously monitored at the station throughout the campaign.
The SMEAR Ⅱ station (61º51'N, 24º17'E) is located in a boreal mixed-coniferous forest in Hyytiälä, southern Finland (Hari and Kulmala, 2005). The site is dominated by a rather homogeneous Scots pine (Pinus sylvestris L.) stand and represents a rural background measurement station. The nearest large city Tampere, located about 60 km to the southwest, has 100 approximately 200 000 inhabitants. The station is equipped with extensive facilities to measure forest ecosystem-atmosphere interactions. Ambient meteorological parameters (i.e., global radiation, UVA, UVB, temperature, RH, pressure, and wind speed and direction), mixing ratios of various trace gases (i.e., carbon dioxide, carbon monoxide, sulfur dioxide, NOx, and O3), and particle concentration and size distribution, are continuously measured at the station.

Instrumentation 105
A Vocus PTR-TOF was deployed in both forest ecosystems to characterize atmospheric organic vapors. Equipped with a new chemical ionization source with a low-pressure reagent-ion source and focusing ion-molecule reactor (FIMR), the Vocus PTR-TOF is able to measure organic vapors with a wide range of volatilities (Krechmer et al., 2018;Riva et al., 2019;Li et al., 2020). A quadrupole radio frequency (RF) field inside the FIMR focuses ions to the central axis and improves the detection efficiency of product ions. Compared with conventional PTR instruments, the sensitivity and detection efficiency of Vocus 110 PTR-TOF are significantly improved (detection limit < 1 pptv). With a high water mixing ratio (10% v/v-20% v/v) in the FIMR, the instrument shows no humidity dependence for sensitivity. More instrumental details have been provided elsewhere (Krechmer et al., 2018;Li et al., 2020).
During both campaigns, we operated the Vocus ionization source at a pressure of 1.5 mbar. Sample air was drawn in through a ~1-m-long PTFE tubing (10 mm o.d., 8 mm i.d.). A sample air flow of 4.5 L min -1 was used to reduce inlet wall 115 losses and sampling delay. Around 100-150 ccm of this flow was sampled into the Vocus and the remainder was directed to the exhaust. The mass resolving power of the long TOF mass analyzer was 12 000 -13 000 m Δm -1 during our measurements.
Data were recorded with a time resolution of 5 s. During the campaign in the Landes forest, background checks were automatically performed every hour using ultra-high-purity nitrogen (UHP N2). The instrument was calibrated twice a day using a mixture of terpenes (α-/β-pinene+limonene; p-cymene). For measurements at the SMEAR Ⅱ station, background 120 measurements by injection of zero air using a built-in active carbon filter and quantitative calibrations with a multi-component standard cylinder were automatically conducted every three hours. All the m/z ratios mentioned in this work include the contribution from the charger ion (H + , mass of 1 Th) unless stated otherwise.

binPMF data preparation and analysis
As described by Zhang et al. (2019a), binPMF divides the mass spectra into small bins and then takes advantage of PMF 125 analysis to separate different sources or formation processes. The binPMF allows utilization of the high-resolution information of the complex mass spectra without the time-consuming and potentially error-prone steps of peak identification and peak https://doi.org/10.5194/acp-2020-648 Preprint. Discussion started: 16 July 2020 c Author(s) 2020. CC BY 4.0 License.
fitting before the factorization. Selected peaks of interest can be analyzed after binPMF, based on the output factors. PMF assumes that factor profiles are constant and unique, and the measured signal of a chemical component is a linear combination of different factors. This approach does not require a priori information about the factors. The detailed working principle of 130 PMF has been provided in numerous previous studies (Paatero and Tapper, 1994;Zhang et al., 2011;Yan et al., 2016).
To prepare the data and error matrices for PMF input, the Vocus PTR-TOF data were processed using the software package "Tofware" (v3.2.0; Tofwerk), which runs in the Igor Pro environment (WaveMetrics, OR, USA). The detailed data processing routines have been presented elsewhere (Stark et al., 2015). Signals were averaged over 30 min for data processing.
Unlike traditional UMR or HR fitting of the mass spectra, in binPMF analysis, the mass spectra were divided into small bins 135 after mass calibration. Due to the greater mass resolving power of the TOF mass analyzer compared with former binPMF studies (Zhang et al., 2019a(Zhang et al., , 2019b, a bin width of 0.01 Th was applied in this study. At a nominal mass N, signals between N-0.15 and N+0.35 Th were included for binning. The error matrix was calculated to include uncertainty from counting statistics following Poisson distribution and instrument electronic noise, as described by Yan et al. (2016) and Zhang et al. (2019a). The electric noise was estimated as the median of the standard deviation of binned noise signals between two nominal 140 masses, with noise range between N+0.4 and N+0.6 Th. Figure 1 shows the average mass spectra of the measurements in the Landes forest as an example. Since the signal intensity of larger molecules is generally much lower than that of low-mass molecules, we divided the mass spectra into two sub-ranges, the low mass range (51-200 Th) and the high mass range (201-320 Th). Factor analysis was separately performed on these two sub-ranges using an Igor-based open-source PMF Evaluation Tool (PET; http://cires1.colorado.edu/jimenez-145 group/wiki/index.php/PMF-AMS_Analysis_Guide#PMF_Evaluation_Tool_Software). We ran the PMF up to ten factors for both sub-ranges. For the low mass range of 51-200 Th, the signals at m/z 81 Th (C6H8H + , monoterpene fragment) and 137 Th (C10H16H + , monoterpenes) were markedly higher than the others. In the initial PMF results, the mass profiles of all resolved factors were dominated by these ions. Therefore, the major mass bins of these ions were excluded for further PMF analysis, but their corresponding isotopes were retained, effectively downweighting their contributions to the PMF result. However, to 150 quantify the relative contribution of different factors, the signals of these removed mass bins were counted back into their corresponding factors. More details can be found in Sect. 4.4. Figure 2 shows the temporal behaviors of temperature, global radiation, concentrations of O3 and NOx, and concentrations of isoprene and monoterpenes in the Landes forest and at the SMEAR Ⅱ station. In the Landes forest, the weather was mainly 155 sunny during the observation period (global radiation > 400 W m -2 ), indicating strong photochemical activity. The air mass in the forest was largely influenced by local sources, with wind speeds below canopy lower than 1 m s -1 over the whole campaign.

Dataset overview
The O3 concentration fluctuated dramatically between day and night, with the average daytime concentration peaking up to 50 ppb and the average nighttime level falling below 2 ppb . The low O3 concentration at night was probably to https://doi.org/10.5194/acp-2020-648 Preprint. Discussion started: 16 July 2020 c Author(s) 2020. CC BY 4.0 License. some extent caused by its titration by monoterpenes ( Fig. 2a; Kammer et al., 2018Kammer et al., , 2020. The Landes forest is known for 160 strong monoterpene emissions (Simon et al., 1994). During our measurements, the average mixing ratios of isoprene and monoterpenes were 0.6 ppb and 6.0 ppb, respectively. More details about this dataset can be found in Li et al. (2020). All data in the Landes forest are reported in local time and all data at the SMEAR Ⅱ station in Finnish winter time (both equal UTC time + 2).
During the measurements at the SMEAR Ⅱ station, 84% (26 out of 31) of the days had strong photochemistry (global 165 radiation > 400 W m -2 ), with the rest being cloudy days. The diurnal variation in O3 concentration was not as dramatic as that in the Landes forest. In the daytime, the O3 concentration sometimes reached up to 50 ppb. At night, the O3 level still largely remained high, above 20 ppb, in contrast to the observations in the Landes forest. A possible explanation is less nighttime O3 consumption by terpenes at the SMEAR Ⅱ station. On average, the mixing ratios of isoprene and monoterpenes were 0.2 ppb and 0.8 ppb, respectively, during the measurements, much lower than those in the Landes forest. 170

Choice of PMF solution and factor interpretation
To interpret the PMF results, the most critical decision is to choose the best number of factors. More factors introduce more degrees of freedom to explain variations in the data, but too many factors may cause splitting of real factors and lead to mathematical artifacts without physical meaning (Ulbrich et al., 2009). The factor interpretation results in this work are 175 summarized in Table 1. In the factor name, L means the Landes forest and S means the SMEAR Ⅱ station.
For the low mass range of the Landes forest dataset, the Q/Qexp varied from 15.5 to 6.0 for two to ten factors (Q is the total sum of the squares of the scaled residuals for PMF solutions). The larger Q/Qexp indicates underestimation of the errors or high residuals for some bins that cannot be simply modeled by the solution (Ulbrich et al., 2009). After seven factors, increasing the factor number does not significantly decrease the Q/Qexp. The optimal solution of seven factors was chosen after 180 a detailed evaluation following the procedures proposed by Ulbrich et al. (2009) and Zhang et al. (2011). Figure S1 shows the distribution of scaled residuals as a function of m/z. For some bins the residuals are still high. The seven factors include Factor L1 closely related to the C4H8H + ion, Factor L2 attributed to a plume event occurring on a single night during the campaign, Factor L3 mainly containing lightly oxidized compounds with six or seven carbon atoms ("C6" or "C7"), Factor L4 representing monoterpenes, Factor L5 indicative of isoprene and its oxidation products, Factor L6 identified as unknown source with large 185 contributions from unknown peaks, and Factor L7 dominated by monoterpene lightly oxidized compounds. The direct comparison of the mass spectra, time series, and diurnal cycles of six-factor and eight-factor solutions are shown in Fig. S2 and Fig. S3. In the six-factor case, the C4H8H + ion-related factor cannot be separated. With eight-factor results, the factor representing isoprene and its oxidation products is split into two components with similar time series. For the high mass range of the Landes forest dataset, the Q/Qexp decreased from 2.5 to 0.9 for two to ten factors. After evaluation, we choose the eight-190 factor solution to explain the data. The Q/Qexp value of the eight-factor solution was 1.1 and the decreasing trend in Q/Qexp https://doi.org/10.5194/acp-2020-648 Preprint. Discussion started: 16 July 2020 c Author(s) 2020. CC BY 4.0 License.
obviously slowed down after eight factors. The distribution of scaled residuals as a function of m/z for the eight-factor solution is shown in Fig. S4. The eight factors are interpreted as Factor L8 dominated by lightly oxygenated compounds containing 13 carbon atoms ("C13"), Factor L9 attributed to a plume event occurring on a single night during the campaign, Factor L10 mainly related to sesquiterpene lightly oxidized compounds, Factor L11 representing more oxidized products mainly from 195 monoterpene oxidations, Factor L12 indicating sesquiterpenes, Factor L13 largely composed of monoterpene-derived organic nitrates, Factor L14 mainly containing oxidized compounds with twelve, fourteen or sixteen carbon atoms ("C12", "C14" or "C16")and Factor L15 as unknown source largely contributed by siloxane compounds. Figure S5 and Figure S6 display the mass spectra, time series, and daily variations of seven-factor and nine-factor solutions. In the seven-factor case, monoterpene more oxidized products and monoterpene-derived organic nitrates are mixed together into a single factor. However, in the 200 nine-factor solution, the unknown factor mainly composed of siloxane compounds is split into two factors with similar mass profiles and similar diurnal trends.
For the SMEAR Ⅱ dataset, the optimal solutions of five-factor and four-factor are chosen for the low and high mass ranges, respectively. The Q/Qexp varied from 7.2 to 2.5 for two to ten factors in the low mass range and from 2.0 to 1.0 for two to ten factors in the high mass range. The five factors for the low mass range are identified as Factor S1 -C4H8H + ion-related, 205 Factor S2 -monoterpenes, Factor S3 -lightly oxidized compounds with six to nine carbon atoms, Factor S4 -isoprene and its oxidation products, and Factor S5 -monoterpene lightly oxidized compounds. The mass spectra, time series, and diurnal profiles of the four-factor and six-factor solutions for the low mass range are presented in Fig. S7 and Fig. S8. For the fourfactor solution, monoterpene lightly oxidized products are not separated as a single factor and mixed into the others. In the sixfactor case, the factor indicative of monoterpene lightly oxidized products is split into two factors. The four factors for the high 210 mass range include Factor S6 -sesquiterpene lightly oxidized products, Factor S7 -sesquiterpenes, Factor S8 -more oxidized compounds, and Factor S9 -unknown source. The direct comparison of the mass spectra, time series, and diurnal variations of three-factor and five-factor solutions are shown in Fig. S9 and Fig. S10. The three-factor solution does not identify a factor representing sesquiterpenes. In the five-factor case, the factor of unknown source mainly contributed by siloxane compounds is split into two factors with similar mass profiles. Factor L1 shows irregular diurnal variations with spiky peaks in the time series (Fig. 4b). The major bins that are largely distributed into this factor are C4H8H + and C4H10O2H + . Factor L1 closely correlates with these fingerprint peaks. Considering the high signal intensity of C4H8H + ion and its large contribution to this factor, we name Factor L1 as C4H8H + ion-related. 225 According to the discussions by Li et al. (2020), the observation of C4H8H + in the Landes forest can be attributed to several sources. For instance, the protonated butene may contribute to the C4H8H + signal, which is emitted by biogenic or anthropogenic sources (Hellén et al., 2006;Zhu et al., 2017). Another possible explanation is that the C4H8H + ion is produced during the fragmentation of many VOCs in the PTR instruments (Pagonis et al., 2019). The green leaf volatiles (GLV) have been found to fragment at m/z 57 Th inside the PTR instruments, which are a group of six-carbon aldehyde, alcohols and their 230 esters released by plants. Furthermore, butanol can easily lose an OH during the PTR source ionization and produce prominent C4H8H + peaks (Spanel and Smith, 1997). Therefore, the condensation particle counters (CPCs) using butanol for aerosol measurements at the site could also be an important source of C4H8H + ions, although the exhaust air from these instruments has been filtered using charcoal denuder.

Factor L2: A plume event 235
Factor L2 is identified as a plume event occurring on a single night during the campaign. As shown in Fig. 4a, the time series of this factor are characterized by much higher intensities at midnight of 9 July 2018 than over the other days. Fingerprint peaks in this factor are aromatic compounds such as C6H6H + , C7H6H + , and C6H6OH + . Factor L2 is well correlated with benzene and phenol (r 2 = 0.88; Fig. 5), indicating the major influence of anthropogenic sources. As mentioned above, the air masses in the forest were relatively stable during our observations with wind speed below canopy < 1 m s -1 . Therefore, the influence of 240 long-range regional transport on the atmosphere in the forest is expected to be minor. We speculate that Factor L2 is a result of local anthropogenic disturbances favored by the lower boundary layer height at night.

Factor L3: C6 and C7 lightly oxidized products
The diurnal cycle of Factor L3 exhibits a small morning peak at 9:00 and significantly elevated intensities during nighttime, peaking at around 22:00 (Fig. 4b). As illustrated in the mass profile of Factor L3, this factor is mainly composed of 245 lightly oxidized compounds containing six or seven carbon atoms such as C6H10OH + , C7H10OH + , C6H10O2H + , and C7H12O2H + .

Factor L4: monoterpenes
The mass profile of Factor L4 is dramatically characterized by a monoterpene peak ( 13 CC9H16H + ) and its major fragments (i.e., 13 CC5H8H + and C7H8H + ) inside the instrument. As shown in Fig. 4b, the diurnal variation of this factor follows a similar pattern to that of monoterpenes . The signal intensity of the factor starts to increase at 20:00, peaks at midnight, and then drops to around the detection limit during daytime. Monoterpene emissions are mainly influenced by temperature 255 (Hakola et al., 2006;Kaser et al., 2013). Therefore, with the continuous emissions of monoterpenes and the shallow boundary https://doi.org/10.5194/acp-2020-648 Preprint. Discussion started: 16 July 2020 c Author(s) 2020. CC BY 4.0 License. layer at night, the signal intensities of monoterpenes are observed to be elevated. The lower signal of Factor L4 in the daytime is likely a combination of enhanced atmospheric mixing after sunrise and the rapid photochemical consumption of monoterpenes. The signal of C10H16OH + is also mostly resolved into this factor. C10H16O could be primary emissions of oxygenated monoterpenes or monoterpene oxidation products (Kallio et al., 2006;McKinney et al., 2011). Previous ambient 260 observation has demonstrated that the atmospheric behavior of C10H16O has high similarity to that of monoterpenes .

Factor L5: isoprene and its oxidation products
The marker peaks in Factor L5 are highly dominated by isoprene and its major oxidation products in the atmosphere, i.e., C5H8H + and C4H6OH + (Wennberg et al., 2018). Isoprene emissions strongly depend on light intensity (Monson and Fall, 1989;265 Kaser et al., 2013) and generally show high concentrations in the day. Similarly, the daily variations of Factor L5 display maximum signal during daytime and minima at night.

Factor L6: unknown source
Factor L6 is characterized by increased signals in the afternoon. The major peaks in its factor profile are C6H4O2H + , C6H6O3H + , and numerous unidentified peaks with negative mass defect. As this factor is clearly separated as a single source 270 with high signals during our observations and the molecule markers remain unidentified, we name this factor as an unknown source.

Factor L7: monoterpene lightly oxidized products
Fingerprint peaks in this factor are monoterpene oxidation products with oxygen number from one to three, such as C9H14OH + , C10H14OH + , C10H16O2H + , and C10H16O3H + . This factor displays clear morning and evening peaks, similar to the 275 behavior of these lightly oxidized compounds . By calculating the reaction rates of monoterpenes with OH and O3, Li et al. (2020) demonstrated that both OH-and O3-initiated oxidation processes contribute to the formation of these compounds in the Landes forest.

High mass range (201-320 Th)
The mass spectra of the five factors identified in the high mass range are shown in Fig. 6, and their time series and daily 280 variations in Fig. 7. Figure 5 includes the correlations of these five factors with fingerprint molecules.

Factor L8: C13 lightly oxidized products
The mass profile of Factor L8 is characterized by high peaks of lightly oxidized compounds containing 13 carbon atoms, like C13H18O2H + and C13H20O3H + . Similar to C6 and C7 lightly oxidized compounds, this factor shows a morning peak at 9:00 and an evening peak at around midnight (Fig. 7b). The time series of this factor correlate well with those of Factor L3 and 285 Factor L7 (r 2 = 0.64 and 0.40; Fig. S11), indicating potentially similar formation mechanisms of these lightly oxygenated compounds. Therefore, the C13 oxidized compounds are speculated to be produced through the dimer formation mechanisms of C6 and C7 peroxy radicals (Valiev et al., 2019). In addition, C13H20O3 can be direct emissions of methyl jasmonate (Meja), which is a typical green leaf volatile used in plant-plant communications for defensive purposes (Cheong and Choi, 2003). But https://doi.org/10.5194/acp-2020-648 Preprint. Discussion started: 16 July 2020 c Author(s) 2020. CC BY 4.0 License.
with similar temporal behaviors to Factor L3 and Factor L7, we conclude that these C13 lightly oxidized compounds are formed 290 from atmospheric oxidation processes, not direct plant emissions.

Factor L9: A plume event
Similar to Factor L2, Factor L9 is characterized with much high intensities on a single night (9 July 2018) during the campaign (Fig. 7a). Fingerprint peaks in the mass profile of Factor L9 are numerous unidentified ions. The time series of Factor L9 correlate tightly with those of Factor L2 (r 2 = 0.93) and aromatic compounds C6H6 and C6H6O (r 2 = 0.75). Therefore, 295 we define Factor L9 as a similar plume event to Factor L2 resolved in the high mass range.

Factor L10: sesquiterpene lightly oxidized products
The fingerprint peaks identified in this factor are C15H22OH + , C15H24OH + , C15H22O2H + , C15H24O2H + , and C15H24O3H + , which are typical reaction products from sesquiterpene oxidations (Fu et al., 2009;Yee et al., 2018). The signal intensity of this factor is generally high during nighttime, but shows another morning peak at 8:00. In addition to the production from 300 sesquiterpene oxidation processes, C15H22O and C15H24O can be oxygenated sesquiterpene alcohols and aldehydes directly emitted from vegetation (Kännaste et al., 2014).

Factor L11: monoterpene more oxidized products
The mass spectrum of this factor is mainly characterized by more oxidized compounds from monoterpene oxidations such as C10H16O4H + , C10H14O5H + , C10H16O5H + , and C10H16O6H + . As shown in Fig. 5, the time series of Factor L11 show good 305 correlations with these compounds. Compared with monoterpene lightly oxidized compounds, the diurnal cycle of this factor shows a broad daytime distribution peaking between 14:00 and 20:00, caused by strong and complicated photochemical reactions during the day.

Factor L12: sesquiterpenes
The mass spectra of Factor L12 are clearly dominated by a big single peak of C15H24H + , indicating the influence of 310 sesquiterpenes. Sesquiterpene emissions from plants are found to exhibit a strong dependence on temperature . Therefore, similar to the diurnal cycle of Factor L4, this factor shows prominently enhanced signals during nighttime.
As shown in Fig. S11, Factor L12 and Factor L4 correlate quite well with each other (r 2 = 0.69).

Factor L13: monoterpene-derived organic nitrates
The signal intensity of this factor starts to increase in the early morning (around 7:00) and presents a distinct morning 315 peak at 9:00. In addition, a much smaller evening peak is observed at 21:00. The daily variations of this factor are quite similar to those of monoterpene-derived organic nitrates measured in the Landes forest . Consistently, the major peaks in the factor profile are C10H15NO4H + , C10H15NO5H + , C9H13NO6H + , and C10H15NO6H + , indicating the dominant contribution of organic nitrates formed from monoterpene oxidation processes. According to the calculation of the reaction rates of monoterpenes with OH radical, O3, and NO3 radical, the big morning peak came from O3-and OH-initiated monoterpene 320 oxidations in the presence of NOx, while for the small evening peak the additional contribution of NO3 radical-induced monoterpene oxidations should be included . C10H17NO4, C10H15NO5, C10H17NO5, and C10H15NO6 have been https://doi.org/10.5194/acp-2020-648 Preprint. Discussion started: 16 July 2020 c Author(s) 2020. CC BY 4.0 License.
Factor L15: unknown sourceThe mass profile of Factor L15 is predominantly characterized by high cyclic volatile methyl 330 siloxanes (VMSs) peaks and some unidentified peaks (Fig. 6). The major cyclic VMSs are protonated D3 siloxane, D4 siloxane, and their H3O + cluster ions, which have been widely used in cosmetics and personal care products (Buser et al., 2013;Yucuis et al., 2013). The diurnal cycle of this factor shows a bit higher intensity during daytime but also big background signals at night. A similar factor has also been identified at the SMEAR Ⅱ station. More detailed discussions can be found in Sect. 4.3.2.

Low mass range (51-200 Th)
The factor profiles, time series, and diurnal cycles of the five-factor solution for the low mass range are presented in Fig. 8 and Fig. 9. Figure 10 illustrates the correlation of the five factors with various molecules.
Factor S1: C4H8H + ion-related Similar to the source identification in the Landes forest, a factor related to C4H8H + ion is clearly resolved at the SMEAR 340 Ⅱ station. The major peaks in this factor are C4H8H + , C4H12O2H + , and C4H14O3H + . As discussed in Section 4.2.1, several sources could contribute to the detection of C4H8H + ion. However, at this site, the bivariate polar plot where the concentrations of air pollutants are shown as a function of WS and WD indicates that high signals of C4H8H + generally occur when the wind comes from the north (Fig. S12). Located in the north of the measurement container is a particle measurement cottage with several CPCs inside using butanol. A previous study at this station also found that C4H8H + signals detected by PTR-TOF 345 mainly come from butanol used by aerosol instruments (Schallhart et al., 2018). Therefore, it is expected that Factor S1 at the SMEAR Ⅱ station is mainly contributed by butanol fragmentation inside the instrument where butanol comes from nearby aerosol instruments.

Factor S2: monoterpenes
A factor representing monoterpenes is also identified at the SMEAR Ⅱ station, with fingerprint peaks of 13 CC5H8H + , 350 C7H10H + , and 13 CC9H16H + . Monoterpenes undergo some degree of fragmentation within PTR instruments, and C6H8H + and C7H10H + have been observed to be the major fragments of monoterpenes (Tani et al., 2003;Tani, 2013;Kari et al., 2018). The signal intensity of monoterpenes at the SMEAR Ⅱ station is much lower than that in the Landes forest.
The mass profile of Factor S3 is characterized by lightly oxygenated compounds with carbon atoms varying from six to 355 nine (C6-C9) such as C6H10OH + , C6H12OH + , C7H10OH + , C8H14OH + , and C9H18OH + . The signal intensity of this factor shows high peaks at night and low appearance during daytime. As discussed in Section 4.2.1, these lightly oxygenated molecules can be directly emitted from anthropogenic and biogenic sources or come from oxidation processes of various VOC precursors (Conley et al., 2005;Pandya et al., 2006;Rantala et al., 2015;Hartikainen et al., 2018). For instance, C7H10O has been found from direct soil emissions (Abis et al., 2020) or oxidation processes of 1,2,4-trimethyl benzene (Mehra et al., 2020). Therefore, 360 we expect the molecules in this factor to be either directly emitted or as oxidation products of forest emissions.

Factor S4: isoprene and its oxidation products
At the SMEAR Ⅱ station, a factor largely composed of isoprene and its oxidation products is also resolved. The outstanding peaks in the factor profile are C5H8H + , C4H6OH + , C4H8O2H + , and C5H8O2H + . The signal intensity of this factor is around ten times lower than that of Factor L5 measured in the Landes forest. Similar to previous isoprene observations at the 365 sampling site (Hakola et al., 2012), this factor shows a broad daytime peak and low signals at night.

Factor S5: monoterpene lightly oxidized products
Similar to Factor L7 identified in the Landes forest, this factor is characterized by major peaks of monoterpene lightly oxidized compounds, as shown in Fig. 8. The signal intensity of this factor starts to increase at 20:00 and presents an obvious morning peak at 7:00. 370 Figure 11 and Figure 12 present the mass spectra, time series, and daily variations of the four factors identified in the higher mass range at the SMEAR Ⅱ station. The correlation coefficients among each factor and various fingerprint compounds can be found in Fig. 10.

Factor S6: sesquiterpene lightly oxidized products 375
This factor is identified as sesquiterpene lightly oxidized compounds with high peaks of C14H22OH + , C14H24OH + , C15H22OH + , and C15H24OH + , similar to Factor L10 in the Landes forest. The time series of this factor show strong correlations with the lightly oxidized products of sesquiterpenes ( Fig. 10; r 2 > 0.88).

Factor S7: sesquiterpenes
Similar to Factor L12 in the Landes forest, this factor is characterized by the big peak of C15H24H + , demonstrating the 380 dominance of sesquiterpenes in the factor. Figure 10 shows that this factor closely correlates with monoterpenes and sesquiterpenes, with r 2 being 0.73 and 0.85, respectively. Compared with the identification of Factor L12, representing sesquiterpenes in the Landes forest, the signal intensity of this factor at the SMEAR Ⅱ station is approximately three times lower. Including the lower signals of monoterpenes and isoprene, the results indicate weaker biogenic VOC emissions in the Factor S8: monoterpene more oxidized products including organic nitrates https://doi.org/10.5194/acp-2020-648 Preprint. Discussion started: 16 July 2020 c Author(s) 2020. CC BY 4.0 License.
Factor S8 is mainly composed of more oxidized compounds, particularly from monoterpene oxidation processes, including monoterpene-derived organic nitrates. The major peaks are shown in Fig. 11. Mixed with monoterpene-derived organic nitrates, this factor of more oxidized compounds displays a small morning peak at 8:00 and generally high signals during daytime (Fig. 12). Utilizing non-negative matrix factorization analysis on iodide-adduct CIMS data at the SMEAR Ⅱ 390 station, Lee et al. (2018) found that the gas-phase organic species subgroup of C6-10HyO≥7 showed distinct daytime diel trends. Yan et al. (2016) conducted source apportionment of HOMs at the SMEAR Ⅱ station and separated various HOM formation pathways, such as monoterpene ozonolysis and monoterpene oxidation initiated by NO3 radical. Unfortunately, due to the similar time series of monoterpene more oxidized compounds and monoterpene-derived organic nitrates, these different formation mechanisms cannot be separated in this study. For example, the time series of C10H15NO5H + correlate well with 395 those of C10H16O4H + and C10H16O5H + (r 2 > 0.61).

Factor S9: unknown source
The marker peaks of Factor S9 are mainly high cyclic volatile methyl siloxanes (VMSs) and unidentified compounds ( Fig. 11), i.e., protonated D3 siloxane, D4 siloxane, and their H3O + cluster ions. In addition to cosmetics and personal care products, siloxanes can also be emitted by silicone oils (Schweigkofler et al., 1999), which have been widely used in instrument 400 pumps (Gonvers et al., 1985). In this study, the temporal behaviors of Factor S9 are contributed by high background signals and present a very regular diurnal cycle with higher signal intensities during daytime and lower ones at night, which basically follow the variations in ambient temperature. Therefore, we speculate that Factor S9 is mainly caused by emissions from silicone oil pumps used by several instruments in the container, and these emissions are influenced by daily temperature changes. 405

Comparison between the two forests
To give an overview of the source distributions in the two forest ecosystems, we calculated the mass fraction of each factor based on their average signal intensities. We acknowledge that it is not a perfect method to quantify the contributions of various sources and formation processes. The sensitivities of different VOCs measured by the PTR instruments may vary by a factor of 2-3 Yuan et al., 2017). The uncertainties can come from the challenge to convert the signal intensity 410 to atmospheric concentrations because of problematic calibrations, especially given that many unknown molecules exist in the mass spectra. The major bins at m/z 81 Th and 137 Th, which were initially excluded to perform PMF analysis, were counted into their corresponding factors. For example, the signals of the discarded bins at m/z 81 Th and 137 Th were estimated by multiplying their isotope signals by the corresponding scale number and added to the factor representing monoterpenes. The average mass fractions of various PMF factors in total measured organic vapors are shown in Fig. 13. 415 While the atmospheric environment and ecosystem processes differ markedly in the Landes forest and the southern Finnish boreal forest, the results of this study reveal similar biogenic sources and oxidation processes in these forest environments. For instance, the biogenic VOCs at the two sites are both dominated by monoterpenes, with the average fractions of 29% in the Landes forest and at the SMEAR Ⅱ station. These two forests are both characterized by pine trees, with dominant https://doi.org/10.5194/acp-2020-648 Preprint. Discussion started: 16 July 2020 c Author(s) 2020. CC BY 4.0 License. emissions of α-pinene and β-pinene (Riba et al., 1987;Simon et al., 1994;Hellén et al., 2018). According to the PMF results, 420 isoprene and its major oxidation products in these environments (mainly C4H6O) contribute 14% and 21% in the two ecosystems, respectively. Factors indicative of sesquiterpenes are identified in the high mass range at both sites. The average contribution of sesquiterpenes is much smaller than that of monoterpenes and isoprene. Figure 14 presents the comparison between the factor profiles of common sources identified at both sites, where the fractions of different bins in the mass spectra of the factors are plotted. As shown in Fig. 14c, d, and f, the mass spectra of the factors indicative of monoterpenes, isoprene 425 and its oxidation products, and sesquiterpenes, match quite well between the two forests, particularly for the dominant mass bins in the factor profile.
Terpenes undergo varying degrees of oxidations in the atmosphere and produce a large variety of organic compounds with different volatilities (Donahue et al., 2012;Ehn et al., 2014). With the sub-range PMF analysis performed in this study, terpene reaction products with varying oxidation degrees are successfully separated. The sources of monoterpene lightly 430 oxidized products, sesquiterpene lightly oxidized products, more oxidized compounds, and monoterpene-derived organic nitrates are identified in both forests with distinct characteristics. For the lightly oxygenated compounds from monoterpene or sesquiterpene reactions, they present similar temporal behaviors at the two sites, with a small morning peak and increased signal intensities at night. The mass spectra of these factors show high similarities in the two forests ( Fig. 14e and g).
Monoterpene-derived organic nitrates are mainly characterized by a distinct morning peak at 9:00. With the active 435 photochemical processes during daytime, more oxidized reaction products have a broad high distribution throughout the day.
At the SMEAR Ⅱ station, more oxidized compounds are mixed together with monoterpene organic nitrates and resolved into a single factor. Therefore, the mass spectra comparison between the two forests are more scattered for the factors of monoterpene-derived organic nitrates and more oxidized compounds. Overall, these common sources and their similar characteristics indicate the similar atmospheric chemical processes in the two forest ecosystems. 440

Concluding remarks
In this study, we conducted Vocus PTR-TOF measurements in two forest environments and performed binPMF analysis on these complex mass spectra. In addition to VOC species, Vocus PTR-TOF is able to measure large amounts of oxygenated VOCs with enhanced detection efficiency. According to the results in this work, factor analysis on Vocus PTR-TOF mass spectra separated VOC precursors and their reaction products with varying oxidation degrees into different factors. These 445 factors showed distinct characteristics in the atmosphere. Comparatively, the conventional PTR instruments or gas chromatograph-mass spectrometry (GC-MS) largely detect VOC precursors of low-mass molecules (Dewulf et al., 2002;de Gouw et al., 2007). Previous source apportionment studies on these datasets mainly identified primary biogenic and anthropogenic emission sources (Vlasenko et al., 2009;Patokoski et al., 2014;Baudic et al., 2016;Debevec et al., 2017;Sarkar et al., 2017;Wang et al., 2020). Recently, factorization methods have been applied on NO3 -CIMS dataset to identify various 450 atmospheric formation pathways of HOMs (Yan et al., 2016;Massoli et al., 2018;Zhang et al., 2019b). Here, for the first time, https://doi.org/10.5194/acp-2020-648 Preprint. Discussion started: 16 July 2020 c Author(s) 2020. CC BY 4.0 License. source apportionment of Vocus PTR-TOF data identified various primary emission sources and secondary formation pathways of atmospheric organic vapors, highlighting the novelty of Vocus PTR-TOF in measuring both VOCs and oxygenated VOCs and providing new perspectives to understand gas-phase chemical processes.
Compared with VOC species, VOC reaction products are generally present in much smaller amounts in the atmosphere. 455 Therefore, utilizing a sub-range PMF analysis, or other similarly weighting method, is particularly important for Vocus PTR-TOF observations, where several orders of magnitude differences are expected between VOC precursors and their oxidation products. Compared with the low mass range, the average contributions of the high mass range in total signals are significantly smaller, 2% and 9%, in the Landes forest and at the SMEAR Ⅱ station, respectively. However, the identified sources in the high mass range, such as sesquiterpenes, sesquiterpene lightly oxidized products, monoterpene-derived organic nitrates, and 460 more oxidized compounds, provide crucial insights into atmospheric physicochemical processes.
To summarize, this study successfully performed binPMF analysis on sub-ranges of mass spectrometry dataset acquired with a Vocus PTR-TOF in two European forest ecosystems, the Landes forest and a southern Finnish boreal forest. Similar sources and formation pathways of organic vapors were identified in the two environments, particularly for terpenes and their reaction products with varying oxidation degrees (including organic nitrates). With the broad coverage of various organic 465 vapors measured by Vocus PTR-TOF, this study provides a more comprehensive picture of gas-phase source identifications in the European forest ecosystems, covering both primary emissions and secondary oxidation processes.
Data Availability. The time series of the measured trace gases, meteorological parameters, and the concentrations of isoprene and monoterpenes in the Landes forest and at the SMEAR Ⅱ station are available from https://doi.org/10.5281/zenodo.3946644 (Li, 2020). 470 Author contributions. HL and ME conceived the study. HL, MR, ST, LH, PMF, EV, and EP conducted the field measurements. HL carried out the data analysis. MC, YZ, ME, and FB participated in the discussions on data analysis. HL wrote the paper with inputs from all coauthors.