Atmospheric organic vapors in two European pine forests measured by a Vocus PTR-TOF: insights into monoterpene and sesquiterpene oxidation processes

15 Atmospheric organic vapors play essential roles in the formation of secondary organic aerosol. Source identification of these vapors is thus fundamental to understand their emission sources and chemical evolution in the atmosphere and their further impact on air quality and climate change. In this study, a Vocus proton-transfer-reaction time-of-flight mass spectrometer (PTR-TOF) was deployed in two forested environments, the Landes forest in southern France and the boreal forest in southern Finland, to measure atmospheric organic vapors, including both volatile organic compounds (VOCs) and their oxidation 20 products. For the first time, we performed binned positive matrix factorization (binPMF) analysis on the complex mass spectra acquired with the Vocus PTR-TOF and identified various emission sources as well as oxidation processes in the atmosphere. Based on separate analysis of lowand high-mass ranges, fifteen PMF factors in the Landes forest and nine PMF factors in the Finnish boreal forest were resolved, showing a high similarity between the two sites. Particularly, terpenes and various terpene reaction products were separated into individual PMF factors with varying oxidation degrees, such as lightly oxidized 25 compounds from both monoterpene and sesquiterpene oxidations, monoterpene-derived organic nitrates, and monoterpene more oxidized compounds. Factors representing monoterpenes dominated the biogenic VOCs in both forests, with less contributions from the isoprene factors and sesquiterpene factors. Factors of the lightly oxidized products, more oxidized products, and organic nitrates of monoterpenes/sesquiterpenes accounted for 8-12% of the measured gas-phase organic vapors in the two forests. Based on the interpretation of the results relating to oxidation processes, further insights were gained 30 regarding monoterpene and sesquiterpene reactions. For example, a strong relative humidity (RH)-dependence was found for the behavior of sesquiterpene lightly oxidized compounds. High concentrations of these compounds only occur at high RH,

yet similar behavior was not observed for monoterpene oxidation products. These findings highlight the need for further studies to delve into gas-phase atmospheric processes of monoterpenes and sesquiterpenes.
Therefore, the identification of these organic vapors and their sources is fundamental for understanding the effects of atmospheric aerosols on climate change and air quality (Schell et al., 2001;Maria et al., 2004). Large amounts of VOCs with varying physicochemical properties are emitted from both biogenic and anthropogenic sources (Friedrich et al., 1999;40 Kesselmeier et al., 1999), and their oxidation processes in the atmosphere can lead to the formation of thousands of structurally distinct products containing multiple functional groups (Hallquist et al., 2009;Wennberg et al., 2018). Due to the enormous challenge in characterizing these organic vapors at molecular level, knowledge of their sources or formation pathways has remained lacking.
Globally, SOA production from biogenic sources is much larger than that from anthropogenic sources (Tsigaridis and 45 Kanakidou, 2003). As a group of highly reactive gases, typically with one or more C = C double bounds, terpenes make up a major fraction of biogenic VOCs, including isoprene, monoterpenes, and sesquiterpenes (Guenther et al., 1995). In the atmosphere, they react with various oxidants, i.e., hydroxyl radical (OH), ozone (O3), and nitrate radical (NO3), and produce a large variety of oxygenated molecules. Isoprene is the most emitted biogenic VOC on the global scale but has a relatively small SOA yield (Ahlberg et al., 2017;McFiggans et al., 2019). Monoterpenes are important sources of SOA (Ehn et al., 2014;50 Zhang et al., 2018) and their oxidation processes have been found to play important roles in new particle formation (Kirkby et al., 2016;Simon et al., 2020). High ambient concentrations of monoterpenes have been observed in numerous pine forests (Hakola et al., 2012;Noe et al., 2012;Bourtsoukidis et al., 2014). While the concentrations of sesquiterpenes are generally much smaller than those of isoprene and monoterpenes (Sakulyanontvittaya et al., 2008;Sindelarova et al., 2014), sesquiterpenes could contribute significantly to SOA formation because of their reactivity and high aerosol yields (Calogirou 55 et al., 1999;Khan et al., 2017). Previous studies found that sesquiterpenes contributed to the O3 and OH reactivity in forest environments (Kim et al., 2011;Hellén et al., 2018). The recently developed Vocus proton-transfer-reaction time-of-flight mass spectrometry (PTR-TOF) enables the real-time detection of both VOCs and their oxidation products. With a new chemical ionization source called Vocus, the instrument significantly improves its detection efficiency of product ions compared with conventional PTR instruments . Based on a laboratory comparison of different chemical 60 ionization techniques, Riva et al. (2019) revealed showed that Vocus PTR-TOF is sensitive to a large range of oxygenated VOCs. With the deployment of a Vocus PTR-TOF in the French Landes forest, Li et al. (2020) observed various terpenes and terpene oxidation products, including a range of organic nitrates.

4
Characterization of Emissions and Reactivity of Volatile Organic Compounds in the Landes forest (CERVOLAND) campaign.
An overview of the Vocus PTR-TOF measurements in the Landes forest has been presented earlier by Li et al. (2020). The ambient observations at the SMEAR Ⅱ station were performed during 18 June -18 July 2019.
The Landes forest (44º29'N, 0º57'W) is the largest man-made pine forest in Europe, mainly filled with maritime pine trees (Pinus pinaster Aiton). The sampling site is situated at the European Integrated Carbon Observation System (ICOS) station at 100 Bilos. The nearest urban area of the Bordeaux metropole is around 40 km to the northwest. A more detailed description of the measurement site can be found elsewhere (Kammer et al., 2018;Bsaibes et al., 2020;Li et al., 2020). Ambient meteorological parameters, including temperature, relative humidity (RH), wind speed and direction, solar radiation, and pressure, and mixing ratios of nitrogen oxides (NO and NO2) and O3 were continuously monitored at the station throughout the campaign.
The SMEAR Ⅱ station (61º51'N, 24º17'E) is located in a boreal mixed-coniferous forest in Hyytiälä, southern Finland 105 (Hari and Kulmala, 2005). The site is dominated by a rather homogeneous Scots pine (Pinus sylvestris L.) stand and represents a rural background measurement station. The nearest large city Tampere, located about 60 km to the southwest, has approximately 200 000 inhabitants. The station is equipped with extensive facilities to measure forest ecosystem-atmosphere interactions. Ambient meteorological parameters (i.e., global radiation, UVA, UVB, temperature, RH, pressure, and wind speed and direction), mixing ratios of various trace gases (i.e., carbon dioxide, carbon monoxide, sulfur dioxide, NOx, and O3), 110 and particle concentration and size distribution, are continuously measured at the station.

Instrumentation
A Vocus PTR-TOF was deployed in both forest ecosystems to characterize atmospheric organic vapors. Equipped with a new chemical ionization source with a low-pressure reagent-ion source and focusing ion-molecule reactor (FIMR), the Vocus PTR-TOF is able to measure organic vapors with a wide range of volatilities Riva et al., 2019;Li et al., 115 2020). A quadrupole radio frequency (RF) field inside the FIMR focuses ions to the central axis and improves the detection efficiency of product ions. Compared with conventional PTR instruments, the sensitivity and detection efficiency of Vocus PTR-TOF are significantly improved (detection limit < 1 pptv). With a high water mixing ratio (10% v/v-20% v/v) in the FIMR, the instrument shows no humidity dependence for sensitivity. More instrumental details have been provided elsewhere Li et al., 2020). 120 During both campaigns, we operated the Vocus ionization source at a pressure of 1.5 mbar. Sample air was drawn in through a ~1-m-long PTFE tubing (10 mm o.d., 8 mm i.d.). A sample air flow of 4.5 L min -1 was used to reduce inlet wall losses and sampling delay. Around 100-150 ccm of this flow was sampled into the Vocus and the remainder was directed to the exhaust. The mass resolving power of the long TOF mass analyzer was 12 000 -13 000 m Δm -1 during our measurements.
Data were recorded with a time resolution of 5 s. During the campaign in the Landes forest, background checks were 125 automatically performed every hour using ultra-high-purity nitrogen (UHP N2). The instrument was calibrated twice a day using a mixture of terpenes (α-/β-pinene+limonene; p-cymene). For measurements at the SMEAR Ⅱ station, background measurements by injection of zero air using a built-in active carbon filter and quantitative calibrations with a multi-component standard cylinder were automatically conducted every three hours. All the m/z ratios mentioned in this work include the contribution from the charger ion (H + , mass of 1 Th) unless stated otherwise. 130

binPMF data preparation and analysis
As described by Zhang et al. (2019a), binPMF divides the mass spectra into small bins and then takes advantage of PMF analysis to separate different sources or formation processes. The binPMF allows utilization of the high-resolution information of the complex mass spectra without the time-consuming and potentially error-prone steps of peak identification and peak fitting before the factorization. Selected peaks of interest can be analyzed after binPMF, based on the output factors. PMF 135 assumes that factor profiles are constant and unique, and the measured signal of a chemical component is a linear combination of different factors. This approach does not require a priori information about the factors. The detailed working principle of PMF has been provided in numerous previous studies (Paatero and Tapper, 1994;Zhang et al., 2011;Yan et al., 2016).
To prepare the data and error matrices for PMF input, the Vocus PTR-TOF data were processed using the software package "Tofware" (v3.2.0; Tofwerk), which runs in the Igor Pro environment (WaveMetrics, OR, USA). The detailed data 140 processing routines have been presented elsewhere (Stark et al., 2015). Signals were averaged over 30 min for data processing.
Unlike traditional UMR or HR fitting of the mass spectra, in binPMF analysis, the mass spectra were divided into small bins after mass calibration. Due to the greater mass resolving power of the TOF mass analyzer compared with former binPMF studies (Zhang et al., 2019a(Zhang et al., , 2019b, a bin width of 0.01 Th was applied in this study. At a nominal mass N, signals between N-0.15 and N+0.35 Th were included for binning. The error matrix was calculated to include uncertainty from counting 145 statistics following Poisson distribution and instrument electronic noise, as described by Yan et al. (2016) and Zhang et al. (2019a). The electric noise was estimated as the median of the standard deviation of binned noise signals between two nominal masses, with noise range between N+0.4 and N+0.6 Th. Figure 1 shows the average mass spectra of the measurements in the Landes forest as an example. Since the signal intensity of larger molecules is generally much lower than that of low-mass molecules, we divided the mass spectra into two sub-ranges, 150 the low mass range (51-200 Th) and the high mass range (201-320 Th). Factor analysis was separately performed on these two sub-ranges using an Igor-based open-source PMF Evaluation Tool (PET; http://cires1.colorado.edu/jimenezgroup/wiki/index.php/PMF-AMS_Analysis_Guide#PMF_Evaluation_Tool_Software). We ran the PMF up to ten factors for both sub-ranges. For the low mass range of 51-200 Th, the signals at m/z 81 Th (C6H8H + , monoterpene fragment) and 137 Th (C10H16H + , monoterpenes) were markedly higher than the others. Because PMF assumes that the data matrix can be explained 155 by a linear combination of different factors, even a very tiny fraction of these high peaks is split into a factor, they may dominate the mass profile of the factor. As shown in Fig. S1, wWith the inclusion of C6H8H + and C10H16H + these ions, the mass profiles of several factors were quite similar and dominated by these peaks (Fig. S1). Therefore, the major mass bins of these ions were excluded for further PMF analysis, but their corresponding isotopes were retained, effectively downweighting their contributions to the PMF result. This simple approach by removing the main peaks of the largest signals produced factors that relative contribution of different factors, the signals of these removed mass bins were counted back into their corresponding factors. More details can be found in Sect. 4.4.

1.92
Measurements of NO3 concentration is challenging. The concentration of NO3 radical was calculated by assuming a steady state between its production from O3 and NO2 and its removal by oxidation reactions and losses in the atmosphere.
Details can be found in Allan et al. (2000) and Peräkylä et al. (2014). 170 3 Dataset overview Figure 2 shows the temporal behaviors of temperature, global radiation, concentrations of O3 and NOx, and concentrations of isoprene and monoterpenes in the Landes forest and at the SMEAR Ⅱ station. In the Landes forest, the weather was mainly sunny during the observation period (global radiation > 400 W m -2 ), indicating strong photochemical activity. The air mass in the forest was largely influenced by local sources, with wind speeds below canopy lower than 1 m s -1 over the whole campaign. 175 The O3 concentration fluctuated dramatically between day and night, with the average daytime concentration peaking up to 50 ppb and the average nighttime level falling below 2 ppb (Li et al., 2020). The low O3 concentration at night was probably to some extent caused by its titration by monoterpenes ( Fig. 2a; Kammer et al., 2018Kammer et al., , 2020. The Landes forest is known for strong monoterpene emissions (Simon et al., 1994). During our measurements, the average mixing ratios of isoprene and monoterpenes were 0.6 ppb and 6.0 ppb, respectively. More details about this dataset can be found in Li et al. (2020). All data 180 in the Landes forest are reported in local time and all data at the SMEAR Ⅱ station in Finnish winter time (both equal UTC time + 2).
During the measurements at the SMEAR Ⅱ station, 84% (26 out of 31) of the days had strong photochemistry (global radiation > 400 W m -2 ), with the rest being cloudy days. The diurnal variation in O3 concentration was not as dramatic as that in the Landes forest. In the daytime, the O3 concentration sometimes reached up to 50 ppb. At night, the O3 level still largely 185 remained high, above 20 ppb, in contrast to the observations in the Landes forest. A possible explanation is less nighttime O3 consumption by terpenes at the SMEAR Ⅱ station. On average, the mixing ratios of isoprene and monoterpenes were 0.2 ppb and 0.8 ppb, respectively, during the measurements, much lower than those in the Landes forest.

Choice of PMF solution and factor interpretation 190
To interpret the PMF results, the most critical decision is to choose the best number of factors. More factors introduce more degrees of freedom to explain variations in the data, but too many factors may cause splitting of real factors and lead to mathematical artifacts without physical meaning (Ulbrich et al., 2009). The factor interpretation results in this work are summarized in Table 1. In the factor name, L means the Landes forest and S means the SMEAR Ⅱ station.
For the low mass range of the Landes forest dataset, the Q/Qexp varied from 15.5 to 6.0 for two to ten factors (Q is the 195 total sum of the squares of the scaled residuals for PMF solutions). The larger Q/Qexp indicates underestimation of the errors or high residuals for some bins that cannot be simply modeled by the solution (Ulbrich et al., 2009). After seven factors, increasing the factor number does not significantly decrease the Q/Qexp (step change < 7%). The optimal solution of seven factors was chosen by evaluating the variations of Q/Qexp vs. varying factor number, the distribution of the scaled residuals for each m/z, sum of the squares of scaled residuals, factor mass profile, factor time series and diurnal cycles, and also signs of 200 split factors (Ulbrich et al., 2009;Zhang et al., 2011). after a detailed evaluation following the procedures proposed by Ulbrich et al. (2009) and Zhang et al. (2011). Figure S2 shows the distribution of scaled residuals as a function of m/z. For some bins the residuals are still high (the scaled residuals as high as ±200). The seven factors include Factor L1 closely related to the C4H8H + ion, Factor L2 attributed to a plume event occurring on a single night during the campaign, Factor L3 mainly containing lightly oxidized compounds with six or seven carbon atoms ("C6" or "C7"), Factor L4 representing monoterpenes, Factor L5 205 indicative of isoprene and its oxidation products, Factor L6 identified as unknown source with large contributions from unknown peaks, and Factor L7 dominated by monoterpene lightly oxidized compounds. The direct comparison of the mass spectra, time series, and diurnal cycles of six-factor and eight-factor solutions are shown in Fig. S3 and Fig. S4. In the sixfactor case, the C4H8H + ion-related factor cannot be separated. With eight-factor results, the factor representing isoprene and its oxidation products is split into two components with similar time series. For the high mass range of the Landes forest dataset, 210 the Q/Qexp decreased from 2.5 to 0.9 for two to ten factors. After evaluation, we choose the eight-factor solution to explain the data. The Q/Qexp value of the eight-factor solution was 1.1 and the decreasing trend in Q/Qexp obviously slowed down after eight factors. The distribution of scaled residuals as a function of m/z for the eight-factor solution is shown in Fig. S5. The eight factors are interpreted as Factor L8 dominated by lightly oxygenated compounds containing 13 carbon atoms ("C13"), Factor L9 attributed to a plume event occurring on a single night during the campaign, Factor L10 mainly related to 215 sesquiterpene lightly oxidized compounds, Factor L11 representing more oxidized products mainly from monoterpene oxidations, Factor L12 indicating sesquiterpenes, Factor L13 largely composed of monoterpene-derived organic nitrates, Factor L14 mainly containing oxidized compounds with twelve, fourteen or sixteen carbon atoms ("C12", "C14" or "C16")and Factor L15 as unknown source largely contributed by siloxane compounds. Figure S6 and Figure S7 display the mass spectra, time series, and daily variations of seven-factor and nine-factor solutions. In the seven-factor case, monoterpene more oxidized 220 products and monoterpene-derived organic nitrates are mixed together into a single factor. However, in the nine-factor solution, the unknown factor mainly composed of siloxane compounds is split into two factors with similar mass profiles and similar diurnal trends.
For the SMEAR Ⅱ dataset, the optimal solutions of five-factor and four-factor are chosen for the low and high mass ranges, respectively. The Q/Qexp varied from 7.2 to 2.5 for two to ten factors in the low mass range and from 2.0 to 1.0 for two 225 to ten factors in the high mass range. The five factors for the low mass range are identified as Factor S1 -C4H8H + ion-related, Factor S2 -monoterpenes, Factor S3 -lightly oxidized compounds with six to nine carbon atoms, Factor S4 -isoprene and its oxidation products, and Factor S5 -monoterpene lightly oxidized compounds. The mass spectra, time series, and diurnal profiles of the four-factor and six-factor solutions for the low mass range are presented in Fig. S8 and Fig. S9. For the fourfactor solution, monoterpene lightly oxidized products are not separated as a single factor and mixed into the others. In the six-230 factor case, the factor indicative of monoterpene lightly oxidized products is split into two factors. The four factors for the high mass range include Factor S6 -sesquiterpene lightly oxidized products, Factor S7 -sesquiterpenes, Factor S8 -more oxidized compounds, and Factor S9 -unknown source. The direct comparison of the mass spectra, time series, and diurnal variations of three-factor and five-factor solutions are shown in Fig. S10 and Fig. S11. The three-factor solution does not identify a factor representing sesquiterpenes. In the five-factor case, the factor of unknown source mainly contributed by siloxane compounds 235 is split into two factors with similar mass profiles.
The rotational freedom of the PMF solutions was explored through the use of the FPEAK parameters. For each of the optimal solutions, we varied the FPEAK values between -1 and +1 with the step of 0.2. For the low mass ranges of the Landes and SMEAR Ⅱ dataset, the varying FPEAK values did not change the factor profiles and time series much, indicating that varying FPEAK values from -1 to +1 did not affect the overall results of PMF analysis. For the high mass range of the Landes 240 measurements, we saw variations especially in the factor profiles by varying FPEAK values. But after a detailed evaluation, we found no evidence that solutions with FPEAK values away from zero were preferable. However, for the high mass range of the SMEAR Ⅱ measurements, the solutions with positive values of FPEAK worked better than that with FPEAK = 0 in terms of factor profiles. The factor time series were similar when FPEAK values varied. But for the factor profiles with positive FPEAK values, the factor of monoterpene more oxidized products including organic nitrates contained less traces of siloxanes 245 and showed elevated fractions of the corresponding fingerprint peaks (Fig. S12). As discussed in Sect. 4.3, these siloxanes can come from cosmetics and personal care products, and silicone oils used in instrument pumps. The temporal variations of these siloxanes differed significantly from those of monoterpene more oxidized products. After evaluation, we chose the solution with FPEAK = +0.6 for the high mass range of the SMEAR Ⅱ dataset, where siloxanes feature more in one factor.  Fig. 6. The high-resolution peak fitting was further performed on the mass profile to identify the fingerprint peaks in the factors. Fingerprint peaks are defined largely based on their distribution in the factors rather than their absolute intensity in the mass profile. The correlation map of each factor with various compounds is shown in Fig. S13. 255

Low mass range (51-200 Th)
Factor L1: C4H8H + ion-related Factor L1 shows irregular diurnal variations with spiky peaks in the time series (Fig. 4b). The major bins that are largely distributed into this factor are C4H8H + and C4H10O2H + . Factor L1 closely correlates with these fingerprint peaks. Considering the high signal intensity of C4H8H + ion and its large contribution to this factor, we name Factor L1 as C4H8H + ion-related. 260

Factor L2: A plume event
Factor L2 is identified as a plume event occurring on a single night during the campaign. As shown in Fig. 4a, the time series of this factor are characterized by much higher intensities at midnight of 9 July 2018 than over the other days. Fingerprint peaks in this factor are aromatic compounds such as C6H6H + , C7H6H + , and C6H6OH + . Factor L2 is well correlated with benzene and phenol (r 2 = 0.88; Fig. S13). 265

Factor L3: C6 and C7 lightly oxidized products
The diurnal cycle of Factor L3 exhibits a small morning peak at 9:00 and significantly elevated intensities during nighttime, peaking at around 22:00 (Fig. 4b). As illustrated in the mass profile of Factor L3, this factor is mainly composed of lightly oxidized compounds containing six or seven carbon atoms such as C6H10OH + , C7H10OH + , C6H10O2H + , and C7H12O2H + .

Factor L4: monoterpenes 270
The mass profile of Factor L4 is dramatically characterized by a monoterpene peak ( 13 CC9H16H + ) and its major fragments (i.e., 13 CC5H8H + and C7H8H + ) inside the instrument. As shown in Fig. 4b, the diurnal variation of this factor follows a similar pattern to that of monoterpenes (Li et al., 2020). The signal intensity of the factor starts to increase at 20:00, peaks at midnight, and then drops to around the detection limit during daytime. Monoterpene emissions are mainly influenced by temperature Kaser et al., 2013). Therefore, with the continuous emissions of monoterpenes and the shallow boundary 275 layer at night, the signal intensities of monoterpenes are observed to be elevated. The signal of C10H16OH + is also mostly resolved into this factor. C10H16O could be primary emissions of oxygenated monoterpenes or monoterpene oxidation products (Kallio et al., 2006;McKinney et al., 2011). Previous ambient observation has demonstrated that the atmospheric behavior of C10H16O has high similarity to that of monoterpenes (Li et al., 2020).

Factor L5: isoprene and its oxidation products 280
The marker peaks in Factor L5 are highly dominated by isoprene and its major oxidation products in the atmosphere, i.e., C5H8H + and C4H6OH + (Wennberg et al., 2018). Isoprene emissions strongly depend on light intensity (Monson and Fall, 1989;Kaser et al., 2013) and generally show high concentrations in the day. Similarly, the daily variations of Factor L5 display maximum signal during daytime and minima at night.

Factor L6: unknown source 285
Factor L6 is characterized by increased signals in the afternoon. The major peaks in its factor profile are C6H4O2H + , C6H6O3H + , and numerous unidentified peaks with negative mass defect. As this factor is clearly separated as a single source with high signals during our observations and the molecule markers remain unidentified, we name this factor as an unknown source.
Factor L7: monoterpene lightly oxidized products 290 Fingerprint peaks in this factor are monoterpene oxidation products with oxygen number from one to three, such as C9H14OH + , C10H14OH + , C10H16O2H + , and C10H16O3H + . This factor displays clear morning and evening peaks, similar to the behavior of these lightly oxidized compounds (Li et al., 2020).

High mass range (201-320 Th)
Factor L8: C13 lightly oxidized products 295 The mass profile of Factor L8 is characterized by high peaks of lightly oxidized compounds containing 13 carbon atoms, like C13H18O2H + and C13H20O3H + . Similar to C6 and C7 lightly oxidized compounds, this factor shows a morning peak at 9:00 and an evening peak at around midnight (Fig. 6b).

Factor L9: A plume event
Factor L9 is characterized with much higher intensities on a single night (9 July 2018) during the campaign (Fig. 6a). 300 Fingerprint peaks in the mass profile of Factor L9 are numerous unidentified ions. The time series of Factor L9 correlate tightly with aromatic compounds C6H6 and C6H6O (r 2 = 0.75).

Factor L10: sesquiterpene lightly oxidized products
The fingerprint peaks identified in this factor are C15H22OH + , C15H24OH + , C15H22O2H + , C15H24O2H + , and C15H24O3H + , which are typical reaction products from sesquiterpene oxidations (Fu et al., 2009;Yee et al., 2018). The signal intensity of 305 this factor is generally high during nighttime, but shows another morning peak at 8:00. In addition to the production from sesquiterpene oxidation processes, C15H22O and C15H24O can be oxygenated sesquiterpene alcohols and aldehydes directly emitted from vegetation (Kännaste et al., 2014).

Factor L11: monoterpene more oxidized products
The mass spectrum of this factor is mainly characterized by more oxidized compounds from monoterpene oxidations such 310 as C10H16O4H + , C10H14O5H + , C10H16O5H + , and C10H16O6H + . As shown in Fig. S13, the time series of Factor L11 show good correlations with these compounds. Compared with monoterpene lightly oxidized compounds, the diurnal cycle of this factor shows a broad daytime distribution peaking between 14:00 and 20:00, caused by strong and complex photochemical reactions during the day.
Factor L12: sesquiterpenes 315 The mass spectra of Factor L12 are clearly dominated by a big single peak of C15H24H + , indicating the influence of sesquiterpenes. Sesquiterpene emissions from plants are found to exhibit a strong dependence on temperature . Therefore, similar to the diurnal cycle of Factor L4, this factor shows prominently enhanced signals during nighttime.

Factor L13: monoterpene-derived organic nitrates
The signal intensity of this factor starts to increase in the early morning (around 7:00) and presents a distinct morning 320 peak at 9:00. In addition, a much smaller evening peak is observed at 21:00. The daily variations of this factor are quite similar to those of monoterpene-derived organic nitrates measured in the Landes forest (Li et al., 2020). Consistently, the major peaks in the factor profile are C10H15NO4H + , C10H15NO5H + , C9H13NO6H + , and C10H15NO6H + , indicating the dominant contribution of organic nitrates formed from monoterpene oxidation processes.

Factor L15: unknown source 330
The mass profile of Factor L15 is predominantly characterized by high cyclic volatile methyl siloxanes (VMSs) peaks and some unidentified peaks (Fig. 5). The major cyclic VMSs are protonated D3 siloxane, D4 siloxane, and their H3O + cluster ions, which have been widely used in cosmetics and personal care products (Buser et al., 2013;Yucuis et al., 2013). The diurnal cycle of this factor shows a bit higher intensity during daytime but also big background signals at night. A similar factor has also been identified at the SMEAR Ⅱ station. More detailed discussions can be found in Sect. 4.3.2. 335

Source identification in the southern Finnish boreal forest
The factor profiles, time series, and diurnal cycles of the five-factor solution for the low mass range are presented in Fig. 7 and Fig. 8. Figure 9 and Figure 10 present the mass spectra, time series, and daily variations of the four factors identified in the higher mass range at the SMEAR Ⅱ station. The correlation coefficients among each factor and various fingerprint compounds can be found in Fig. S14. 340

Low mass range (51-200 Th)
Factor S1: C4H8H + ion-related Similar to the source identification in the Landes forest, a factor related to C4H8H + ion is clearly resolved at the SMEAR Ⅱ station. The major peaks in this factor are C4H8H + , C4H12O2H + , and C4H14O3H + .

Factor S2: monoterpenes 345
A factor representing monoterpenes is also identified at the SMEAR Ⅱ station, with fingerprint peaks of 13 CC5H8H + , C7H10H + , and 13 CC9H16H + . Monoterpenes undergo some degree of fragmentation within PTR instruments, and C6H8H + and C7H10H + have been observed to be the major fragments of monoterpenes (Tani et al., 2003;Tani, 2013;Kari et al., 2018). The signal intensity of monoterpenes at the SMEAR Ⅱ station is much lower than that in the Landes forest.

Factor S3: C6-C9 lightly oxygenated compounds
The mass profile of Factor S3 is characterized by lightly oxygenated compounds with carbon atoms varying from six to nine (C6-C9) such as C6H10OH + , C6H12OH + , C7H10OH + , C8H14OH + , and C9H18OH + . The signal intensity of this factor shows high peaks at night and low appearance during daytime. These lightly oxygenated molecules can be directly emitted from anthropogenic and biogenic sources or come from oxidation processes of various VOC precursors (Conley et al., 2005;Pandya et al., 2006;Rantala et al., 2015;Hartikainen et al., 2018). For instance, C7H10O has been found from direct soil emissions 355 (Abis et al., 2020) or oxidation processes of 1,2,4-trimethyl benzene (Mehra et al., 2020). Therefore, we expect the molecules in this factor to be either directly emitted or as oxidation products of forest emissions.

Factor S4: isoprene and its oxidation products
At the SMEAR Ⅱ station, a factor largely composed of isoprene and its oxidation products is also resolved. The outstanding peaks in the factor profile are C5H8H + , C4H6OH + , C4H8O2H + , and C5H8O2H + . The signal intensity of this factor is 360 around ten times lower than that of Factor L5 measured in the Landes forest. Similar to previous isoprene observations at the sampling site (Hakola et al., 2012), this factor shows a broad daytime peak and low signals at night.

Factor S5: monoterpene lightly oxidized products
Similar to Factor L7 identified in the Landes forest, this factor is characterized by major peaks of monoterpene lightly oxidized compounds, as shown in Fig. 7. The signal intensity of this factor starts to increase at 20:00 and presents an obvious 365 morning peak at 7:00.

Factor S6: sesquiterpene lightly oxidized products
This factor is identified as sesquiterpene lightly oxidized compounds with high peaks of C14H22OH + , C14H24OH + , C15H22OH + , and C15H24OH + , similar to Factor L10 in the Landes forest. The time series of this factor show strong correlations 370 with the lightly oxidized products of sesquiterpenes ( Fig. S14; r 2 > 0.88).

Factor S7: sesquiterpenes
Similar to Factor L12 in the Landes forest, this factor is characterized by the big peak of C15H24H + , demonstrating the dominance of sesquiterpenes in the factor. Figure S14 shows that this factor closely correlates with monoterpenes and sesquiterpenes, with r 2 being 0.73 and 0.85, respectively. Compared with the identification of Factor L12, representing 375 sesquiterpenes in the Landes forest, the signal intensity of this factor at the SMEAR Ⅱ station is approximately three times lower. Including the lower signals of monoterpenes and isoprene, the results indicate weaker biogenic VOC emissions in the Hyytiälä boreal forest than in the Landes forest.

Factor S8: monoterpene more oxidized products including organic nitrates
Factor S8 is mainly composed of more oxidized compounds, particularly from monoterpene oxidation processes, 380 including monoterpene-derived organic nitrates. The major peaks are shown in Fig. 9. Mixed with monoterpene-derived organic nitrates, this factor of more oxidized compounds displays a small morning peak at 8:00 and generally high signals during daytime (Fig. 10).

Factor S9: unknown source
The marker peaks of Factor S9 are mainly high cyclic volatile methyl siloxanes (VMSs) and unidentified compounds 385 ( Fig. 9), i.e., protonated D3 siloxane, D4 siloxane, and their H3O + cluster ions. In addition to cosmetics and personal care products, siloxanes can also be emitted by silicone oils (Schweigkofler et al., 1999), which have been widely used in instrument pumps (Gonvers et al., 1985). In this study, the temporal behaviors of Factor S9 are contributed by high background signals and present a very regular diurnal cycle with higher signal intensities during daytime and lower ones at night, which basically follow the variations in ambient temperature. Therefore, we speculate that Factor S9 is mainly caused by emissions from 390 silicone oil pumps used by several instruments in the container, and these emissions are influenced by daily temperature changes.

Comparison among different factors
The monoterpene factor and sesquiterpene factor correlate very well with each other at both sites ( Fig. 11; r 2 = 0.69 in the Landes forest and r 2 = 0.59 at the SMEAR Ⅱ station). The emissions of monoterpenes and sesquiterpenes are both strongly 395 influenced by temperature. Their signals peak at night with the effect of the shallow boundary layer. In the daytime, the low signals of the monoterpene and sesquiterpene factors are likely a combination of enhanced atmospheric mixing after sunrise and the rapid photochemical consumption of monoterpenes and sesquiterpenes. The signal of monoterpene factor is around 15 times higher than that of sesquiterpene factor at the SMEAR Ⅱ station while it is around 60 times in the Landes forest. Previous studies found that sesquiterpene emissions from pines, spruces, and birches under normal conditions were 5-15% of total 400 monoterpene emissions by mass (Rinne et al., 2009 and references therein), in line with our observations. In the Landes forest, a factor of C6 and C7 lightly oxidized products (Factor L3) was resolved in the low mass range and a factor representative of C13 lightly oxidized products (Factor L7) was identified in the high mass range. Interestingly, these two factors show a close correlation with each other (r 2 = 0.64). The C6 oxygenated compounds have been observed during the oxidation processes of benzene and C7 oxygenated compounds from toluene oxidations (Sato et al., 2012;Zaytsev et al., 405 2019). These compounds can also be directly emitted from biogenic or anthropogenic sources (Conley et al., 2005;Pandya et al., 2006;Rantala et al., 2015). The temporal behaviour of Factor L7 is similar to that of Factor L3, indicating potentially similar formation pathways of these lightly oxygenated compounds. Therefore, the C13 oxidized compounds are speculated to be produced through the dimer formation mechanisms of C6 and C7 species (Valiev et al., 2019). In addition, C13H20O3 can be direct emissions of methyl jasmonate (Meja), which is a typical green leaf volatile used in plant-plant communications for 410 defensive purposes (Cheong and Choi, 2003). But considering the close correlation between Factor L3 and Factor L7, we conclude that these C13 lightly oxidized compounds are formed from atmospheric oxidation processes, not direct plant emissions.
Monoterpene lightly oxidized products and sesquiterpene lightly oxidized products were resolved as individual factors at both sites (Factor L7 vs. Factor L10 in the Landes forest and Factor S5 vs. Factor S6 at the SMEAR Ⅱ station). While the 415 diurnal variations of monoterpene lightly oxidized products are similar to those of sesquiterpene lightly oxidized products, their time series do not follow very well with each other, suggesting the probably different formation pathways or different factors influencing the atmospheric processes of monoterpenes and sesquiterpenes. More discussions can be found in Sect.

4.6.
In this study, the source apportionment analysis was performed separately on two subranges of the mass spectra. It can 420 happen that the same factor is identified in both subranges. For example, both Factor L2 and Factor L9 are defined as the plume event during the measurements. The time series of Factor L2 and Factor L9 show a high correlation coefficient of 0.93 and correlate tightly with aromatic compounds, indicating the major influence of anthropogenic sources. As mentioned above, the air masses in the Landes forest were relatively stable during our observations with wind speed below canopy < 1 m s -1 .
Therefore, the influence of long-range regional transport on the atmosphere in the forest is expected to be minor. We speculate 425 that the plume event is a result of local anthropogenic disturbances favored by the lower boundary layer height at night.

Comparison between the two forests
To give an overview of the source distributions in the two forest ecosystems, we calculated the mass fraction of each factor based on their average signal intensities. We acknowledge that it is not a perfect method to quantify the contributions of various sources and formation processes. The sensitivities of different VOCs measured by the PTR instruments may vary by a factor 430 of 2-3 Yuan et al., 2017). The uncertainties can come from the challenge to convert the signal intensity to atmospheric concentrations because of problematic calibrations, especially given that many unknown molecules exist in the mass spectra. The major bins at m/z 81 Th and 137 Th, which were initially excluded to perform PMF analysis, were counted into their corresponding factors. For example, the signals of the discarded bins at m/z 81 Th and 137 Th were estimated by multiplying their isotope signals by the corresponding scale number and added to the factor representing monoterpenes. The 435 average mass fractions of various PMF factors in total measured organic vapors are shown in Fig. 12.
While the atmospheric environment and ecosystem processes differ markedly in the Landes forest and the southern Finnish boreal forest, the results of this study reveal similar biogenic sources and oxidation processes in these forest environments. For instance, the biogenic VOCs at the two sites are both dominated by monoterpenes, with the average fractions of 29% in the Landes forest and at the SMEAR Ⅱ station. These two forests are both characterized by pine trees, with dominant 440 emissions of α-pinene and β-pinene (Riba et al., 1987;Simon et al., 1994;Hellén et al., 2018). According to the PMF results, isoprene and its major oxidation products in these environments (mainly C4H6O) contribute 14% and 21% in the two ecosystems, respectively. Factors indicative of sesquiterpenes are identified in the high mass range at both sites. The average contribution of sesquiterpenes (0.5% in the Landes forest and 1.7% at the SMEAR Ⅱ station) is much smaller than that of monoterpenes and isoprene. Factors of the lightly oxidized products, more oxidized products, and organic nitrates of 445 monoterpenes/sesquiterpenes in total contribute 8% and 12% of the measured organic vapors in the Landes forest and at the SMEAR Ⅱ station, respectively.
The factor related to C4H8H + ion was resolved at both sites and contributes 10% in the Landes forest and 16% at the SMEAR Ⅱ station. According to the discussions by Li et al. (2020), the observation of C4H8H + in the Landes forest can be biogenic or anthropogenic sources (Hellén et al., 2006;Zhu et al., 2017). Another possible explanation is that the C4H8H + ion is produced during the fragmentation of many VOCs in the PTR instruments (Pagonis et al., 2019). The green leaf volatiles (GLV) have been found to fragment at m/z 57 Th inside the PTR instruments, which are a group of six-carbon aldehyde, alcohols and their esters released by plants. Furthermore, butanol can easily lose an OH during the PTR source ionization and produce prominent C4H8H + peaks (Spanel and Smith, 1997). Therefore, the condensation particle counters (CPCs) using 455 butanol for aerosol measurements at the site could also be an important source of C4H8H + ions, although the exhaust air from these instruments has been filtered using charcoal denuder. At the SMEAR Ⅱ station, the bivariate polar plot where the concentrations of air pollutants are shown as a function of WS and WD indicates that high signals of C4H8H + generally occur when the wind comes from the north (Fig. S15). Located in the north of the measurement container is a particle measurement cottage with several CPCs inside using butanol. A previous study at this station also found that C4H8H + signals detected by 460 PTR-TOF mainly come from butanol used by aerosol instruments . Therefore, it is expected that Factor S1 at the SMEAR Ⅱ station is mainly contributed by butanol fragmentation inside the instrument where butanol comes from nearby aerosol instruments. Figure 13 presents the comparison of the mass spectra of the common sources identified at both sites, with the x and y axis showing the mass fraction of different bins in the factor profile. The scattering in the plots is mainly caused by mass bins 465 with much lower mass fractions. However, the dominant bins with high mass contributions in the factor profiles generally correlate well and are located close to the 1:1 line. It shows that the mass spectra of the common sources match well in these two forests and the sources and processes are indeed similar despite the quite different regions the forests are in.

Insights into terpene oxidation processes
Terpenes undergo varying degrees of oxidations in the atmosphere and produce a large variety of organic compounds with 470 different volatilities (Donahue et al., 2012;Ehn et al., 2014). With the sub-range PMF analysis performed in this study, terpene reaction products with varying oxidation degrees are successfully separated. The sources of monoterpene lightly oxidized products, sesquiterpene lightly oxidized products, monoterpene more oxidized compounds, and monoterpene-derived organic nitrates are identified in both forests with distinct characteristics. These factors account for 8-12% of the measured organic vapors in the two forests. It provides a great opportunity to gain insights into terpene oxidation processes. Because some 475 environmental parameters, for example, measurements of UVB to estimate OH concentration, are not available in the Landes forest, the results from SMEAR Ⅱ station are presented as follows.

Monoterpene oxidations
The oxidation processes of monoterpenes at the SMEAR Ⅱ station have been investigated by several previous studies, mostly based on the highly oxidized compounds. Utilizing non-negative matrix factorization analysis on iodide-adduct CIMS data at daytime diel trends. Yan et al. (2016) conducted source apportionment of HOMs at the SMEAR Ⅱ station and separated various HOM formation pathways, such as monoterpene ozonolysis and monoterpene oxidation initiated by NO3 radical. In this study, three types of monoterpene reaction products were detected: monoterpene lightly oxidized compounds, monoterpene more oxidized compounds, and monoterpene-derived organic nitrates. The latter two were not clearly separated into different factors 485 at the SMEAR Ⅱ station due to the similarities in their overall time trends. For example, the time series of C10H15NO5H + correlate well with those of C10H16O4H + and C10H16O5H + (r 2 > 0.61).
Consistent with previous observations, monoterpene more oxidized products (i.e., C10H16O4 and C10H14O5) have a broad high distribution throughout the day due to the active photochemical processes during daytime. Monoterpene-derived organic nitrates (i.e., C10H17NO4, C10H15NO5, and C9H13NO6) are mainly characterized by a distinct morning peak at around 8:00, 490 approximately 2 h after the NO peak. But their intensities are also elevated at night. PMF analysis of NO3 -CIMS dataset observed similar diurnal variations of terpene organic nitrates factor at a forest site in the southeastern US (Massoli et al., 2018). Compared with β-pinene and most other monoterpenes, the overall organic nitrate yield from α-pinene + NO3 is rather low (Fry et al., 2014;Kurtén et al., 2017). Laboratory studies found that using iodide-adduct FIGAERO-HR-ToF CIMS, C10H15NO6 is the most abundant organic nitrate in both gas-and particle-phase measurements of α-pinene + NO3 reactions 495 (Nah et al., 2016). Boyd et al. (2015) Fig. 14; Fig. S16). Comparatively, C10H17NO5 and C10H15NO6 correlate better with the products of [OH] × [monoterpenes] and [O3] × [monoterpenes] during daytime (9:00~18:00). However, for the product of [NO3] × [monoterpenes], its correlation 500 coefficients with C10H17NO5 and C10H15NO6 are a bit higher at night (20:00 to 4:00 of the next day). These results indicate that monoterpene-derived organic nitrates can be mainly formed by the NO3-initiated oxidations at night, but in daytime by the OH and O3-initiated oxidations followed by NO termination of the RO2. It should be noted that C10H17NO5 and C10H15NO6 are used as examples because both of them are fingerprint peaks of the factor, but in real environments it may not be the case that these molecules are always produced from the above formation routes. 505

Sesquiterpene oxidations
The lightly oxygenated compounds from sesquiterpene reactions present a big morning peak and elevated signal intensities at night, similar to the diurnal variations of monoterpene lightly oxidized products. Hellén et al. (2018) showed that at the SMEAR Ⅱ station, O3 oxidation dominated the first step of sesquiterpene reactions for the whole year. It has also been observed in central Amazonia that sesquiterpenes contributed the highest to total O3 reactivity although sesquiterpene concentrations 510 were much lower than those of monoterpenes and isoprene (Yee et al., 2018). At the SMEAR Ⅱ station, emissions of sesquiterpenes are dominated by β-caryophyllene (Hellén et al., 2018). Photooxidation of β-caryophyllene in the chamber experiments resulted in high aerosol yield and is expected to strongly influence SOA formation (Jaoui et al., 2013). Using the mass spectrometric techniques, Jokinen et al. (2016) observed the production of highly oxidized organic compounds from β-caryophyllene ozonolysis, i.e., monomers C15H24O7,9,11 and C15H22O9,11, and dimers C29H46O12,14,16 and C30H46O12,14,16. 515 However, due to the instrumental limitation, only the lightly oxidized products from sesquiterpene reactions were identified in this study.
Interestingly, a strong RH-dependence was observed for the correlations between sesquiterpene lightly oxidized compounds and the product of [OH] × [sesquiterpenes] or [O3] × [sesquiterpenes]. These products represent the oxidation rates of sesquiterpenes with OH radical and O3. As shown in Fig. 15, the corresponding correlation coefficients vary significantly 520 with RH. In addition, the signal intensities of sesquiterpene lightly oxidized products also show high dependence on RH. At lower RH (RH<40%), the signal intensities of sesquiterpene lightly oxidized products are relatively low and correlate closely with the product of [ (Fig. S17). These findings have not been observed by previous studies and the reasons behind remain unclear. High-RH conditions typically occur during nights with temperature inversion (Zha et al., 2018), while RH below 40% generally only occurs at the station during sunny days. The controlling role of temperature can be ruled out because temperature is strongly anti-correlated with RH and is known to influence terpene emissions and terpene reaction rates. Future studies are needed to dig deep into the atmospheric processes of sesquiterpenes and monoterpenes. 530

Concluding remarks
In this study, we conducted Vocus PTR-TOF measurements in two forest environments and performed binPMF analysis on these complex mass spectra. In addition to VOC species, Vocus PTR-TOF is able to measure large amounts of oxygenated VOCs with enhanced detection efficiency. According to the results in this work, factor analysis on Vocus PTR-TOF mass spectra separated VOC precursors and their reaction products with varying oxidation degrees into different factors. These 535 factors showed distinct characteristics in the atmosphere. Comparatively, the conventional PTR instruments or gas chromatograph-mass spectrometry (GC-MS) largely detect VOC precursors of low-mass molecules (Dewulf et al., 2002;de Gouw et al., 2007). Previous source apportionment studies on these datasets mainly identified primary biogenic and anthropogenic emission sources (Vlasenko et al., 2009;Patokoski et al., 2014;Baudic et al., 2016;Debevec et al., 2017;Sarkar et al., 2017;Wang et al., 2020). Recently, factorization methods have been applied on NO3 -CIMS dataset to identify various 540 atmospheric formation pathways of HOMs Massoli et al., 2018;Zhang et al., 2019b). Here, for the first time, source apportionment of Vocus PTR-TOF data identified various primary emission sources and secondary formation pathways of atmospheric organic vapors, highlighting the novelty of Vocus PTR-TOF in measuring both VOCs and oxygenated VOCs.
The relative abundances of organic precursors, the lightly oxidized products, and the more oxidized products can be utilized by modellers to evaluate simulation output, improve model performance, and provide new perspectives to understand gas-545 phase physicochemical processes.
Compared with VOC species, VOC reaction products are generally present in much smaller amounts in the atmosphere. Therefore, utilizing a sub-range PMF analysis, or other similarly weighting method, is particularly important for Vocus PTR-TOF observations, where several orders of magnitude differences are expected between VOC precursors and their oxidation products. Compared with the low mass range, the average contributions of the high mass range in total signals are significantly 550 smaller, 2% and 9%, in the Landes forest and at the SMEAR Ⅱ station, respectively. However, the identified factors in the high mass range, such as sesquiterpenes, sesquiterpene lightly oxidized products, monoterpene-derived organic nitrates, and more oxidized compounds, can provide crucial insights into atmospheric physicochemical processes. For example, we found that the correlations between sesquiterpene lightly oxidized compounds and the products of [OH] × [sesquiterpenes] or [O3] × [sesquiterpenes] show strong dependences on RH. High signal intensities of sesquiterpene lightly oxidized compounds only 555 occur at high-RH conditions. Such high RH-dependence was not observed for monoterpene lightly oxidized compounds.
To summarize, this study successfully performed binPMF analysis on sub-ranges of mass spectrometry dataset acquired with a Vocus PTR-TOF in two European forest ecosystems, the Landes forest and a southern Finnish boreal forest. Both primary emission sources and secondary oxidation processes of organic vapors were identified in the two environments, particularly for terpenes and their reaction products with varying oxidation degrees (including organic nitrates). Factors of the 560 lightly oxidized products, more oxidized products, and organic nitrates of monoterpenes/sesquiterpenes accounted for 8-12% of the measured gas-phase organic vapors in the two forests. Further interpretations show a strong RH-dependence for the behaviour of sesquiterpene lightly oxidized products but not for that of monoterpene lightly oxidized products, for which the reasons behind need more investigations in the future. Data Availability. The time series of the measured trace gases, meteorological parameters, and the concentrations of isoprene 565 and monoterpenes in the Landes forest and at the SMEAR Ⅱ station are available from https://doi.org/10.5281/zenodo.3946644 (Li, 2020 (850614)) and the Academy of Finland (grant nos. 317380 and 320094). We thank the SMEAR II station staff for their help during field measurements in Hyytiälä. The authors also would like to thank the PRIME-QUAL program for financial support (ADEME, convention#1662C0024) and the French National Research Agency (ANR) in the frame of the "Investments for the Future" program, within the Cluster of Excellence COTE (ANR-10-LABX-45) of the Villenave, E., Perraudin, E., Ehn, M., and Bianchi, F.: Terpenes and their oxidation products in the French Landes forest: insights from Vocus PTR-TOF measurements, Atmos. Chem. Phys., 20, 1941Phys., 20, -1959Phys., 20, , https://doi.org/10.5194/acp-20-1941Phys., 20, -2020Phys., 20, , 2020 Li, H.: Data for "Source identification of atmospheric organic vapors in two European pine forests: Results from Vocus PTR-TOF observations", http://doi.org/10.5281/zenodo.3946644, 2020.