Molecular composition of biogenic secondary organic aerosols using ultrahigh resolution mass spectrometry : comparing laboratory and field studies

Introduction Conclusions References


Introduction
Biogenic volatile organic compounds (BVOCs) play an important role in atmospheric chemistry and give rise to secondary organic aerosols (SOA), which have effects on 25 climate (Hallquist et al., 2009) and human health (Pope and Dockery, 2006). SOA is 29595 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | formed within the atmosphere from gaseous precursors and gas-to-particle conversion processes. Laboratory chamber experiments have been performed for decades in an attempt to mimic atmospheric SOA formation. However, it is still unclear how close the aerosol particles generated in laboratory experiments resemble atmospheric SOA with respect to their detailed chemical composition. One of the major challenges is the iden-5 tification of the organic composition of the SOA, which is composed of thousands of organic compounds (Kanakidou et al., 2005). These compounds generally cover a wide range of polarities, volatilities and masses (Goldstein and Galbally, 2007) and therefore it is difficult to find a single analytical technique for their detailed chemical analysis at the molecular level. Conventional chromatographic methods (gas chromatography 10 (GC) and liquid chromatography (LC)) are not capable of resolving the highly complex mixtures with a wide variety of physico-chemical properties. Moreover, commonly used mass spectrometers, which are often used as detectors following chromatographic separation, do not have sufficient mass-resolving power to distinguish and differentiate all the compounds present in the complex mixture of organic aerosol. Ultra-high resolu- 15 tion mass spectrometers (UHR-MS) (i.e., Fourier transform ion cyclotron resonance MS and Orbitrap MS) have a mass resolution power that is at least one order of magnitude higher (≥ 100 000) than conventional MS and thus have the potential for solving this problem. Direct infusion electrospray ionisation (ESI)-UHR-MS has been successfully applied for the analysis of both ambient and laboratory-generated SOA and facil- 20 itated the characterisation of hundreds of species with individual molecular formulae . Despite the high analytical throughput of direct infusion MS, this method is prone to artefacts such as changes in the ionisation efficiency of an analyte due to the presence of "matrix" compounds in the complex organic mixtures (Pöschl, 2005). For instance, sulphates, nitrates and ammonium salts are important 25 constituents of atmospheric aerosols (Pöschl, 2005) and once injected into the ESI source can cause ion suppression, adduct formation and a rapid deterioration of instrument performance (Dettmer et al., 2007). NanoESI-MS, which generally produces smaller droplet sizes and analyte flow in the electrospray (Schmidt et al., 2003), can substantially reduce interference effects from inorganic salts. Moreover, it provides better sensitivity towards a variety of analytes in samples containing relatively high levels of salts (Juraschek et al., 1999;Schmidt et al., 2003) and decreases source contamination (Schmidt et al., 2003) compared to conventional ESI sources.
To date, most laboratory experiments reproducing atmospheric SOA formation have 5 been performed using a single organic precursor (e.g., α-or β-pinene or isoprene) while in the atmosphere a wide range of precursors contribute to SOA, which results in a more complex SOA composition compared to the one-precursor laboratory systems. Although, there are a few studies where oxidation of volatile organic compound (VOC) mixtures were performed, their main goal was to investigate SOA formation, yields 10 (VanReken et al., 2006;Jaoui et al., 2008;Hao et al., 2009Hao et al., , 2011Kiendler-Schar et al., 2009;Mentel et al., 2009;Hatfield and Hatz, 2011;Waring et al., 2011) and specific products (Jaoui et al., 2003;Amin et al., 2013) rather than detailed molecular composition.
The main objective of this work is to compare the detailed molecular composition 15 of laboratory-generated SOA from oxidation of a single BVOC (α-pinene) and from a mixture of four BVOCs, with samples of ambient aerosol from urban and remote locations using chip-based direct infusion nanoESI-UHR-MS. In a preceding study we examined aerosol samples from the boreal forest site Hyytiälä, Finland, and determined that a dominant fraction of the detected compounds are reaction products of 20 a multi-component mixture of BVOCs (Kourtchev et al., 2013a). In the present study we compare the composition of these field samples with SOA generated in chamber experiments from the ozonolysis of α-pinene and of BVOC mixtures containing four species (α-and β-pinene, ∆ 3 -carene, and isoprene) that are most abundant in Hyytiälä's environment. The laboratory experiments were performed under conditions 25 (e.g., relative humidity (RH), aerosol seed, and VOC ratios) resembling those at the boreal sampling site during the summer 2011 period. To the best of our knowledge this is the first direct comparison of the molecular composition of laboratory-generated SOA from the BVOC mixtures and ambient samples.

29597
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 2 Materials and methods

Atmospheric simulation chamber
Experiments were carried out in an atmospheric simulation chamber described in detail elsewhere (Thüner et al., 2004). Briefly, the chamber is a cylinder made of fluorine-ethene-propene (FEP) Teflon ® foil with a volume of 3.91 m 3 . It was operated at 5 296 ± 2 K using purified air at 0.1-1 mbar above atmospheric pressure. The experiments were performed at 60-68 % relative humidity produced from bubbling purified air through heated water. The humidity and temperature were measured using a dew point meter (DRYCAP ® DM70 Vaisala). Fans installed at both ends of the chamber were used during the first 5 min of the reaction to provide rapid and uniform mixing of the re-10 actants and products. Between experiments the chamber was cleaned by introducing about 1 ppm of ozone into the chamber and flushing with purified air at a flow rate of 0.15 m 3 min −1 . The experiments were performed with neutral ammonium sulphate ((NH 4 ) 2 SO 4 , Sigma Aldrich, 99.99 %) seed particles, produced using an atomizer and dried before introduction into the chamber. Aerosol seed particles were subjected to 15 Krypton-85 (Kr-85) charge neutraliser before introduction to the chamber. Seed particle concentrations for each experiment are shown in Table 1. Cyclohexane at a molar concentration 1000 times higher than the VOC precursors was used to scavenge OH radicals produced from ozonolysis of the reactants. BVOCs (i.e., α-pinene, β-pinene, ∆ 3 -carene and isoprene) were introduced into the chamber by flowing purified air over 20 known amounts of the compounds in a gently heated Pyrex impinger. The BVOC concentrations are shown in Table 1. Ozone (ca. 200 ppbv) was introduced at the beginning of the reaction over a period of 1 min using an electric discharge generator. Ozone decay was monitored with an automated analyser (Thermo Model 49i). A scanning mobility particle sizer (TSI model 3081) was used to measure particle number-size distributions in the range 11-478 nm (mobility diameter) every 3 min. The particle mass concentrations were calculated assuming a density of 1 g cm −3 . Dilution and wall-loss 29598 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | corrections were calculated by considering the particle loss as a first order rate process. The background NO x concentration measured using a NO x analyser (Thermo Model 42i) was below 2 ppbv for all experiments. The aerosol samples were collected on quartz fibre filters (Pallflex Tissuquartz 2500QAT-UP, 47 mm diameter) 20-30 min after the maximum SOA concentration was observed. Before use, the quartz fibre filters 5 were preheated at 650 • C for 12 h to remove any possible organic impurities. A charcoal denuder was used in front of the filter pack to remove gas phase species formed during the ozonolysis reaction. The aerosol sampling flow rate was approximately 12 L min −1 and the sampling time was 40 min. In addition, "blank" chamber samples were collected by drawing "clean" air from the chamber for 40 min. The filter samples were wrapped in 10 baked aluminium foil and stored at −20 • C prior to analysis.

Ambient samples
Biogenic ambient samples were collected at the boreal forest site SMEAR II in Hyytiälä,southern Finland (61 • 51 N,24 • 17 E) as previously described in detail (Kourtchev et al., 2013a). The forest around the station is dominated by conifers (mainly Scots pine and 15 Norway spruce) with some deciduous trees, such as aspen and birch, with a tree density of about 2500 ha −1 . Detailed descriptions of the site, instrumentation, meteorological data collection and sampling are given elsewhere (Kulmala et al., 2001;Hari and Kulmala, 2005). In total, 10 separate day and night atmospheric aerosol PM 1 samples, each representing 12 h of sampling, were collected over the period 16 to 25 August 20

2011.
Anthropogenic ambient samples were collected during 9-17 September 2011 at the Tivoli Industrial Estate and Docks (TIED), Cork, Ireland (51 • 54 5 N, 8 • 24 38 W). A detailed description of the TIED site is given elsewhere (Healy et al., 2009;Hellebust et al., 2010;Kourtchev et al., 2011). The site is located approximately 3 km east of if they do exist, they are likely to be spruce (Picea sitchensis). PM 2.5 aerosol samples were collected on quartz fibre filters (Pallflex Tissuquartz 2500QAT-UP, 150 mm diameter, pre-fired for 24 h at 650 • C) using a High Volume (Digitel DHA-80, Switzerland) sampler with a flow rate of 500 L min −1 . 10 All ambient filters were analysed for organic carbon (OC) and elemental carbon (EC) using a thermal-optical transmission (TOT) technique (Birch and Cary, 1996). For each sample, a part of the quartz fibre filter (6-30 cm 2 , depending on OC or total aerosol loading for ambient and laboratory samples respectively) was extracted three times with 5 mL of methanol (Optima ® grade, Fisher Scientific) under ultrasonic agitation for 15 30 min. The three extracts were combined, filtered through a Teflon filter (0.2 µm) and reduced by volume to approximately 200 µL under a gentle stream of nitrogen. The final extracts were analysed using an ultrahigh resolution LTQ Orbitrap Velos mass spectrometer (Thermo Fisher, Bremen, Germany) equipped with a TriVersa Nanomate robotic nanoflow chip-based ESI source (Advion Biosciences, Ithaca NY, 20 USA). The Orbitrap MS instrument was calibrated using an Ultramark 1621 solution (Sigma-Aldrich, UK). The mass accuracy of the instrument was below 1.5 ppm and was routinely checked before the analysis. The instrument mass resolution was 100 000 at m/z 400. The negative ionisation mass spectra were collected in three replicates at ranges m/z 100-650 and m/z 200-900 and processed using Xcalibur 2.1 software 25 (Thermo Scientific). A mixture of camphor, sulphonic acid (20 ng µL ) in methanol and Ultramark 1621 solution were used to optimise the ion transmission settings. The direct infusion nanoESI pa-29600 rameters were as follows: the ionisation voltage and back pressure were set at −1.4 kV and 0.8 psi, respectively. To assess possible matrix effects caused by inorganic salts on the detection of organic compounds in the direct infusion analysis, the methanolic extracts of the laboratory-generated samples were mixed with 30 % aqueous solution of ammonium 5 sulphate (to mimic the ambient concentration ratios in the boreal samples, see discussion below). Control samples were mixed with water in the same proportions. These modified samples were analysed in the same way as the unaltered aerosol extracts.

Aerosol sample analysis
For the LC/(-)ESI-MS analysis, due to relatively low OC loading of the filter samples, all day and night samples were pooled into one day and one night sample, evapo-10 rated to dryness and resuspended in 0.1 % formic acid. LC/(-)ESI-MS analysis was performed using an Accela system (Thermo Scientific, San Jose, USA) coupled with LTQ Orbitrap Velos MS and a T3 Atlantis C18 column (3 µm; 2.1 × 150 mm; Waters, Milford, USA) as described in Kourtchev et al. (2013a). The mobile phases consisted of 0.1 % formic acid (v /v ) (A) and methanol (B). The applied gradient was as follows: 15 0-3 min 3 % B, 3-25 min from 3 % to 50 % B (linear), 25-43 min from 50 % to 90 % B (linear), 43-48 min from 90 % to 3 % B (linear) and kept for 12 min at 3 % B (total run time 60 min). MS spectra were collected in full scan using the lock mass for the deprotonated dimer of formic acid at m/z 91.00368 with a resolution of 100 000 and the mass ranges of m/z 50-650 and m/z 150-900. Based on pre-scan information from 20 the full scan MS, a parallel data-dependent collision induced dissociation (CID) multistage mass spectrometry (MSn) (n = 1, 2, 3 and 4) was performed on the most intense precursor ion in three scans at the resolution of 30 000.

Ultrahigh MS resolution data analysis
The ultrahigh resolution mass spectral data interpretation was carried out using a pro-25 cedure as described elsewhere (Kourtchev et al., 2013a). For each direct infusion sample analysis 70-80 mass spectral scans were averaged into one mass spectrum. Molecular assignments were performed using Xcalibur 2.1 software applying the fol-29601 S ≤ 2, 34 S ≤ 1. All mathematically possible elemental formulae with a mass tolerance of ±5 ppm were calculated. The data filtering was performed using a Mathematica 8.0 (Wolfram Research Inc., UK) code developed in-house that employed several conservative rules and constraints similar to those used in previous studies (Koch et al., 2005;Wozniak 5 et al., 2008;Lin et al., 2012). Only ions with intensities ten times above the noise level were kept for the data analysis. The mass tolerance range for keeping mathematically assigned elemental formulae was set to approximately ±0.5 ppm and varied within the ±5 ppm tolerance window. This range was determined by establishing the average difference between the theoretical and the experimental mass for nine com-10 pounds with known elemental composition determined by LC/MS analyses (Kourtchev et al., 2013a). All molecular formulae where O/C ≥ 1.2, 0.3 ≤ H/C ≥ 2.5, N/C ≥ 0.5, S/C ≥ 0.2 were eliminated with the aim of removing compounds that are not likely to be observed in nature. Moreover, neutral formulae that had either a non-integer or a negative value of the double bond equivalent (DBE) were removed from the list of possible 15 molecules. Double bond equivalents were calculated using Xcalibur 2.1 software. The assigned formulae were additionally checked for the "nitrogen-rule" and isotopic pattern as described elsewhere (Kourtchev et al., 2013a). The background spectra obtained from the procedural blanks were also processed using the rules mentioned above. The formulae lists of the background spectra were subtracted from those of the ambient (or 20 chamber) sample and only formulae with a sample/blank peak intensity ratio ≥ 10 were retained. All molar ratios, DBE factors and chemical formulae presented in this paper refer to neutral molecules.

Hierarchical agglomerative cluster analysis
Aerosol samples were classified by hierarchical agglomerative cluster analysis (Lukasová, 1979). The data was organised in a two-way table X nm , where n is the number of samples (six smog chamber samples and two ambient samples) and m is the number of compounds analysed by UHR-MS in the mass range 100-300 Da (451 molecular formulae). The X nm is a binary value indicating the presence/absence of the compound m in sample n. The cluster analysis was performed using Statistica 10 (StatSoft Inc., Tulsa, OK, USA), based on the unweighted pair-group average linkage method (or average linkage method) and using the percent disagreement (Georgieva 5 et al., 2005) distance measure. The metric used in this study is analogous to the Jaccard's dissimilarity distance measure that is commonly applied for the analyses of binary patterns (Sneath and Sokal, 1973;Anthony et al., 2002;Cordeiro et al., 2003;Kosman and Leonard, 2005). The percent disagreement or simple mismatch metric considers as a match the absence of compounds in all compared spectra. This might 10 give misleading results because two samples could be considered close to each other just because they shared many absences of compounds (Kosman and Leonard, 2005). For this reason, results obtained using the unmodified percent disagreement or different metrics, i.e. Euclidean distance and r-Pearson correlation coefficient, which would have the same drawbacks for binary data, were not considered (Kosman and Leonard, 15 2005). Therefore, the percent disagreement metric was modified in order to calculate the distances on the basis of the percentage of common ions between the considered samples. The linkage distance between two samples is calculated using the following equation: where LD ij is the linkage distance between sample i and sample j, N is the total number of ions considered in the cluster analysis (451 ions) and c ij is the number of ions in common between sample i and sample j.
Robustness of the applied technique was evaluated by repeating the cluster analysis using different linkage methods, i.e. single linkage (or nearest neighbour) and complete 25 linkage (farthest neighbour), which gave exactly the same results.

29603
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Results and discussion
The VOC mixture used in the laboratory experiments contained four of the most abundant SOA-precursor monoterpenes (i.e., α-pinene, ∆ 3 -carene, β-pinene and isoprene) emitted at the boreal forest site in Hyytiälä (Hakola et al., 2003;Aaltonen et al., 2011;Bäck et al., 2012). The emissions of α-pinene and ∆ 3 -carene were found to be respon-5 sible for up to 97 % of the total monoterpene proportions in both plant branch emissions from Scots pine trees (a dominant species at Hyytiälä) and the ambient samples from the boreal forest site in Hyytiälä (Bäck et al., 2012). In the present study, SOA ambient samples were collected below the canopy and 5 m above the forest floor; therefore, the VOC composition is expected to be additionally influenced by emissions from ground 10 vegetation. At the boreal ground floor the monoterpenes were also found to be the most abundant compound group with α-pinene (average 2.975 µg m −2 h −1 ), ∆ 3 -carene (average 1.305 µg m −2 h −1 ), camphene (average 0.442 µg m −2 h −1 ) and β-pinene (average 0.191 µg m −2 h −1 ) accounting for 90 % of the monoterpene fluxes (Aaltonen et al., 2011). Previous studies (Kourtchev et al., 2005(Kourtchev et al., , 2008 indicated that SOA from Hyytiälä 15 contained a number of oxidation products of isoprene implying that isoprene certainly plays a role in SOA formation at the boreal site. Therefore, isoprene was added to the VOC mixture in proportions estimated from the fluxes at the sampling site (Hakola et al., 2003;Aaltonen et al., 2011). Although the total concentrations of the VOC mixture used in our chamber experiments exceeded those observed at the Finnish site, their molar  Table 1. The average SOA yields (corrected for wall losses) for α-pinene and VOC mixtures were 0.16 ± 0.01 (n = 3) and 0.11 ± 0.01 (n = 3), respectively. The 25 obtained yields for the α-pinene-only experiments are in reasonable agreement with those reported in the literature for similar VOC concentration ranges (Pathak et al., 2007;Hatfield and Hartz, 2011). Surprisingly, in the present study, SOA yields for the 29604 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | VOC mixture were significantly lower than for the single VOC (α-pinene) system. αpinene and ∆ 3 -carene accounted for a major fraction (∼ 70 %) of the total VOC mixture and thus are expected to make the major contributions to the SOA mass. ∆ 3 -carene is reported to have a similar SOA yield to α-pinene (Jonsson et al., 2006;Hatfield and Hartz, 2011) and therefore cannot be responsible for the observed low yield from the 5 precursor mixture. β-pinene and isoprene account for 20 % and 10 % of the total VOC mixture, respectively. The ozonolysis of these VOCs generally results in a lower yield for β-pinene compared to α-pinene (Jonsson et al., 2006) and a very low yield for isoprene (∼ 0.014) (Kleindienst et al., 2007). However, considering that the latter VOCs account for a small fraction of the reaction mixture their contribution to the total SOA mass is 10 expected to be rather low. It has been shown that the addition of β-pinene to the αpinene/O 3 system did not affect the SOA yield significantly (Hatfield and Hartz, 2011). Therefore, the possibility that isoprene is responsible for suppressing SOA formation from the precursor mixture cannot be ruled out. Furthermore, as shown in Table 1, ozone was present in all experiments in excess and thus the differences in yield are 15 not expected to be due to the limited availability of oxidants.

UHR-MS analysis
Representative (-) nanoESI high resolution mass spectra for ambient summer aerosol from the boreal forest site Hyytiälä, Finland, laboratory-generated SOA from ozonolysis of α-pinene, and the VOC mixture are shown in Fig. 1. The molecular composition 20 of the organic aerosol at Hyytiälä is found to be strongly affected by air mass origin. Depending on the sampling day 460-730 molecular formulae were identified in the 10 ambient samples (Kourtchev et al., 2013a). The NanoESI mass spectra of the ambient samples is mainly composed of low molecular mass compounds (i.e. peaks below m/z 350, Fig. 1a Mazzoleni et al., 2012) environments. This is in contrast to laboratory generated SOA from both α-pinene (Fig. 1c) and the VOC mixture (Fig. 1d) which contain high molecular weight compounds with distinguishable groups of dimers. Similar observations were reported in the literature for laboratory generated SOA from biogenic or anthropogenic VOCs where UHR mass spectra often contain a large number 5 of oligomers (Reinhardt et al., 2007;Walser et al., 2007;Putman et al., 2012). Figure 1b shows a mass spectrum containing only those ions that were observed in all 10 Hyytiälä samples (referred to hereafter as "common ions"). Considering that "common ions" exclude all species that occurred during the individual days, they are potentially characteristic of locally formed and emitted OA because their presence is 10 independent of the air mass origin.
The VOC mixture samples have a fewer number of peaks in the dimeric region than the α-pinene samples. The total number of assigned formulae in the α-pinene and VOC mixture mass spectra were on average 632 ± 84 and 501 ± 54, respectively (where ± describes the variability between three replicate chamber experiments). A higher 15 number of formulae (about 900) were identified from the negative electrospray ultrahigh resolution FT-ICR mass spectra of SOA from α-pinene ozonolysis in the previous study of Putman et al. (2012). However, the latter study identified formulae in the range 100 < m/z < 850, whereas we only considered ions below m/z 650. The number of possible empirical formulae assignments increases significantly with higher masses, 20 especially above 400 Da. Because no common ions > m/z300 are present in the ambient samples, only ions from the monomeric region of the laboratory-generated SOA were used for further comparison with the ambient sample.
In the monomeric region (below m/z 300), the number of formulae in SOA from αpinene and the VOC mixture were comparable, on average 199 ± 29 and 215 ± 17, respectively ( Fig. 1a and b). At first, such a small difference was somewhat puzzling. However, considering that three of the four VOC precursor compounds (i.e., α-pinene, β-pinene and ∆ 3 -carene) used in the mixture are structural isomers, their oxidation with O 3 is expected to yield products with similar elemental composition but different 29606 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | structures, which cannot be separated using the analytical technique employed here. For instance, the mass spectra from both chamber experiments and ambient OA were dominated by an ion at m/z 185.08. While in α-pinene experiments this ion corresponded to cis-pinic acid, in the VOC mixture experiments and Hyytiälä ambient samples this ion was related to three (i.e. cis-pinic acid, homoterpenylic acid, and cis-caric 5 acid) and five (i.e. cis-pinic acid, homoterpenylic acid, limonic acid, ketolimononic acid and cis-caric acid) different compounds, respectively. The separation and identification of these compounds was achieved using LC/MS analysis. The ionisation of organic compounds can be affected by the presence of inorganic salts in the analyte solutions, potentially leading to a decrease in MS signal intensity 10 when using direct infusion mass spectrometry methods. Thus, we tested whether the presence of atmospherically abundant salts (e.g., ammonium sulfate) in our filter extracts could cause such a matrix effect and whether this could be responsible for the lack of dimers observed in the ambient samples. Laboratory generated samples were spiked with ammonium sulfate at atmospherically realistic proportions (30 % of the total 15 aerosol mass). The addition of salts suppressed the intensities of all ions in the entire mass range but did not selectively decrease the intensity of ions in the dimeric region (Supplement Fig. S1). However, due to competitive ionisation of analytes in the ESI direct infusion analysis of the aerosol samples that are known to have a very complex matrix, the ion intensities do not directly reflect the concentration of the molecules in 20 the sample. Therefore, signal intensities should be interpreted with caution and thus were not considered for the mass spectral comparison in this study. In contrast, LC/MS which is a quantitative technique showed significant difference in the abundances of peaks associated with higher-molecular weight (HMW) compounds between ambient and laboratory generated samples (Fig. 2). While a number of HMW species associ-25 ated with m/z 337. 16, 357.15, 367.18, and 377.14 were observed in the chromatogram from laboratory generated SOA (Fig. 2a), only one of these species (m/z 357.15) was detected in the ambient samples with intensity just above the chromatographic noise (Fig. 2b). It should be noted that a chromatographic peak associated with m/z 357.15  Yasmeen et al. (2010), who suggested that the HMW compound at m/z 357 is a possible esterification product of cis-pinic and diaterpenylic acid. Both of these acids were 5 found to be very abundant in our ambient and laboratory generated samples; however, as outlined above, their dimer was only present in the latter samples (Fig. 2).
These results rule out the possibility that the observed direct infusion nanoESI mass spectral differences in the ambient and laboratory generated samples are either due to matrix or methodological artifacts. We can thus conclude that the dimer concentration 10 in the boreal forest OA is negligible compared to the laboratory SOA.
The Van Krevelen (VK) diagram, in which the H/C ratio is plotted as a function of the O/C ratio for each formula in a sample, is often used to describe the evolution of organic mixtures. Moreover, VK diagrams can also be used to visualise the differences in the elemental composition of different samples. Figure 3 shows an overlaid VK di-15 agram for SOA from (a) α-pinene, the VOC mixture and a boreal forest sample from Hyytiälä and (b) α-pinene, the VOC mixture and a sample from the TIED site, which is heavily influenced by anthropogenic emissions. As indiacted above, the elemental composition from the boreal forest site only included "common ions" because they are potentially characteristic for locally emitted OA as their presence is independent of the 20 air mass origin. The elemental ratios from the TIED site included "common ions" from 3-4 September 2011, associated with westerly air masses. The composition of the latter samples is discussed in detail in a separate article (Kourtchev et al., 2013b). It can be seen from Fig. 3, that the distribution of elemental ratios of laboratory-generated SOA from α-pinene is very similar to that of the VOC mixture. Moreover, the elemental 25 distribution of laboratory SOA generated from a single precursor and a mixture of VOCs represent fairly well that of the ambient SOA from Hyytiälä, except that the latter sample contained an additional cluster of molecules as displayed at the upper left part of the diagram. In general, this region is associated with the most reduced/saturated species 29608 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | (Lin et al., 2012) and could therefore possibly be fatty acids emitted from a local biogenic source (Kourtchev et al., 2013a). In contrast, the VK diagrams of the laboratorygenerated SOA were very different from that of the anthropogenic aerosol from the TIED site, which contained a large cluster of ions with low H/C (≤ 1.0) and O/C (≤ 0.5) ratios, possibly corresponding to oxidised aromatic hydrocarbons (Mazzoleni et al.,5 2012). These differences were also apparent when the data was expressed as DBE vs. mass-to-charge-ratio (m/z) (Fig. 4). The samples from α-pinene and the VOC mixtures had very similar DBE distribution with values in the range 1 to 7. A small number of species observed in the laboratory samples with DBE values of 5-7 were possibly associated with dimers that were formed through accretion reactions (Putman et al., 10 2012). The DBE distribution of molecules from the Hyytiälä samples clearly resembled those of the α-pinene and VOC mixtures, except that the ambient sample contained an additional cluster of ions with DBE 0-1. As determined by MS/MS analysis they are attributed to unsaturated and saturated fatty acids (Kourtchev et al., 2013a). The DBE plot for OA from TIED was very different from the rest of the compared samples 15 and contained an additional large cluster of molecules with DBE between 7 and 13 ( Fig. 4b), once more indicating the presence of oxidised aromatic species. Aromatic compounds are typically associated with anthropogenic sources (Henze et al., 2008) whereas aliphatic compounds can be of both anthropogenic and biogenic origin.
The average O/C and H/C ratios for SOA from α-pinene (0.55 and 1.46, respec-20 tively) and the VOC mixture (0.58 and 1.40) were fairly comparable to the ratios for OA from Hyytiälä (0.52 and 1.48) (Kourtchev et al., 2013a) but higher than those from TIED (0.36 and 1.1). The H/C value for laboratory-generated SOA indicated that the identified SOA molecules are of aliphatic and alicyclic nature (Putman et al., 2012). The elemental O/C ratios found in this study are within the range obtained for SOA gen- It has been demonstrated that O/C ratio, as measured by the Aerodyne High Resolution Time-of-Flight Aerosol Mass Spectrometer (HR-ToF-AMS), is positively correlated with the hygroscopic parameter of the organic fraction (Wu et al., 2013), which in turn is related to the cloud condensation nucleus (CCN) activity of aerosol particles (Petters and Kreidenweis, 2007). Therefore, considering that the O/C ratio for SOA generated 5 from the α-pinene-only system is very similar to that from the VOC mixture and OA from Hyytiälä, we suggest that the simplified VOC system can possibly be used for parameterisation of OA at the boreal site.
O/C ratios may not accurately describe the degree of oxidation of organics, because other non-oxidative processes (e.g., hydration and dehydration) can also affect these 10 parameters (Kroll et al., 2011). Carbon oxidation state (OS C ), on the other hand, may change differently upon oxidation, but the average OS C will increase. Therefore, OS C could be a useful metric for the degree of oxidation of organic species in the atmosphere and can serve as a key variable to describe organic mixtures. OS C is shown to be strongly linked to aerosol volatility and thus is a useful parameter for the classi-15 fication of SOA (Hao et al., 2011). Carbon oxidation state can be calculated from the following equation: where OS i is the oxidation state associated with element i and n i /n C is the molar ratio of element i to carbon (Kroll et al., 2011). Figure S2 in the Supplement shows the 20 overlaid carbon oxidation state vs. the number of carbon atoms for molecules from laboratory-generated SOA and the ambient boreal samples. The OS C distribution of laboratory-generated SOA generally resembles that of the ambient samples ranging from −1.7 to 1 and only a few species have an oxidation state greater than +1. Interestingly, molecules with OS C greater than +1 were only observed in SOA from the 25 VOC mixture experiments and ambient aerosol from Hyytiälä. Such compounds are expected to contain several carbonyl groups. However, literature data suggests that the 29610 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | average oxidation state of organic aerosol rarely exceeds this value because species with several carbonyl groups are highly unstable and will rapidly decompose to smaller molecules (Kroll et al., 2011). Considering that the studied VOC mixture mainly contained monoterpenes, which are structural isomers and the fact that highly oxidised molecules were not observed in the SOA generated from the ozonolysis of α-pinene, 5 it is likely that species with OSc > +1 were produced from the ozonolysis of isoprene. Moreover, in the VOC mixture experiments cross reactions between radicals and oxidation products of the different VOCs are expected to occur which may lead to formation of a complex range of species. These highly oxidised species are worthy of further investigation.

10
The majority of the species exhibited OS C values between −1 and +1 with 15 or less carbon atoms, suggesting that they are semi-and low-volatile organic compounds corresponding to "fresh" and "aged" SOA produced by multistep oxidation reactions (Jimenez et al., 2009;Kroll et al., 2011). Compared to chamber samples, the Hyytiälä samples additionally contained ions with OS C < −1 and more than 7 carbon atoms 15 which is characteristic of primary biomass burning aerosol (Kourtchev et al., 2013a). Figure 5 shows the fraction of molecular formulae below 300 Da found in both the laboratory-generated SOA and the ambient samples relative to the total number of formulae in the ambient samples. Evidently, the molecular composition of SOA from 20 both the VOC mixture and α-pinene represented the overall composition of the ambient sample from the boreal forest site reasonably well, with 72.3 ± 2.5 % (n = 3) and 69.1 ± 3.0 % (n = 3) common ions, respectively. Although, the VOC mixture resulted in a slightly higher number of common formulae than that from the boreal forest aerosol compared to the pure α-pinene-SOA, the difference in the mean values among the 25 treatment groups is not large enough to exclude the possibility that the difference is due to random sampling variability; according to ANOVA test, the difference was not statistically significant (p = 0.348). In contrast, the molecular composition of laboratory-29611

Comparison using statistical tools
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | generated SOA was substantially different from that of the anthropogenically affected TIED site. The fraction of common molecular formulae from α-pinene and the VOC mixture relative to the total number of ions from the TIED sample was only 16.1±1.7 % and 16.9 ± 1.2 %, respectively, indicating the very different sources of organic compounds in these samples.

5
Laboratory-generated and ambient samples were also compared by hierarchical cluster analysis (HCA) that divides samples into groups (clusters) of similar molecular composition. HCA separated the samples into three clusters (Fig. 6): (1) α-pinene (replicates from three different experiments); (2) VOC mixture (three replicates) together with the common ions of the ambient samples from Hyytiälä; and (3) common 10 ions of the ambient samples from TIED. The branches in the tree diagram (dendrogram) represent the average distance between the connected samples. It is evident from the dendrogram that all replicate samples from the α-pinene and the VOC mixture experiments cluster together, implying very good reproducibility of the applied technique (i.e., smog chamber experimental and MS analysis) to separate two experimental conditions 15 relative to each other. Although the α-pinene data is separated from the VOC mixture and Hyytiälä cluster, the linkage distance is not large enough to conclude that their chemical composition is very different. On the other hand, the data from TIED was classified into a separate cluster confirming that its molecular composition is very different from the rest of the samples. The results from HCA clearly support the findings 20 obtained from the statistical analysis and other visualisation methods (Van Krevelen diagrams, carbon oxidation state, DBE).

Conclusions
The detailed molecular composition of background ambient aerosol from a boreal forest site (Hyytiälä, Finland), an urban location (Cork, Ireland), laboratory-generated SOA 25 from α-pinene and a mixture of four VOCs were compared using nanoESI-UHRMS. Our results demonstrate that the molecular composition of SOA in the monomeric mass range up to m/z 300 from both the ozonolysis of the VOC mixture and α-pinene represented the overall composition of the ambient sample from the boreal forest site fairly well, with 72.3±2.5 % (n = 3) and 69.1±3.0 % (n = 3) common ions, respectively. Other atmospheric oxidants (e.g., OH radicals and NO x ) will certainly influence the composition of SOA and their reaction products are likely to explain some of the remaining 5 molecules that were not observed in our laboratory generated SOA.
The elemental (O/C and H/C) ratios of SOA from the α-pinene-only system were very similar to those from the VOC mixtures and ambient aerosol from boreal forest. Considering that the O/C ratio is positively correlated with hygroscopicity of the organic fraction, the simplified α-pinene-only system can potentially be useful for parameteri-10 sation of boreal OA. A specific class of CHO compounds identified as fatty acids was present exclusively in the ambient samples suggesting that the composition at the boreal forest OA is also influenced by primary emissions. In contrast, the overall molecular composition of the urban samples is dominated by a high number of oxidised aromatic hydrocarbons and is very different from the boreal and laboratory-generated OA. The     . The linkage distance between two samples, expressed in percentage, has been calculated as the difference between the total ions considered in the cluster analysis (451 ions, 100 %) and the number of common ions between the samples (see text for detailed explanation). Rep 1, 2 and 3 correspond to chamber replicate experiments.