Principal component analysis of summertime ground site measurements in the Athabasca oil sands with a focus on analytically unresolved intermediate-volatility organic compounds

In this paper, measurements of air pollutants made at a ground site near Fort McKay in the Athabasca oil sands region as part of a multi-platform campaign in the summer of 2013 are presented. The observations included measurements of selected volatile organic compounds (VOCs) by a gas chromatograph–ion trap mass spectrometer (GC-ITMS). This instrument observed a large, analytically unresolved hydrocarbon peak (with a retention index between 1100 and 1700) associated with intermediate-volatility organic compounds (IVOCs). However, the activities or processes that contribute to the release of these IVOCs in the oil sands region remain unclear. Principal component analysis (PCA) with varimax rotation was applied to elucidate major source types impacting the sampling site in the summer of 2013. The analysis included 28 variables, including concentrations of total odd nitrogen (NOy), carbon dioxide (CO2), methane (CH4), ammonia (NH3), carbon monoxide (CO), sulfur dioxide (SO2), total reduced-sulfur compounds (TRSs), speciated monoterpenes (including αand β-pinene and limonene), particle volume calculated from measured size distributions of particles less than 10 and 1 μm in diameter (PM10−1 and PM1), particlesurface-bound polycyclic aromatic hydrocarbons (pPAHs), and aerosol mass spectrometer composition measurements, including refractory black carbon (rBC) and organic aerosol components. The PCA was complemented by bivariate polar plots showing the joint wind speed and direction dependence of air pollutant concentrations to illustrate the spatial distribution of sources in the area. Using the 95 % cumulative percentage of variance criterion, 10 components were identified and categorized by source type. These included emissions by wet tailing ponds, vegetation, open pit mining operations, upgrader facilities, and surface dust. Three components correlated with IVOCs, with the largest associated with surface mining and likely caused by the unearthing and processing of raw bitumen. Published by Copernicus Publications on behalf of the European Geosciences Union. 17820 T. W. Tokarek et al.: PCA in the Athabasca oil sands


Introduction
The Athabasca oil sands region of Northern Alberta, Canada, has seen extraordinary expansion of its oil sands production and processing facilities (CAPP, 2016) and associated emissions of air pollutants over the last several decades (Englander et al., 2013;Bari and Kindzierski, 2015).Air emissions from these facilities have been impacting surrounding communities, including the city of Fort McMurray and the community of Fort McKay (WBEA, 2013).To assess the impact of these emissions on human health, visibility, climate, and the ecosystems downwind, it is critical to obtain an understanding of the source types from all activities associated with oil sands operations (ECCC, 2016).
Prior to 2013, there had been only a single industryindependent study of trace gas emissions from the Athabasca oil sands mining operations (Simpson et al., 2010;Howell et al., 2014).The data showed elevated concentrations in nalkanes (30 % of the total quantified hydrocarbon emissions), cycloalkanes (49 %), and aromatics (15 %) in plumes from an oil sands surface mining facility intercepted from a single aircraft flight.These compounds are associated with oil and gas developments including mining, upgrading, and transportation of bitumen (Siddique et al., 2006).Specifically, these activities involve the use of naphtha, a complex mixture of aliphatic and aromatic hydrocarbons in the range of C 3to C 14 -containing n-alkanes (e.g., n-heptane, n-octane, and n-nonane) and benzene, toluene, ethylbenzene, and xylenes (BTEX).
In August 2013, a comprehensive air quality study as a part of the Joint Oil Sands Monitoring (JOSM) plan (JOSM, 2012), referred to here as the 2013 JOSM intensive study, was conducted.This study was performed in northern Alberta at two ground sites in and near Fort McKay in close proximity (as close as 3.5 km) to oil sands mining operations and from a National Research Council of Canada (NRC) Convair 580 research aircraft to characterize oil sands emissions and their downwind physical and chemical transformations (Gordon et al., 2015;Liggio et al., 2016;Li et al., 2017).
One ground site, located at Wood Buffalo Environmental Association (WBEA) air monitoring station (AMS) 13 (Fig. 1), was equipped with a comprehensive set of instrumentation to measure concentrations of a wide range of trace gases and aerosols (Table 1), yielding a unique and new data set, parts of which are presented in this paper for the first time.As part of this effort, a gas chromatograph equipped with an ion trap mass spectrometer (GC-ITMS) was deployed at AMS 13.When air masses passing over regions with industrial activities were observed (as judged from a combination of local wind direction and tracer measurements), the total ion chromatogram showed an analytically unresolved hydrocarbon signal associated with intermediate-volatility organic compounds (IVOCs) with a saturation concentration (C * ) in the range of 10 5 µg m −3 < C * < 10 7 µg m −3 (Liggio et al., 2016).
Emission estimates for analytically unresolved hydrocarbons range from 5 × 10 6 to 14 × 10 6 kg yr −1 for the two facilities that reported such emissions (Li et al., 2017).Using aircraft measurements during the 2013 study, Liggio et al. (2016) showed that IVOCs contributed to the majority of the observed secondary organic aerosol (SOA) mass production in a fashion similar to how anthropogenic VOCs contributed to SOA production during the Deepwater Horizon oil spill (de Gouw et al., 2011) and rivaling the magnitude of SOA formation observed downwind of megacities (Liggio et al., 2016), though ultimately it has remained unclear which activities are associated with IVOC emissions.
In this paper, concurrent measurements of air pollutants at the AMS 13 ground site during the 2013 JOSM intensive study are presented.The analytically unresolved hydrocarbon signal was integrated and is presented as a time series and used as an input variable in a principal component analysis (PCA) to elucidate the origin of IVOCs in the Athabasca oil sands by association.The analysis presented here is a receptor analysis focusing on the normalized variability in pollutants impacting the AMS 13 ground site and hence does not constitute a comprehensive emission profile analysis of the oil sands facilities as a whole, for which aircraft-based measurements and/or direct plume or stack measurements are more suitable.PCA was chosen over the more popular positive matrix factorization (PMF) method (Paatero and Tapper, 1994) because it yields a unique solution and is particularly suited as an exploratory tool for identification of components without a priori constraints (Jolliffe and Cadima, 2016).The PCA was complemented by bivariate polar plots (Carslaw and Ropkins, 2012;Carslaw and Beevers, 2013) to show the spatial distribution of sources in the region as a function of locally measured wind direction and speed.A second PCA was performed to investigate which components correlate with (and generate) secondary pollutants, i.e., pollutants that are formed by atmospheric processes.Potential sources and processes contributing to each of the components identified by PCA are discussed.

Measurement location
Measurements of air pollutants were made at AMS 13 (Fig. 1), which is operated by WBEA.The site is located at −111.6423 longitude and 57.1492 latitude about 3 km from the southern edge of the community of Fort McKay, 300 m west from a public road, and 1 km west of the Athabasca River.The immediate vicinity of the site consisted of mixedleaf boreal forest with a variety of tree species, including poplar, aspen, pine, and spruce (Smreciu et al., 2013).The site was accessible via a gravel road; traffic on this road was restricted during the study period (August-September 2013).(Alberta, 2017).The red star indicates the location of AMS 13.
The site is impacted by emissions from nearby oil sands facilities (Table 1 and Fig. 1), including a large surface mining site operated by Syncrude Canada whose northeastern corner is located 3.5 km to the south of AMS 13 (and which is adjacent to the 5 km long Syncrude Mildred Lake (SML) tailing pond) and from a large upgrader stack facility operated by Suncor Energy Inc. located to the southeast.There are additional oil sands facilities operated (during the study period) by Canadian Natural Resources Limited, Imperial Oil, and Shell Canada to the north and northeast.A potentially important consideration is the photochemical aging of emissions between the points of emission and observation.During daytime, the average surface wind speed was 7.5 km h −1 (2.1 m s −1 ).The average transit times were 0.5 h to the edge of the closest mining operation, 1.6 h to the 12.2 km distant Mildred Lake Plant site, and 3.2 h to the Muskeg River Mine site located 23.7 km upwind.

Instrumentation
A large number of instruments were deployed for this study; a partial list of instruments whose data were utilized in this paper is given in Table 2. Detailed descriptions of these instruments and operational aspects such as calibrations are given in the Supplement.Sample observations of analytically unresolved hydrocarbons by GC-ITMS and how these data were used in the analysis are described in Sect."Analytically unresolved hydrocarbon signature" below.

Analytically unresolved hydrocarbon signature
As previously reported (Liggio et al., 2016), the total ion chromatogram of the GC-ITMS occasionally showed elevated and analytically unresolved hydrocarbons in the volatility range of C 11 -C 17 with a saturation vapor concentration (C * ) of 10 5 µg m −3 < C * < 10 7 µg m −3 .An example is shown in Fig. 2.
An offline analysis of the headspace above ground-up bitumen gave a similarly unresolved hydrocarbon signal (Fig. 2, black trace).In this particular case, the ambient air chromatogram also shows enhancements of lower-molecularweight hydrocarbons (possibly from naphtha) that were not observed in the bitumen sample.The observed unresolved hydrocarbon feature is qualitatively similar to the "large chromatographic hump of unresolved complex mixtures" reported by Yang et al. (2011) during their analysis of bitumen extracts.
The major ions contributing to the unresolved signals in Fig. 2 are associated with alkanes (i.e., m/z 55, 57, 67, 69; see Fig. S1 in the Supplement).In contrast, counts at masses associated with aromatics (i.e., m/z 115, C 9 H + 7 , and m/z 91, C 7 H + 7 ) as reported by Cross et al. (2013) were negligible in both the bitumen headspace and polluted day samples.The resemblance of the unresolved hydrocarbon feature in ambient air with the bitumen headspace sample in terms of both volatility (i.e., elution time) and electron impact mass fragmentation is consistent with bitumen as the source of IVOCs at this site.Burtscher et al. (1982) n/a -not applicable.
In the interpretation of the integrated IVOC signal, it is assumed that it is of primary origin, i.e., emitted directly from point sources in the vicinity of the measurement site.For the PCA, the unresolved signal was integrated from a retention time of 25 to 45 min (gray area in Fig. 2) in all ambient air chromatograms.
The IVOCs observed in this work likely encompass a portion of the total that is emitted.For example, IVOCs generated by combustion processes, such as diesel engine exhaust, are comprised of aliphatic alkanes, including cyclic and branched alkanes, and aromatics (Gentner et al., 2012;Zhao et al., 2015).The use of a chromatographic column in this work biases the IVOC signal towards hydrocarbon IVOCs since oxygenated compounds (i.e., alcohols and acids) will not elute from the analytical column.Furthermore, the recovery of VOCs from the pre-concentration unit, while reproducible and likely complete for n-alkanes, which bracket the bulk of IVOC emitted and whose calibration curves were lin- ear, is not known for late-eluting compounds but is assumed to be sufficiently reproducible to yield a semiquantitative signal.

Principal component analysis
The PCA was carried out using the Statistical Analysis System (SAS ™ ) Studio 3.4 software (SAS, 2015) using a method similar to that described by Thurston et al. (2011) and Thurston and Spengler (1985).The source-related components and their associated profiles are derived from the correlation matrix of the input trace constituents.This approach assumes that the total concentration of each "observable" (i.e., input variable) is made up of the sum of contributions from each of a smaller number of pollution sources and that variables are conserved between the points of emission and observation.

Selection of variables
A total of 22 variables whose ambient concentrations are dominated by primary emissions or which are formed very shortly after emission (such as the less-oxidized oxygenated organic aerosol (LO-OOA) factor observed by the SP-AMS; see below) were included in the PCA (Table 3).These variables included CO 2 , CH 4 , NO y , CO, and SO 2 , which are known to be emitted in the oil sands region from stacks, the mine fleet and faces, tailing ponds, and fugitive emissions (Percy, 2013).The median ratio of NO x (= NO + NO 2 ) to NO y was 0.85, consistent with the close proximity of the measurement site to emission sources and limited chemical processing.Because NO x constituted a large fraction of NO y , its temporal variation was captured by the latter, and it was not included as a separate variable in the PCA.
For this work, mixing ratios of all non-methane hydrocarbons (NMHCs) that were quantified (i.e., o-xylene, the n-alkanes decane and undecane, the aromatics 1,2,3-and 1,2,4-trimethylbenzene (TMB), and limonene and αand βpinene) were included as variables.In addition, the aforementioned unresolved signal associated with IVOCs was included as a variable by integrating total GC-ITMS ion counts (m/z 50-425) over a retention time range of 25-45 min (retention index range of 1100 to 1700).
Gas-phase ammonia was included as a variable because elevated reduced nitrogen concentrations have been observed in the region and were linked to the use of ammonia on an industrial scale, for example as a floating agent and for hydrotreating (Bytnerowicz et al., 2010).Total sulfur and total reduced sulfur were added as tracers of upgrader stack SO 2 emissions and of "odors", believed to be emitted from oil sands tailing ponds, which continue to be of concern in surrounding communities (Small et al., 2015;Percy, 2013;Holowenko et al., 2000).
Refractory black carbon was added as a variable since it is present in diesel truck exhaust and in biomass burning plumes and is hence a combustion tracer (Wang et al., 2016;Briggs and Long, 2016).Particle-surface-bound polycyclic aromatic hydrocarbons (pPAHs) were included because of their association with facility stack emissions and combustion particles in the area (Allen, 2008;Grimmer et al., 1987).Hydrocarbon-like organic aerosol (HOA) was included as a surrogate for fossil fuel combustion by vehicles (Jimenez et al., 2009).The LO-OOA factor was included as it appears to form rapidly after emission of precursors (Lee et al., 2018).Supermicron aerosol volume (PM 10−1 , i.e., the volume of particles between PM 10 and PM 1 ) was also included as a tracer of coarse particles from primary sources, which are expected to be dominated by dust emissions.
To assess which components impact secondary product formation, a second PCA was performed that included variables mainly formed through atmospheric chemical processes and whose concentrations more strongly depend on air mass chemical age than those variables selected initially.In this PCA, odd oxygen (O x = O 3 + NO 2 ), submicron aerosol SO 2−  4(p) , NO − 3(p) , NH + 4(p) , a second, more-oxidized OOA factor (MO-OOA), and PM 1 volume were included, increasing the total number of variables to 28 (Table 4).Furthermore, since oxidation of IVOCs leads to formation of SOA (Robinson et al., 2007;Lee et al., 2018), and the photochemical conversion of IVOC to SOA may adversely affect the PCA, a PCA without secondary and aerosol variables is presented in the Supplement (Table S10).

Treatment of input data
Data used in the PCA were averaged to match the time resolution of the GC-ITMS VOC and IVOC measurements, i.e., a Values were determined only from data points included in the PCA, not from the entire campaign.b Average and relative standard deviation were calculated before zeros were replaced with 0.5 × LOD.c RSD: relative standard deviation.d LOD: limit of detection.e ppt: parts per trillion by volume (10 −12 ).f NA: data not available.g Calculated using 3 × standard deviation at ambient background levels.
over 10 min long periods (spaced ∼ 1 h apart) set by the start and stop times of the GC-ITMS pre-concentration period.
When concentrations were below their respective limit of detection (LOD; values are given in Table 3), half the reported LOD was used to minimize bias (Harrison et al., 1996;Polissar et al., 1998;Zhao et al., 2004;Guo et al., 2004).Prior to PCA, input variables were standardized to eliminate unit differences by subtracting the mean concentration C i of pollutant i from the concentration of sample k (C i,k ) and dividing by the standard deviation (s i ) of all samples included in the PCA.
Here, Z i,k is the standardized pollutant concentration.In total, 218 data points from all identified species over the period of the campaign were used for the main PCA.

PCA solutions
In this work, the varimax method (Kaiser, 1958) was used to rotate the loading matrix.This method is an orthogonal rotation (i.e., components are not expected to correlate), which minimizes the impact of high loadings, making the results easier to interpret (Kaiser, 1958).Several criteria (Table S10) were considered for component selection: the latent root criterion, i.e., on the basis that rotated eigenvalues must be greater than unity, the (cumulative) percentage of variance criterion, for which the extracted components account for > 95 % of the variance, and the scree test (Fig. S2) (Thurston  , 1985;Guo et al., 2004;Hair et al., 1998;Cattell, 1966).For the optimal solution presented in the main paper, the 95 % variance criterion was chosen, providing a 10-component solution for the PCA with only primary variables and an 11-component solution for the PCA with both primary and secondary variables.Components 1 through 4 were consistent regardless of the number of components retained.Solutions with fewer and more components are presented in the Supplement.Time series of each of the components were calculated by multiplying the original standardized matrix by the rotated loading matrix and were used to generate bivariate polar plots (Sect.2.4).

Bivariate polar plots
The PCA was complemented by bivariate polar plots showing the wind speed and direction dependence of air pollutant concentrations.The use of these representations implies a linear relationship between local wind conditions and air mass origin, which may not always be the case (for example, during or after stagnation periods).In addition, local topography, such as the Athabasca River valley, complicates regional air flow patterns and limits the interpretability of polar plots in general and in particular to the E of AMS 13, where the river valley is located.The plots were generated with the OpenAir software package (Carslaw and Ropkins, 2012;Carslaw and Beevers, 2013) using the R programming language and the open-source software "RStudio: Integrated development environment for R" (RStudio Boston, 2017).The default setting (100) was used as the smoothing function.

Overview of the data set
Time series of the 22 pollution tracers chosen for PCA are presented in Fig. 3, grouped approximately by source type.Statistics of the data (i.e., median, average, maxima, minima) are summarized in Table 3.Time series of VOCs of primarily anthropogenic origin (i.e., o-xylene and 1,2,3-and 1,2,4-TMB) as well as the IVOC signature are shown in Fig. 3a.The abundances of these species, as well as the other compounds, were highly variable and varied as a function of time of day (i.e., boundary layer mixing height) and air mass origin, with higher VOC concentrations generally observed during daytime.The VOC concentrations varied between nearly pristine, remote conditions, with concentrations below detectable limits, and mixing ratios of aromatic species exceeding 100 pptv.The concentration range of o-xylene is within the extremes reported by WBEA in their 2013 annual report (WBEA, 2013), exemplifying that the data set is representative of typical pollutant levels in this region.
While there is some obvious covariance among variables (i.e., when the mixing ratios of one particular VOC increase, so do others), the ratios of hydrocarbons varied considerably.For example, on 18 August at 10:50 UTC, the n-decaneto-o-xylene ratio was ∼ 22 : 1, whereas on 24 August at 07:40 UTC it was ∼ 1 : 5.7.The IVOC magnitude also varied greatly and often increased and decreased in tandem with the other VOCs (e.g., on 24 August at 16:30 UTC) but also increased independently from the other VOC abundances (e.g., on 30 August at 01:20 UTC and on the night of 22 August).This behavior suggests the presence of multiple sources with distinct signatures that are being sampled to a varying extent at different times or, perhaps, a single source whose emission profile varies.This, coupled with the intermittency of the highly elevated signals, presents an analysis problem frequently encountered in environmental analysis that is usually investigated through a factor analysis or PCA (Thurston et al., 2011;Guo et al., 2004).
Presented in Fig. 3b are the time series of NO y , rBC, and pPAH abundances, all of which are combustion by-products.For example, rBC is emitted from combustion of fossil fuels, biofuels, open biomass burning, and burning of urban waste (Bond et al., 2004).Similar to the VOCs, the abundances of these species varied greatly, from very low continental background levels (i.e., < 100 pptv of NO y , < LOD for rBC and pPAHs) to polluted concentrations (i.e., > 60 ppbv of NO y , > 1 µg m −3 rBC, > 10 ng m −3 pPAHs) characteristic of polwww.atmos-chem-phys.net/18/17819/2018/Atmos.Chem.Phys., 18, 17819-17841, 2018 luted urban and industrial areas.When high concentrations of NO y were observed, its main component was NO x (data not shown), which is a combustion by-product usually associated with automobile exhaust.In the Alberta oil sands, emissions from off-road mining trucks as well as the upgrading processes are the main contributors to the NO y burden (Percy, 2013;Watson et al., 2013).Shown in Fig. 3c are the mixing ratios of the greenhouse gases CH 4 and CO 2 along with CO.Abundances of CO 2 were clearly attenuated by photosynthesis and respiration of the vegetation near the measurement site, as judged from the strong diurnal cycle in its concentration (not shown).Maxima typically occurred shortly after sunrise, coincident with the expected breakup of the nocturnal boundary layer.In addition to biogenic emissions from vegetation and soil, CO 2 originates from a variety of point and mobile sources in this region, including off-road mining trucks (Watson et al., 2013) and the extraction, upgrading, and refining of bitumen, and on-road vehicle sources in the area (Nimana et al., 2015a, b).Concentrations of CO 2 spiked whenever these emissions were transported to the measurement site.
Concentrations of CH 4 also exhibit a diurnal cycle, with higher concentrations generally observed at night and peaking in the early morning hours.While CH 4 and CO 2 mixing ratios frequently correlated in plumes, their ratios were variable overall, suggesting they often originated from distinct sources.Potential methane point sources in the region include microbial production in tailing ponds (Siddique et al., 2012) and fugitive emissions associated with the mining and processing of bitumen (Johnson et al., 2016).Indeed, a recent analysis shows tailing ponds and open pit mining sources to be the largest sources of CH 4 in the region (Baray et al., 2018).
Similar to the anthropogenic VOCs, the abundances of CH 4 and CO 2 were highly variable and ranged from minima of 1.88 and 384 ppmv to maxima of 2.96 and 578 ppmv, corresponding to maximum enhancements of 1.63 and 1.47 relative to tropospheric global monthly means of 1.806 ± 0.001 and 394.3±0.1 ppmv for July 2013 (Dlugokencky, 2017a, b), respectively.
Mixing ratios of CO also varied with time but generally were not elevated greatly (median 118 ppbv) above back-ground levels (minimum 91 ppbv), except for occasional spikes in concentration (Fig. 3c).Carbon monoxide is a tracer of biomass burning and fossil fuel combustion, in particular in automobiles with poorly performing or absent catalytic converters, but is also a by-product of the oxidation of VOCs, in particular of methane and isoprene, which are oxidized over a wide area upwind of AMS 13 (Miller et al., 2008).
Time series of sulfur species and PM 10−1 volume are shown in Fig. 3d.The total sulfur (TS) and SO 2 data are dominated by intermittent plumes containing SO 2 mixing ratios exceeding 5 ppbv.The highest mixing ratio observed was 92.5 ppbv (in between the preconcentration periods of the GC-ITMS).Mixing ratios of SO 2 exhibited the most variability in all pollutants, as judged from the relative standard deviation of each of the measurements (Table 3).TRS levels were generally small (< 1 ppbv) and variable, except for plumes; TRS abundances in plumes, however, are more uncertain since they were calculated by subtraction of two large numbers.When TS and SO 2 abundances were low (< 1 ppbv), TRS abundances were variable and occasionally exhibited spikes that did not show any obvious correlation with other variables, suggesting the presence of one or more distinct TRS sources.PM 10 volume concentrations varied a lot as well and, just like TRS, did not show an obvious correlation with other variables.Fugitive dust emissions likely contributed to much of the PM 10 volume in the Athabasca oil sands region (Wang et al., 2015).
Time series of monoterpene mixing ratios are shown in Fig. 3e.α-Pinene was generally the most abundant monoterpene, followed by β-pinene.Their ratio, averaged over the entire campaign, was 1 : 0.85, though occasionally the αpinene-to-β-pinene ratio was below 1 : 2 (e.g., on 28 August at 14:50 UTC and 5 September at 12:40 UTC).Terpene mixing ratios were generally higher at night than during the day, with maxima of 1.9 and 1.6 ppbv, respectively, a diurnal pattern consistent with what has been observed at other forest locations (Fuentes et al., 1996).Monoterpenes are emitted by plants via both photosynthetic and non-photosynthetic pathways (Fares et al., 2013;Guenther et al., 2012); at night, their emissions accumulate in a shallow nocturnal boundary layer, whereas during daytime, they are entrained aloft (above the canopy) and oxidized by the hydroxyl radical (OH) and O 3 , which are more abundant during the day than at night (Fuentes et al., 1996).αand β-pinene mixing ratios were lowest midday (median values at noon of 140 and 133 pptv, respectively).The largest daytime concentrations were observed on 25 August, a cloudy day (as judged from spectral radiometer measurements of the NO 2 photolysis frequency): on this particular day, mixing ratios at noon were 687 and 850 pptv, respectively.
Also shown in Fig. 3e is the time series of ammonia.These data were dominated by spikes, which were observed sporadically and did not correlate with other variables, suggesting the presence of nearby ammonia point sources.Ammo- nia was not as variable as some of the other pollutants (e.g., the anthropogenic VOCs, sulfur species) as judged from its relative standard deviation (Table 3), which suggests a geographically more disperse source or sources similar to CO or CH 4 , which have a "background".This is consistent with a recent study by Whaley et al. (2018) that estimated over half (∼ 57 %) of the near-surface NH 3 during the study period originated from NH 3 bidirectional exchange (i.e., reemission of NH 3 from plants and soils), with the remainder being from a mix of anthropogenic sources (∼ 20 %) and forest fires (∼ 23 %).

PCA with primary variables
The loadings of the optimum solution are presented in Table 5.The 10-component solution accounts for a cumulative variance of 95.5 %.The communalities for the analysis, i.e., the fraction of total pollutant observations accounted for by the PCA, are all greater than 85 %, with the lowest communality obtained for the IVOCs (0.86).
In the following, an overview of the observed components is presented.Associations with r > 0.7, r > 0.3, and r > 0.2 are referred to as "strong", "weak", and "poor", respectively.Hypothesized identifications are given in Sect. 4 and are summarized in Table 6 and Fig. 4.
Components 1 through 4 emerged regardless of the number of components used to represent the data, whereas the structure of components 5 through 10 only fully emerged in the 10-component solution (see Supplement).Hence, components 6 through 10 are somewhat tentative as many (i.e., 7-9) are single-variable components and have eigenvalues close to or below unity, i.e., account for less variance than any single variable.As a result, the interpretations of these components are subject to more uncertainty and are more speculative but are presented in the Supplement for the sake of completeness and transparency.For the purpose of this paper, this is inconsequential as components 6-10 are not associated with IVOCs.

Extended PCA with added secondary variables
The loadings of the optimum solution that includes primary and secondary variables are shown in Table 7.In this 11component solution, the 10 components originally identified were preserved, though their relative order was changed, with the upgrader component moving from the fourth to second position.There was one new component (6), which encompassed only secondary species, including MO-OOA (r = 0.92), O x (r = 0.33), NO − 3(p) (r = 0.36), PM 1 (r = 0.31), and LO-OOA (r = 0.31).
NH + 4(p) , SO 2− 4(p) , and NO − 3(p) are associated with the stack emissions component (2, with r = 0.84, 0.84, and 0.44, respectively), which was also weakly correlated with PM 1 (r = 0.44) and O x (r = 0.36).The association of secondary variables with the primary components suggests rapid formation of these secondary products on a timescale that is similar to the transit time of the pollutants to the measurement site.PM 1 correlated strongly with the major IVOC component (component 5, r = 0.80), which was also weakly associated with LO-OOA (r = 0.66) and NO − 3(p) (r = 0.59), as well as NH + 4(p) and SO 2− 4(p) (r = 0.32 and 0.33, respectively).

Bivariate polar plots
Bivariate polar plots were generated for all components and their dominant associated variables and are shown in the Supplement (Figs.S2-S11).Winds were predominantly from the SW but were also observed often from the S and N. Figure 5a shows the plot for IVOCs.The highest concentrations were observed when the local wind direction was from the NE, where several facilities including the Aurora North, Musket River, and Jackpine mines and large swaths of disturbed and cleared land are located in close proximity to each other (Table 1 and Fig. 1).The second highest IVOC signal intensity was observed when local wind direction was from the SSE.The bivariate polar plots of the three components associated with IVOCs are shown in Fig. 5b-d.These components are associated with winds from the NE, E, SE, and S at low to moderate speeds (1-3 m s −1 ).Component 5 (Fig. 5b) was the most strongly correlated with IVOCs and shows the most spatial overlap with the distribution of the IVOC source; however, the intensities differ owing to the association of component 5 with other variables such rBC and LO-OOA.(Cho et al., 2012), examined VOCs (Bari and Kindzierski, 2018;Bari et al., 2016) and PM 2.5 (Bari and Kindzierski, 2017;Landis et al., 2017) impacting the nearby communities of Fort McKay and Fort McMurray, or investigated pollutants such as PAHs as they affect sediments (Jautzy et al., 2013) or lichens (Landis et al., 2012).The measurement suite in this work encompassed a larger variety of collocated analytical instruments closer to oil sands mining operations than these earlier studies and included the first direct observation of airborne IVOCs, which is unique to this area and we have not observed it elsewhere where we have made GC-ITMS measurements, i.e., in Calgary and on Vancouver Island (Tokarek et al., 2017a).
The main objective of this work was to elucidate the origin of the IVOC signature observed at the AMS 13 ground site downwind from the Athabasca oil sands mining operations (Fig. 2) through a PCA.The optimum solution identified 10 components, of which three were associated with the IVOC signature: 1, 2, and 5 (Table 5).Tentative assignments of these components to source types in the oil sands are given in Table 6 and are discussed below.Emission inventories show that the facilities that process the mined bitumen are by far the largest anthropogenic point sources in the oil sands region (NPRI, 2013), consistent with recent aircraft measurements (Baray et al., 2018;Howell et al., 2014;Li et al., 2017;Simpson et al., 2010), which have shown substantial emissions of NO y , SO 2 , CO, VOCs, CO 2 , and CH 4 , from these facilities and associated mining activities.No single component correlates with all of these variables, suggesting that the PCA is able to distinguish among source types within the facilities such as tailing ponds (component 1), stack emissions (component 4), and mining (component 2).
Close-up overflights (Howell et al., 2014;Li et al., 2017;Baray et al., 2018) were able to spatially resolve various oil sands facility emission sources (i.e., tailing ponds from upgrader facilities, fluid coking reactors, hydrocrackers and hydrotreaters); the PCA presented in this paper is not expected to do this in all cases because some emissions would have frequently merged into a single plume by the time of observation at AMS 13; unless their emissions vary considerably in time, these sources could be interpreted as originating from a single source in the PCA.
The discussion below focuses on components that are associated with IVOCs (Sect.4.1), followed by those that are not (Sect.4.2).The PCA that included six secondary products is discussed in Sect.4.3.Components that are not associated with IVOCs and have only tentatively been identified (i.e., components 6-10) are discussed in the Supplement.

Component 1: tailing ponds (wet tailings)
Component 1 is strongly associated with anthropogenic VOCs (r > 0.87) and weakly with TRS (r = 0.59) and CH 4 (r = 0.59).These pollutants originate from tailing ponds (Small et al., 2015), though it is unclear from this analysis how large a source tailing ponds are compared to fugitive emissions of these pollutants from the nearby processing (e.g., bitumen separation and mining) facilities.
Tailing ponds cover large areas of land and are used to slowly (on a timescale of years to decades) separate solid components, or tailings, from water used in bitumen extraction.Residual bitumen often floats to the top of the settling basins.Most tailing ponds are "wet" (as they contain residual naphtha that is used as a diluent during the transfer of tailings to the ponds) and emit VOCs, CH 4 , and CO 2 (Small et al., 2015).The presence of o-xylene, TMB, and the nalkanes in component 1 is consistent with the fugitive release of VOCs from residual naphtha, which contains these compounds (Siddique et al., 2008(Siddique et al., , 2011;;Small et al., 2015).Furthermore, the observation of TRS and CH 4 from this source is consistent with the presence of anaerobic sulfur-reducing bacteria and methanogens within the ponds, which degrade not only the residual bitumen (Holowenko et al., 2000;Percy, 2013;Quagraine et al., 2005) but also the various components of naphtha (Shahimin and Siddique, 2017;Small et al., 5) and in a recent aircraft study (Baray et al., 2018).
While component 1 correlates with CH 4 (r = 0.59), it does not correlate with CO 2 (r = 0.09).Emissions of CH 4 from tailing ponds due to methanogenic bacterial activity are well documented (Small et al., 2015;Yeh et al., 2010) and hence the correlation with CH 4 is not unexpected.Conversely, the lack of correlation with CO 2 seems inconsistent with emission inventories that generally present tailing ponds as large CO 2 sources (Small et al., 2015).One plausible explanation is that tailing ponds are a relatively small CO 2 source overall in the region and that other, larger CO 2 sources and sinks (such as photosynthesis and respiration by the vegetation surrounding the site) dominate the variance impacting the PCA results.It may also indicate that, at least on aggregate and for the particular ponds detected in this work, the emissions are in a regime in which the release of CH 4 dominates over CO 2 , i.e., the ponds have, perhaps, become more anoxic than believed to be the case in previous studies and hence emit more CH 4 (Holowenko et al., 2000).For example, Small et al. (2015) showed that older tailing ponds (those without the addition of fresh froth or thickening treatments) tended to emit more CH 4 , while newer ponds are associated with higher VOC emissions.It is likely that component 1 is dominated by the nearest pond (the Mildred Lake settling basin, 6-11 km SSE of AMS 13) and other tailings in the SE where the majority of air samples originated from.The Mildred Lake settling basin is one of the oldest in the region and is still actively being used; the correlation with CH 4 and VOC emissions is hence expected.
Component 1 is also associated with NO y , rBC, CO, and HOA, though these correlations are relatively modest (r = 0.27, 0.30, 0.41, and 0.40, respectively).These species typically originate from combustion sources, such as generators and motor vehicles, including diesel-powered engines powering generators or pumps; it is not obvious if and to what extent these are operated on or near tailing ponds, though.Satellite observations have shown elevated concentrations of NO 2 above on-site upgrader facilities, likely a result of emissions from extraction and transport sources (McLinden et al., 2012).In addition, one of the major highways of the region is located adjacent to the Mildred Lake settling basin and other major ponds in the region; highway traffic emissions (of CO, NO y , rBC, and HOA) may hence also be partially included in component 1.
The bivariate polar plot shows that component 1 was observed when local wind speeds were from the SE and E of the measurement site (Fig. 5c), which is consistent with the notion that the Mildred Lake settling basin and emissions along Highway 63 and, potentially, more distant facilities are sources contributing to this component.
Component 1 is associated with the IVOC signature, though to a lesser degree than components 2 and 5.The association of the IVOC signal with component 1 is slightly poorer (r = 0.31) than the association with component 2 (r = 0.39), but significantly poorer than the association with component 5 (r = 0.74).One possible explanation for the association of IVOCs with tailing ponds vapor is the presence of bitumen in the ponds that was not separated from the sand during the separation stage (Holowenko et al., 2000).This semi-processed bitumen would be expected to emit the same IVOC vapors as those that were observed in the lab (Fig. 2).Tailing ponds contain anywhere from 0.5 % to 5 % residual bitumen by weight (Chalaturnyk et al., 2002;Holowenko et al., 2000;Penner and Foght, 2010).As illustrated in Fig. 4a, some of this material floats on the ponds' surfaces, where IVOCs can partition to the air.Emission of IVOCs from bitumen floating on tailing ponds would be a function of many variables (e.g., diluent composition, extraction methodology, settling rate, temperature) and is thus not expected to be as persistent as CH 4 partitioning from the ponds to the above air or from exposed bitumen on the mine surface, leading to a lower overall correlation.
Component 1 is also weakly associated with LO-OOA (r = 0.45).Liggio et al. (2016) found that the observed SOA is dominated by an OOA factor whose mass spectrum was similar to that of aerosols formed from oxidized bitumen vapors.The organic aerosol budget in this study was also dominated by an OOA factor, the LO-OOA (Lee et al., 2018).The association of LO-OOA with component 1 is thus consistent with its association with IVOCs.

Component 2: mine fleet and vehicle emissions
Component 2 strongly correlates with NO y (r = 0.82), rBC (r = 0.77), pPAH (r = 0.94), and HOA (r = 0.74), which suggests a combustion source such as diesel engines.In the Athabasca oil sands, there is a sizeable off-road mining truck fleet consisting of heavy aggregate haulers.In addition, there are diesel engine sources associated with generators, pumps, and land-moving equipment, i.e., graders, dozers, hydraulic excavators, and electric rope shovels (Watson et al., 2013;Wang et al., 2016).Most of these non-road applications have been exempt from highway fuel taxes, on-road fuel formulation requirements, and after-engine exhaust treatment (Watson et al., 2013).Emissions from the hauler fleet and the stationary sources would fit the profile of component 2. Other diesel engines operated in the region include a commuter bus fleet, pickup and delivery trucks, tractor-trailers, and privately owned diesel-powered automobiles used to commute from the work sites to the major residential areas around Fort McMurray, whose emissions are likely captured by component 2 as well, though the magnitude of these relative to the mining truck fleet is not known.Consistent with component 2 being associated with an anthropogenic source is its poor correlation with undecane (r = 0.27), likely arising from fugitive fuel emissions.
The bivariate polar plot (Fig. 5d) for component 2 and NO y in particular (Fig. S4a) matches the location of Highway 63, which crosses the river to the SE of AMS 13 and bends to the E and is indicative of a line source.At the same time, some of the largest mining operations in the region, the Susan Lake Gravel Pit, Aurora North, Muskeg River, and Millennium mines are located to the NE and SE of AMS 13 as well.NO y , rBC, and HOA (Fig. S4a, b and d) all appear to have dominating point sources to the S and E when wind speeds are 1-2 m s −1 .These directions are the same as the Fort McKay industrial park to the E and the Syncrude Mildred Lake facility parking lot to the S, which would have a higher concentration of vehicles emitting these pollutants in a smaller area, whose emissions would be in addition to those from industrial activities.
Component 2 is associated with the IVOC signature and CH 4 (both r = 0.39).The mining activities bring bitumen to the surface; similar to what we had observed in lab experiments (Fig. 2, black trace), the surface exposure of bitumen during mining and on-site processing is expected to be associated with fugitive emissions of CH 4 (Johnson et al., 2016) and IVOCs.
Fine-fraction pPAHs are associated strongly with component 2, but no other components.Measurements of individual PAHs in snow and moss downwind from the oil sands facilities have identified multiple sources of PAHs in the Athabasca oil sands, which include windblown petroleum coke dust (also referred to as petcoke for short), a carbonaceous residual product from the upgrading of crude petroleum that is stockpiled on mine sites, and emissions from fine tailings, oil sands ore, and naturally exposed bitumen (Zhang et al., 2016;Jautzy et al., 2015;Parajulee and Wania, 2014).Given this diversity of known sources, the associations of PAHs with only a single component is surprising, though it indicates that emissions from the mining fleet (which would include diesel and, perhaps, windblown emissions from petcoke that is being transported) gave rise to most of the variability in surface-bound PAH concentrations in this data set.The petcoke emissions identified in the studies mentioned above are likely mainly associated with larger supermicron-sized particles, whose PAH content would not be detected by the pPAH measurement in this data set.
Component 2 is not associated with LO-OOA (r = 0.11), even though IVOCs are associated with this component.This feature may indicate that the IVOCs emitted in component 2 are qualitatively different from those emitted by components 1 and 5, in that they are less likely to yield organic aerosol on the timescale of transport from emission to observation.One reason for the difference could be that the bitumen that is transported by the mining fleet is relatively freshly exposed, whereas the IVOCs released from tailing ponds or from mine faces (component 5) may have been oxidized to a greater extent and hence more prone to rapid aerosol formation.
There is no association of component 2 with CO 2 (r = 0.08).This is somewhat unexpected as the trucks are expected to release CO 2 (Wang et al., 2016) but could be due to significantly larger CO 2 sources in the area dominating the observed CO 2 variability at AMS 13 (e.g., components 3 and 6).Furthermore, one would expect an association of non-road mining truck emissions with aromatics and alkanes.Component 2 exhibited only poor correlations with decane (r = 0.22) and undecane (r = 0.27) and no correlation with o-xylene (r = 0.08), suggesting that other components (i.e., component 1) explained most of the variability in their concentrations at this site.
One possibility is that these emissions occur on mine faces, where previously unexposed bitumen is brought to the surface as a result of mining.Only a relatively small portion of the mine faces are actively mined; those parts give rise to rBC and NO y emissions from combustion engines in heavy haulers or generators powering equipment.The poor association of component 5 with TRS could be due to sulfurreducing bacteria found on the surface of bitumen.However, most of the variability in TRS at AMS 13 is attributed to composite or "dry" tailing ponds given their more conducive environment to microbial activity.
Component 5 does not correlate with CO 2 (r = −0.03)or with CH 4 (r = 0.12), which is somewhat at odds with the notion of mine faces as the main source of IVOCs.The mine faces give rise to substantial fugitive emissions of CO 2 and CH 4 (Johnson et al., 2016) -these emissions are likely captured by component 6 in this analysis (see Supplement).It is unclear to what extent these greenhouse gases are released relatively quickly from "hot spots" (i.e., from a small number of locations) through surface cracks and fissures or by slow release from new material that is exposed and then releases greenhouse gases during material handling, transport, and processing (Johnson et al., 2016).IVOCs from surfaceexposed bitumen are likely released by the latter mechanism and are temperature dependent.If the mine faces are indeed the main IVOC source, the analysis results presented here suggest that the IVOC emissions from surface-exposed bitumen on mine faces are decoupled from CH 4 emissions in time and appear as a distinct component and hence corroborate the hot spots or fast-release hypothesis, though clearly more work is needed to characterize greenhouse gas emissions from oil sands mine faces.
The association of IVOCs with component 5 may also be a result of fugitive emissions during the hot-water-based extraction of bitumen sand slurries during the separation phase of bitumen treatment.Generally, bitumen is extracted in a weak alkaline environment by aeration of the solution to optimize the separation of sand and bitumen (Masliyah et al., 2004).Unrecovered bitumen and naphtha then end up in tailings.The recovered bitumen and naphtha are moved to upgrader facilities where they undergo further treatment (such as coking or hydrotreatment).The magnitude of fugitive emissions during these downstream extraction processes could be large, considering the bitumen is heated and actively aerated.Future work should investigate IVOC fluxes near extraction plants and on mine faces.
Component 5 correlates strongly with LO-OOA (r = 0.72), which is likely generated in part by photochemical aging of IVOCs.A back-of-the-envelope calculation using a k OH of 1.8 × 10 −11 cm 3 molecule −1 s −1 based on the fact that used diesel exhaust IVOCs (Zhao et al., 2014) and an estimated midday OH concentration of 7×10 6 molecules cm −3 (Liggio et al., 2016) gives a first-order lifetime of 130 min with respect to IVOC oxidation by OH during daytime.The photochemical age, estimated using relative concentrations of 124-TMB and n-decane and the method described by Borbon et al. (2013), during daytime was 1.0 ± 0.4 h; assuming similar photochemical ages, we estimate that between 25 % and 50 % of the emitted IVOC is (potentially) oxidized during daytime (see Supplement).This oxidation will contribute SOA growth (Kroll et al., 2011).Hence, we expect some formation and growth of organic aerosol associated with component 5.
Finally, it is conceivable that a "natural" background of IVOCs exists in the region (since bitumen can be found at or near the surface in many parts of the region); such a natural background would also be included in component 5.However, this natural bitumen would have been exposed at the surface for geological timescales and, unlike unexposed, buried bitumen, likely would have lost most of its volatile content over that period.Furthermore, the mine faces occupy large swaths of land in the region (as evident from satellite imagery).Thus, the IVOC emissions are more likely due to anthropogenic activity than due to a natural phenomenon.
At AMS 13, CO 2 and the monoterpenes exhibit a very similar diurnal cycle: they are present in higher concentrations during the night than during the day (Fig. 3) due to a decrease in the boundary layer height (BLH) at night coupled with plant respiration of CO 2 and non-photochemical emission of monoterpenes (Fares et al., 2013;Guenther et al., 2012).During the day, mixing ratios of CO 2 are lower due to plant uptake and photosynthesis, and mixing ratios of terpenes are lower due to higher mixing heights and vertical entrainment and due to oxidation by O 3 and OH (Fuentes et al., 1996).Hence, the PCA gives a positive correlation of monoterpenes with CO 2 even though the physical processes, photosynthesis and respiration, work in the opposite direction.
The bivariate polar plots (Fig. S5a-c) show that the monoterpenes and CO 2 were observed in the highest concentrations when the wind speeds were low (< 1 m s −1 ), consistent with formation of a stable nocturnal boundary layer.
To corroborate this interpretation, the PCA was repeated with BLH estimated by a light detection and ranging (LI-DAR) instrument (Strawbridge, 2013) added as a variable (Table S9 in the Supplement).Since BLH is not emitted by any source, it appears as a single variable component (r = 0.90).The only other component that BLH (anti)correlates with is the biogenic component 3 (r = −0.35).
The dominant monoterpene species observed was αpinene, followed by β-pinene and limonene, though occasionally there was twice as much β-pinene than α-pinene in the sampled air.Some variability in this ratio is expected since emission factors vary considerably among tree species (Geron et al., 2000), which are not homogeneously dis-tributed throughout the region (e.g., Fig. S1 of Rooney et al., 2012).Simpson et al. (2010) observed enhancements of α-pinene and, to a greater extent, β-pinene over the oil sands (up to 217 and 610 pptv) compared to background levels of 20 ± 7 and 84 ± 24 pptv, respectively, during midday overflights (which occurred between 11:00 and 13:00 local time).Similar enhancements were also reported by Li et al. (2017), who observed emissions of biogenic hydrocarbons in the four facilities sampled, three of which showed a higher βthan αpinene concentration.The PCA (Table 5) showed no significant correlation of αand β-pinene with any of the anthropogenic components, which implies that the biogenic source strength is simply too large for any anthropogenic emissions of terpenes to be picked up in the analysis, especially considering that terpenes are relatively short lived.
The biogenic source shows poor anticorrelations with NO y (r = −0.26)and NH 3 (r = −0.24).Many NO y species (i.e., NO 2 , HONO, peroxycarboxylic nitric anhydrides or PAN, and HNO 3 ) deposit to the forest canopy (Hsu et al., 2016;Min et al., 2014;Fenn et al., 2015); at night, when mixing heights are lower, their concentrations are expected to decrease faster than during the day and are thus out of phase with the CO 2 and terpene concentrations.The poor anticorrelation with NH 3 likely arises because the NH 3 emissions from plants are mainly stomatal and scale with temperature and are hence larger during the day than at night, anticorrelated with the terpene source (Whaley et al., 2018).

Component 4: Upgrader emissions
Component 4 is strongly correlated with SO 2 (r = 0.97) and total sulfur (r = 0.93).By far the largest source of SO 2 in the region are upgrader facilities, which emit as much as 6 × 10 7 kg annually according to emission inventories (ECCC, 2013).Significant SO 2 emissions from upgrader facilities have recently been confirmed by aircraft studies (Simpson et al., 2010;Howell et al., 2014;Liggio et al., 2016).Component 4 is also poorly correlated with NO y (r = 0.21) but not with rBC (r = 0.05), consistent with a non-sooty (i.e., lean) combustion source such as upgrader stacks.Strong enhancements in SO 2 were only observed intermittently as "spikes", which is expected when sampling emissions from relatively few and discrete point sources.
Component 4 is not associated with CO 2 (r = −0.12),even though inventories indicate that the upgrading facilities are the largest CO 2 source in the region (Furimsky, 2003;Englander et al., 2013;Yeh et al., 2010).In this data set, the lack of correlation of component 4 with CO 2 (and to some extent with PM 10−1 as well) likely arises mainly from a sampling bias as stack emissions were only observed during daytime, likely due to diurnal variability in the atmospheric boundary layer structure as explained below.
Most of the variability in CO 2 concentration at AMS 13 is due to surface-based sources that originate from large areas, especially biogenic processes (photosynthesis during the day and respiration at night, component 3) and anthropogenic surface sources such as those captured by component 6 (Sect.4.2.3).Other anthropogenic pollutants, such as SO 2 , NO y , and CH 4 , are not subject to large biogenically driven processes and are less affected than CO 2 .
In contrast to surface sources, emissions from the > 100 m tall stacks are comparatively undersampled and observed mainly during daytime, when vertical mixing brings elevated plumes to the surface, yet CO 2 concentrations are generally much lower than during the night due to uptake by vegetation.At night, pollutants emitted from stacks are injected above the likely very shallow nocturnal surface layer and were hence not observed at the surface.Vertical profile measurements of SO 2 stack plumes by a Pandora spectral sun photometer at Fort McKay during daytime have shown considerable vertical gradients and only occasional transport of SO 2 all the way to the surface (Fioletov et al., 2016).
The association of component 4 with CO 2 is negative because the stack emission source is observed only during the day when the large biogenic sink dominates and effectively masks the relatively small increase due to anthropogenic CO 2 .In contrast, background concentrations of SO 2 are comparatively low, and the increase in SO 2 concentrations is readily picked up by the PCA.
It would be interesting to conduct a future study in winter when biogenic activities decrease; a wintertime PCA of surface measurements might be able to associate CO 2 enhancements with upgrading facilities, though boundary layer mixing heights would decrease as well, which would make a PCA using surface data even more challenging.
Component 4 does not correlate with PM 10−1 volume (r = 0.09).It is clear that the emitted SO 2 will contribute to secondary aerosol formation downwind, such that a correlation of stack emissions with PM 10−1 volume might be expected.However, these secondary contributions will likely mostly be in the submicron aerosol fraction, which adds relatively little to PM 10−1 volume.Further, PM 10−1 volume is dominated by coarse particles from other primary sources, mostly windblown emission of sand from the mine surfaces, roadways, and perhaps bioaerosol (component 7, see the Supplement).These effects make PM 10−1 volume from stacks appear comparatively small, such that the variability in the larger, surface-based sources likely masks the contribution of stack emissions to PM 10−1 variability.
The bivariate polar plot of component 4 (Fig. S6d) shows that the largest magnitudes were observed when local winds were from the SE.The corresponding plot of SO 2 (Fig. S6a) reveals two more distinct sources: a larger one from the E and a smaller one from the SSE.However, only two facilities (Sunrise and Firebag) are located to the E at relatively large distances of 37 and 47 km, respectively.The largest known upgrading facility and SO 2 sources in the area (i.e., upgrading facilities located at the Mildred Lake and Suncor base plants) are located to the S and SE of AMS 13.Considering that the stack emissions are only observed intermittently, we speculate that there is a mesoscale transport pattern in the Athabasca River valley that channel emissions, such that the local wind direction and speed may be misleading as to the true location of these sources.For more extensive data sets, such phenomena may very well average out but perhaps did not in this case.

Extended PCA with added secondary variables
The extended analysis (Table 7) qualitatively preserves the structure (with the exception of an added "aged" component 6) of the original 10-component solution but allows an assessment of which components most result in formation of secondary products such as SOA, which has implications for health (Bernstein, 2004) and climate (Charlson et al., 1992).Secondary products vary considerably as a function of air mass chemical age (which depends, amongst other components, on time of day and synoptic conditions, including wind speed) and are hence expected to add considerable noise and scatter to the results leading to lower correlations.Conversely, the distance between the measurement site and sources is fixed, such that this variability should average out over time.This indeed appears to have happened in this data set in spite of the relatively low sample size.
The analysis indicates that the component with the strongest IVOC source (component 5) also has the highest association with PM 1 (r = 0.70; Table 7).Aircraft measurements combined with a modeling study have required a group of IVOC hydrocarbons to explain the significant SOA formation and growth downwind of the oil sands region (Liggio et al., 2016).The association of IVOCs with PM 1 volume is consistent with the hypothesis that oxidation of IVOCs observed at AMS 13 leads to SOA generation and appears to have a significant impact on the variation in PM 1 mass.The relatively short distance to sources and young photochemical age suggests that IVOCs would experience a relatively small number of oxidation steps.Consistent with this interpretation, a correlation with the MO-OOA is not observed in component 5 (r = 0.10; Table 7).However, component 6, which is (poorly) anticorrelated with IVOCs (r = −0.23), is strongly correlated with MO-OOA (r = 0.92), consistent with the notion that this component is more photochemically processed and that IVOCs contribute to this SOA AMS factor.
The second component influencing PM 1 is that from stack emissions (component 4 in the primary PCA; component 2 in the secondary PCA) (Tables 5 and 7).It is well established that the oxidation of SO 2 to sulfate will lead to formation of fine particulate matter.This apparently occurs, at least partially, on the timescale between the point of emission and the AMS 13 site (assuming a wind speed of 3 m s −1 and a distance of 11 km, the transit time is 1 h), though some fraction of SO 2−  4(p) is likely directly emitted.
A PCA was applied to continuous measurements of 22 primary pollutant tracers at the AMS 13 ground site in the Athabasca oil sands during the 2013 JOSM intensive study to elucidate the origins of airborne analytically unresolved hydrocarbons that were observed by GC-ITMS.The analysis identified 10 components.Three components correlated with the IVOC signature and were tentatively assigned to mine faces and, potentially, hot-water bitumen extraction facilities, the mine hauler fleet, and wet tailing pond emissions.All three are anthropogenic activities that involve the handling of raw bitumen, i.e., the unearthing, mining, and transport of crude bitumen, and the disposal of processed material that contains residual bitumen in wet tailing ponds.The PCA results are consistent with our previous interpretation that the unresolved hydrocarbons originate from bitumen based on the similarity of the chromatograms with those obtained in a headspace vapor analysis of ground-up bitumen in the laboratory.Liggio et al. (2016) showed that these hydrocarbons constitute a group of IVOCs in the saturation vapor concentration (C * ) range of 10 5 µg m −3 < C * < 10 7 µg m −3 that contribute significantly to SOA formation and growth downwind of the oil sands facilities.The correlation of LO-OOA with two of the three IVOC components in the main PCA and with PM 1 in the extended analysis is consistent with the high SOA formation potential of IVOCs and suggests that further differentiation may be needed and stresses the need for IVOCs to be routinely monitored.In particular, direct measurements of emissions throughout the processing of raw bitumen are needed to pinpoint source contributions more accurately and aid in the development of potential mitigation strategies.
The PCA in this study suffered from several limitations.For instance, PCA does not provide insight into emission factors of individual facilities, though it does capture what conditions change ambient concentrations the most.Further, the receptor nature of PCA did not always discern among large source areas that may have many individual point sources coming together at the point of observation.For example, component 1 contains an obvious tailing pond signature because of its high correlation with anthropogenic VOCs, methane, and TRS, but also includes several combustion sources, making interpretation of this IVOC source location more challenging.A longer continuous data set with a greater number of variables would have perhaps been able to resolve these different sources, including the various tailing ponds, of which there are 19 in the region, all with slightly different emission profiles (Small et al., 2015) .
Another limitation is the bias of this (and most) ground site data set towards surface-based emissions and the undersampling of stack emissions.Facility stacks were only observed in the daytime because at night the mixing height is so low that the stacks emit directly into the residual layer.These emissions could be quantified using aircraft-based platforms (Howell et al., 2014;Li et al., 2017;Baray et al., 2018).The PCA struggled most with the allocation of greenhouse gases.Mixing ratios of CO 2 , in particular, were difficult to reconcile in this analysis due to a high background and large attenuation by biogenic activity and boundary layer meteorology.Forests greatly affected CO 2 levels in the region because CO 2 is taken up during the day when plants are photosynthetically active and emitted at night when plants undergo cellular respiration.This CO 2 source and sink appears to dominate the PCA, effectively masking relatively small emissions from tailing ponds, facilities, and tail pipes in particular from the mine hauling fleet.
Finally, there is a need for improved monitoring methods for IVOCs.For instance, future studies should focus on characterizing the VOCs in the abovementioned volatility range using a greater mass and time resolution instrument, such as a time-of-flight mass spectrometer (TOF-MS) or higherresolution separation methods (e.g., multidimensional gas chromatography), and also include measurement of speciated aerosol organic composition by, for example, thermal desorption aerosol GC (TAG) analysis (Williams et al., 2006).Future studies should also investigate how IVOC volatility distributions vary with source type and chemical age.Supplement.The supplement related to this article is available online at: https://doi.org/10.5194/acp-18-17819-2018-supplement.
Author contributions.TWT and HDO carried out the experiments and analysis and wrote the paper (with input from all co-authors).CAOA, JAH, RM, AKYL, MDW, JPDA, CM, AD, RLM, KS, KLH, JSO, EGS, GRW, AGT, and DEJW operated and collected the raw data during the measurement campaign and/or worked up and compiled final data.DKB and FVA analyzed the bitumen sample after the campaign in the laboratory.AKYL and MGA carried out the aerosol PMF analysis.SML, JL, and JRB coordinated the campaign.
Competing interests.The authors declare that they have no conflict of interest.Special issue statement.This article is part of the special issue "Atmospheric emissions from oil sands development and their transport, transformation and deposition (ACP/AMT inter-journal SI)".It is not associated with a conference.

Figure 1 .
Figure 1.Map of oil sands facilities showing locations of surface mines and tailing ponds, downloaded from the Oil Sands Information Portal (Alberta, 2017).The red star indicates the location of AMS 13.

Figure 2 .
Figure 2. The top shows total ion chromatograms of air samples collected on 27 August 2013 from 18:04 to 18:14 UTC (red) and on 28 August 2013 from 13:43 to 13:53 UTC (blue).The total ion counts (TICs) of a headspace sample of ground-up bitumen collected after the campaign is superimposed (black).The gray area indicates the range over which IVOC signal was integrated.The bottom shows retention times of n-alkanes, determined after the measurement campaign by sampling a VOC mixture containing a C 10 -C 16 n-alkane ladder.

Figure 5 .
Figure 5. Bivariate polar plots related to IVOCs.(a) IVOCs from the complete data set.(b) Component 5 extracted from the main PCA (Table 5).(c) Component 1 extracted from the main PCA.(d) Component 2 extracted from the main PCA.Wind direction is binned into 10 • intervals and wind direction into 30 • intervals.The polar axis indicates wind speed (m s −1 ).a.u. is arbitrary units.

Table 1 .
Oil sands facilities located within 30 km of AMS 13.Distances were estimated using coordinates provided in the National Pollutant Release Inventory (NPRI, 2013) and do not account for the size of each facility, whose boundaries may be considerably closer to (or further away from) AMS 13.PACPRM: petroleum and coal product refining and manufacturing; OGPS: oil and gas pipelines and storage.

Table 2 .
Instruments used to measure ambient gas-phase and aerosol species during the 2013 JOSM intensive study at AMS 13.

Table 3 .
Variables observed at the AMS 13 ground site during the 2013 JOSM campaign used for PCA.

Table 5 .
Loadings for the 10-factor optimal solution (primary variables only).Coefficients with Pearson correlation coefficients r > 0.3 are shown in bold font.

Table 6 .
Hypothesized identifications of principal components.

Table 7 .
Loadings for the 11-component solution with the inclusion of variables associated with secondary processes.