Can Positive Matrix Factorization Help to Understand Patterns of Organic Trace Gases at the Continental Global Atmosphere Watch Site Hohenpeissenberg?

Abstract. From the rural Global Atmosphere Watch (GAW) site Hohenpeissenberg in the pre-alpine area of southern Germany, a data set of 24 C2–C8 non-methane hydrocarbons over a period of 7 years was analyzed. Receptor modeling was performed by positive matrix factorization (PMF) and the resulting factors were interpreted with respect to source profiles and photochemical aging. Differing from other studies, no direct source attribution was intended because, due to chemistry along transport, mass conservation from source to receptor is not given. However, at remote sites such as Hohenpeissenberg, the observed patterns of non-methane hydrocarbons can be derived from combinations of factors determined by PMF. A six-factor solution showed high stability and the most plausible results. In addition to a biogenic and a background factor of very stable compounds, four additional anthropogenic factors were resolved that could be divided into two short- and two long-lived patterns from evaporative sources/natural gas leakage and incomplete combustion processes. The volume or mass contribution at the site over the entire period was, in decreasing order, from the following factor categories: background, gas leakage and long-lived evaporative, residential heating and long-lived combustion, short-lived evaporative, short-lived combustion, and biogenic. The importance with respect to reactivity contribution was generally in reverse order, with the biogenic and the short-lived combustion factors contributing most. The seasonality of the factors was analyzed and compared to results of a simple box model using constant emissions and the photochemical decay calculated from the measured annual cycles of OH radicals and ozone. Two of the factors, short-lived combustion and gas leakage/long-lived evaporative, showed winter/summer ratios of about 9 and 7, respectively, as expected from constant source estimations. Contrarily, the short-lived evaporative emissions were about 3 times higher in summer than in winter, while residential heating/long-lived combustion emissions were about 2 times higher in winter than in summer.


Introduction
Tropospheric ozone is an environmental pollutant that adversely affects vegetation, e.g., by reducing and altering physiological processes and plant growth (Matyssek et al., 2010;Nunn et al., 2005), and humans, e.g., through respiratory diseases linked to ozone.In addition to these effects, ozone has been one of the most important greenhouse gases since the beginning of industrialization with a large impact on radiative forcing (Gauss et al., 2003).Atmospheric background concentrations of tropospheric ozone are expected to increase in the 21st century (Vingarzan, 2004).In contrast to other greenhouse gases, such as carbon dioxide, methane, and nitrous oxide, it is not emitted directly but produced in the atmosphere by photochemical processes from precursor substances.The main drivers for the production of ozone, besides nitrogen oxides (NO + NO 2 ) and carbon monoxide (CO), are volatile organic compounds (VOC), and amongst them, non-methane hydrocarbons (NMHCs) (Atkinson, 2000).NMHCs are not only important for the photochemical formation of tropospheric ozone, but also for other secondary air pollutants such as peroxycarboxylic nitric anhydrides (PAN), formaldehyde (HCHO) (Rappenglück et al. 2010), and secondary organic aerosols.In addition, many NMHC species act directly as air toxics or hazardous air pollutants.As a consequence, the Gothenburg protocol required, e.g.Germany to reduce 1990 national emissions by 73 % by 2010.After a new evaluation in 2013, the EU commission proposed a further reduction of 2005 emissions by 43 % by 2030 (European Commission, 2013).Thus, monitoring and modeling of the spatiotemporal distribution of these species and relating them to source sectors are important for mitigation strategies concerning air quality, radiative forcing, and human health.
The most important sources of NMHCs are combustion of fossil fuels from road traffic and industrial processes, handling and evaporation of fuels, solvents, and gases, and plant emissions, amongst others.Stevenson et al. (2005) expected a strong increase of biogenic NMHC emissions, mainly isoprene, monoterpenes, and ethene, caused by temperature stress with a future increase of global temperatures.On a global scale, biogenic emissions dominate total VOC emissions (e.g., Sindelarova et al., 2014), while in many urban areas they play a minor role due to high amounts of anthropogenic emissions; however, this is highly dependent on the type and location of the urban area.For major metropolitan areas with large anthropogenic emissions (e.g., Houston, Atlanta), a large impact of the highly reactive biogenic VOC on ozone formation and OH chemistry has been shown (e.g., Chameides et al., 1988;Mao et al., 2010;Leuchner and Rappenglück, 2010).
Reliable long-term scientific data of NMHCs were gathered by several groups and various networks.Within the framework of the World Meteorological Organization (WMO), the Global Atmosphere Watch (GAW) program was developed to achieve global measurements of the chemical composition of the atmosphere with high data quality (WMO, 2007).
In order to quantify impacts of biogenic and anthropogenic origin on photochemical production of ozone, aerosols, and other compounds, an apportionment into specific source categories is necessary (Badol et al., 2008).Several receptor models such as principal component analysis/absolute principal component scores (PCA/APCS) (e.g., Chan and Mozurkewich, 2007;Guo et al., 2004Guo et al., , 2006)), chemical mass balance (CMB) (e.g., Badol et al., 2008;Na and Kim, 2007), or UNMIX (e.g., Jorquera and Rappenglück, 2004;Olson et al., 2007) were used for source apportionment.In particular, positive matrix factorization (PMF) (e.g., Lingwall and Christensen, 2007;Paatero, 1997Paatero, , 1999)), a multivariate mathematical receptor model, has been shown to be quite reliable at identifying and quantifying source categories.However, most studies investigating NMHC composition were concentrated in urban metropolitan areas with mainly anthropogenic emissions (e.g., Brown et al. 2007;Leuchner and Rappenglück 2010;Na and Kim, 2007).Yuan et al. (2012) stressed the importance of different reactivity of the NMHC compounds and the impact of photochemical aging on the interpretability of the resolved factors as source profiles that have not been considered in most of the studies applying PMF.The impact of photochemical processing increases with longer transport times from source to receptor.Only a few studies have applied PMF receptor modeling at remote sites with a focus on the global or continental background.Lanz et al. (2009) and Sauvage et al. (2009) used PMF analysis for reactive species, such as NMHCs at remote sites in Switzerland and France, despite the PMF assumption of mass conservation from source to measurement site.
In the current work, NMHC data from the GAW global site Hohenpeissenberg, southern Germany, were used to quantify the impact of different source categories as well as their seasonality at this rural site with PMF analysis.One main objective of this study was to interpret and discuss PMF as a statistical tool for reliably identifying sources of reactive trace gases at a rural site, even though source profiles were distorted due to photochemical aging.

Experimental setup
The GAW Observatory Hohenpeissenberg is located about 70 km southeast of Munich (47 • 48 N, 11 • 02 E) at 980 m a.s.l., on top of a hill about 300 m above the surrounding countryside (approx.70 % pasture and 30 % forest).Sample air was routinely measured daily at 01:00 (41 % of data) and 13:00 CET (48 %); however, 11 % of the data were measured at other times of the day.At 13:00, the site was gen-erally in a vertically fully developed mixed-layer and local emissions may have affected measurements only low to moderately.For this analysis, mainly the 13:00 data were used to minimize the influence of local sources and shallow boundary layer conditions during nighttime as well as to ensure the homogeneity of the data set.However, nightly data were analyzed as well to identify differences and help interpret the results.
C 2 -C 8 NMHCs were measured with an online gas chromatograph-flame ionization detection (GC-FID) system.It consisted of a 3600 CX Varian gas chromatograph combined with a flame ionization detection (FID) system until January 2008 and was then replaced by a Varian CP-3800 GC-FID.The air intake was 17 m above the ground and 2 m above a flat roof that was about the same height as the nearby forest canopy (> 10 m distance).The intake was a downwards-facing glass funnel connected to a permanently flushed glass manifold (375 L min −1 , 8 m length, 4 cm I.D.).The GC sampling unit was connected to a port on the manifold via a 1/16 in.Sulfinert line (2 m length, 0.96 mm I.D., 50 mL min −1 , Restek), such that an overall residence time of 3.5 s was achieved.After the port, the sample gas passed through a filter for aerosol and ozone removal (PTFE filter holder, 25 mm I.D., Metron Technology; PTFE-coated glass Fiberfilm filter, Pall Life Sciences, impregnated with sodium thiosulfate (Na 2 S 2 O 3 ) and backed by a PTFE-membrane filter with 20-30 µm pores, Metron Technology) and, further downstream, a stainless-steel screen (10 µm pores, VICI AG).A custom-built sampling and gas-flow system was used, comprising of a moisture trap at 228 K (0.5 m 1/8 in.Sulfinert, Restek), a VOC trap using cryo-adsorption on glass beads (87 K adsorption, 403 K desorption; SPT-type by Varian, installed in a custom-built LN 2 dewar), a sample volume determined by measuring the pressure increase in an evacuated reference volume, and the corresponding Valco (VICI AG) switching valves mounted in a temperature-controlled compartment at 293 K.After sampling for 20 min, the cryotrap was dry purged by helium at 15 mL min −1 for 15 min.The adsorbed NMHCs were then thermally desorbed in a helium carrier gas flow at 5 mL min −1 and injected and separated on a PLOT column (Al 2 O 3 /KCl, 50 m × 0.53 mm I.D., Chrompack, Netherlands).After an initial isothermal phase (313 K for 2 min), the GC column was heated in two phases: first to 345 K (4 K min −1 ) and then, with a rate of 6 K min −1 , to 473 K.This temperature was kept for 33.7 min.The end of the column extended to the FID system, where the separated compounds were identified.This system was regularly checked and calibrated with helium (zero gas), calibration gas by NPL (certified mixtures of a few ppb in nitrogen of some 30 NMHCs), and different reference gases holding synthetic and whole air mixtures in pressurized cylinders (see metadata at WDCGG, 2013).The system has participated in several intercomparisons and proved its ability to measure NMHCs at high-quality levels (Hörger et al., 2014;Plass-Dülmer et al., 2006;Rappenglück et al, 2006).
A more detailed description of measurement, integration, and error assessment was published by Plass-Dülmer et al. (2002).For this study, mixing ratios of 24 substances, measured daily over the course of 7 years from 2003 to 2009, were used.An individual uncertainty for each compound and each measurement comprising systematic uncertainty contributions and random factors was estimated and assigned to each value.It considers blank values, peak integration errors (including insufficient chromatographic separation) and detection limit, calibration uncertainties, and random fluctuations in the system response (Plass-Dülmer et al., 2002).

Positive matrix factorization (PMF) model description
Next to CMB, UNMIX, and PCA, PMF has become an accepted and regularly used tool for receptor modeling.In this study, factor analysis was conducted with PMF 3.0 (US-EPA, 2011).PMF determines the number of source factors, p, a species profile, f , for each factor, and the amount, g, that each factor contributes to each sample.
In PMF, Eq. ( 1) is solved by decomposing the matrix X ij of measurement data, with i the number of samples and j the number of the different chemical species, into two matrices, factor contributions, and factor profiles.Both factor contributions and factor profiles can then be analyzed.
There are natural and logical physical conditions for such a model: the original data have to be reproduced by the model, the predicted source compositions and contributions must be non-negative, and the sum of the predicted mass contributions for each source must be less than or equal to the total measured mass for each substance (Hopke, 2003).
The multilinear engine ME-2 (Paatero, 1999) gives PMF the ability to solve multilinear problems and implement constraints such as the replacement of missing values and individual weighting of data points by associating an uncertainty value u ij to each point.The object function (Eq.2) is then minimized using these uncertainties (Norris et al., 2008).
PMF 3.0 also gives the opportunity to test the stability and uncertainty of the computed solutions by using a bootstrap technique.It also provides a tool called Fpeak to control the rotations of the different factors (Norris et al., 2008).
Several studies have compared PMF, CMB, PCA, and UN-MIX (Anderson et al., 2002;Miller et al., 2002;Paatero and Tapper, 1994;Willis, 2000) and found limitations and advantages of the different models.Some of the advantages of PMF are the good performance (not only with simulated data), the Various studies have shown that PMF provides physically reasonable results for source identification of NMHCs in environments located in proximity to the sources (e.g., Buzcu and Fraser, 2006;Lanz et al., 2008;Leuchner and Rappenglück, 2010) or particulate matter (e.g., Santoso et al., 2008;Tauler et al., 2009;Yue et al., 2008).However, as receptor models are only mathematical models, they do not use pollutant emissions, chemical transformation mechanisms, or meteorological data to identify and quantify the sources at a receptor location.It is difficult to use PMF for data from remote sites because, with chemical reactions during the transport of air masses (Atkinson and Arey, 2003) and the effects of mixing (Parrish et al., 2007), the presumed mass conservation from source to measurement site, necessary for using receptor models (Hopke, 2003), is not given.Despite these limitations, PMF has been used for remote VOC data (Lanz et al., 2009;Sauvage et al., 2009) and has obtained reasonable results in terms of plausible factor compositions and contributions of the source categories.However, it needs to be considered that not only emission profiles but also their different aging determine the factor solutions of PMF.

Data treatment
A total of 2335 valid day (13:00) measurements of 24 substances per measurement, hereafter named daytime data, were available for the investigated time span, of which 345 values of individual substances were missing (0.6 % of the data set).For all other times, a total of 2277 valid measurements of 24 substances, hereafter named nighttime data, were used for analysis, of which 325 values were missing (0.6 % of the data set).Both data sets were analyzed separately as well as combined.If not indicated explicitly, results refer to the daytime data.
There are different ways to treat missing data.PMF 3.0 provides the option to exclude the entire sample.In that case a loss of 15 % of the sample data would have occurred.To avoid such a high loss of data, the missing values were re-placed by the respective species' geometric mean, and the corresponding uncertainties were set to 4 times the geometric mean according to Sauvage et al. (2009).Five different treatments of missing value replacement and uncertainty assessment were performed for the daytime data as described below and shown in Table 1.
Values greater than zero, but below the specific detection limit, were replaced by half the detection limit.The uncertainty for zero values and values below the detection limit was set to the specific detection limit (Sauvage et al. 2009).Fpeak values, indicating the degree of rotation of the solutions, were varied between 5.0 and −5.0 in steps of 0.1.
The impact of different treatments of missing values, of values below the detection limit, and of zero values on the overall results of the PMF analysis was assessed by utilizing five methods of value replacement.In this data set the number of these values was rather low at 2.3 % of all data.Table 1 shows the five different treatments with missing values (treatments 1-5), values below the detection limits (3-5), and zero values (4) replaced by the species median (1) and geometric mean (2-5).The respective uncertainties were fitted accordingly in treatments 1-4 and with additional 20 % to n-hexane in treatment 5 due to peak overlaps with an unknown substance in the 2003 and 2004 data.Statistical differences in the treatments were tested with Levene's test for variances.For the test, data were linearly transformed by normalization with the respective arithmetic mean value of the different treatments.Results are shown in Sect.S4 in the Supplement.
The remote character of the research site at Hohenpeissenberg implies that only a few substances were emitted locally; the others were transported from multiple sources at different distances.During the transport time, photochemical reactions occurred and the original emission pattern was altered due to differing photochemical reactivity of the compounds (Atkinson, 2000;2008).PMF applied to attribute sources, however, needs inertness of the substances and cannot integrate reactivity into the model.Sauvage et al. (2009) proposed a method to consider photochemical changes by enhancing the uncertainty with increasing compound reactivity.For each compound, a potential error E j (reactivity) was computed with Eq. ( 3), assuming pseudo first-order reaction kinetics and photochemical reactivity mainly driven by OH radical reactions: where k j is the second-order rate constant of the reaction between the substance j and OH (Atkinson and Arey, 2003) and t is the source-receptor time of transport.
[OH] is the seasonally and spatially averaged OH concentration published by Spivakovsky et al. (2000).The overall uncertainty s ij for the PMF modeling was then computed following the ISO guide rule for uncertainty (ISO 13005, according to Sauvage et al., 2009) (Eq.4) and used for PMF computa-tions.
Thus applied, the PMF results in factors which correspond to the prevailing factors determined at the receptor site and are different from the pure emission profiles.They may be seen as aged source profiles, with aging corresponding to some mean transport time from many diverse source locations to the receptor site.It should be pointed out that this approach reduces the impact of the shorter-lived NMHCs on the achieved results and that obtained factors have high uncertainty in these short-lived compounds.Taking these complications into account, aging, integration over many sources, discrimination of short-lived compounds, and comparisons with emission profiles are considered misleading, especially for the short-lived compounds.Nevertheless, the method was tested for our data set and its applicability is further discussed in Sect.S5 in the Supplement.
Another method to account for photochemical processing is the photochemical age-based parameterization method suggested by Yuan et al. (2012) following de Gouw et al. (2005).It assumes that the main sources at the receptor site are either anthropogenic, originating from one major urban settlement with a defined transport time to the receptor site, or of biogenic origin.It is further assumed that the magnitude of urban emissions is proportional to acetylene emissions, the reaction with OH radicals is the dominant form of removal, and photochemical age is defined by the transport from one urban emission area to the receptor and can be calculated from the ratio of mixing ratios from two NMHCs with different lifetimes.These assumptions, in particular the first and last, do not apply at a remote receptor site with many small anthropogenic emission sources around.Again, this approach and its applicability to Hohenpeissenberg data will be further discussed in Sect.S6 in the Supplement.
There are different guidelines to help determine the number of factors that best model the measured reality.Mathematical variables like Q values or the distribution of residuals and stability of the solution can be taken into account, but interpretation of the computed factors by the analyst is a crucial part of selecting the most appropriate solution (Hopke, 2003;Norris et al., 2008).Comparing computed Q values as a function of the number of factors to theoretical Q values (approximately the number of data points) seems to work only for certain kinds of weighted uncertainties (Hopke, 2003).
PMF solutions with 2 to 20 factors were calculated, but only the four most plausible solutions (five to eight factors) were compared in this work.Selection criteria were mathematical indicators such as the Q value, residual distribution, explained variance, and the plausible explanation of the source categories by expert knowledge of the authors.
At the GAW site Hohenpeissenberg, other trace gases and particulates were also measured (Gilge et al., 2010).Re-solved factors can be compared to these independent measurements to verify the apportionment of the factor when the hydrocarbons and the trace gas are emitted from the same source or at the same time but by a different source.Thus, the contributions of factors resolved by PMF were correlated to the additional trace gases NO, NO 2 , SO 2 , CO, and NO y .The secondary products ozone and PAN, as well as particulate matter (PM 10 , PM 3 , black carbon) data, were also analyzed and included in additional PMF runs.Since sometimes correlations can also be coincidental, Lanz et al. (2008) suggested that these correlations can add evidence to the source apportionment of the factors but should not be used as the only basis for the attribution of sources.In addition to simple correlations between the resolved factors and other substances, the single trace gases or aerosols as well as several combinations of those substances were included in the PMF model to test and help interpret the apportionment of single factors.However, including further non-NMHC substances did not show clear results.With more than one or two compounds included, the factor solutions did sometimes show alterations in the source profiles and the occurrence of separate factors.Thus the numbers of factors needed to be increased to seven or eight to retrieve the six identified NMHC factors.In the following source apportionment (Sect.3), the most important results from inclusion of these substances are noted for each resolved factor.Only when a resolved factor profile significantly correlated with one or more of the other trace gases is it mentioned in the text.PMF 3.0 provides a bootstrap function that selects blocks of input samples and creates new input files from them.These files have the same dimensions as the original input files.Then PMF is run and the resulting factors are mapped to the base factor they correlate with best.In this study, 200 bootstrap runs were performed on the base of the final solution with six factor profiles.

Seasonal cycles calculated from a simple box model
Anthropogenic NMHCs generally show pronounced seasonal cycles with maxima in winter and minima in summer.Apart from changing emissions over the course of the year and often associated with temperature (e.g., evaporation generally has a summer maximum and residential heating a winter maximum), the photochemical cycle determines the chemical removal with concentrations of OH at mid-latitudes 1 order of magnitude higher in summer than in winter.As this photochemical signal is often stronger than the seasonal cycle of emissions, we first analyzed the seasonal variation solely due to photochemistry with assumed constant emissions.Then, deviations from this seasonality in the PMFderived factors gave information on the seasonality of the corresponding emissions.
It is assumed in this simple box calculation that constant NMHC emissions of 5 × 10 3 molec cm −3 s −1 are fed into the atmosphere.Emissions are balanced by the atmospheric re- moval due to OH radicals and ozone.The OH and ozone concentrations are taken from the monthly averaged measurements of these compounds at Hohenpeissenberg: OH by chemical ionization mass spectrometry (Berresheim et al., 2000) and ozone by UV absorption (Gilge et al., 2010).Sine fits through the monthly averages are used to drive the removal with 24 h time resolution.The NMHC start concentrations are adjusted such that stable annual cycles are established.
The different seasonal signals are demonstrated by the 95-/5-percentile ratios as a function of the OH rate constants in Fig. 1 for compounds which only react with OH radicals (rate constants at 283 K; Atkinson, 2000;Atkinson and Arey, 2003).No annual cycles could be seen at reaction rate constants below 10 −14 cm 3 molec −1 s −1 , and an increase from winter/summer ratios of 2 to 12 was depicted for OH reaction rate constants from 10 −13 to 10 −11 cm 3 molec −1 s −1 .Accordingly, for ethane with k OH = 2.1 × 10 −13 cm 3 molec −1 s −1 , a "damped" seasonal variability of factor 3.7 could be expected if sources were constant over the year, whereas for reactive compounds like heptane with k OH = 6.9 × 10 −12 cm 3 molec −1 s −1 , the seasonal variation of factor 11 had about the same range as OH. Figure 2 shows the resulting seasonal variation for different anthropogenic NMHCs, including their reactions with OH and ozone.Rate constants were computed dependent on the monthly mean temperatures for ozone; for OH, the monthly averaged half-width of the OH diurnal cycle was determined and the corresponding average temperature was used.In this box calculation, the reactive alkenes such as cis-2-butene showed seasonal cycles that were less pronounced than for compounds not reacting with ozone because the seasonal variation of ozone had only a summer/winter ratio of 1.8.
The winter/summer ratios of all considered NMHCs in this study were calculated in this simple box and compared to the observed ratios (Table 2).The observed ratios were derived from sine fits to the monthly mean concentrations over the last 10 years after compensating for trends in the data.Apparently only a few NMHCs show similar winter/summer ratios as expected from constant emissions.

Determination of the number of factors
The calculated Q values of four different factor solutions with five to eight factors and other diagnostic parameters are shown in Table 3.The values decreased with increasing number of factors due to a better explanation of the variability of the measured NMHC mixing ratios by a higher number of factors and lower global minima of the object functions.All scaled residuals should be within ±3σ (Willis, 2000) and symmetrically distributed.With this data set all solutions from five to eight factors had normally distributed residuals.The number of residuals beyond 3 standard deviations decreased for an increasing number of factors (Table 3) down to four for 20 factors.An Fpeak value of 0.0 showed the lowest Q values for all solutions displayed in Table 3; additional rotation of the factors did not improve the results.Coefficients of determination of modeled to measured mixing ratios and  the mean ratios were high for all solutions and increased with the number of resolved factors from 0.86 (R 2 ) and 0.97 (ratio) to 0.89 and 0.98, for five to eight factors, respectively.An indication for an appropriate number of factors is the stability of the factors after performing the analysis at least five times with the same parameters but randomized starting points.No multiple solutions should be found (Hopke, 2003).In this analysis the tested five-, six-, seven-, and eight-factor solutions were very stable and showed the same distribution of factors for all five computations (not shown here).
On the basis of these statistical indicators no final decision about the optimal number of factors could be made.Thus, it remained a process based on plausibility arguments mainly by checking the resulting factors versus reasonable source or aged source profiles (cf.Sect.3.2), also in comparison to previous studies (Lanz et al., 2009(Lanz et al., , 2008;;Sauvage et al., 2009).Careful consideration led to the decision of choosing six factors as the most reasonable solution.A comparison to alternative solutions with five, seven, and eight factors can be found in the Supplement (Sect.S1).

Source apportionment
In the following, the six-factor solution (Fig. 3) of the daytime data is presented and discussed.Including nighttime data or use of exclusively nighttime data resulted only in very slight differences in the derived factors (see Sects.S2 and S3 in the Supplement).
Figure 3 shows the absolute and relative contributions of substances to the six-factor solution.Absolute values are the mixing ratios of each substance that PMF apportioned to each factor.Relative contributions are the fractions of each substance attributed to each factor; therefore, the sum of all factors for each substance is 1.
The apportionment of factors to source categories and the interpretation of chemical aging for the respective factors were performed by comparison to source profiles from the literature.Because of the complexity of the atmospheric system with transport, mixing, and chemistry, each individual factor cannot be attributed exclusively to one source category.The factors should then be seen as aged profiles originating from different sources belonging to similar source categories (Sauvage et al., 2009).In addition to a change of composition patterns during transport, emission profiles in literature vary due to measurement uncertainty associated with the use of different techniques, e.g., online or canister samples, the number of substances measured, the experimental setup and associated conditions, and the published units.These variations made direct quantitative comparisons of the profiles difficult but allowed a qualitative assessment for identification of possible source profiles.
For a plausibility check of the attribution of the factors to source categories, the annual courses, including the winter/summer ratios of the retrieved factors, are compared to corresponding box model calculations in Sect.3.3.

Figure 3.
Factor profiles for the six-factor solution calculated by PMF.Left: mixing ratio of each species apportioned to each factor (pptv); right: contribution of factor to the species.Note that the scales for each subplot are different due to large variations in absolute mixing ratios.

Biogenic sources
Factor I explained 34 pptv or 94 % of the measured isoprene, which was the only biogenic NMHC included in this analysis.On the absolute scale, ethane (49 pptv), ethene (21 pptv), and isopentane (14 pptv) also contributed to this factor, but on the relative scale this factor only contains about 4 % of the total amounts of ethane and ethene; thus, this factor is apportioned to biogenic sources.The short lifetime of isoprene in the atmosphere excludes distant sources.Small amounts of ethene found in this factor might also be of biogenic origin since it is an important plant hormone (Fall, 1999).Factor I was the only factor with a distinct maximum in summer (Fig. 4).Since isoprene emissions depend on the photosynthesis of plants, temperature, and solar radiation (Fuentes and Wang, 1999), the maximal source strength and thus the maximal mixing ratios were found in July.The ethane contribution to this factor might have been derived from biomass burning (Stein and Rudolph, 2007) with a maximum during the summer.
Similar profiles for this factor, including amounts of ethane, n-pentane, and isopentane, were also found by Sauvage et al. (2009) at three remote sites in France.The alkane contributions to this factor could be attributed to artifacts from the PMF model, temperature-related emissions like evaporation from fuel, or mixing with other sources.Thus, the biogenic factor is possibly slightly overestimated; however, the whole set of biogenically emitted monoterpenes was not considered in this study.The inclusion of additional trace gases and aerosols apportioned up to 25 % of total ozone to this factor, which could be due to correlation of both variables with temperature and radiation over the course of the year.

Short-lived incomplete combustion sources
Factors II and III both showed a large contribution of short-lived substances.Factor II contained rather short-lived alkenes such as ethene (mixing ratio: 133 pptv, fractions of the substance attributed to this factor: 27 %), propene (46 pptv, 79 %), 1,3-butadiene (6 pptv, 90 %), and some butenes, typical for incomplete combustion processes.The average lifetime of Factor II was 1.6 days, calculated from annual mean OH and ozone and factor loading-weighted mean lifetime.This factor is attributed to short-lived combustion sources mainly from vehicular exhaust.
Literature profiles of vehicle emissions show similarly high contributions of ethene and propene (e.g., Badol et al., 2008;Friedrich and Obermeier, 1999;Hellen et al., 2003;Liu et al., 2008;Pang et al., 2014;Sagebiel et al., 1996;Thijsse et al., 1999).Sauvage at al. (2009) provided a vehicle exhaust factor very similar for one of the French remote sites.Factor II correlated well with NO 2 (r = 0.84), NO (0.75), NO y (0.73), and CO (0.66), all associated with traffic emissions.Included in the PMF runs, only significant fractions of NO 2 (∼ 10 %) were apportioned to this factor.This factor does not resemble vehicular emissions alone since longer-lived combustion compounds such as benzene, acetylene, and aromatics contributed to Factor IV as described below.
The aromatics were the only substances measured in this study that are found in evaporative solvent emission profiles, e.g., in paint or wood coating.The alkanes are found in the gasoline composition and evaporation source.This factor is thus further referred to as short-to medium-lived evaporative sources, additionally containing some proportions of short-to medium-lived combustion compounds (ethane, acetylene, isopentane, alkenes, benzene, and toluene), which also explains its correlation with combustion tracers NO 2 (r = 0.74), NO (0.74), and NO y (0.75).The largest fraction of NO 2 (> 70 %) was abundant in a factor similar to this when included in the PMF runs.

Residential heating and long-lived incomplete combustion sources
In comparison to Factors II and III, the compounds in Factors IV and V exhibited longer atmospheric lifetimes: 21 days (IV) and 26 days (V).Large parts of the alkynes acetylene (216 pptv, 49 %) and propyne (8 pptv, 56 %), as well as ethene (287 pptv, 58 %) and benzene (65 pptv, 48 %), were explained in Factor IV, which also contained some C 7 -C 8 aromatics (15-20 %), ethane, and propane (16 % each).This factor could be attributed to residential heating and wood burning and maybe other (incomplete) combustion processes, in particular from road traffic.The major compounds resembled in this factor had atmospheric lifetimes of 1-17 days.Only a few source profiles for domestic combustion emissions were found in literature.A wood-burning profile by Friedrich and Obermeier (1999), adjusted to the substances measured for this study, showed a similar composition to Factor IV, as does a "residential heating" profile by Klemp et al. (2002) and Mannschreck et al. (2002).Ethene and acetylene contributions of the literature profiles differed from those found in Factor IV.These differences indicated that not only wood burning but also other sources contributed to this factor.Similar profiles to Factor IV were found by Lanz et al. (2008) and Sauvage et al. (2009), who attributed the French profiles to hot-water generation and building heating by burning fossil fuels and wood.The benzene/toluene ratio at those French sites was 2, in Mannschreck et al. (2002) 3.2, and here for Factor IV 2.7.Evtyugina et al. (2014) determined similar benzene/toluene ratios between 1.7 and 5.0 for emission factors from different woods from fireplaces and woodstoves.Aromatics, ethene, and acetylene could, however, also be contributed from vehicle exhaust (e.g., Pang et al., 2014;Badol et al., 2008).Factor IV correlated well with the combustion tracer CO (r = 0.87).A correlation to SO 2 (0.62) was found for this factor, which is reasonable since household emissions contribute to SO 2 in the atmosphere (UBA, 2011).SO 2 out of the additional PMF runs was almost exclusively apportioned to this factor.Also, the largest fractions of NO and all aerosols (> 80 %), as well as amounts of CO (∼ 20 %), contributed to this factor, supporting residential heating and other long-lived combustion sources as the main contributors.
Although emission profiles and PMF results of urban data sets presented high isopentane/n-pentane ratios, Sauvage et al. (2009) found ratios close to 1 for remote sites in France for evaporative sources, similar to our study.Factor V can be viewed as an aged combined profile of evaporative losses of natural gas, gasoline, and LPG with atmospheric lifetimes of 4-60 days (C 2 -C 4 alkanes).The correlation with CO (r = 0.80) may at first glance indicate a relation to fossilfuel burning, but it appears more likely that it is due to the similar (large) footprint areas associated with the similar lifetimes and the similar source areas, which are mainly related to high population density.

Background sources
Factor VI can be apportioned to remote sources showing the continental background.It explained most of the measured ethane (881 pptv, 59 %) and quite large amounts of benzene (42 pptv, 31 %), acetylene (29 %), and propane (25 %).Ethane is the most abundant and longest-living compound measured in this study.The high abundance in combination with the lack of shorter-lived compounds were indicators for aged air masses (average lifetime 45 days).Propane, acetylene, and benzene are also very stable substances (lifetimes 11-17 days at annual average OH concentration of 9.4 × 10 5 molec cm −3 ) that underline the background character of this factor.Hellen et al. (2003)   factor with high ethane loadings and amounts of propane in France, and Lanz et al. (2008) found an ethane factor in Switzerland.There is also a very high resemblance to compound composition and mixing ratios when compared to other remote data.Similar levels of ethane and propane as apportioned to this factor study were found at Pico Mountain, Azores and Mauna Loa, Hawaii (Helmig et al., 2008), and at Mace Head, Ireland (Yates et al., 2010), from groundbased measurements.Aircraft data from the central North Atlantic (Lewis et al., 2007) showed a similar average profile from all campaigns to our PMF-resolved data.The mixing ratios of the most important substances in this factor from our study and that of Lewis et al. (2007) are 881 and 870 pptv for ethane, 141 and 100 pptv for propane, 126 and 110 pptv for acetylene, 42 and 38 pptv for benzene, and 16 and 10 pptv for toluene, respectively.Factor VI did not correlate well with any of the additional trace gases.More than 50 % of CO and the largest part of ozone (> 60 %) were apportioned to this factor from the PMF model when included.

Seasonality and total contributions
The winter/summer amplitude was already used for better attribution of source categories to the individual factors in the previous sections (Fig. 4).The total monthly variation of factorial NMHC mixing ratios is summed up in Fig. 5.The maximum was encountered in February (7.1 ppbv) and the minimum in July (1.8 ppbv).During fall and winter, chemical reaction rates decreased due to lower OH concentrations as a consequence of lower available UV light and temperatures.Thus, the NMHCs were not depleted as rapidly as during the summer months.Within this data set, biogenic sources only contributed to the measured and modeled NMHCs in summer but are responsible for 20 % of the total modeled amount of hydrocarbons in July.Due to the high reactivity of isoprene, this overall small factor plays an important role in atmospheric processes during summertime.
In Sect.2.4 we calculated the expected seasonal variation of the individual NMHCs within a simple box model with constant emissions over the year and losses due to measured annual cycles of OH radicals and ozone.In order to analyze the impact of different sources on the seasonality, we then calculated the expected seasonal variation of the factors derived in this PMF in the constant source scenario by combining the calculated seasonality of the individual NMHCs weighted by their factor loadings.Thus, determined winter/summer ratios are compared to the PMF-derived seasonal variations in Table 4. Larger winter/summer ratios than in the constant source scenario indicate stronger sources in winter than in summer, while smaller ratios indicate stronger sources in summer.
Similar winter/summer ratios existed for the short-lived combustion (II) and the long-lived gas leakage and evaporative (V) sources pointing towards constant emissions by these sources over the year.This is reasonable since Factor II is related mainly to traffic exhaust and Factor V to natural gas and LPG losses and gasoline evaporation emissions, both expected to be fairly constant throughout the year.Factor III, the short-lived evaporative, is associated with temperature-driven emissions and accordingly expected to be stronger in summer, as seen by the observed ratio of 3.6 compared to the constant-emission box calculation with a ratio of 9.5.Contrarily, the factor associated with residential heating has twice as high emissions in winter than in summer (15.8) compared to a constant source scenario with a winter/summer ratio of 7.1.The background Factor VI indicated slightly higher emissions in summer than in the constant source scenario (ratios of 3.1 compared to 5.0), which might be due to biomass burning contributions to ethane, which are higher in summer; however, inter-hemispheric transport also reduces the winter maxima of these long-lived gases (Rudolph, 1995).
Though the seasonal comparison to our simple box calculations showed reasonable results, it should be pointed out that this approach is rather simplistic and might not appropriately describe the situation at Hohenpeissenberg.Factors not considered include, among others, different footprint areas for compounds of different lifetimes and in different seasons, the representativeness of OH and ozone for such different footprint areas, and the seasonally changing mixed-layer height and intensity of vertical mixing.However, such factors are considered, in this first approximation, of minor importance to the seasonal variability compared to the changing photochemical loss and changing regional emissions.
Over the entire 7-year period, background sources contributed most to the NMHC volume (Fig. 6a) and mass at the remote site (31.3 vol %, 25.8 mass %), followed by gas leakage/long-lived evaporative sources (24.6 vol %, 25.4 mass %), residential heating/long-lived combustion sources (23.4 vol %, 21.0 mass %), short-lived evaporative sources (11.3 vol %, 17.8 mass %), short-lived combustion sources (6.0 vol %, 5.9 mass %), and biogenic sources (3.5 vol %,  4.0 mass %).Weighted with the corresponding OH reaction rate constants, the mean OH-reactivity of the factors was essentially reversed (Fig. 6b): 24.2 % for biogenic, 22.5 % for short-lived combustion, 20.6 % for residential heating, 15.1 % for short-lived evaporative, 9.0 % for background sources, and 8.5 % for long-lived evaporative.Despite low volume and mass fractions, the biogenic isoprene factor has the highest contribution for the total reactivity and ozone formation.The impact is even higher during maximum emissions in the summer.

Conclusions and outlook
PMF was used for characterizing the impact of source categories and photochemical aging on this data set of NMHC measurements at the remote GAW site Hohenpeissenberg.This new approach does not aim for strict source apportionment and consequently does not require mass conservation or explicit treatment of photochemical aging in the factor profiles; however, the cost is losing a quantitative understanding of the source contributions.For the determination of the number of factors, statistics on the provided results were not decisive.Interpretability of the computed factors was the most important parameter in the analysis and interpretation of the results.Treatment of missing values and uncertainties had no substantial influence on the solutions.It could be interesting for other data sets to evaluate the number of missing values that the model can compensate for before an effect on the result is seen.The stability of the modeled factors depended on the individual uncertainties of the contributing substances.Especially short-lived Factors II (incomplete combustion) and III (evaporative) showed lower stability in the bootstrap runs performed by PMF.These two factors were the ones that contained compounds with high reactivity and thus higher variability at the receptor site.The other factors could be attributed to biogenic sources, residential heating including long-lived incomplete combustion, long-lived gas leakages/evaporative sources, and sources that reflect the continental background.The measured anthropogenic NMHCs have maxima in winter when photochemical removal by OH or ozone is lowest.Compared to simple box calculations assuming constant emissions, it could be demonstrated that the short-lived combustion Factor (II) and the gas leakage/longlived evaporative Factor (V) indicated constant emissions over the year as expected.The short-lived evaporative (III) emissions are higher in the summer by a factor of 2 to 3, in line with higher temperatures.The residential heating and wood-burning Factor (IV) indicated about twofold higher emissions in winter.The background (VI) was smoother than the constant source scenario, indicating additional sources in summer, like biomass burning or enhanced losses due to inter-hemispheric transport in the Northern Hemispheric winter.The overall influence of biogenic isoprene sources (I) and other short-lived Factors (II and III) on reactivity is substantial.In terms of reactivity and chemical processes like ozone production, these factors dominate with, on average, more than 60 % compared to their low volume contribution of less than 20 %.When considering other biogenic emissions such as monoterpenes, sesquiterpenes, and OVOC, the impact of the biogenic factor on chemical processes in summer during daytime will even increase.
Fractions of the factor composition profiles were often in agreement with source profiles from literature and calculated results from PMF studies in urban areas; some of the factors of this study even showed high agreement with those from other studies.However, some of the literature source profiles exhibit large uncertainties and only low substance resolution; thus the comparisons to these profiles leave a lot of room for interpretation (Theloke and Friedrich, 2007).Improve-1233 ment and updates of emission profiles, particularly in regard to new legal limitations of emissions, are highly needed.
Since Yuan et al. (2012) emphasized the non-negligible influence of photochemical aging on PMF results at urban sites, the pronounced inclusion of aging into the interpretation of the factors (in this case Factors II-V) rather than a typical source apportionment seems necessary for remote sites that cannot fulfill the assumption of mass conservation during transport.Factors II and IV were thus attributed to incomplete combustion originating mainly from vehicular exhaust and residential heating (lead components: light alkenes, alkynes, benzene); Factors III and V were attributed to evaporative losses from fuel (including fuel not burned, e.g., during cold-start), natural gas, and solvents as they predominantly contain alkanes and aromatics.Differences between Factors II and IV, and between III and V, respectively, were the different lifetimes of the compounds of, on average, 1.6 days (II) and 21 days (IV), and 4.9 days (III) and 26 days (V), at annual average OH and ozone.We interpret VOC patterns in the atmosphere as a superposition of a regional background determined by the longer-lived patterns (IV and V) and fresh, local impacts by the short-lived patterns (II and III).The relative compositions then reflect the different degrees of aging and mixing.This interpretation with respect to only two lumped source categories is plausible in the light of mainly two factors: firstly measurements were made at times of well-mixed conditions (13:00 CET), and local, more pronounced impacts of individual sources were damped.Secondly, emissions are usually correlated with population density for major source types especially in the rural surroundings of Hohenpeissenberg with no major local sources but multiple, wide-spread small sources; for example, the closest motorway is some 30 km east.Thus, signatures of different individual sources were not expected to show up in different air masses in a pronounced way.
The factors resolved for the Hohenpeissenberg data set were very similar to those found by Sauvage et al. (2009) for remote sites in France; PMF seems to be able to calculate reasonable results for reactive species without including reactivity into the uncertainty for the PMF model.It should be emphasized that the factors are not expected to resemble emission profiles but nevertheless are related to emission categories, and they offer a good tool to characterize source impacts and remoteness of different stations and regions.
In terms of implementing photochemical processing by adding to uncertainties or applying photochemical agebased parameterization, the method suggested by Sauvage et al. (2009) did not improve the interpretability and the photochemical age-based parameterization method suggested by Yuan et al. (2012) was not well applicable for a site like Hohenpeissenberg.The most important underlying restriction for the application of the latter method to our data set is that we cannot calculate photochemical age reliably from the scaling of the ratio of two compounds with different reactivity.
With the inclusion of nighttime data, PMF still resolved the same six factors and only slight differences to the daytime data solution were found, which supports the stability of the PMF solution in extracting profiles reliably at this remote site as well.Using another receptor model like CMB or UNMIX on this data set could further confirm the resolved factors.
The Supplement related to this article is available online at doi:10.5194/acp-15-1221-2015-supplement.

Figure 1 .
Figure 1.The expected winter/summer ratios (95-/5-percentile) as a function of the OH rate constants for compounds with assumed constant sources in a simplified box model and the mean OH profile determined at Hohenpeissenberg.

Figure 2 .
Figure 2. The calculated seasonal cycles of different NMHCs assuming constant source of 5 × 10 3 molec cm −3 s −1 and the average annual cycles of OH and ozone as measured at Hohenpeissenberg.

Figure 4 .
Figure 4. Annual pattern of the mean, median, and standard deviation of the factorial mixing ratio (pptv) of the respective source category.

Figure 5 .
Figure 5. Mean monthly variability of the contributions from the six factors.

Figure 6 .
Figure 6.Contributions of individual factors to the total amount of modeled NMHCs (pptv) (left) and to the total OH reactivity of modeled NMHCs (s −1 ) (right).

Table 1 .
Different treatments of missing values, values below the detection limit, and zero values.

Table 2 .
Observed winter/summer ratios for the considered NMHCs compared to ratios obtained from box calculations using constant emissions.

Table 3 .
Mathematical diagnostics for the results of PMF computations for different numbers of factors.

Table 4 .
Comparison of the calculated winter/summer ratios from the box model with constant emissions and the observed winter/summer ratios for the PMF-factors.