Atmospheric Chemistry and Physics Discussions

Abstract. A Positive Matrix Factorization receptor model for aerosol pollution source apportionment was fit to a synthetic dataset simulating one year of daily measurements of ambient PM2.5 concentrations, comprised of 39 chemical species from nine pollutant sources. A novel method was developed to estimate model fit uncertainty and bias at the daily time scale, as related to factor contributions. A circular block bootstrap is used to create replicate datasets, with the same receptor model then fit to the data. Neural networks are trained to classify factors based upon chemical profiles, as opposed to correlating contribution time series, and this classification is used to align factor orderings across the model results associated with the replicate datasets. Factor contribution uncertainty is assessed from the distribution of results associated with each factor. Comparing modeled factors with input factors used to create the synthetic data assesses bias. The results indicate that variability in factor contribution estimates does not necessarily encompass model error: contribution estimates can have small associated variability across results yet also be very biased. These findings are likely dependent on characteristics of the data.


Introduction
Air pollution comprised of particulate matter smaller than 2.5 µm in aerodynamic diameter (PM 2.5 ) has been associated with a significant increased risk of morbidity and mortality (Dockery et al., 1993;Pope et al., 2002;Peel et al., 2005). Existing regulations have focused on average and peak PM 2.5 concentrations (µg m −3 ). To help policy 20 makers design more targeted and cost-effective approaches to protecting public health and welfare, an understanding of the association between PM 2.5 sources and morbidity and/or mortality needs to be developed.
The Denver Aerosol Sources and Health study (DASH) has been undertaken to understand the sources of PM 2.5 that are detrimental to human health. PM 2.5 filter sam-factor profiles due to receptor model uncertainty by varying the parameters in models fit to bootstrapped datasets.
The Environmental Protection Agency's Office of Research and Development distributes two software products, EPA PMF 1.1 (Eberly, 2005) and EPA Unmix 6.0 (Norris et al., 2007), which incorporate the bootstrap to analyze receptor model fit results. 5 The software can be used to assess uncertainty in factor profile estimates and has been used by studies such as Chen et al. (2007) and Olson et al. (2007) to characterize sources of PM 2.5 . Few studies, however, have addressed uncertainty in factor contribution estimates. Two examples are Nitta et al. (1994) and Lewis et al. (2003), though the estimates come from different source apportionment models and pertain to 10 average contribution variability.
The method presented in this paper estimates at the daily time scale bias and variability due to random sampling error in factor contribution estimates. Three novel techniques make such estimation possible: the resampling of measurement days using a balanced bootstrap to create replicate datasets; use of neural networks to match 15 factors across results on that data; the tracking of all measurement days resampled across the replicate datasets. This discussion describes the method in the context of application to a synthetic PM 2.5 dataset, which was designed to simulate DASH data, fit by the PMF model. Using synthetic data allows assessment of model fit as well as a way to validate the method itself. Introduction Interactive Discussion tions consistent with the Denver area. The solution from applying Positive Matrix Factorization (PMF) can be compared with "known" profiles and contributions, allowing estimates of bias to be computed. The first novel aspect of the method is to use a balanced bootstrap to generate additional data by resampling, with replacement, from the original synthetic measurement 5 series. Each new dataset, or replicate, is again fit by the PMF model to apportion the PM 2.5 mass to factors.
The second novel aspect pertains to how factors are sorted between solutions. For each solution the factors should correspond to the same real-world pollution sources. The factors need to be aligned such that "factor k" in each solution always refers to the 10 same factor. To accomplish this factor alignment, or matching, the standard approach has been to use scalar metrics like linear correlation to match a factor from one solution to the "closest" factor in another solution. This is the approach taken by the EPA PMF 1.1 software, where it is specifically the time series of factor contributions that are matched between solutions. In contrast, the present work takes the novel approach of 15 using Multilayer Feed Forward Neural Networks (NN), trained to perform pattern recognition, to align factors between PMF solutions. Further, using the intuitive notion that pollution sources are characterized best by the chemical species they emit, the matching is based on factors' profiles rather than their contributions. The NN approach is a robust factor matching technique: it avoids the sensitivity to outliers that is problematic 20 when using measures such as linear correlation and replaces it with a method that is capable of capturing linear as well as non-linear relationships.
The third novel aspect in the method presented here is the tracking of the days resampled in each bootstrapped dataset. Through this bookkeeping it is possible to arrive at a collection of PMF results for each factor's contribution on each day. Further, 25 because a balanced bootstrap is employed, over the aggregate of all bootstrapped datasets each original sampling day will be resampled the same number of times. Accordingly, descriptive statistics for each factor contribution on each day can be computed puted. In traditional factor analysis approaches, including Principal Components Analysis, the variance-covariance matrix of the observations is used in an eigen-analysis to find the factors that explain most of the variability observed. The uncertainty in the observations, for all variables, is assumed to be independent and normally distributed. These assumptions are often not valid in the context of air pol-10 lution measurement data. In contrast, PMF -a receptor-based source apportionment model -offers an alternative technique that is based upon a least squares method, and measurement uncertainties can be specific to each observation, correlated, and nonnormal in distribution. Further, the factors resultant from PMF need not be orthogonal, which is an important quality when trying to associate modeled factors to real-world 15 pollution sources that can be highly temporally correlated but are nonetheless important to characterize separately (e.g. diesel versus gasoline fuel combustion). Given a matrix of observed PM 2.5 concentrations, X, PMF attempts to solve by finding the matrices G and F that recover X most closely, with all elements of G and 20 F strictly non-negative. G is the matrix of factor contributions (or "scores" in traditional factor analysis terminology), where G i k is the concentration factor k contributed to the total PM 2.5 observed in sample i . F is the matrix of factor profiles (or "loadings"), where F kj is the fraction at which species j makes up factor k. Finally E is the matrix of residuals defined by G and F are found through an alternating least squares algorithm that minimizes the sum of the normalized, squared residuals, Q where E i j is weighted by S i j , the uncertainty associated with the measurement of the j th pollutant species in the i th sample. The ability to weight specific observations with 5 specific uncertainties allows PMF to handle data that include heterogenous measurement uncertainty, outliers, values below measurement detection limits, and missing values. As such, PMF can often yield better results than traditional factor analysis methods (Huang et al., 1999). An algorithm for fitting the model to data is available as a commercial software tool, 10 PMF2, which is used here. PMF2 has numerous optimization parameters that can be set by the user, and methods of choosing these values has been published elsewhere (Paatero, 2000;Paatero et al., 2002;Paatero et al., 2005). Since the focus of this paper is on a method of assessing uncertainty and bias in PMF solutions, the discussion of fine-tuning the numerous algorithm parameters is kept to a minimum.

Synthetic data
Given that the results of pollution source apportionment models may ultimately be used as critical components of environmental policy and regulatory decisions, it is especially important to assess their quality. One approach for evaluating receptor models is the use of synthetic data, which is defined as simulated PM 2.5 measurements rather than 5 actual observations (Willis, 2000). Predefined sources are used, along with respective contributions and profiles, to create the G and F matrices in Eq.
(2). With G and F "known", X can be calculated and then given as input (along with uncertainty estimates) to the PMF2 software, where the resultant G and F matrices can then be compared with the actual values to assess model fit.

10
The method of creating synthetic datasets followed in this paper is described in detail in Brinkman et al. (2006) and Vedal et al. (2007) 1 . Briefly, nine pollutant sources were used (Table 1), which contributed concentrations of 39 chemical species (Table 2), over 365 synthetic sampling days. Table 1 also lists the references used to generate the annual contributions, chemical profile, temporal patterns and variability for each 15 source. Distinct time series for the contributions from each source were generated by starting with average contribution estimates from preliminary DASH studies and the Northern Front Range Air Quality Study , then adding day-to-day variations reflecting both random variability and hypothesized weekly or seasonal patterns, as ap- 20 propriate. Daily totals for the nine source contributions were normalized to match actual daily PM 2.5 levels observed in Denver in 2003. Oxygen was included as a species for mass closure purposes, assuming that oxygen represented 30% of the organic carbon mass fraction. The matrix of data uncertainties, S from Eq. (3), is computed as follows. Measurement detection limits, detection limit uncertainty, and measurement Introduction Interactive Discussion uncertainty associated with typical analytical techniques used to speciate PM 2.5 filter samples (Ion Chromatography, Thermal Optical Transmission, and Gas Chromatography/Mass Spectrometry), were incorporated into the PMF input via where for species j , α j is the measurement uncertainty, β j is the detection limit uncer-5 tainty, and D j is the detection limit. These uncertainties were incorporated into the final data matrix X ′ with the following formula where Z i j is a random number drawn from a standard normal distribution. If X ′ i j was less than the detection limit associated with measuring species j , then a value of one-10 half the detection limit was substituted in the final data matrix.

The Bootstrap
The bootstrap is a computationally intensive method for estimating the distribution of a statistic, the statistic itself being an estimator of some parameter of interest (Efron, 1979). The essence of the method is to create replicate data by resampling, with 15 replacement, from the original observations of a random variable. For each replicate dataset the statistic of interest is computed, and the distribution of these values serves as an estimate for the random sampling distribution of the statistic. The properties of this distribution are then used to make inferences about the parameter of interest. In the present context, each pollutant species' time series represents realizations of a random 20 variable. The F and G matrices resulting from PMF's fitting of these data are functions of these random variables, thus, each element of those matrices may be considered a statistic. Previous studies using PMF have focused on analyzing the F matrix, the matrix of factor profiles. This discussion takes a different tack, with the statistic of interest being each element of the G matrix, the matrix of daily factor contributions. Introduction Much of bootstrap theory is based upon the assumption that the data are comprised of observations of independent and identically distributed (iid) random variables. Time series data, however, are typically serially correlated. Singh (1981) showed that the bootstrap can be inconsistent in estimating the distribution of statistics based upon de-5 pendent data. Since then, numerous modifications of the original iid bootstrap have been formulated to better handle dependent data (Carlstein, 1986;Kunsch, 1989;Liu and Singh, 1992). One approach often used for time series data is to resample blocks of successive observations. If the blocks are of sufficient length, l , and the series is only weakly dependent, then the observations within each block may be considered 10 independent of the observations within the other blocks. Further, if the series is stationary then all blocks will share the same l -dimensional joint distribution. These two conditions allow the blocks to be treated as independent and identically distributed observations to which the iid bootstrap can be applied. This approach is currently used by the EPA PMF 1.1 software tool, which uses a 3-day block bootstrap.
In the EPA's bootstrap implementation, as well as this study, measurement days are resampled. In the present case, realizations of a composite random variable comprised of 39 pollutant species are resampled, with replacement, from the original synthetic dataset. To investigate an appropriate bootstrap block size the serial correlation of each pollutant species in the synthetic dataset, as well as total PM 2.5 mass, was examined. 20 An Auto Regressive (AR) time series model was fit to each species' series and the optimal lag parameter, p, was found. Figure 1 shows the distribution of the lag values. Figure 1 Interactive Discussion size critically dependent upon the size of the data as well as the statistic for which the distribution is being estimated (Hall et al., 1995;Lahiri, 2001). Given these findings, and that practical methods for choosing block size are based on examining the lagdependence in the data (Politis and White, 2004), it seems difficult to choose a single block size for resampling measurement days of speciated PM data. 5 An additional complication in using the block bootstrap with such data is that while all block bootstrap schemes are designed to handle serial correlation they also assume the data are from a stationary stochastic process. To the authors' knowledge, there are no published results in which a block bootstrap was used on speciated PM time series data that had tested for, or transformed to, stationarity prior to resampling. It is 10 assumed that this difficulty was simply ignored but it deserves consideration in future applications of the block bootstrap.
The present work takes an alternate approach to bootstrapping the pollutant time series data and avoids the block bootstrap. Instead, a variant of the iid bootstrap, called the balanced bootstrap, is employed. The balanced bootstrap ensures that over the 15 course of creating numerous replicate datasets all original observations are resampled the same number of times, which can reduce problems of bias and variance in estimation (Davison et al., 1986), while being only marginally more computationally expensive than an "unbalanced" bootstrap (Gleason, 1988). In the present context, use of the balanced bootstrap violates the assumption that the data are an iid sample; 20 however, there is evidence that in some settings the iid bootstrap can still yield accurate estimation even when applied to dependent data (Kiefer and Vogelsang, 2005;Goncalves and Vogelsang, 2007 2  included at all (recall that since the balanced bootstrap is used here, over the aggregate of all replicate datasets each measurement day is resampled the same number of times). By keeping track of which days are resampled in which replicate dataset it is possible to assess factor contribution uncertainty and bias at the daily time scale.

10
The use of the bootstrap yields a collection of factor contribution matrices, G (k) , k=1,. . . ,B, where B is the number of bootstrap replicate datasets. The collection of matrices may be considered as a single, three-dimensional matrix G ′ , with elements G ′ (k) i j i =0,. . . ,N-1; j =0,. . . ,P-1; k=0,. . . ,B, where N is the number of samples and P is the number of factors (note that the G matrix associated with the original data and 15 "base case" solution is also included in G ′ ). While the nature of the factors that the PMF2 algorithm finds should be stable across the "bootstrap" solutions, the ordering of the factors within those solutions may be different. Before computing statistics on the elements of G ′ , the dimension indexed by j must be sorted, such that across the B+1 matrices factor j refers to the same real-world pollution source. The typical approach to 20 matching and sorting factors has relied on comparisons between the contribution time series, using linear correlation to match a given factor from a bootstrap solution (or a factor from another analysis method) to the "closest" factor in a base case solution.
There are several concerns with this approach. First, "closeness" is measured with a scalar metric that is highly sensitive to outliers. Second, the bootstrap replicate data 25 sets will not preserve the temporal patterns seen in the original data when viewed over correlated with the same factor in the base case solution. The use of this metric invites "data dredging", where the practitioner must make ad hoc choices to separate and match factors. The present discussion employs an approach that the authors believe to be novel and robust when applied to aerosol pollution data: neural networks are used to match 10 factors between bootstrap solutions and the base case solution based upon their profiles. Matching on profiles addresses the second issue noted above, while the use of neural networks rather than correlation address the first and third issues. The need to classify a measured spectrum (profile) with a known reference spectrum is a problem found in multiple scientific settings, most notably in the analysis of stellar spectra 15 and data from hyperspectral remote sensing. Findings in these fields may be useful in the modeling of aerosol pollution data and are considered briefly. Work by van der Meer (2006) found that a spectral similarity measure based on correlation was more sensitive to noisy data than other traditional measures based on Euclidean distance or spectral angle. Further, Shafri, et al. (2007) reported that neural networks were ac-20 curate at classifying spectra from remote sensing of tropical forests, especially when compared to measures based on spectral angle. Tong and Cheng (1999) found that using neural networks was superior to using maximum correlation when classifying gas chromatography mass spectrometry data. Based on these findings, as well as the findings presented herein, the authors are confident that using neural networks to match 25 factor profiles allows the bootstrap technique to be better leveraged. The factor matching process can be easily automated, adapted to complex patterns and new, possibly noisy, data, and can avoid subjective "closeness" thresholds required when using less robust measures like correlation. Artificial Neural Networks are statistical modeling methods capable of characterizing highly non-linear functions, doing so by approximating the behavior of the brain. The term "artificial" is used to distinguish this numerical approximation of biological, adaptive, cognition from those biological systems. In general, this is understood in statisti-5 cal modeling, and Artificial Neural Networks are simply referred to as Neural Networks (NN). Excellent introductions to the subject can be found in Haykin (1998) and Munakata (1998). The specific type of NN used in this study is called a Multilayer Feed Forward Network. This type of network relies on supervised learning, in which the network is given 10 inputs and learns how to transform it into desired outputs. The learning is encoded in numerical weights defining the strength of connection between elements in the network. Weights are found through quasi-Newton optimization incorporated with the backpropagation method, where "backpropagation" refers to the ground-breaking algorithm developed in the 1970s and 1980s (Rumelhart et al., 1986;Werbos, 1974), 15 allowing neural networks to classify linearly inseparable patterns. A trained network, characterized by its structure and its weights, can then be given novel input and transform it to the correct output. In the present work that transformation is classification: given a new factor profile, the trained neural networks will classify it as a known type, or possibly classify it as unknown.

Neural Network configuration
In the present work, NN software from Visual Numerics' IMSL® C Numerical Library, version 6.0, is used. The structure of the network is three fully connected layers, with 39 nodes in the input layer, five nodes in the hidden layer, and two nodes in the output layer. The values of the two output nodes range between 0 and 1. An output of [1,0] 25 indicates a perfect match between an input factor profile and the profile that particular network was trained to classify as a "Yes". Likewise, an output of [0,1] indicates a Interactive Discussion perfect non-match between an input factor profile and the learned profile. A "Yes" match is only possible if the first output node has a value of at least 0.95, with the second node having a value no larger than 0.05. The performance of the network depends heavily upon the data used for training. It is well established that neural networks can be unstable when data used for training 5 varies greatly in scale; therefore, transformation and normalization of data are typical preprocessing steps. In this study, factor profiles are normalized before being learned by the networks. In "raw" form, the rows of the F matrix correspond to factor profiles and each row sums to 1. Viewing factor profiles this way can sometimes result in factors that are difficult to distinguish, since some species will be present in large amounts in 10 many factors (e.g. organic carbon and oxygen). To make plots of factor profiles more visibly distinguishable the following normalization is done, where F ′ kj is the relative weighting species j has in factor k's profile when considering all other factors. When viewing factor profiles under this normalization, species com-15 mon to many factors are damped and marker species become more pronounced, as compared to viewing the raw profiles.

Training data
Five datasets were used to train the neural networks. One dataset was the original synthetic data, with the remaining four being bootstrap replicates. PMF was fit to each 20 dataset, with the solution associated with the original dataset being the base case. The base case factor profiles were normalized (as described above), plotted, and visually compared to the normalized factor profile plots associated with the bootstrap solutions. For each bootstrap solution the factors were reordered to match the base case order- ing. Figure 2 shows the end result of this process, with the five factor profile plots used to classify each factor for the neural network training, as well as the actual profile used to create the original synthetic dataset. Figure 2: Plots of five normalized profiles for each factor learned by the neural networks. The thicker, black line represents the profile associated with the "base case" solution, while the thicker red line indicates the actual profile used to create the synthetic data. The remaining four colors correspond to factor profiles for "bootstrap" solutions based on resampled data.

Method steps
Having discussed the major components of the method for analyzing factor contribution 10 uncertainty and bias, it is helpful to summarize their relationship in the following steps: Step 1: Using the synthetic PM 2.5 data and measurement uncertainties, compute a base case PMF model fit that has P factors. In the present work, P=9. 15 Step 2: Create T bootstrap replicate data matrices, with corresponding uncertainty matrices, and fit each set with PMF. These results, in addition to the base case result, will serve as the neural network training data. In the present work, T=4.
Step 3: For each training replicate dataset, visually compare the normalized 20 bootstrap factor profiles versus the normalized base case profiles, and define the factor matching between the results. Reorder the factors to be consistent with the base case factor ordering.
Step 4: For each factor, train a neural network to learn its normalized profile, as 25 well as what is not its profile (thus, there will be P networks). For each factor there will be T+1 profiles to be learned as "Yes" patterns. The remaining P-1 profiles associated with the base case results are learned as "No" patterns. Step 5: Create B new bootstrap replicate datasets and fit each one with PMF. These results will be used to assess PMF model fit uncertainty. In the present work, B=500.

5
Step 6: For each bootstrap solution, allow each of the P neural networks to examine each of the P normalized factor profiles. Each network should identify a unique factor profile as a "Yes", with all others classifying the profile as "No". Reorder the factors in the bootstrap solution accordingly.

10
Step 7: Parse the factor contribution data by day-factor combination. For example, consider examining the bias and variability in the PMF solutions for the 3rd factor on day 126. All B+1 datasets would be searched for where the original day 126 was resampled. This collection of indices would be used to index into the 3rd column of the corresponding solutions' G matrix to get factor 3's contribution on day 126. The 15 distribution of values is then compared with the actual contribution used to create the original synthetic dataset.

Note that
Step 3 is what establishes the supervisor for the supervised learning algorithm used to train the neural network. The role of the neural network is to learn 20 the classification defined by an expert human observer, such that when new factor profiles are analyzed, they are classified as would the expert. In Step 6, it is possible that a bootstrap factor can be matched with more than one base case factor, or, perhaps, it cannot be matched to any base case factor. In either case, that particular solution is excluded from the collection of other solutions. In this way, after the last replicate dataset has been fit by PMF, the collection of solutions all correspond to the case where bootstrap factors were matched uniquely to base case factors.

Results
Eight and nine factor PMF models fit to the original synthetic data were comparable, in terms of sums of the normalized, squared residuals, Q, the residuals associated with specific species, and the physical interpretability of the factors. Seven and ten factor solutions were judged less optimal with respect to these same measures. Accordingly, 5 descriptive statistics are presented for both eight and nine factor simulations (Table 3). However, to facilitate comparison of model fitting results with the contributions of the nine sources used to create the synthetic data, additional results pertain to the simulation using a nine factor solution. For the eight factor solutions, it was found that the meat cooking and wood combustion factors were collapsed into a single factor.  tion on day 146. This example represents a "vertical slice" from the contribution time series in Fig. 3 and can be calculated for any factor-day combination.

Factor contribution plots
The results of applying the method to the synthetic PM 2.5 data demonstrate several types of PMF solutions. The first is exemplified by the contribution time series plots for factors 0 and 1 (Fig. 3). Here, PMF's solution, over hundreds of resampled datasets, 5 shows low variability and small bias when compared to the actual contribution time series. Factors 3 and 4 exemplify solutions with moderate variance between resampled datasets and also moderate bias (positive in the first case, negative in the latter). Factors 2 and 7 represent solutions in which the temporal structure matches closely with the actual respective contribution time series, but the bias is quite large. Finally, factors 5, 6, and 8 have solutions that match poorly against the known contributions.
If the synthetic data is assumed to be a close approximation of data likely to be actually observed, then the first two cases (factors 0, 1, 3, and 4) represent pollution sources well-modeled by PMF. On the other hand, factors 2, 5, 6, 7, and 8 might represent pollution sources for which the PMF model is suspected to provide biased or 15 generally poor fit. Thus, the presented method could also serve as a way of qualifying future PMF solutions. Factors corresponding to sources not well-modeled in the synthetic data solutions could be given additional scrutiny when identified in solutions corresponding to real data.

Uncertainty, variability, and bias 20
Application of the method yields estimates of variability and bias in daily factor contributions, which can be used in an uncertainty analysis of the PMF model fit. However, the uncertainty in results brought out by fitting PMF to resampled data is likely different compared to the uncertainty in the results due to model assumptions. For  Interactive Discussion about measurement errors; if a different source apportionment model was used altogether? As Chatfield (1995) notes, "It is indeed strange that we often admit model uncertainty by searching for a best model but then ignore this uncertainty by making inferences and predictions as if certain that the best fitting model is actually true." In the present work, as much as possible, model assumptions have been avoided. Input 5 data was not filtered after seeing preliminary output, and PMF2 parameters were set to avoid assumptions about the distribution or "quality" of the data. Still, the use of PMF as the receptor model, the chemicals included in the analysis, and the number of factors to be characterized, were all choices and are clearly subjective. The present work seeks to offer a method for assessing uncertainty in model fit when it is assumed 10 that the model is valid, and this distinction should be kept in mind.

Eight versus nine factor solutions
In simulations using nine factor solutions, the rate at which the neural network factor matching method failed to uniquely match bootstrapped factors to base case factors was generally close to 30%. Typically this was just one bootstrap factor matching with 15 two base case factors for a given bootstrap solution, with the remaining bootstrap factors having a unique match. Specifying only eight factors allowed PMF to collapse two similar factors-meat cooking and wood combustion-leading to generally more distinguishable results, thus unique factor matching failed at approximately the 5% rate. The nine factor solutions are presented in Figs. 2 and 3 to allow easy comparison with 20 the actual factor information. It is important to note that, when using the traditional factor matching method based on the maximum linear correlation between the contribution time series, the nine factor bootstrap solutions were difficult to interpret and sort. Consider Fig. 3d and e for the contribution time series plots for factors 3 (diesel fuel combustion) and 4 (paved road dust), respectively. If linear correlation alone was used 25 as the metric for matching, it is easy to imagine how often factor 3 might be labeled 4 sometimes and vice versa. The time series plots would accordingly show larger interquartile ranges, which would purely be an artifact of the factor matching technique 2996 Introduction Interactive Discussion "lumping apples with oranges", misleading the practitioner into inferring that PMF's fit was more uncertain than it in fact was. In contrast, the neural network method uses pattern recognition to classify factor profiles that may differ in some species between datasets, differences that may be enough to throw off measures like linear correlation, but not an expert observer.

The nonparametric bootstrap
The method used here makes use of a nonparametric bootstrap for creating replicate datasets. The term "nonparametric" refers to the fact that the bootstrap resamples the data itself, as opposed to data from a generating process for which parameters would have to be set. The parametric approach assumes some data generating process is 10 an accurate approximation for the data actually in hand. In the present setting, however, there are often dozens of chemical species comprising PM 2.5 data, each likely characterized by a different probability density function and cross correlation with other species. Accordingly, the parametric bootstrap does not appear to be a feasible tool unless the practitioner was able to formulate a data generating process for each species. 15 Another version of a parametric bootstrap resamples residuals from a model fit. In the present context this approach could be outlined as: Use PMF2 to find a solution given a data matrix, X; take the resulting residuals matrix, E, and add it to X to create a new dataset; use PMF2 to find a solution using this dataset; repeat the previous steps as desired. A fundamental assumption of this approach is that the model is the true 20 model, and, given this assumption, residuals should be independent and identically distributed. For the simulation presented here, however, the base case solution had associated residuals for eight of the 39 species that failed some basic test of independence (for example, runs up, runs below the mean, and length of runs). While it may be possible to fine tune the PMF2 settings to improve the results, the residuals 25 bootstrap rests upon the assumption that the model from which the residuals come is "true". The authors believe that this is inappropriate given the level of model uncertainty present. In contrast, the nonparametric bootstrap employed here gives focus to 2997 Introduction Interactive Discussion the PM 2.5 data itself, avoiding assumptions about the validity of the model fit to that data. The surrounding method for assessing that model's quality is equally applicable to PMF results as it is to another source apportionment model, and is applicable to assessing estimates of source profiles as well as estimates of source contributions.

5
The method presented here can likely be made even more robust, and the authors propose two options to explore. The first is to consider training the networks on more information than just the scaled factor profiles. For example, additional input-layer nodes could encode information about factor contributions or tracer species. The second option pertains to assessing replicate datasets before fitting the PMF model. In the 10 present discussion replicate datasets generated by bootstrapping were not examined in any way for being "realistic" prior to the PMF model fit. Heidam (1987) presented a bootstrap method in which the replicate datasets were first screened by looking at their associated covariance matrices. If a given covariance matrix was not representative of the covariance structure assumed to be truly in the data, then the bootstrapped dataset 15 was not fit by the source apportionment model. There are numerous accept-reject criteria that could be employed such that non-representative replicate datasets would not be fit by PMF. For example, if certain marker species or "rare event" sampling days were deemed critical to the model fit, replicate datasets could be tested for sufficient representation of those data before use in subsequent analyses. This approach was 20 avoided in the present discussion in order to focus on the method's performance with as few practitioner-defined assumptions as possible. In certain settings, however, such assumptions may be warranted.