Fine particulate matter source apportionment using a hybrid chemical transport and receptor model approach

. A hybrid ﬁne particulate matter (PM 2 . 5 ) source apportionment approach based on a receptor model (RM) species balance and species speciﬁc source impacts from a chemical transport model (CTM) equipped with a sensitivity analysis tool is developed to provide physically and chemically consistent relationships between source emissions and receptor impacts. This hybrid approach enhances RM results by providing initial estimates of source impacts from a much larger number of sources than are typically used in RMs, and provides source–receptor relationships for secondary species. Further, the method addresses issues of source collinearities and accounts for emissions uncertainties. We apply this hybrid approach to conduct PM 2 . 5 source apportionment at Chemical Speciation Network (CSN) sites across the US. Ambient PM 2 . 5 concentrations at these receptor sites were apportioned to 33 separate sources. Hybrid method results led to large changes of impacts from CTM estimates for sources such as dust, woodstoves, and other biomass-burning sources, but limited changes to others. The reﬁnements reduced the differences between CTM-simulated and observed concentrations of individual PM 2 . 5 species by over 98 % when using a weighted least-squares error minimization. The rankings of source impacts changed from the initial estimates, further demonstrating that CTM-only re-sults should be evaluated with observations. Assessment with RM results at six US locations showed that the hybrid results differ somewhat from commonly resolved sources. The hybrid method also resolved sources that typical RM methods do not capture without extra measurement information for unique tracers. The method can be readily applied to large domains and long (such as multi-annual) time periods to provide source impact estimates for management-and health-related studies.


Introduction
Fine particulate matter (PM 2.5 ) with an aerodynamic diameter less than 2.5 µm is associated with adverse effects on human health (e.g., Dockery et al., 1993).From the perspective of linking health effects with air quality, and for assessing air quality management options, it is desirable to have the spatially and temporally resolved impacts of major emission sources.However, quantifying the impacts of individual sources on the ambient concentration of fine particulate matter, better known as source apportionment (SA), is challenging.A fundamental issue with any SA method is that there is no way to directly measure source impacts, and therefore it is difficult to assess the accuracy of source apportionment results.Tracer gases such as cyclic perfluoroalkanes and SF 6 can be utilized to help quantify source impacts (Martin et al., 2011).However, such an approach is typically limited to assess a specific source's impact in special studies.Instead, source apportionment results are typically evaluated by comparing simulated concentrations of individual components and total mass of PM 2.5 with observations (e.g., Watson et al., 2008;Viana et al., 2008b).
Receptor model (RM) approaches have long been used for PM 2.5 source apportionment (Chow et al., 1992;Cooper and Watson, 1980;Liu et al., 2006;Martello et al., 2008;Reff Published by Copernicus Publications on behalf of the European Geosciences Union.et al., 2007;Schauer et al., 1996;Swietlicki et al., 1996;Thurston et al., 2011;Viana et al., 2008b;Watson, 1984;Watson et al., 2008;Xie et al., 2013).These methods, such as chemical mass balance (CMB) (Watson et al., 1984) or positive matrix factorization (PMF) (Pattero and Tapper, 1994), rely on using observed species concentrations of PM 2.5 at a receptor(s) and solve a set of species balance equations to estimate source impacts.RM methods typically do not use emissions estimates or explicitly account for the chemical and physical processes that govern pollutant transport and transformation after being emitted from a specific source.To address these limitations, additional approaches are used (Blanchard et al., 2012;Chen et al., 2011;Lin and Milford, 1994;Roy et al., 2011;Watson et al., 2002;Wittig and Allen, 2008).In addition, receptor modeling typically accounts for a relatively small number of sources (on the order of ten out of hundreds in the inventory), comprising about 80 % of the estimated emissions (Baek, 2009), leading to potential biases in the results.In RM methods, the common approach for assessing the accuracy of source apportionment results is to compare the calculated PM 2.5 composition concentrations and total mass to observations, and if they compare well, it is assumed that the results are reasonable.However, this type of evaluation does not use a set of observations that are completely independent of the ones used to obtain the source impacts, although non-fitting species comparisons and other tests can be used to assist in the evaluation (USEPA, 2004).Further, similar estimated species concentrations, and hence similar performance, can result from very different combinations of source impacts.Results can also be quite sensitive to model inputs (e.g., source profiles for CMB), or the number of sources (or factors in PMF) used.Differences in source apportionment results for similar cases found between competing RM methods also suggest errors (Held et al., 2005;Laupsa et al., 2009;Lee et al., 2008;Lowenthal et al., 2010;Marmur et al., 2006;Rizzo and Scheff, 2007;Shi et al., 2009;Viana et al., 2008a;Watson et al., 2008).Several studies have tried to reconcile the results by refining source profiles and adding extra constraints (Lee and Russell, 2007;Marmur et al., 2007;Sheesley et al., 2007;Swietlicki et al., 1996;Watson et al., 2008).Extra species such as organic molecular markers and other unique tracers for certain sources have been utilized in RM modeling (Bullock et al., 2008;Lee et al., 2009;Schauer et al., 1996;Zheng et al., 2002) to improve the accuracy and identify additional sources, however measurements of those markers are not available from routine monitoring networks.
Source-oriented modeling (SM) approaches, such as chemical transport models (CTMs), follow the emission, transport, transformation, and loss of chemical species in the atmosphere to simulate ambient concentrations and source impacts.CTMs can compensate for limitations in RM methods (Burr and Zhang, 2011a, b;Doraiswamy et al., 2007;Held et al., 2005;Henze et al., 2009;Kleeman et al., 2007;Kwok et al., 2013;Lowenthal et al., 2010;Marmur et al., 2006;Russell, 2008;Schichtel et al., 2006;Wagstrom et al., 2008;Wang et al., 2009;Ying et al., 2008) because they describe processes affecting source-receptor relationships from a first-principles basis.For example, compared with RMs, CTMs directly account for secondary formation of PM 2.5 and nonlinearities in pollutant transformations and have the ability to quantify a more complete range of sources.Also, CTMs use knowledge of the specific location of emission sources in the region and their emission rates, and can provide spatially resolved source impacts across the modeling domain.An important strength of using CTMs for source apportionment is that model evaluation relies on independent data.However, estimates of source strengths and characteristics (e.g., diurnal and day-to-day variations) are viewed as highly uncertain, meteorological inputs of CTMs contain errors, and there continue to be uncertainties in how various processes are described.In addition, CTM methods utilize different approaches within a CTM framework to further estimate source impacts.These approaches include but are not limited to particulate matter source apportionment technology (PSAT) (Wagstrom et al., 2008), tagged species source apportionment (TSSA) (Wang et al., 2009), the integrated source apportionment method (ISAM) (Kowk et al., 2013), and various source and receptor sensitivity approaches (e.g., Koo et al., 2009;Henze et al., 2009).However, there are various theoretical limitations of each approach in determining source impacts in the complex atmospheric system (Koo et al., 2009;Burr and Zhang, 2011b).Due to these uncertainties/limitations and the required level of effort, SM approaches are not as widely used as RM methods for conducting PM 2.5 source apportionment.
One way to take the advantages of SM approaches is to further improve SM source apportionment results by utilizing species concentration observations in a manner similar to RM approaches.Here, a hybrid SM-RM approach is developed and applied to obtain improved source impact estimates by integrating measurements with the CTM results, including uncertainty estimates of measurements and emissions.As developed, the method integrates the CMB method with CTM results at monitoring locations and measurement times by adding additional information and constraints in a species balance approach similar to CMB.The improved source impact estimates at these sparse locations can potentially be utilized to obtain source impact fields using spatial and temporal interpolation that take advantage of the initial CTM estimates across the domain and over the time period of interest.In this study the hybrid approach is applied to a 36 km resolution CTM simulation over North America.Our focus is to demonstrate the hybrid method by closely examining SM-RM source apportionment results across all sites and with more detail at select locations.

CTM simulation and measurement data
Simulated three-dimensional concentration fields of trace chemical species are obtained using the Community Multiscale Air Quality model (CMAQ) (Byun and Schere, 2006) version 4.5 (for using a newer version see Note S1 in the Supplement) with strict mass conservation (Hu et al., 2006), the SAPRC-99 chemical mechanism (Carter, 2000), and the aerosol module described in Binkowski and Roselle (2003).The modeling domain (Fig. 1) covers the continental United States (CONUS) as well as portions of Canada and Mexico with 36 km × 36 km horizontal grids and 13 vertical layers of variable thickness extending from the surface to 70 hPa.CTMs applied with higher horizontal resolution would perform better in comparison of volume concentration to point measurement, especially for particulate matter, but computational cost increases rapidly.CTM modeling using 12 km × 12 km grids covering the CONUS was restrictive at the time of this research but has started to become more practical recently.
We used meteorological fields generated by the Fifth-Generation PSU/NCAR Mesoscale Model (MM5) (Grell et al., 1994), run with 35 vertical levels using four-dimensional data assimilation (FDDA), and the Pleim-Xiu land-surface model (Pleim and Xiu, 1995;Xiu and Pleim, 2001).Simulated meteorological fields were evaluated against surface hourly observations from the US and Canada (Table S1 in the Supplement); performance was well within the typical range for regional air quality modeling (Emery et al., 2001;Hanna and Yang, 2001).
Emissions inputs used were developed from a 2004 inventory that was projected from the 2002 National Emissions Inventory (NEI2002, obtained from http://www.epa.gov/ttn/chief/emch/index.html#2002).Projection of the 2002 inventory to 2004 was conducted using growth factors obtained from the Economic Growth Analysis System (EGAS) version 4.0 and control efficiency data obtained from EPA for existing federal and local control strategies.In addition, US emissions from large NO x and SO 2 point sources for 2004 were obtained from the continuous emissions monitoring (CEM) database (http://ampd.epa.gov/ampd/).The inventory has emissions of seven criteria pollutants including PM 2.5 .The Sparse Matrix Operator Kernel for Emissions (SMOKE) model (CEP, 2003) is used to process the emissions inventory and prepare gridded, CMAQ-ready emissions inputs.In SMOKE processing, PM 2.5 emissions were split into major components (sulfate, nitrate, EC, OC, and other) using source-specific speciation profiles from the SPECI-ATE program (Simon et al., 2010).The component historically called "unidentified" in the emissions modeling process is called "other" here because this portion of PM 2.5 is derived from measurements that provide the composition of the emissions and includes element species, which can be used to track source specific impacts on primary PM 2.5 .Spatial surrogates provided by the US EPA (http: //www.epa.gov/ttn/chief/emch/spatial/),derived from census and geographic information such as population, households, road networks, railroads, land use, etc. were used in SMOKE for spatially distributing different emission subcategories according to their source classification codes (SCCs).Monthly, weekly, and diurnal temporal profiles were used to allocate emissions by hour.While most temporal profiles were used nationwide, dozens of state-specific temporal profiles were also applied.For example, different diurnal profiles have been developed for prescribed burning emissions from different states.Therefore, emissions uncertainties and biases are not expected to be spatially or temporally uniform, especially on a daily basis.
We apply the above modeling system to simulate PM 2.5 and gaseous concentrations for the month of January 2004, with 1-3 January as ramp-up days.The simulations of major PM 2.5 and gaseous species were compared against measurements from multiple monitoring networks (Table S3 in the Supplement) with performance statistics well within the normal range of current state-of-the-art CTMs (Boylan and Russell, 2006;Simon, et al., 2012;Tesche et al., 2006).We chose to simulate a winter episode for a number of reasons: (1) wintertime provides a complete range of source sectors for a better evaluation of CTM source impact results.A summer episode would miss many important source sectors such as prescribed burns and open fires.(2) PM 2.5 pollution episodes happen more frequently during the winter season, and there were many elevated PM 2.5 measurements during the selected one-month-long period.(3) Secondary nitrate PM 2.5 is much more abundant during winter and becomes a major portion of PM 2.5 , especially on the west coast.Although oxidation rates are lower during winter, sulfate and secondary organic aerosol (SOA) is still formed, especially in areas that are relatively warm during this period.

Y. Hu et al.: Fine particulate matter source apportionment
To further evaluate source impacts, we also use measurements of 35 elements in PM 2.5 that are collected at the Chemical Speciation Network (CSN) sites (Fig. 1) along with measurements of major PM 2.5 components and total mass (Table S4 in the Supplement).Detection limit and measurement uncertainty were used to screen for measurements that are invalid or below the detection limit (DL).Values below the DL were set to one-half of the detection limit and the uncertainty was set to two-thirds of the DL (Marmur et al., 2006).Organic and elemental carbon measurements were artifact-corrected and converted from thermal optical transmittance (TOT) to thermal optical reflectance (TOR) equivalents using the method (Malm et al., 2011) recommended by US EPA (http://www.epa.gov/ttn/naaqs/standards/pm/data/20120614Frank.pdf;see Note S2 in the Supplement).CMAQ (and other CTMs as well) does not explicitly simulate many elemental species in PM 2.5 .Compared to version 4.5, CMAQ v5.0 has several additional metal species (Appel et al., 2013), but its complete list of explicitly modeled elements -Al, Ca, Fe, Mg, Mn, K, Na, Si, and Ti -still does not cover all the measured elements.One way to derive simulated concentrations for elements that are not explicitly modeled is to utilize the modeled other PM 2.5 concentration by splitting it with source contribution and source specific profile information (to be detailed in Sect.2.2).

CTM source apportionment
Source impacts (and initial and boundary condition impacts) can be estimated using a Taylor series approach (Cohan et al., 2005): where SA CTM i,j is the CTM simulated impact (source apportionment result) of source j (j = 1, . . .J CTM , with J CTM being the total number of sources that are included in the CTM simulation, treating initial and boundary conditions as "sources") on PM 2.5 species i(i = 1, . . .N, with N being the total number of such species) at the receptor; P j,k is either the emission rate of compound k (k =1, . . ., K) (k can be different than i, accounting for species transformations) from source j , i.e., E j,k , or the initial or boundary concentration of compound k; l and L are the same as k and K; c i is species i's concentration; p j,k (p j,l ) is the sensitivity parameter for P j,k (P j,l ); and HOT stands for high-order terms.The total impact of source j on the PM 2.5 concentration using CTM method (SR CTM j ) is found by summing its impact on each species concentration: Note that the above source apportionment approach is a sensitivity method.Sensitivity methods for estimating source impacts have been compared with other approaches such as PSAT (Koo et al., 2009;Burr and Zhang, 2011b).Though none of the methods were found to be perfect, the sensitivity method (with first-order sensitivities) was found to be proficient in determining the impacts of sources that have nonlinear effects among different species, such as motor vehicle emissions that include substantial amounts of multiple pollutants.
Here, for simplicity, we chose to ignore the higher-order terms (see Note S3 in the Supplement) and only used the first-order terms for source impact estimation: (1) i,j,k is the semi-normalized first-order sensitivity of species i's concentration to emission rate (or initial and boundary conditions) of compound k from source j , while S (1) i,j is the similar first-order sensitivity to the emissions of all compounds from source j , which is defined as the response of species i's concentration c i to perturbations in a sensitivity parameter p j (a model parameter or input such as an emission rate, initial condition, or boundary condition) by scaling the local sensitivities (∂c i /∂p j ) by P j (the unperturbed or "base case" value of the sensitivity parameter).The notations for time and space dependencies are dropped for simplicity.S (1) i,j is computed by CMAQ using the decoupled direct method (DDM) (Dunker, 1981(Dunker, , 1984) ) applied to three-dimensional air quality models (Cohan et al., 2005;Dunker et al., 2002;Hakami et al., 2004;Yang et al.,1997) and extended to include the ability to follow PM 2.5 (called DDM-3D/PM hereafter) (Boylan et al., 2002(Boylan et al., , 2006;;Koo et al., 2009;Napelenok et al., 2006).
Since first-order DDM-3D/PM sensitivities best approximate a small perturbation, we group the total emissions into 33 integrated source categories (a simple description of the source categories are in Table 1 and further detailed grouping information using SCC can be found in Table S2 in the Supplement).Most of the categories have a small portion of emissions compared to the total.We computed DDM-3D/PM first-order sensitivity coefficients for each source except SEASALT, as well as boundary and initial conditions for which the sensitivity parameters are defined as the summation of all species.The sensitivity coefficients of boundary and initial conditions were found to be minimal and therefore ignored in our source impact calculations.For SEASALT we directly used the simulated concentrations of Na + and Cl − from sea salt emissions in the model as sensitivities of Na + or Cl − to SEASALT emissions.Sensitivities of other species (including other elements, ions, and total mass of PM 2.5 ) to SEASALT emissions were derived by applying the composition profile (Table S5 in the Supplement) for each species relative to the Na + sensitivities.For the other 32 sources, element (metals and minerals) sensitivity coefficients that are not explicitly simulated by CMAQ are derived by applying composition profiles (Table S5 in the Supplement) for those elements relative to the modeled, source specific, other PM 2.5 sensitivities, respectively.Similarly, we also derived these elements' simulated concentrations from the concentration of other PM 2.5 .The source composition profiles of all the 33 categories are assembled from the 86 profiles examined in Reff et al. (2009) by emissions-weighted averaging of corresponding member profiles (determined by SCC groupings).
The result of Eq. ( 3) can be compared with the CMB method, which is based on apportioning each species proportional to the relative amount of that species in the PM 2.5 emissions from a source: where f i,j = E j,i E j represents the original source profile used by CMB, i.e., the emission fraction of species i(E j,i ) of the total PM 2.5 (E j ) emitted from source j (j =1, . . .J CMB , with J CMB being the total number of emission sources that the CMB approach considers; source j here can be different than the sources CTM includes) and SR CMB j is the CMBcalculated impact of source j on total PM 2.5 concentration.One can extend the definition of f i,j for CTMs using Eq. ( 5) that includes the source impacts on condensed secondary pollutants in the analysis.Hence, an effective f * i,j is found as (5) Equation ( 5) reveals that when there are no emissions of PM 2.5 component i from source j , f * i,j can still be nonzero, as the source could still contribute to secondary production of PM 2.5 .

CTM-CMB hybrid source apportionment approach
At monitoring locations, on days with sufficient PM 2.5 composition measurements available, the following species balance equations can be built for a CMB solution: where c obs i is the measured concentration for the ith PM 2.5 species, and e CMB i is the concentration prediction error to be minimized.CMB solves the species balance equations to calculate a set of SR CMB j using fixed source profiles f i,j (with uncertainties) that minimizes the weighted squared error in the simulated concentrations (Watson, 1984).
Likewise, similar species balance equations can be built at the same receptors using the initial source apportionments from CMAQ DDM-3D/PM results as follows: The extension to using CTM results is shown in the second through fourth equalities, where e CTM i is the prediction error of CTM for the ith PM 2.5 species.This equation is applied at specific receptor locations and times.Note that here we only used the first-order DDM-3D/PM results for approximating SA CTM i,j ; however, for more accurate estimates of SA CTM i,j , one can include higher-order sensitivity (e.g., Zhang et al., 2012) results as well if they are available and the source is large.Also, the formulation in Eq. ( 7) (and following equations) allows SA CTM i,j to be source impact estimates from any other methods, including PSAT, TSSA, ISAM, and other sensitivity-base methods.
Utilizing Eq. ( 7) we can evaluate the initial source apportionment results for a measurement at a receptor by calculating the square prediction error as where σ C obs i is the uncertainty in the measured concentration of species i obtained from the CSN measurement uncertainty.
Equation ( 8) also sheds light on an opportunity to further minimize the CTM's prediction error in a least-squares solution that mimics the CMB method.This leads to a new method of conducting source apportionment in an SM-RM hybrid approach.One way to achieve this is to calculate a new set of SR CTM j using the extended f * i,j that minimizes the weighted squared error in the simulated concentrations as follows: While this approach is similar to CMB, it accounts for secondary contributions and other atmospheric processing using the extended f * i,j .If Eq. ( 9) alone were used to develop revised source impacts, it would not fully take into account the information provided by the CTM about the estimated size and location of various emission sources and their probable impact on pollutant concentrations at a receptor, i.e., the initial source impact estimates As formulated in Eq. ( 9), this information is only used in the calculation of f * i,j , but the magnitudes of the source impacts are lost.Further, collinearity and uniqueness issues, such as different sources sharing similar source profiles, would still impact the solution of the system of equations.
Instead of the above approach, the CMB concept is extended to directly use the initial estimates of SA CTM i,j as well as the initial simulated concentrations c init i from the CTM to refine the estimated source impacts.Defining R j as a scale factor applied to the initial estimate of impact of source j (or initial or boundary conditions), SA refined i,j , the refined CTMsimulated impact of source j on species i is obtained as Here SA init i,j is the initial source impact (SA init i,j is the same as previous SA CTM i,j and is used from now on to distinguish from SA refined i,j ).As such, refinements to source impacts can be found in a similar fashion to traditional CMB approaches by solving for R j to minimize χ 2 , where However, without further constraints R j can be physically unrealistic and would not account for the knowledge provided by the CTM about the source impacts or the uncertainties in emission estimates.Here, additional constraints and a term that penalizes moving away from the initial source im-pact estimates are added to find an optimized R j : where σ SR CTM i is the a priori uncertainty in CTM-derived total sources' impact on the ith species, which is added to give weight for initial source impact estimates for different species and represents model errors.One can estimate σ SR CTM i as proportional to observed concentration σ SR CTM i = δ i * c obs i , with δ i as normalized model errors.The second term of the equation accounts for uncertainties in the CTMderived individual source impacts due to emissions error.σ ln R j is the a priori uncertainty of the natural log of source j 's scale factor.The logarithmic form is used as it has the same value on a relative basis (i.e., a 2-fold overestimate is weighted the same as a 0.5-fold underestimate).This naturally constrains R j to be positive.is introduced to balance the two terms in Eq. ( 12).
The objective function expressed as Eq. ( 12) can be minimized by using various optimization algorithms available for nonlinear optimization problems with constraints.We have tested multiple algorithms, including the algorithm of sequential least-squares quadratic programming (SLSQP) (Kraft, 1988(Kraft, , 1994) ) and L-BFGS, a limited-memory quasi-Newton optimization function (Liu and Nocedal, 1989;Nocedal, 1980).With both the SLSQP and the L-BFGS method one can set lower and upper limits on R j for each individual source.We chose L-BFGS for our demonstration case study.As R j is optimized, the refined estimates of individual source impacts by species at a specific location are then given by Eq. ( 10).The level of remaining error in the refined concentration predictions can be found using Eq.(11).

Application and case study
The hybrid method was applied for January 2004 to calculate PM 2.5 source impact scale factors at 164 CSN monitors for which we had valid speciated PM 2.5 data.By using the valid measurements at each of these CSN sites, the initial source impacts were evaluated through Eq. ( 12) to obtain impact scale factors and refined source impact estimates.The L-BFGS algorithm was used with box constraints that limited R j to be between 0.1 and 10.0 (different sets of limits have been tested, up to the range of between 0.02 and 50.0).Two steps were used to apply L-BFGS to find the final optimized R j .First, an initial choice for was set as J CTM = 41 33 = 1.24 to equally weigh the two terms in the objective function and obtain the initial optimalR j .The choice of was examined using L-curve analysis (Fig. S1 in the Supplement).Then, the initial optimal R j were used to create a new as the value of the first term of the objective function divided by J CTM .The new was chosen to keep the prediction error relatively small but constraining the size of adjustments (Fig. S2 in the Supplement), and was applied to obtain the final optimized R j .Here σ ln R j are determined by considering the daily emission estimates uncertainties for each source (Table S2 in the Supplement) derived from the literature (Hanna et al., 1998(Hanna et al., , 2001(Hanna et al., , 2005)).In general, regulated sectors such as industrial, on-road and nonroad sources have lower uncertainties, non-regulated sectors such as residential related sources, dust and biomass burning have higher uncertainties, and sources with direct measurements (e.g., from CEMs) have the lowest.Because the refinements are applied daily, the uncertainties used account for the day-to-day variability in source strengths.For example, prescribed burning events can be quite variable in time.For traffic, day-specific emissions patterns are used, and so the source strength's variability is smaller.Sources for which direct emissions monitoring is available are assigned the lowest uncertainty.To determine σ SR CTM i , δ i (Table S6 in the Supplement) are chosen as the typical normalized prediction errors of PM 2.5 species as found in regional applications of state-ofthe-art CTM models (Appel et al., 2008;Boylan and Russell, 2006;Simon, et al., 2012;Tesche et al., 2006).Results were found to be not very sensitive to the range of values of σ ln R j and σ SR CTM i tested.We chose six CSN sites, each representing a major US metropolitan area, for close examination of the method and further analysis.These six sites are located in the Atlanta, Chicago, Detroit, Los Angeles, New York, and Pittsburgh areas, representing urban/suburban locations across the country.Additional information for these six sites can be found in Tables S7 (basic site information) and Table S8 (emissions estimates surrounding each site) in the Supplement.For comparison, we also conducted CMB modeling at the At-lanta site using the same measurement data set and collected source apportionment results from the literature of the other five sites.

Impact scale factors and refined concentration predictions
The hybrid method was applied to obtain R j and to further refine the initial source impact estimates.R j less than 1 means that the refined impact is reduced from the original (suggesting that the emissions are biased high or that the CTM is leading to a high bias in the source-receptor relationship), while larger than 1 means that the impact is increased from the initial simulation.The R j values obtained for the 33 sources ranged from 0.1 to 10 and have means between 0.15 and 1, with sources of higher uncertainties having larger standard deviations (Table 2).In general, sources that are commonly considered as having high uncertainties were found to have R j values deviating the most from 1, while those sources considered less uncertain were found to have R j values near 1.This is expected, in part because of the second term in the weighting function.The scale factors are also found to be quite consistent (i.e., in the same directions), in general, for the same source between locations and between days at the same location (Table S9 in the Supplement).Most significantly, R j 's cumulative distribution functions are found to be distinct between sources (Fig. S3 in the Supplement).This is true even between biomass-burning sources, although most of them have a similar composition in emissions (Fig. S3a in the Supplement).Dust, lawn waste burning (LWASTEBURN), and woodstove impacts (and other biomass-burning sources as well, although to a lesser extent) are found to be biased high (R j values typically ∼ 0.15).This is consistent with findings of prior studies (Baek, 2009;Chow et al., 2007;Tian et al., 2009) that emission rates for these sources were overestimated.Also, prescribed burning impacts are found to be biased low (R j values being close to 10) a small portion of the time due to its high day-to-day variations.Typically, prescribed burning emissions are distributed uniformly over time in the inventories, while in reality burns occur on days with favorable burning conditions.For most other sources (Fig. S3b, c, and d in the Supplement), impact scale factors are typically closer to 1, where most of the R j values are between 0.8 and 1.1, with the exception of metal processing, cooking processes, fuel oil and natural gas combustion, on-road gasoline vehicle, and other sources.These six sources have more diverse R j values among locations and/or between days.An indication of the magnitude of the refinements can be found by comparing the initial and refined individual species concentrations to the observations and can be quantified using the weighted least-squares error (i.e., χ 2 as expressed in Eq. ( 11)).The simulated concentrations are found to be improved substantially compared to the initial simulation after refining source impact estimates for major individual components and for most of the elements (Fig. 2 and Table 3).Note that several elements with very low ambient concentrations (e.g., near the measurement uncertainty) were found to have slightly deteriorated agreement with observations (Table 3).However, results show that the refined χ 2 c,refined (Eq.( 11) with obtained R j ), an overall measure for remaining error, was reduced from the original χ 2 c,init by over 98 % on average (Fig. 3).Because the CTM uses the original source speciation, the overall error will not go to zero unless the source fingerprints were correct.Further, the remaining error, χ 2 c,refined , includes the CTM's other input error such as meteorological bias and/or model limitations, e.g., the uncertainties involved in simulating nitrate or SOA formation.The magnitude of the remaining error itself can be one indicator of the uncertainty of the hybrid results (smaller error indicates more accurate results).

Initial and refined CTM source impacts
Significant day-to-day variations are found in the initial source impact estimates (e.g., Table S10 in the Supplement, as normalized by total source impact), being more pronounced for some sources, such as power plants (i.e., coal combustion) and industrial sources.For example, in Atlanta, power plants (coal combustion) can contribute over 30 % on one day but only about 5 % on other days (primarily as secondary sulfates).In Chicago, metal processing contributes 20 % on some days but less than 10 % on other days.On-road gasoline impact can also vary significantly day to day, such as in Detroit, it varies from ∼ 18 % to ∼ 3 %.Biomass-burning sources such as prescribed burns and agricultural burns contribute significantly on some days in Atlanta, but have virtually zero impact on other days.Refined source impacts changed significantly from the initial CTM estimates for sources with high uncertainties, such as woodstoves and dust, as well as other biomass-burning sources, but changed much less or little for other sources (compare left and right columns in Tables 4 and S10 in the Supplement).Woodstoves and dust were top ranked at all six sites from the initial estimates; however, refinement significantly lowered those sources' impacts (Table 5).The differing adjustments between sources resulted in the rankings of top contributors changing.This indicates that estimates from SM-only methods might result in misleading source apportionment outcomes due to the errors in emissions estimates on a specific day, as well as meteorological field and model parameter errors.For example, Marmur et al. (2006) found that the CMAQ-calculated impact of soil dust at Jefferson Street, Atlanta, GA (and other locations), was high when compared with two CMB estimates.This further supports that SM source apportionment results should be evaluated using measurements.
The hybrid method can separate sources with similar composition, e.g., woodstove and prescribed burns, especially noting the different changes of these two sources between their initial and refined impacts (Table S10 in the Supplement), as well as on-road and non-road diesel vehicles.This is because it starts from integrating estimated emissions from the inventory with source specific spatial and temporal resolutions, instead of starting from only the source composition like RMs do.In addition, with the hybrid method, secondary pollutants are directly apportioned to specific sources.For example, after the hybrid method refinement, livestock impacts advance in rank among top contributors in Midwestern cities: Chicago, Detroit, and Pittsburgh (Table 5), mostly through the secondary formation of ammonium and the associated nitrate from NH 3 emissions.Also, the two most common major contributors across the cities become coal combustion (except Los Angeles, Table 5), mainly due to sulfate formation from SO 2 emissions, and on-road gasoline vehicles, partially due to nitrate and SOA formation from NO x and VOC emissions.

Comparison of refined source impacts with results from RM methods
In order to compare with other source apportionment studies (see Table S11 in the Supplement for comparison with a CTM study's PSAT results), we first reduced the number of sources from 33 to 13 by aggregating the source impacts (Fig. 4 and Table S12 in the Supplement).The 13 aggregated sources are chosen to cover the range of various sources in different locations as identified in prior studies.Sources with similar composition, e.g., various gasoline and diesel vehicular sources, were merged accordingly."AllOthers" included sources typically not resolved in traditional SA studies, e.g., livestock, biogenic and solvents as well as minor combustion and industrial sources.AllOthers (due to its large secondary contribution) as well as gasoline and diesel vehicles are top ranked in all six cities (Fig. 4 and Table S12 in the Supplement).To make hybrid results directly comparable to that of RM methods, we further separated the primary and secondary contributions in the aggregated source impacts and merged the secondary portions correspondingly into ammonium sulfates, ammonium nitrate, and secondary organic carbon (details are discussed in Note S4 in the Supplement).We compared the regrouped hybrid source impacts for a more direct comparison with RM methods conducted at the same location by this or prior studies (Coutant et al., 2003;Gildemeister et al., 2007;Maranche, 2006;Pham et al., 2008;Rizzo and Scheff, 2007) in Table 6.All the RM results were based on CSN measurements, though time periods for other RM results are typically longer than one year (details of RM model applications are found in Note S5 in the Supplement).Due to the different time periods used, we compare the major features such as what sources are being resolved and the relative contributions between certain sources.The hybrid approach resolved extra sources (with the total impacts of extra sources ranging between 20 and 30 % at the six sites) that are typically missing from RM results (Table 6).This is consistent with ∼ 20 % of the emissions that Baek (2009) found were not captured in most RM source apportionment applications.For example, CMB-LGO, an extended CMB approach using the Lipschitz Global Optimizer (LGO) program (Marmur et al., 2005), did not capture the aircraft source impact at the Atlanta site (Balachandran et al., 2012) as the profile is uncertain and similar to diesel combustion.However, measurement (Herndon et al., 2008;Lee et al., 2011) and modeling (Unal et al., 2005) studies have both suggested that the commercial aircraft engine emissions from the Atlanta airport had significant impacts on local air quality including PM 2.5 concentrations.Natural gas combustion and cooking process are two sources usually not resolved by RM methods using CSN data because their identification needs extra measurement information.For instance, CMB with particle-phase organic compounds as tracers using measurements collected at the Jefferson street site has identified that natural gas combustion had a 1.1 % impact on PM 2.5 in Atlanta (Zheng et al., 2002).Subramanian et al. (2007) used CMB with molecular markers and found that the impact of cooking processes range from 1 to 5 % on PM 2.5 concentrations in Pittsburgh.Compared to the hybrid results, primary impact estimates of coal combustion from RM methods are either missing or too low.This is because the trace element   markers for coal combustion, Se and Sr, were not detected consistently in CSN samples due to low signal-to-noise ratios (Chen et al., 2010).Hybrid results estimated that total vehicle impacts (ranging from 14 to 22 %) were comparable to the RM results found at the same urban/suburban locations, with an exception in Chicago (Table S13 in the Supplement).In Chicago, Rizzo and Scheff (2007) also conducted PMF modeling using the same composite data, and their PMF results differ from CMB results, e.g., for biomass burning (5 % vs. 11 %) and vehicle (23 % vs. 31 %) source impacts.The PMF results were closer to the hybrid findings.At three of the four sites where the RM methods separated vehicle impacts between diesel and gasoline, the hybrid results do not agree with the RM methods on the diesel-gasoline split (Table S13 in the Supplement): the hybrid method found higher impacts of diesel vs. gasoline (by a factor of 2.0-2.6), while the RMs found the opposite (0.28-0.49).The ratios of diesel / gasoline emissions surrounding the sites are in the range of 1.7-3.6 (Table S13 in the Supplement).Subramanian et al. (2006)  found, utilizing molecular markers, that diesel impacts in Pittsburgh tend to dominate.The split between diesel and gasoline vehicular impacts at the Minnesota CSN sites from CMB solutions have been found to be inaccurate (Chen et al., 2011) when only regular measurements were used.Chow et al. (2007) suggested that CMB has difficulty making an accurate gasoline-diesel split without organic marker compounds.
Hybrid results tend to find lower secondary contributions than the RM methods, except in Chicago and Pittsburgh for this period (Table S14 in the Supplement; see Table S15 in the Supplement for individual sources' contribution to sulfate and secondary organic carbon (SOC)).While the hybrid and RMs agree well on ammonium sulfates at all six sites (16-37 % vs. 20-38 %, Table 6), the hybrid method estimated lower secondary organic carbon ( 4.8 % vs. 11.7 %) in Atlanta, and they differ the most on secondary nitrate impacts (3-27 % vs. 20-44 %, Table 6).The difficulties in simulating particulate nitrate have been noted previously (Chang et al., 2011).The simulated nitrate from CMAQ tended to be biased low in the base simulation at some locations and times.The hybrid method adjusted the nitrate upwards to better match the observed value, but will not force it to an exact match.This is because the adjustment is limited by the second term on the right-hand side of Eq. ( 12) that penalizes over-adjusting the impact based upon the estimated uncertainty in the emissions (of NO x in this case).Given that estimated emissions of NO x from power plants are viewed as well estimated and emissions from mobile sources are not as uncertain as, for example, dust emissions, this term will limit the adjustment of impacts from those sources more than other sources.Typical RM methods do not have a similar term for secondary contributions; they allow the attribution of species to secondary contributions to match the observations exactly.

Discussion
The hybrid source apportionment method developed and applied here has been demonstrated to be a novel way to improve SM-only CTM results by utilizing observations.It also has advantages over RM methods.First, some limitations of RM methods are addressed (depending upon RM method): (1) the assumption that emissions are inert, with no chemical reactions; (2) a limited number of source categories are considered; (3) potential collinearities between source compositions; (4) inconsistent or unrealistic results because receptor models do not include information on the strength and location of source emissions; and (5) not accounting for physical process such as complex meteorology.Second, the refinement and evaluation of the source impact estimates use measurement data that are independent from those used to develop the initial source impact estimates.Additionally, the hybrid method can be applied to obtain spatial fields of source impacts providing refined hourly spatial fields.
A number of potential uncertainties from the CTM modeling can lead to uncertainties in the estimated impacts from the hybrid approach.The assumption for deriving concentrations and sensitivities for the elements that are not explicitly simulated in the CTM model might not always hold.The missing pathways of secondary organic aerosol formation and inaccurate representation of nitrate formation in the CTM model can lead to underestimation of secondary source impacts.Errors in the meteorology may result in errors in the source fingerprints (f * i,j ) and source-receptor relationships.Errors in the initial emissions inventory, particularly in the spatial and/or temporal variability and in the composition of the emissions, also introduce potential errors, particularly when using the model to temporally interpolate the impact adjustments, i.e., to provide 1 h impact fields after using the 24 h, speciated PM 2.5 measurements.Thus, it is best to consider using results of this approach applied to 24 h averaged fields.
On the other hand, evaluating the hybrid model results on a species basis can help identify errors in the original source profiles.Additionally, including measurements from multiple sites in a region and/or spatially dense satellite retrievals in the process of adjusting emissions can further help stabilize R j .This will provide more accurate refinements and address the possibility of the measurements taken at a single point being overly influenced by local sources.In this direction, the hybrid source results can be more accurate representations of the pollutant levels spatially because they integrate estimates of the spatial distribution of emissions and the local chemical and physical atmospheric processes.
The Supplement related to this article is available online at doi:10.5194/acp-14-5415-2014-supplement.

Figure 1 .
Figure 1.Modeling domain and monitoring sites used.

Table 2 .
Calculated source impact scale factors (R j ) across 164 CSN sites, January 2004: mean and standard deviation.

Table 4 .
January 2004 average initial and refined absolute (µg m −3 ) and percentage (%) source impacts on PM 2.5 at the six sites.

Table 6a .
Refined source impacts results regrouped to 13 primary sources and compared to results from using RM methods: Atlanta, Chicago, and Detroit.

Table 6b .
Refined source impacts results regrouped to 13 sources and compared to results from using RM methods: Los Angeles, New York, and Pittsburgh.Primary impacts only, secondary portion of the impacts are removed from these sources and merged into the secondary sources: AMSULFT -ammonium sulfate plus ammonium bisulfate; AMNITR -ammonium nitrate; and OTHROC -secondary organic carbon.