Source apportionment of volatile organic compounds in the north-west Indo-Gangetic Plain using positive matrix factorisation model

In this study we undertook quantitative source apportionment for 32 volatile organic compounds (VOCs) measured at a suburban site in the densely populated North-West Indo-Gangetic Plain using the US EPA PMF 5.0 Model. Six sources were resolved by the PMF model namely “biofuel use and waste disposal”, “wheat-residue burning”, “industrial emissions and solvent use”, “cars”, “two-wheelers” and “mixed daytime sources”. The biofuel and waste disposal, wheat residue burning, industrial emissions and solvent use, combined traffic sources, mixed daytime sources accounted for 23.2 %, 22.4 %, 11.8 %, 5 25.1 %, and 15.7 % of the total VOC mass concentration respectively; 18.1 %, 32.4 %, 7.3 %, 21.9 %, and 20.3 % of the total O3 formation potential respectively; and 14.9 %, 13.9 %, 10.1 %, 59.0 %, and 2.2 % of the SOA formation potential, respectively. Further the factors contributed 24.6 %, 8.5 %, 20.1 %, 46.8 %, and 0 %, respectively, to the human class I carcinogen benzene and 18.4 %, 25.4 %, 5.9 %, 13.3 %, and 36.9 %, respectively, to the toxic emerging contaminant isocyanic acid. Evaluation of emission inventories using the in-situ data derived PMF solution revealed that among EDGARv4.2, REASv2.1 and 10 GAINSv5.0, the GAINSv5.0 emission inventory for year 2010, best agreed with the in-situ data derived PMF results for May 2012.

only in primary emissions can be pulled down in the source composition (f kj ) matrix of the photochemistry factor. A detailed discussion of the use of constraints in a receptor model has been provided in previous studies (Paatero et al., 2002(Paatero et al., , 2014Paatero and Hopke, 2009;Norris et al., 2014;Sarkar et al., 2016). Bootstrap model runs (Brown et al., 2015) were performed to assess the model uncertainty. Input parameters for the bootstrap runs constituted random seed, 100 number of bootstraps and default values for block size (10) and minimum correlation R-value (0.6) and there were no unmapped factors. Except for 5 the car and two-wheeler factor (R=0.6) for which a certain degree of co-linearity is expected, none of the other factors showed cross correlation with each other (R<0.3) and the g-space plot even of this factor pair is well filled. The constraint mode was unable to force the PMF model to separate the wheat residue burning factor in a 5-factor solution without imposing a split between the car and 2-wheeler factor, indicating that these two indeed represent distinct source profiles. 10 We perform a conditional probability function analysis (Leuchner and Rappenglück, 2010) by calculating the probability of observing mass concentrations above the 75 th percentile of a given factor contribution for every wind direction. This aids in identifying physical locations of different PMF source factors without using back trajectories (Xie and Berkowitz, 2006).

Calculation of the ozone formation potential and SOA formation potential
Ozone production potential for each of the PMF derived source factors was calculated based on the method used by Sinha 15 and co-workers (Sinha et al., 2012) as described in the supplementary text in greater detail. Secondary organic aerosol (SOA) potential was calculated for the PMF source factors using the literature SOA yields (Derwent et al., 2010) as described in the supplementary text. 20 Six source factors were resolved by the PMF model. These were identified as "biofuel use and waste disposal", "wheat-residue burning", "four-wheelers", "two-wheelers", "industrial emissions and solvent use" and "mixed daytime sources", respectively. Factor profiles were cross-correlated with the fingerprints of source samples collected from a number of potential sources including wheat residue fires (Chandra et al., 2017;Kumar et al., 2018), a busy traffic junction (Chandra et al., 2017), tail-pipes of various vehicles (this study), waste burning (Sharma et al., Under Review), leaf litter burning (this study) and domestic 25 biofuel use (Stockwell et al., 2016) to identify the sources. Figure 3 shows the factor profiles obtained from the PMF run (in dark blue), the percentage of each species explained by the respective PMF factor (red squares) and the source profiles of those sources which best matched the factor profile (in various colors as indicated in the legend). The identification of the factors is further supported by independent tracers such as the criteria air pollutants (N O y , CO, SO 2 , O 3 ) and MODIS (Moderate Resolution Imaging Spectroradiometer) fire counts as discussed in detail below. 3.2 Factor 1 -Biofuel use & waste disposal Figure 4 shows that biofuel use & waste disposal contributes 23.2 %, 18.1 % and 14.9 % of the total VOC mass, ozone formation potential and SOA formation potential, respectively. The factor profile correlates most strongly with the measured VOC source speciation profiles of domestic cooking (R=0.8), leaf-litter burning (R=0.7) and smoldering garbage fires (R=0.6).

Identification of PMF factors
As discussed previously for other South Asian atmospheric environments (Sarkar et al., 2017), the source contributions of 5 domestic biofuel use and domestic waste burning are difficult to segregate due to the high spatio-temporal overlap of the two activities. As can be seen in Figure 5, the factor shows a weak bimodal behaviour with an early morning and late evening peak as both domestic biofuel use and waste disposal fires peak in the early morning and in the evening hours (Nagpure et al., 2015).
CO serves as the best independent tracer ( Figure 5) indicating that this factor represents a low temperature combustion with a low combustion efficiency. Figure 5 shows that highest conditional probability for this factor is from the N (>0.4), the direction   (Wang et al., 2009;Paulot et al., 2011;Stockwell et al., 2016) open waste burning (Sharma et al., Under Review) and PMF factors results of residential biofuel use and waste disposal factor in Kathmandu, Nepal (Sarkar et al., 2017). It 20 should be noted, that this factor is responsible for approximately 25 % of the total benzene emissions. Since benzene is an identified Group-1 carcinogen (IARC, 1987) and emissions occur within the household itself (domestic cooking) or within close proximity of the house (waste disposal) this factor deserves special attention in programs targeted at emission reductions.
Direct emission of isocyanic acid, a highly toxic emerging contaminant and its photochemical precursors (Alkyl amines and Amides) was observed from this source and explained 18 % of the isocyanic acid mass concentration and 7-15 % of all the 25 alkyl amines and amides in the PMF model, respectively.

Factor 2 -Wheat Residue burning
Wheat residue burning takes place every year in the NW-IGP in the post-harvest season and generally peaks in the month of May. It has been shown that wheat residue burning has a major impact on both ozone mixing ratios (Kumar et al., 2016) and VOC mixing ratios and hydroxyl radical reactivity (Kumar et al., 2018), resulting in a large suite of unknown (∼ 40 %) and 30 poorly quantified reactive gaseous emissions. Figure 4 shows that wheat residue burning, contributes 22.4 % of the total VOC mass concentration, 32.4 % of the total ozone formation potential and 13.9 % of the total SOA formation potential. Figure 3 shows that the factor profile correlates most strongly with flaming wheat residue burning (R=0.9) and Figure  the best independent tracer for the average contribution of wheat residue burning to the total NMVOC mass are the daily fire counts with a cross correlation of R=0.4 and a lag of 2 days. Since wheat residue burning is an area source and emissions are transported to the receptor site from a large fetch region and often with a significant lag time, there is no strong conditional probability for enhancements from any specific wind direction. demonstrate that more than 55 % of the hydroxyacetone, 37 % of the acetic acid, 32 % of the total methyl ethyl ketone and 28-39 % of the amides/amines as well as 28 % of the isocyanic acid mass in the model can be explained by this factor. This makes wheat residue burning the largest contributor to the human exposure to isocyanic acid in the month of May both through direct emissions of isocyanic acid and by virtue of being the largest source for its photochemical precursors.  Figure 5 shows that the factor contribution of the industrial emissions and solvent use factor correlated with the SO 2 time series (R= 0.6), indicates that the emissions of coal or biofuel burning in industrial units and/or coal fired power plants may also be contributing to this factor profile. Figure 5 shows that the highest conditional probability 20 of this factor is to the South East direction (120 • -150 • wind sector). The receptor site is downwind of a 600 MW coal fired power plant located in Jagadhri (80 km SE) as well as downwind of several industrial areas and brick kiln clusters located around Dera Bassi (15 km), Lalru (20 km) and Jagadhari (80 km) when the wind blows from this direction. In the Kathmandu valley, biofuel co-fired brick kilns explained a significant fraction of the benzene and acetonitrile mass (Sarkar et al., 2017) and hence a combustion contribution from brick kilns to the factor profile cannot be ruled out. The diel profile 25 broadly reflects boundary layer dynamics with factor contributions increasing continuously throughout the night indicating a buildup of constant emissions in the nocturnal boundary layer. Factor contributions peak in the early morning 17-26 µg m −3 between 5-9 am local time and the factor contribution of this factor decreases from 9 am onwards after the breakup of the nocturnal boundary layer. This factor has higher average than the median factor contributions at night due to strong plumes (max ∼ 200 µg m −3 ) reaching the receptor when it is downwind of the industrial sector but not during other nights when the 30 wind direction is from rural Punjab (NW) or the urban sector (NE).
2-wheelers are observed. As can be seen from Figure 6, the two traffic factors jointly explain 47 %, 80 %, 70 % and 67 % of the total benzene, toluene, C-8 and C-9 aromatic compounds in the model consistent with findings from the Kathmandu valley that traffic, not residential biofuel use and waste disposal is the more important source of aromatic compounds in South Asia. It is also clear that despite stringent regulations, the transport sector in the region is still the largest contributor to human benzene exposure.  Mohali. The two traffic factors combined together were found to be the strongest contributors to the total VOC mass concentration (25.1 %) followed by biofuel use and waste disposal factor (23.2 %), wheat-residue burning (22.4 %), the mixed daytime factor (15.7 %) and industrial emissions (11.8 %), with the residual not apportioned VOC mass only amounting to 1.7 % of the total. Early source receptor modelling studies from India attributed a slightly larger share 26-58 % of the total VOC mass to 25 traffic related emissions (Srivastava, 2004;Srivastava et al., 2005), suggesting that the progression to the emission norms Bharat   15 However, Figure S5 shows that despite providing monthly data, the REAS emission inventory has very little seasonality for any of the sources. Overall it appears that GAINS, the emission inventory with the lowest absolute emissions from residential and commercial biofuel use shows the best agreement with our PMF solution. Our PMF solution suggests that transport sector emissions are underestimated by approximately a factor of 1.5 in GAINS, while the combined effect of residential biofuel use and waste disposal emissions as well as the VOC burden associated with solvent use may be overestimated by a factor of 1.3 in the same emission inventory. Similar results have been reported previously. Sarkar and co-workers (Sarkar et al., 2017) reported REAS and EDGAR overestimated residential bio fuel usage emissions even more than GAINS. EDGAR underestimated transport sector emissions and industrial emissions and solvent usage while REAS overestimates the importance of the same 10 two sources. REAS also fails to include agricultural residue burning as a source.
Our results highlight that for accurate air quality forecasting and modelling it is essential that emissions are attributed only to the months in which the activity actually occurs. This is important both for emissions from crop residue burning (which occur in May and from Mid-October to the end of November) and emissions from wildfires (which are restricted to the dry season and peak in April and May). Annually averaged emissions are unlikely to yield accurate air quality forecast in regions affected 15 by such seasonal events. At present, more specialized fire emission inventories such as FINN (Wiedinmyer et al., 2011) must be used to account for the full seasonality and day to day variations of open burning emissions. We also demonstrate, that the source profiles obtained as PMF output can be validated and matched against samples collected at the potential sources to validate the factor identification.
We find that the GAINSv5.0 emission inventory for the year 2010 agreed best with the in-situ data derived PMF solution for 20 May 2012.

Conclusions
Six VOC emission sources were extracted via PMF simulations from the dataset comprising of 32 VOC species measured online at primary temporal resolution of 1 minute at a sub-urban site in Mohali in the summer of 2012. US EPA PMF 5.0 Model was used for source apportionment of VOCs and PMF-resolved factors included traffic exhaust, biofuel use and waste 25 disposal, wheat-residue burning and mixed daytime sources (comprising of biogenic emissions and photochemical formation), industrial emissions and solvent use, which along with the residuals,accounted for 25.1 %, 23.2 %, 22.4 %, 15.7 %, 11.8 % and 1.7 %, respectively, of the total VOC mass concentration. For the human class I carcinogen benzene, the traffic factor alone contributed to 47 % of the total benzene mass at this receptor site followed by residential biofuel use and waste disposal (25 %) and industrial emissions and solvent use (20 %). Since the annual NAAQS for benzene is exceeded at this receptor exceeds concentrations that can, after dissociation at blood pH, result in blood cyanate ion concentrations (Roberts et al., 2011) high enough to produce significant health effects in humans (Wang et al., 2007) such as atherosclerosis, cataracts and rheumatoid arthritis due to protein damage. Peak mixing ratios of this compound exceed 3 ppb in some night time wheat residue burning plumes. Wheat residue burning was also the single largest source of the photochemical precursors of isocyanic acid, namely, formamide, acetamide and propanamide, indicating that this source must be most urgently targeted to reduce 5 human concentration exposure to isocyanic acid. Our results highlight that for accurate air quality forecasting and modelling it is essential that emissions that are both large in terms of their absolute contribution and display a significant seasonality in their occurrence are attributed only to the months in which the activity actually occurs. This is important both for emissions from crop residue burning (which occur in May and from Mid-October to the end of November) and emissions from wildfires (which are restricted to the dry season and peak in April and May). Annually averaged emissions are unlikely to yield accurate 10 air quality forecast in regions affected by such seasonal events. We find that the GAINSv5.0 emission inventory for the year 2010 was best agreed with the in-situ data derived PMF solution for May 2012, as long as crop residue burning emissions were attributed to 2.5 months of the year only, and emissions from domestic biofuel use and solvent use were scaled down by a factor of 1.3 and transport sector emissions were scaled up by a factor of 1.5. The quantitative source apportionment results reported in this study for benzene, isocyanic acid and ozone and SOA precursors will provide much needed information for 15 targeted mitigation efforts to improve the regional air quality.
Data availability. Data is available from the corresponding author upon request.
Author contributions. Pallavi performed the analysis and wrote the first draft of the paper. Dr. Baerbel Sinha conceived the analysis and revised the paper draft. Dr. Vinayak Sinha collected the data and commented on the paper draft.