Impacts of household sources on air pollution at village and regional scales in India

. Approximately 3 billion people worldwide cook with solid fuels, such as wood, charcoal, and agricultural residues. These fuels, also used for residential heating, are often combusted in inefﬁcient devices, producing carbonaceous emissions. Between 2.6 and 3.8 million premature deaths occur as a result of exposure to ﬁne particulate matter from the resulting household air pollution (Health Effects Institute, 2018a; World Health Organization, 2018). Household air pollution also contributes to ambient air pollution; the magnitude of this contribution is uncertain. Here, we simulate the distribution of the two major health-damaging outdoor air pollutants (PM 2 . 5 and O 3 ) using state-of-the-science emissions databases and atmospheric chemical transport models to estimate the impact of household combustion on ambient air quality in India. The present study focuses on New Delhi and the SOMAARTH Demographic, Development, and Environmental Surveillance Site (DDESS) in the Palwal District of Haryana, located about 80 km south of New Delhi. The DDESS covers an approximate population of 200 000 within 52 villages. The emissions inventory used in the present study was prepared based on a national inventory in India (Sharma et al., 2015, 2016), an updated residential sector inventory prepared at the University of Illinois, updated cookstove emissions factors from Fleming et al. (2018b), and PM 2 . 5 speciation from cooking ﬁres from Jayarathne et al. (2018). Simulation of regional air quality was carried out using the US Environmental Protection Agency Community Multiscale Air Quality modeling system (CMAQ) in conjunction with the Weather Research and Forecasting modeling system (WRF) to simulate the meteorological inputs for CMAQ, and the global chemical transport model GEOS-Chem to generate concentrations on the boundary of

Abstract. Approximately 3 billion people worldwide cook with solid fuels, such as wood, charcoal, and agricultural residues. These fuels, also used for residential heating, are often combusted in inefficient devices, producing carbonaceous emissions. Between 2.6 and 3.8 million premature deaths occur as a result of exposure to fine particulate matter from the resulting household air pollution (Health Effects Institute, 2018a;World Health Organization, 2018). Household air pollution also contributes to ambient air pollution; the magnitude of this contribution is uncertain. Here, we simulate the distribution of the two major health-damaging outdoor air pollutants (PM 2.5 and O 3 ) using state-of-thescience emissions databases and atmospheric chemical transport models to estimate the impact of household combustion on ambient air quality in India. The present study focuses on New Delhi and the SOMAARTH Demographic, Development, and Environmental Surveillance Site (DDESS) in the Palwal District of Haryana, located about 80 km south of New Delhi. The DDESS covers an approximate population of 200 000 within 52 villages. The emissions inventory used in the present study was prepared based on a national inventory in India (Sharma et al., 2015(Sharma et al., , 2016, an updated residential sector inventory prepared at the University of Illinois, updated cookstove emissions factors from Fleming et al. (2018b), and PM 2.5 speciation from cooking fires from Jayarathne et al. (2018). Simulation of regional air quality was carried out using the US Environmental Protection Agency Community Multiscale Air Quality modeling system (CMAQ) in conjunction with the Weather Research and Forecasting modeling system (WRF) to simulate the meteorological inputs for CMAQ, and the global chemical transport model GEOS-Chem to generate concentrations on the boundary of the computational domain. Comparisons between observed and simulated O 3 and PM 2.5 levels are carried out to assess overall airborne levels and to estimate the contribution of household cooking emissions.
Observed and predicted ozone levels over New Delhi during September 2015, December 2015, and September 2016 routinely exceeded the 8 h Indian standard of 100 µg m −3 , and, on occasion, exceeded 180 µg m −3 . PM 2.5 levels are predicted over the SOMAARTH headquarters (September 2015 andSeptember 2016), Bajada Pahari (a village in the surveillance site; September 2015, December 2015, and September 2016, and New Delhi (September 2015, December 2015, and September 2016. The predicted fractional impact of residential emissions on anthropogenic PM 2.5 levels varies from about 0.27 in SOMAARTH HQ and Bajada Pahari to about 0.10 in New Delhi. The predicted secondary organic portion of PM 2.5 produced by household emissions ranges from 16 % to 80 %. Predicted levels of secondary organic PM 2.5 during the periods studied at the four locations averaged about 30 µg m −3 , representing approximately 30 % and 20 % of total PM 2.5 levels in the rural and urban stations, respectively.

Introduction
Although outdoor air pollution is widely recognized as a health risk, quantitative understanding remains uncertain on the degree to which household combustion contributes to unhealthy air. Recent studies in China, for example, show that 50 %-70 % of black carbon (BC) emissions and 60 %-90 % of organic carbon (OC) emissions can be attributed to residential coal and biomass burning (Cao et al., 2006;Klimont et al., 2009;Lai et al., 2011). Moreover, existing global emissions inventories show a significant contribution of household sources to primary PM 2.5 (particulate matter of diameter less than or equal to 2.5 µm) emissions. The Indo-Gangetic Plain of northern India (23-31 • N, 68-90 • E) has among the world's highest values of PM 2.5 . In this region, the major sources of emissions of primary PM 2.5 and of precursors to secondary PM 2.5 are coal-fired power plants, industries, agricultural biomass burning, transportation, and combustion of biomass fuels for heating and cooking (Reddy and Venkataraman, 2002;Rehman et al., 2011). The southwest monsoon in summer months in India leads to lower pollution levels than in winter months, which are characterized by low wind speeds, shallow boundary layer depths, and high relative humidity (Sen et al., 2017). With the difficulty in determining representative emissions estimates (Jena et al., 2015;Zhong et al., 2016), simulating the extremely high PM 2.5 observations in the Indo-Gangetic Plain has remained a challenge (Schnell et al., 2018).
Approximately 3 billion people worldwide cook with solid fuels, such as wood, charcoal, and agricultural residues (Bonjour et al., 2013;Chafe et al., 2014;Smith et al., 2014;Edwards et al., 2017). Used also for residential heating, such solid fuels are often combusted in inefficient devices, producing BC and OC emissions. Between 2.6 and 3.8 million premature deaths occur as a result of exposure to fine par-ticulate matter from household air pollution (Health Effects Institute, 2018a;World Health Organization, 2018). In India, more than 50 % of households report the use of wood or crop residues, and 8 % report the use of dung as cooking fuel (Klimont et al., 2009;Census of India, 2011;Pant and Harrison, 2012). Residential biomass burning is one of the largest individual contributors to the burden of disease in India, estimated to be responsible for 780 000 premature deaths in 2016 (Indian Council of Medical Research et al., 2017). The recent GBD MAPS Working Group (Health Effects Institute, 2018b) estimated that household emissions in India produce about 24 % of ambient air pollution exposure. Coal combustion, roughly evenly divided between industrial sources and thermal power plants, was estimated by this study to be responsible for 15.3 % of exposure in 2015. Open burning of agricultural crop stubble was estimated annually to be responsible for 6.1 % nationally, although it was higher in some areas.
Traditional biomass cookstoves, with characteristic low combustion efficiencies, produce significant gas-and particle-phase emissions. An early study of household air pollution in India found outdoor total suspended particulate matter (TSP) levels in four Gujarati villages well over 2 mg m −3 during cooking periods (Smith et al., 1983). Secondary organic aerosol (SOA), produced by gas-phase conversion of volatile organic compounds to the particulate phase, is also important in ambient PM levels, yet there is a dearth of model predictions to which data can be compared. Overall, household cooking in India has been estimated by various groups to produce 22 %-50 % of ambient PM 2.5 exposure (Butt et al., 2016;Chafe et al., 2014;Conibear et al., 2018;Health Effects Institute, 2018b;Lelieveld et al., 2015;Silva et al., 2016), andFleming et al. (2018a, b) report characterization of a wide range of particle-phase compounds emitted by cookstoves. In a multi-model evaluation, Pan et al. (2015) concluded that an underestimation of biomass combustion emissions, especially in winter, was the dominant source of model underestimation. Here, we address both primary and secondary organic particulate matter from household burning of biomass for cooking.
Air quality in urban areas in India is determined largely, but not entirely, by anthropogenic fuel combustion. In rural areas, residential combustion of biomass for household uses, such as cooking, also contributes to nonmethane volatile organic carbon (NMVOC) and particulate emissions (Sharma et al., 2015(Sharma et al., , 2018. Average daily PM 2.5 levels frequently exceed the 24 h Indian standard of 60 µg m −3 and can exceed 150 µg m −3 , even in rural areas. The local region on which the present study focuses is the SOMAARTH Demographic, Development, and Environmental Surveillance Site (DDESS) run by the International Clinical Epidemiological Network (INCLEN) in the Palwal District of Haryana (Fig. 1). Located about 80 km south of New Delhi, SO-MAARTH covers an approximate population of 200 000 in 52 villages. Particular focus in the present study is given to the SOMAARTH Headquarters (HQ) and the village of Bajada Pahari within DDESS, coinciding with the work of Fleming et al. (2018b), who studied cookstove nonmethane hydrocarbon (NMHC) emissions and ambient air quality. Demographically, with a coverage of almost 308 km 2 , the DDESS has a mix of populations from different religions and socioeconomic and development statuses.
The climate of the region of interest in the present study is primarily influenced by monsoons, with a dry winter and very wet summer. The rainy season, July through September, is characterized by average temperatures around 30 • C and primarily easterly and southeasterly winds. In a study related to the present one, Schnell et al. (2018) used emission datasets developed for the Coupled Model Intercomparison Project Phases 5 (CMIP5) and 6 (CMIP6) to evaluate the impact on predicted PM 2.5 over northern India, October-March 2015-2016, with special attention paid to the effect of meteorology of the region, including relative humidity, boundary layer depth, strength of the temperature inversion, and low-level wind speed. In that work, nitrate and organic matter (OM) were predicted to be the dominant components of total PM 2.5 over most of northern India.
The goal of the present work is to simulate the distribution of primary and secondary PM 2.5 and O 3 using recently updated emissions databases and atmospheric chemical transport models to obtain estimates of the total impact on ambient air quality attributable to household combustion. With respect to ozone, the present work follows that of Sharma et al. (2016), who simulated regional and urban ozone concentrations in India using a chemical transport model and included a sensitivity analysis to highlight the effect of changing precursor species on O 3 levels. The present work is based on simulating the levels of both O 3 and PM 2.5 at the regional level based on recent emissions inventories using state-ofthe-science atmospheric chemical transport models.

Nonresidential sectors emissions
The present study uses an emissions inventory conglomerated from two primary sources: (1) an India-scale inventory for all nonresidential sectors prepared by TERI (Sharma et al., 2015(Sharma et al., , 2016 and (2) a high-resolution residential sector inventory detailed here. Emissions data from each source were distributed to a 4 km grid for the present study. The TERI national inventory was prepared at a resolution of 36 km × 36 km using the Greenhouse Gas and Air Pollution Interactions and Synergies (GAINS ASIA) emission model (Amann et al., 2011). GAINS ASIA estimated emissions based on energy and nonenergy sources using an emission factor approach after taking into account various fuelsector combinations. Following the approach of Kilmont et al. (2002), the emissions were estimated using the basic equation: E k = l m n A k,l,m ef k,l,m 1 − η l,m,n · X k,l,m,n , where E denotes the pollutant emissions (in kt); k, l, m, and n are region, sector, fuel or activity type, and control technology, respectively; A is the activity rate; ef is the unabated emission factor (kt per unit of activity); η is the removal efficiency (%/100); and X is the application rate of control technology n (%/100) where X = 1. The energy sources considered include coal, natural gas, petroleum products, biomass fuels, and others and are categorized into five sectors -transport, industries, residential, power, and others. The model uses the state-wise energy data and generates emissions of species such as PM, NO x , SO 2 , NMVOCs, NH 3 , and CO. For activity data of source sectors, TERI employed published statistics (mainly population, vehicle registration, energy use, and industrial production) where possible. Energyuse data for industry and power sectors were compiled based on a bottom-up approach, collected from the Ministry of Petroleum and Natural Gas (MoPNG, 2010), the Central Statistics Office (CSO, 2011), and the Central Electricity Authority (CEA, 2011). Transportation activity data were compiled from information on vehicle registrations (Ministry of Road Transport and Highways, 2011), emission standards (MoPNG, 2001), travel demand (CPCB, 2000), and mileage (TERI, 2002). Emission factors for energy-based sources from the GAINS ASIA database were used. Speciation factors are adopted from sector-specific profiles from Wei et al. (2014), primarily developed for China as there is a lack of information for India. In the transportation sector, the Chinese species profiles are dependent on fuel type but not technology.
The TERI inventory was compiled on a yearly basis, with monthly variations for brick kilns and agricultural burning, at a native resolution of 36 km × 36 km then equally distributed to grid resolution of 4 km × 4 km for this study. Emissions for nonresidential sectors have no specified diurnal or daily variations; thus, the inventory for nonresidential sectors is the same for each simulated day. Transportation sector emissions were estimated using population and vehicle fleet data at the district level and distributed to the grid using the administrative boundaries. Industry, power, and oil and gas sector emissions were assigned to the grid by their respective locations. Emissions from agriculture were allocated by croptypes produced by state in India. The inventory was vertically distributed to three layers with the lowest layer extending to 30-43 m, the middle layer to 75-100 m, and the top layer to 170-225 m. Volatile organic compound (VOC) emissions were assumed to occur only in the bottom layer. Industry and power emissions were distributed based on stack heights and allocated to the second and third layers.
We incorporated biogenic emissions by using dailyaveraged emission rates of isoprene (0.8121 moles s −1 ) and terpenes (0.8067 moles s −1 ) per 4 km grid cell, predicted by GEOS-Chem for the region of study. The TERI inventory additionally includes isoprene emissions from the residential sector, so isoprene from natural sources was calculated as the difference of the total rate predicted by GEOS-Chem and the rate of emissions solely from the residential sector. Terpene emissions are assumed to occur only in nonresidential source sectors. Isoprene and terpene emission rates were applied to all computational cells as an hourly average (with no diurnal profile) in the nonresidential inventory.

Residential sector emissions
To examine local and regional impacts of residential sector emissions in greater detail, an update to the TERI inventory was performed using various sources to consider more granular input data specific to the residential sector (Table 1). Bottom-up estimates of delivered energy for cooking, space heating, water heating, and lighting were informed by those used in Pandey et al. (2014) and converted to fuel consumption at the village level using population size and percentage of reported primary cooking and lighting fuels from the 2011 Census of India (2011). Urban areas of the domain were assumed to have the average cooking and lighting fuel use profiles of the average urban areas of their district. Fuel consumption was converted to emission rates using fuel-specific emission factors informed by a review of field and laboratory studies, which was used to update the Speciated Pollutant Emissions Wizard (SPEW) inventory (Bond et al., 2004) and to generate summary estimates by fuel type.
Hourly emissions were generated using source-specific diurnal emissions profiles (Fig. 2). The same diurnal emissions profile is applied to all species from a source category and was informed by real-time emissions measurements taken in homes during cooking reported by Fleming et al. (2018a, b). Profiles for fuel-based lighting were informed by real-time measurements of kerosene lamp usage data reported in Lam et al. (2018). The residential sector inventory represents surface emissions with a native spatial resolution of 30 arcsec (∼ 1 km).
In deriving summary estimates of emission factors, priority was given to emission factor measurements from fieldbased studies. Several studies have shown that laboratorybased measurements of stove and lighting emissions tend to be lower than those of devices measured in actual homes (Roden et al., 2009), perhaps due to higher variation in fuel quality and operator behavior. Field-based emission factors utilized in this study include those for nonmethane hydrocarbons, measured from fuels and stoves within the study domain (Fleming et al., 2018a, b). PM 2.5 speciation from cooking fires was informed by Jayarathne et al. (2018) (Tables 2 and 3). Residential emission rates for PM 2.5 , BC, OC, CO, NO x , CH 4 , CO 2 , and NMHCs were generated from SPEW, which estimates emissions from combustion by fuel type. As such, solvent emissions are not included for lack of specific input data. Additionally, while SPEW incorporates temperature-dependent heating combustion activity, the inventory assumes temperatures too high for this activity to  Figure 2. Fraction of daily household emissions by quantifiable fuel-use activity. Red, green, blue, and purple indicate cooking, space heating, water heating, and lighting, respectively. This represents the fraction of activity-specific daily emissions at each hour. Each species obeys the same profile. While profiles for heating are shown, the inventory assumes temperatures too high for this activity to take effect. take effect. Thus, our inventory has no emissions from heating.
We employed various methods to account for pollutant species not explicitly reported by SPEW (Tables 1 and 2). Gas-phase SO 2 and NH 3 emissions were informed by existing residential emissions in the TERI inventory (Sharma et al., 2015); NO and NO 2 were estimated from NO x emissions, assuming a NO : NO 2 emission ratio of 10 : 1. Total NMHC and PM 2.5 emission factors from SPEW are distributed by fuel type (wood, dung, agriculture residue, or LPG) ( Table 2). Given the low PM 2.5 emission rate of LPG (Shen et al., 2018), emissions from LPG are assumed to be negligible. To further speciate NMHCs, we employed HC species-specific emission factors (Fleming et al., 2018b), differentiated by fuel and stove type (i.e., traditional stove, or chulha, with wood or dung, and simmering stove, or angithi, with dung). We assume that all NMHC emissions in each computational grid cell are produced by either wood or dung, whichever contributes the greater fraction of total PM 2.5 emissions in that cell (Fig. 3). The NMHC emission profile of dung was assumed to be the average of measurements from chulha and angithi stoves. The emission profile for agricultural residue is similar to that of wood; therefore, wood speciation profiles are applied in cells where agricultural residue dominates.
Particle-phase speciation of total PM 2.5 was based on PM mass emissions from wood-and wood-dung-fueled cooking fires as reported by Jayarathne et al. (2018), and primary cooking fuel type distribution data from the 2011 census (Tables 2 and 3). A single PM 2.5 speciation profile, defined as the average of that of wood and that of the wooddung mixture, was applied in all cells for lack of information on pure dung emissions (Table 3). Noncarbon organic particulate matter (P NCOM ) and particulate water (P H 2 O ) were Red indicates cells where dung use dominated emissions and thus was assumed to be the sole fuel type used. Orange indicates cells where wood and agricultural residue use dominated emissions and was thus assumed to be the sole fuel type used. assumed to be negligible owing to a lack of information on these species. Emissions of remaining particle-phase species (i.e., Al, Ca, Fe, Mg, Mn, Si, and Ti) were also assumed to be negligible for lack of information. Unspeciated fine particulate matter (PM othr ) is defined in CMAQ as the portion of total PM 2.5 unassigned to any other species: PM othr = PM 2.5 − P EC + P OC + P Na + P NH 4 + P K (2) +P Cl + P NO 3 + P SO 4 Tables 4 and 5 summarize emission rates for the study domain.

Atmospheric modeling
To study the impact of household emissions on ambient air pollution, we simulated two emission scenarios each for three time periods which coincide with available INCLEN observation data (Tables 6 and 7). A "total" emission scenario represents the overall atmospheric environment by including emissions from all source sectors in the inventory. A "nonresidential" emission scenario represents zeroing-out or "turning-off" of all household emissions. By considering these scenarios independently, we can isolate the effect of the residential sector on the ambient atmosphere. Each scenario was simulated over a region in northern India ( Fig. 1) for those periods when measurements were carried out in the region of interest. Figure 1 shows the 600 km by 600 km domain with 4 km grid resolution. The domain is centered over the Palwal District and the SOMAARTH DDESS and includes New Delhi and portions of surrounding states.  (Jayarathne et al., 2018) Wood, wooddung mix Average profile of wood and wooddung mix applied to all fuel type emissions.
Speciated HCs (Fleming et al., 2018a, b) Wood, dung One profile applied to each cell according to which fuel type dominates emissions in that cell.
Where agricultural residue dominates, wood profile is assumed. Simulation of regional air quality was carried out using the US Environmental Protection Agency Community Multiscale Air Quality modeling system (CMAQ), version 5.2 (Appel et al., 2017;US EPA, 2017). CMAQ is a threedimensional chemical transport model (CTM) that predicts the dynamic concentrations of airborne species. CMAQ includes modules of radiative processes, aerosol microphysics, cloud processes, wet and dry deposition, and atmospheric transport. Required input to the model includes emissions inventories, initial and boundary conditions, and meteorological fields. The domain-specific, gridded emissions inventory provides hourly-resolved total emission rates for each species (not differentiated by source) by cell, time step, and vertical layer. Initial conditions (ICs) and boundary conditions (BCs) are necessary to define the atmospheric chemical concentrations in the domain at the first time step and at the domain edges, respectively. The present study uses the global chemi-  (Skamarock et al., 2008).
Simulations were run for 1 year, after which hourly time series diagnostics were compiled for the CMAQ modeling period. Using the PseudoNetCDF processor, we remapped a subset of the 616 GEOS-Chem-produced species to CMAQ species (https://github.com/barronh/pseudonetcdf, last access: 28 May 2019). The resulting ICs and BCs include 119 gas-and particle-phase species, 80 adapted from GEOS-Chem and the remaining 39 (including OH, HO 2 , ROOH, oligomerized secondary aerosols, coarse aerosol, and aerosol number concentration distributions) from the CMAQ default initial and boundary conditions data (which were developed to represent typical clean-air pollutant concentrations in the United States).

Community Multiscale Air Quality (CMAQ) modeling system
Within the chemical transport portion of CMAQ, there are two primary components: a gas-phase chemistry module and an aerosol chemistry, gas-to-particle conversion module. The present study employs a CMAQ-adapted gas-phase chemical mechanism, CB6R3 (derived from the Carbon Bond Mechanism 06) (Yarwood et al., 2010), and the aerosolphase mechanism, AERO6, which define the gas-phase and aerosol-phase chemical resolution. The present study considers 70 NMHC compounds lumped into 12 groups of VOCs. The emissions inventory provides emission rates for 28 chemical species, including 18 gas-phase species and 10 particle-phase species. The CB6R3 adaptation describes atmospheric oxidant chemistry with 127 gas-phase species and 220 gas-phase reactions, including chlorine and heterogenous reactions. The CMAQ aerosol module (AERO6) describes aerosol chemistry and gas-to-particle conversion with 12 traditional SOA precursor classes, and 10 semivolatile primary organic aerosol (POA) precursor reactions. The majority of the gas-phase organic species are apportioned to lumped groups by their carbon bond characteristics, such as single bonds, double bonds, ring structure, and number of carbons. Some organic compounds are apportioned based on reactivity, and others, like isoprene, ethene, and formaldehyde, are treated explicitly. The secondary organic aerosol module, AERO6, developed specifically for CMAQ, interfaces with the gas-phase mechanism, predicts microphysical processes of emission, condensation, evaporation, coagulation, new particle formation, and chemistry, and produces a particle size distribution comprising the sum of the Aitken, Accumulation, and Coarse log-normal modes (Fig. 4). AERO6 predicts the formation of SOA from anthropogenic and biogenic VOC precursors (properties of which are shown in Table 8), as well as semivolatile POA and cloud processes. CB6R3 accounts for the oxidation of the first-generation products of the anthropogenic lumped VOCs: high-yield aromatics, low-yield aromatics, benzene, PAHs, and long-chain alkanes (Pye and Pouliot, 2012).
In addition to SOA formation from traditional precursors, CMAQv5.2 accounts for the semivolatile partitioning and gas-phase aging of POA using the volatility basis set (VBS) framework independently from the rest of AERO6 (Murphy et al., 2017). The module distributes directly emitted POA (as the sum of primary organic carbon, POC, and noncarbon organic matter, NCOM) from the emissions inventory input into five new emitted species grouped by volatility: LVPO1, SVPO1, SVPO2, SVPO3, and IVPO1 (where LV is low volatility, SV is semivolatile, IV is intermediate volatility, and PO is primary organic). POA is apportioned to these lumped vapor species using an emission fraction and is oxidized in CB6R3 by OH to LVOO1, LVOO2, SVOO1, SVOO2, and SVOO3 (where OO denotes oxidized organics) with stoichiometric coefficients derived from the 2D-VBS model. AERO6 then partitions the semivolatile primary organics and their oxidation products to the aerosol phase (Fig. 4). Thus, the treatment of POA as semivolatile products leads to an additional twenty species, a particle-and vapor-phase component for each primary organic and oxidation product (Murphy et al., 2017).
Emissions inventory modifications were required to match the most recent aerosol module, AERO6, in the CMAQ model. Initially, the lumped emissions of PAR (a lumped VOC group characterized by alkanes) and XYL (a lumped VOC group characterized by xylene) derived from grouping Blue indicates species and processes predicted by CB6R3. All other coloring indicates the AERO6 mechanism where green arrows are two-product volatility distribution, orange arrows are particle-and vapor-phase partitioning, and purple arrows are oligomerization. In AERO6, anthropogenic and biogenic VOC emissions (lumped by category) are oxidized by OH, NO, and HO 2 and OH, O 3 , NO, and NO 3 respectively, to semivolatile products that undergo partitioning to the particle phase (Pye et al., 2015). Semivolatile primary organic pathways in CMAQv5.2 are described by Murphy et al. (2017). The semivolatile reaction products of "long alkanes" (SV_ALK1 and SV_ALK2) are parameterized by Presto et al. (2010). Values for "low-yield aromatics" products (SV_XYL1 and SV_XYL2) are based on xylene, with the enthalpy of vaporization ( H vap ) from studies of m-xylene and 1,3,5-trimethylbenzene. H vap for products of "high-yield aromatics" (SV_TOL1 and SV_TOL2) are based on the higher end of the range for toluene. The products of benzene (SV_BNZ1 and SV_BNZ2) assume the same value for H vap . All semivolatile aromatic products are assigned stoichiometric yield (α) and effective saturation concentration (C * ) values from laboratory measurements by Ng et al. (2007). Remaining parameters for PAH reaction products (SV_PAH1 and SV_PAH2) are taken from Chan et al. (2009). Properties of semivolatile primary organic aerosol precursors are given in Murphy et al. (2017).   Here the yellow lines correspond to CMAQ predictions of the "total" (solid) and "nonresidential" (dotted) simulations. The solid black line represents ambient observations. Standard deviations of the diurnal profiles for observations and predictions are indicated, respectively, by colored shading. Diurnal profiles were averaged over simulation durations (Table 7). Computations were carried out at 4 km resolution.

Surface observational data
Gas-phase air quality data analyzed in the present study come from the Central Pollution Control Board (CPCB) of the Ministry of Environment, Forest & Climate Change of the Government of India at two sites in New Delhi (one in the west, and one in the south) (CPCB, 2019). The particlephase data analyzed come from the SOMAARTH Demographic, Development, and Environmental Surveillance Site (Mukhopadhyay et al., 2012;Pillarisetti et al., 2014;Balakrishnan et al., 2015) managed by INCLEN. Palwal District has a population of ∼ 1 million over an area of 1400 km 2 . In this district, ∼ 39 % of households utilize wood burning as their primary cooking fuel, with dung (∼ 25 %) and crop residues (∼ 7 %) (Census of India, 2011). The specific sites studied are the SOMAARTH HQ in Aurangabad (15 km south of Palwal) and the village of Bajada Pahari (8 km northwest of SOMAARTH HQ). Ambient measurement sites are shown in Fig. 1, and Table 6 details available data for each location. We used meteorological data (hourly surface temperature and near-surface wind speed and direction) from INCLEN and CPCB at the two rural and two urban sites, respectively, to evaluate the WRF simulations performance.

WRF evaluation
We evaluated WRF-simulated meteorology against the available surface observations at different sites during the same periods. Figure 5 shows that there is generally good agreement in surface temperature between WRF and observations for all three months. The surface wind direction is found to be consistent between model and observations for each site and each month ( Table 9). The simulated near-surface wind speeds are overestimated in WRF, with an averaged mean bias (MB) of about +1.5 m s −1 . Such a bias is partly a result Here the green lines correspond to CMAQ predictions of the "total" (solid) and "nonresidential" (dotted) simulations. The solid black line represents ambient observations. Standard deviations of the diurnal profiles for observations and predictions are indicated by gray and green colored shading, respectively. Diurnal profiles were averaged over simulation durations (Table 7). Computations were carried out at 4 km resolution.
of the difference in the definition of "near-surface" between the model and observations.

Particulate matter
Figures 6-9 show measured and predicted total PM 2.5 and the average diurnal profile at each site for the periods with available measurements. The diurnal profile in these figures includes that of both emission scenarios: the total scenario with all emissions and the nonresidential scenario with zeroed-out residential sector. The simulations capture the general trend well and produce significant diurnal profiles (Table 10) Afternoon minima tend to be underestimated in September and December 2015. Diurnal trends of PM 2.5 were weaker in September 2016 than the other months, with lower predictions but overestimated minima. Urban sites show greater overestimation than rural sites. This is likely due in part to the granularity of the primary emissions inventory datasets. The nonresidential sector was prepared from data with a native resolution of 36 km, while the residential sector used data with ∼ 1 km resolution. Underpredictions of peak PM 2.5 concentrations in September could also result because the emission inventory does not account for day-to-day variations, especially in the agricultural burning sector in which emissions can change significantly on a daily basis. Observed and predicted PM 2.5 levels in New Delhi can exceed 300 µg m −3 , Here the pink lines correspond to CMAQ predictions of the "total" (solid) and "nonresidential" (dotted) simulations. The solid black line represents ambient observations. Standard deviations of the diurnal profiles for observations and predictions are indicated by gray and pink colored shading, respectively. Diurnal profiles were averaged over simulation durations (Table 7). Computations were carried out at 4 km resolution.
especially in winter. In this highly populated urban environment, particulate matter levels are more than double those reported in the nearby rural areas. The employed emissions inventory specifies particulate matter surface emissions, which surpass those of Bajada Pahari and SOMAARTH HQ more than 30-fold (Table 5). Biogenic emissions are predicted to be of little importance, accounting for less than 10 % on average of total PM 2.5 concentrations for most stations and months (Table 10). Figure 10 shows CMAQ predictions of secondary organic PM 2.5 (SOA). Like PM 2.5 , SOA is typically predicted to be higher in New Delhi than in the rural sites, due to higher PM 2.5 and precursor VOC emissions and ambient concentrations in urban environments (Tables 5 and 6). Higher levels are similarly attained in December than in September due to longer residence times and more aging during winter. SOA has high day-to-day variability. Values range from below 20 µg m −3 to over 200 µg m −3 in December, with average peaks up to 55 µg m −3 at the rural sites. September months predict lower SOA, ranging from 10 to 130 µg m −3 . Diurnal average SOA maximum in December for the rural stations is nearly double that of September 2016, which can be attributed to temperature inversions and a shallower planetary boundary layer in winter.
The significance of household emissions on outdoor PM 2.5 concentrations is demonstrated by the diurnal profiles in Fig. 11. Figure 11a, b, and c the predicted contribution of the residential sector to anthropogenic PM 2.5 , while Fig. 11 d, e, and f describe the predicted contribution of the residential sector to secondary organic PM 2.5 , as in Eqs. (7) and (8) respectively: Residential anthropogenic PM 2.5 Total anthropogenic PM 2.5 , Residential SOA Total SOA .
(8) Figure 11g, h, and i show the predicted SOA portion of residential PM 2.5 , as Residential SOA Residential PM 2.5 , Here the blue lines correspond to CMAQ predictions of the "total" (solid) and "nonresidential" (dotted) simulations. The solid black line represents ambient observations. Standard deviations of the diurnal profiles for observations and predictions are indicated by gray and blue colored shading, respectively. Diurnal profiles were averaged over simulation durations (Table 7). Computations were carried out at 4 km resolution.
where residential PM is calculated as the difference in predictions from the nonresidential and total emission scenario and averaged over simulation durations (Table 7). The importance of household emissions to ambient PM is strongly correlated with mealtimes. Predicted maximum contributions to anthropogenic PM 2.5 in Bajada Pahari and SOMAARTH HQ are about double that of South and West New Delhi for each month. Household energy use is estimated to account for up to 27 % of anthropogenic PM 2.5 (at SOMAARTH HQ during September 2016), remaining consistently above 10 % for each rural site during all months. Similar behav-ior is predicted for SOA (Fig. 11b, e, and h). An estimated 15 % to 34 % of secondary organic matter is attributable to residential emissions in September and 2016. Again, the impact is smaller in West and South New Delhi (up to 19 % and 21 %, respectively in September 2016), where there are greater emissions of SOA precursors from other sectors. The diurnal profile of the contribution to SOA is subdued for all sites in December, suggesting that SOA generation is less efficient in winter when radiation and temperatures are lower. The aging of VOCs is captured by the phase shift of the impact on SOA daily trend, where peaks consistently occur an Statistics are calculated for average diurnal profiles of predicted parameters. PM 2.5 , O 3 , and SOA are the mass concentrations in micrograms per cubic meter (µg m −3 ) of total fine particulate matter, ozone, and secondary organic matter, respectively. F bio is the fraction of total PM 2.5 that is produced by biogenic emissions, F SOA,res is the fraction of total secondary organic matter attributable to the residential sector, Fan,res is the fraction of total anthropogenic PM 2.5 attributable to the residential sector, and F res,SOA is the fraction of residential PM 2.5 attributable to SOA. PRE is mean predictions, OBS is mean observations, MB is mean bias, ME is mean error, and RMSE is root mean square error. Standard deviation of predictions and observations are noted in parentheses.
hour after the residential sector shows the greatest importance to anthropogenic PM 2.5 . At each measurement site during all months, SOA is predicted to make up more than 40 % of PM 2.5 produced by the residential sector on average ( Fig. 11g-i). SOA is least significant to residential PM 2.5 in the first half of mealtimes (∼ 20 % during breakfast and ∼ 40 % during dinner) at rural sites, when primary particulate matter is largest. The aging of precursor VOCs from cooking emissions, paired with maximum incoming radiation, leads to maximum Residential SOA Residential PM 2.5 values in early afternoon, when SOA accounts for more than 75 % of residential PM 2.5 at both rural and urban sites during each simulated month.
The fractional contribution of total SOA to total PM 2.5 is shown in Fig. 12. While concentrations of SOA depend significantly on the site and time period, their contribution to total PM 2.5 shows little variation. At all stations, SOA is predicted to make up to 55 % of PM 2.5 in September months and to be most significant around midday. However, diurnal variation of the significance of SOA is greater in New Delhi than in Bajada Pahari or SOMAARTH HQ, owing to greater di-versity of energy-use activities and emissions characteristics in the urban environment.

Ozone
The 8 h India Central Pollution Control Board (CPCB) standard for ozone is 100 µ m −3 for an 8 h average. In the alternative unit of ozone mixing ratio, a mass concentration of ozone of 100 µg m −3 at a temperature of 298 K at the Earth's surface equates to a mixing ratio of 51 parts per billion (ppb). A number of atmospheric modeling studies of ozone over India exist (Kumar et al., 2010;Chatani et al., 2014;Sharma et al., 2016). Sharma et al. (2016) carried out baseline CMAQ simulations for 2010 and compared ozone predictions with measurements at six monitoring locations in India (Thumba, Gādanki, Pune, Anantpur, Mt. Abu, and Nainital). Also carried out were sensitivity simulations in which each emissions sector (transport, domestic, industrial, power, etc.) was systematically set to zero. The domestic sector was predicted to contribute ∼ 60 % of the nonmethane volatile organic carbon emissions, followed by 12 % from transportation and 20 % from solvent use and the oil and gas sector. The  (Table 7). Computations were carried out at 4 km resolution. Statistics are shown in Table 10. overall NO x -to-VOC mass ratio in the region simulated by Sharma et al. (2016) was 0.55. This exceptionally low NO xto-VOC ratio was attributed, in part, to the widespread use of biomass fuel for cooking (leading to high VOC emissions), coupled with relatively low NO x emissions. (Although vehicle emissions are high in urban areas, overall vehicle ownership is relatively low at the national level. In addition, Euroequivalent norms have led to a reduction in NO x emissions.) Predicted O 3 levels at the six observation sites tended to exceed measured values, with the ratio of predicted to observed annual average O 3 being in the range of 1.04-1.37 at the six locations. Moreover, the overall low NO x -to-VOC ratios in India lead to NO x -sensitive O 3 formation conditions. Based on emissions inventories, the overall anthropogenic NMVOC / NO x mass emissions ratio in India in 2010 as computed by Sharma et al. (2016) was 1.82. Considering only ground-level sources, the ratio increases to 3.68.
Ozone surface measurements and predicted mass concentrations based on the CMAQ 4 km resolution simulations at two sites in New Delhi over the periods 7-29 September 2015, 7-30 December 2015, and 7-29 September 2016 in the present study are shown in Fig. 13a-c. The predicted O 3 concentrations are reproduced well at the West New Delhi and South New Delhi stations, especially in September (Table 10). However, when NO concentrations are higher due to meteorological inversion conditions, ozone concentrations are underestimated, as local NO + O 3 titration reactions near the monitoring site are not resolved. The performance of the model improves with regard to its prediction of higher values of ozone (as in the case of September), which are of greater importance for assessing exposures. High ozone concentrations in September are quite well reproduced by the model. This shows that, on the larger scale, the model captures photochemistry quite well; however, micro-scale titration is not (d-f), and Residential SOA Residential PM 2.5 (g-i). Bajada Pahari is shown in yellow, SOMAARTH HQ in green, West New Delhi in pink, and South New Delhi in blue. Shading indicates mealtimes. Residential PM is calculated as the difference in predictions from the nonresidential and total emission scenario and averaged over simulation durations (Table 7). Computations were carried out at 4 km resolution. Statistics are shown in Table 10. well represented due to the limitations of inventory resolution. This would require further enhancement of emission inventories at even higher resolution. The results of ozone simulations in the present study are generally consistent with those of previous simulations over India. For example, also using WRF-CMAQ, Kota et al. (2018) showed that the relative bias in ozone simulation ranges from −30 % to +50 % in the major cities of India. In South New Delhi, the bias in O 3 predictions in the present study lies between −2.67 and +7.01 µg m −3 , as compared to the observations of 29.28 to 62.76 µg m −3 .

Conclusions
Air quality in India is determined by a mixture of industrial and motor vehicle emissions, and anthropogenic fuel combustion, that includes residential burning of biomass for household uses, such as cooking. Average daily PM 2.5 levels frequently exceed the 24 h standard of 60 µg m −3 and can exceed 200 µg m −3 , even in rural areas. PM 2.5 is a mixture of directly emitted particulate matter and that formed by the atmospheric conversion of volatile organic compounds to secondary organic aerosol. Here, we assess the extent to which observed O 3 and PM 2.5 levels in India can be predicted using state-of-the-science emissions inventories and atmospheric chemical transport models. We have focused on the 308 km 2 of the SOMAARTH Demographic, Development, and Envi-  (Table 7). Computations were carried out at 4 km resolution.
ronmental Surveillance Site (DDESS) in the Palwal District of Haryana, India.
Atmospheric simulation of particulate matter levels over a complex region like India tends to be demanding, owing to the combination of a wide range of primary particulate emissions and the presence of secondary organic matter from atmospheric gas-phase reactions generating lowvolatility gas-phase products that condense into the particulate phase, forming secondary organic aerosol (SOA). Consequently, the main focus of the present work has been the evaluation of the extent to which ambient particulate matter levels over the current region of India can be predicted. Simulations capture the general trend of observed daily peaks and lows of particulate matter, with PM 2.5 reaching values as high as 250 µg m −3 . Secondary organic matter accounts for 10 % to 55 % of total PM 2.5 mass on average. In India, over 50 % of households report use of wood, crop residues, or dung as cooking fuel; such fuels produce significant gas-and particle-phase emissions. We evaluated the fractional impact of the residential sector emissions on the formation of secondary organic aerosol, as a function of time of day, for New Delhi, SOMAARTH HQ, and Bajada Pahari. The predicted fractional contribution of residential sector emissions to secondary organic PM 2.5 in Bajada Pahari and SOMAARTH HQ reaches values as high as 34 % and, moreover, displays a distinct diurnal profile, with maxima corresponding to the morning and evening mealtimes. In both rural and urban areas, SOA is predicted to account for more than 40 % of residential PM 2.5 , reaching up to 80 % in early afternoon in September months.  (Table 7). Computations were carried out at 4 km resolution.
Simulations of ozone levels in New Delhi reported here are largely in agreement with ambient monitoring data, although the simulations fail to capture several 1-to 2-day ozone episodes that exceed predictions by a factor of 2 or more. The overall agreement between observed and predicted O 3 levels, also demonstrated in the study of Sharma et al. (2016), suggests that gas-phase atmospheric chemistry over India is reasonably well understood. While ozone and particulate matter were simulated for September and December months, we employed a single emissions inventory, regardless of season. Thus, the inventory does not capture December-specific characteristics, including heating combustion. Furthermore, information regarding household solvent use, emissions profiles by fuel type, and speciation of certain emissions (such as semivolatile organic compounds and intermediate volatility organic compounds) is lacking. Variation in the resolution of specific input data additionally contributes to uncertainty.
Air quality studies such as the present one provide a quantification of the elements of atmospheric composition in India, especially that owing to household sources. The importance of replacing traditional household combustion devices with modern technology is evident in studies such as the present one.
Data availability. The gridded data files of PM 2.5 used in this study are available from the authors upon request by email. Surface measurements of various atmospheric chemicals and meteorology are available from the Central Pollution Control Board (CPCB) of the Ministry of Environment and Forests of the Government of India at http://www.cpcb.gov.in/CAAQM/frmUserAvgReportCriteria. aspxTS1 (CPCB, 2019; last access: 28 May 2019). Initial and boundary condition data for WRF meteorological simulations are from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5, generated using Copernicus Climate Change Service Information and available at https://cds.climate.copernicus. eu/cdsapp\T1\textbackslash#!/home (Copernicus Climate Change Services, 2017; last access: 28 May 2019). Neither the European Commission nor ECMWF is responsible for any use that may be made of the Copernicus information or data it contains.
Author contributions. BR carried out the simulations and wrote the paper. RZ, YW and KB assisted with the simulations. AP carried out field measurements. SS, SK, TB, NL, BO, LX, and VG helped formulate the emissions inventory. LF, RW, SM, and DB designed and carried out measurements. SN, RE, AY, and NA performed data analysis. KS designed the research. JS designed the research and wrote the paper.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. This work was supported by EPA STAR grant R835425 Impacts of Household Sources on Outdoor Pollution at Village and Regional Scales in India. The contents are solely the responsibility of the authors and do not necessarily represent the official views of the US EPA. Yuan Wang appreciates the support from the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration, and support from the US National Science Foundation (award no. 1700727).
Financial support. This research has been supported by EPA STAR (grant no. R835425).
Review statement. This paper was edited by Qiang Zhang and reviewed by three anonymous referees.