Supplemental Material Ensemble Predictions of Air Pollutants in China in 2013 for Health Effects Studies Using WRF / CMAQ Modeling System with Four Emission Inventories

1 Jiangsu Key Laboratory of Atmospheric Environment Monitoring and Pollution Control, Jiangsu Engineering Technology Research Center of Environmental Cleaning Materials, Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, School of Environmental Science and Engineering, Nanjing University of Information Science & Technology, 219 Ningliu Road, Nanjing 210044, China 2 Zachry Department of Civil Engineering, Texas A&M University, College Station, TX 778433136 3 Ministry of Education Key Laboratory for Earth System Modeling, Center for Earth System Science, Tsinghua University, Beijing, China 4 State Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing 100084, China 5 Department of Civil and Environmental Engineering, Louisiana State University, Baton Rouge, LA 77803


Introduction
Large population in China has been exposed to severe air pollution in recent decades as the consequence of intensive energy use without efficient control measures.Based on ambient air pollution data published by China National Environmental Monitoring Center (CNEMC), most of the major cities are in violation of the Chinese Ambient Air Quality Standards Grade II standard (35 μg m -3 ) for annual average particulate matter with diameter of 2.5 μm or less (PM2.5)(Zhang and Cao, 2015;Wang et al., 2014b), with a mean population weighted PM2.5 concentration of over 60 μg m -3 during 2013-2014.Long-term exposure to such high levels of PM2.5 greatly threatens public health in this country.Recent studies have suggested that approximately more than one million premature deaths can be attributed to outdoor air pollution each year in China (Lelieveld et al., 2015;Liu et al., 2016;Hu et al., 2017a).Accurate exposure estimates are required in health effects studies.Central monitor measurements are usually used in exposure assessment, but routine central monitoring network in China has just been built up from 2013, and is still limited in spatial coverage and lack of detailed information of chemical composition, size fractions, and source origins.Chemical transport models (CTMs) have been widely used in health effects studies to overcome the limitations in central monitor measurements for exposure estimates (Philip et al., 2014;Lelieveld et al., 2015;Liu et al., 2016;Laurent et al., 2016a;Laurent et al., 2016b;Ostro et al., 2015).However, the accuracy of CTMs predictions is largely affected by the accuracies of emission inventories (Wang et al., 2010), the meteorological fields (Hu et al., 2010), and numerical solutions to the equations that describe various atmospheric processes (Hu et al., 2006;Yu et al., 2005).Emission inventories are indispensable tools for a wide range of environmental activities from management of chemicals to the prevention of air pollution.Several emission inventories have been created for China.Different emission inventories focus on specific geographical regions in the urban, regional (Zhao et al., 2012;Zhang et al., 2008), national or continental (Zhang et al., 2009;Kurokawa et al., 2013) scales; and/or focus on pollutants from individual (Su et al., 2011;Ou et al., 2015) and specific sectors (Zhao et al., 2008;Xu et al., 2017).Despite the great efforts in improving the accuracy of emission inventories in China, large uncertainties remain.Generally, the emissions of pollutants are estimated as the product of activity levels (such as industrial production or energy consumption), unabated emission factors (i.e.mass of emitted pollutant per unit activity level), and the efficiency of emission controls and their fractional penetrations into the industries.Large uncertainties are associated with activity levels, emission source fractions, and emission factors (Akimoto et al., 2006;Lei et al., 2011a).The uncertainties are especially significant for some pollutants, such as ammonia (NH3) and volatile organic compounds (VOCs).For example, it is shown that for a Pearl River Delta (PRD) inventory in 2006, SO2 emission has low uncertainties of -16%~21% from power plant sources quantified by Monte Carlo simulations.However, NOx has medium to high uncertainties of -55%~150% and VOC, CO, and PM have even higher uncertainties (Zheng et al., 2009).For an inventory for the Yangtze River Delta (YRD) region, the overall uncertainties for CO, SO2, NOx, PM10, PM2.5, VOCs, and NH3 emissions are ±47.1 %, ±19.1 %, ±27.7 %, ±117.4 %, ±167.6 %, ±133.4 %, and ±112.8 %, respectively (Huang et al., 2011).A comprehensive quantification study by Zhao et al. (2011) using Monte Carlo simulations showed that the uncertainties of Atmos. Chem. Phys. Discuss., doi:10.5194/acp-2017-182, 2017 Manuscript under review for journal Atmos.Chem.Phys.Discussion started: 16 May 2017 c Author(s) 2017.CC-BY 3.0 License.
Chinese emissions of SO2, NOx, PM2.5, BC, and OC in 2005 are −14%∼13%, −13%∼37%, −17%∼54%, −25%∼136%, and −40%∼121%, respectively.The uncertainties in emission inventories are carried into CTMs simulations, leading to biases in air quality predictions, which need to be carefully evaluated to identify the useful information that can be used in health effects studies (Hu et al., 2016b;Hu et al., 2014c;Hu et al., 2014b;Hu et al., 2015b;Tao et al., 2014).An evaluation of one-year air pollutants predictions using the Weather Research and Forecasting (WRF) / Community Multi-scale Air Quality (CMAQ) modeling system with the Multi-resolution Emission Inventory for China (MEIC) has been reported (Hu et al., 2016a).The model predictions of O3 and PM2.5 generally agree with ambient measured concentrations, but the model performance varies in different regions and seasons.In some regions, such as the northwest of China, the model significantly under-predicted PM2.5 concentrations.The technique of ensemble is often used to reduce uncertainties in model predictions by combining multiple sets of predictions.This technique has been widely used in the climate predictions (Murphy et al., 2004;Tebaldi and Knutti, 2007), and recently adopted in air quality predictions (Delle Monache et al., 2006;Huijnen et al., 2010).A recent study has compared a few anthropogenic emission inventories in China during 2000-2008(Saikawa et al., 2016)), but detailed evaluation of air quality model results based on these inventories for over an extended time period have not been performed.The methods to utilize the strength of different emission inventories to get improved air quality predictions for China have not been reported in the literature.The aim of this study is to create an improved set of air quality predictions in China by using an ensemble technique.First, four sets of one-year air quality predictions were conducted with the WRF/CMAQ modeling system with four different anthropogenic emission inventories for China for the entire year of 2013.In addition to MEIC, the three other emission inventories are the Emissions Database for Global Atmospheric Research (EDGAR), Regional Emission inventory in Asia version 2 (REAS2), and Emission Inventory for China developed by School of Environment at Tsinghua University (SOE).The model performance on PM2.5 and O3 concentrations in 2013 with different emission inventories was then evaluated against available observation data in China.The differences among air quality predictions were also compared and identified.Finally, an ensemble technique was developed to minimize the bias of model predictions and to create improved exposure predictions.To the authors' best knowledge, this is the first ensemble model study in China using multiple emission inventories.The ensemble predictions of this study are available for public health effects analyses.

Method 2.1 Model description
In this study, the applied CMAQ model is based on CMAQ v5.0.1 with changes to improve the model's performance in predicting secondary organic and inorganic aerosol.The details of these changes could been found in previous studies (Hu et al., 2016a;Hu et al., 2017b), therefore only a brief description is summarized here and more details can be found in the cited publications and the references therein.The gas phase photochemical mechanism SARPC-11 was modified to better treat isoprene oxidation chemistry (Ying et al., 2015;Hu et al., 2017b).Formation of secondary organic aerosol (SOA) from reactive uptake of dicarbonyls, methacrylic acid epoxide, and isoprene epoxydiol through surface pathway (Li et al., 2015;Ying et al., 2015) was added.Corrected SOA yields due to vapor wall-loss (Zhang et al., 2014) were adopted.Formation of secondary nitrate and sulfate through heterogeneous reactions of NO2 and SO2 on particle surface (Ying et al., 2014) were also incorporated.It has been showed that these modifications improved the model performance on secondary inorganic and organic PM2.5 components.

Anthropogenic emissions
The CMAQ model was applied to China with surrounding countries in East Asia using the horizontal resolution of 36-km.The anthropogenic emissions are from four different inventories: MEIC, SOE, EDGAR, and REAS2.MEIC was developed by a research group in Tsinghua University (http://www.meicmodel.org).Compared with other inventories for China, e.g.INTEX-B (Zhang et al., 2009) or TRACE-P (Streets et al., 2003), the major improvements include a unit-based inventory for power plants (Wang et al., 2012) and cement plants (Lei et al., 2011b), a county-level high-resolution vehicle inventory (Zheng et al., 2014), and a novel NMVOC speciation approach (Li et al., 2014).The VOCs were speciated to the SAPRC-07 mechanism.As the detailed species to model species mapping of the SAPRC-11 mechanism is essentially the same as the SAPRC-07 mechanism (Carter and Heo, 2012), the speciated VOC emissions in the MEIC inventory were directly used in the simulation.The SOE emission inventory was developed using an emission factor method (Wang et al., 2011;Zhao et al., 2013b).The sectorial emissions in different provinces were calculated based on activity data, technology-based uncontrolled emissions factors, and penetrations of control technologies.Elemental carbon (EC) and organic carbon (OC) emissions were calculated based on PM2.5 emissions and their ratios to PM2.5.The sectorial activity data and technology distribution were obtained using an energy demand modeling approach with various Chinese statistics and technology reports.More details, including the spatio-temporal distributions and speciation of NMVOC emissions, can be found in previous publications (Zhao et al., 2013b;Wang et al., 2011;Zhao et al., 2013a).Since MEIC and SOE emission inventories only cover China, emissions from outside China countries and regions were based on REAS2 (Kurokawa et al., 2013).The version 4.2 of EDGAR emission (http://edgar.jrc.ec.europa.eu/overview.php?v=42) has a spatial resolution of 0.1 o ×0.1 o .The EDGAR inventory contains annual emissions from different sectors based on IPCC designations.REAS2 has a spatial resolution of 0.25˚ ×0.25˚ for the entire Asia.The inventory contains monthly emissions of pollutants from different source categories.Saikawa et al. (2016) compared the major features of different anthropogenic emission inventories for China.Detailed information regarding these inventories can be found in the publications presenting them.Table S1 shows the total emissions of major pollutants within China in a typical workday of each season.In general, large differences exist among different inventories for China.MEIC has the highest CO emissions in January while REAS2 has the highest in other three seasons.MEIC has the highest NOx emissions while REAS2 has the highest emissions of VOCs in all months.EDGAR predicts the highest SO2 emissions, which are approximately a factor of two higher than those estimated by SOE.SOE has highest NH3 emissions while EDGAR has much lower NH3 emissions than the other three.EDGAR also has lowest EC and OC emissions, but the total PM2.5 emissions are the highest.Standard deviations (SD) indicate that January has the largest uncertainties for all species except SO2 and NH3.January has the smallest SO2 uncertainties while July has the largest NH3 uncertainties.All the emissions inventories were processed with an in-house program and re-gridded into the 36-km resolution CMAQ domain when necessary.Representative speciation profiles based on the SPECIATE 4.3 database maintained by U.S. EPA were applied to split NMVOC of EDGAR and REAS2 into SAPRC-11 mechanism.PM2.5 was also speciated into AERO6 species using profiles from the SPECIATE 4.3 database.Monthly emissions were temporally allocated into hourly files using temporal allocation profiles from previous studies (Chinkin et al., 2003;Olivier et al., 2003;Wang et al., 2010a).More details regarding EDGAR can be found in Wang et al. (2014a), while those for REAS2 can be found in Qiao et al. (2015).

Other inputs
The Model for Emissions of Gases and Aerosols from Nature (MEGAN) v2.1 was used to generated biogenic emissions.The 8-day Moderate Resolution Imaging Spectroradiometer (MODIS) leaf area index (LAI) product (MOD15A2) and the plant function type (PFT) files used in the Global Community Land Model (CLM 3.0) were applied to generate inputs to MEGAN.The readers are referred to Qiao et al. ( 2015) for more information.Open biomass burning emissions were generated using a satellite observation based fire inventory developed by NCAR (Wiedinmyer et al., 2011).The dust emission module was updated to be compatible with the 20-category MODIS land use data (Hu et al., 2015a) for inline dust emission processing and sea salt emissions were also generated during CMAQ simulations.The meteorological inputs were generated using WRF v3.6.1.The initial and boundary conditions to WRF were downloaded from the NCEP FNL Operational Model Global Tropospheric Analyses dataset.WRF configurations details can be found in Zhang et al. (2012).WRF performance has been evaluated by comparing predicted 2m above surface temperature and relative humidity, and 10m wind speed and wind direction with all available observational data at ~1200 stations from the National Climate Data Center (NCDC).The model performance is generally acceptable and detailed evaluation results can be found in a previous study (Hu et al., 2016a).The initial and boundary conditions representing relatively clean tropospheric concentrations were generated using CMAQ default profiles.

Model evaluation
Model predictions with the four emission inventories were evaluated against available observation data in China.Hourly observations of PM2.5, PM10, O3, CO, SO2, and NO2 from March to December 2013 at 422 stations in 60 cities were obtained from CNEMC (http://113.108.142.147:20035/emcpublish/). Detailed quality control of the data can be found in previous studies (Hu et al., 2016a;Hu et al., 2014a;Wang et al., 2014b).Statistical matrix of mean normalized bias (MNB), mean normalized error (MNE), mean fractional bias (MFB) and mean fractional error (MFE) were calculated using the Equations ( E1)-(E4): where m C and o C are the predicted and observed concentrations, respectively, and N is the total number of observation data.MNB and MNE are commonly used in evaluation of model performance of O3, and MFB and MFE are commonly used in evaluation of model performance of PM2.5 (Tao et al., 2014).

Ensemble predictions
The four sets of predictions with different inventories were combined linearly to calculate the ensemble predictions, as shown in Equation (E5): , , where  , is the ensemble predictions,  , is the predicted concentration from the m th simulation, Nm is the number of simulations in the ensemble (Nm=4), and m w is the weighting factor of the m th simulation.The weighting factor for each set of predictions was determined by minimizing the objective function Q as shown in Equation ( 6): where    is the observed PM2.5 or O3 concentration at the i th city, Ncity is the total number of cities with observation (N=60),   , is the predicted concentration at the i th city from the m th simulation, Nm is the number of simulations in the ensemble (Nm=4), and w's are weighting factors to be determined under the constraints that 0<w<1.The observations data were the same as used in the model evaluation.Ensemble predictions were performed for PM2.5 and O3 in this study.A MATLAB program was developed to solve above equation and determine the weighting factors.

Model performance on gaseous and particulate pollutants
Table 1 summarizes the overall model performance on O3, CO, NO2, SO2, PM2.5, and PM10 with different inventories using the averaged observations in 60 cities in 2013.The U.S. EPA previously recommended O3 model performance criteria of within ± 0.15 for MNB and less than 0.30 for MNE (as shown in Figure 1) and PM model performance criteria of within ± 0.60 for MFB and less than 0.75 for MFE (EPA, 2001).Figure 2 includes the criteria and goals for PM as a function of PM concentration, as suggested by Boylan and Russell (2006), which have been widely used in model evaluation.Model performance meets the O3 criteria for all inventories.O3 from SOE are 7.2 parts per billion (ppb) lower than the mean observed concentration while the under-predictions of the other three inventories are less than 2 ppb.CO, NO2, and SO2 are underpredicted by all inventories, indicating potential uncertainties in the inventories.CO predictions from three inventories (SOE inventory does not include CO) are substantially lower than observations, with the best performance (lowest MNB and MNE) from REAS2.NO2 overall performance is similar to CO; however, MEIC and SOE yield the lowest MNB, and EDGAR yields the highest.SO2 performance is better than CO and NO2, and MEIC and SOE yield the lowest MNB, while MNE values of the four inventories are very similar.PM2.5 and PM10 predictions using all inventories meet the performance criteria with similar MFB and MFE values.REAS2 generally yields slightly better PM2.5 and PM10 performance, but all inventories under-predict the concentrations generally.The difference in model performance with the four inventories also varies seasonally and spatially.Figure 1 shows the comparison of model performance for hourly gaseous species (O3, CO, NO2, and SO2) in each month from March to December 2013.The MNB values of O3 in most months are within the criteria for all inventories except for SOE, which under-predicts O3 concentrations.March has the worst performance for all inventories with MNE values larger than 0.4 for MEIC, SOE, and EDGAR.No significant performance difference among different inventories in different months is found, but large difference exists in various regions of China (see the definition of regions of China in Figure S1).O3 predicted using MEIC, SOE, and REAS2 meet the criteria for the YRD region by MEIC.O3 predicted using SOE only meets the criteria in Northwest (NW) and other region (Other) of China.For CO, NO2, and SO2, model performance in the less developed regions such as central (CNT), NW, and Other regions is worse compared to more developed regions.Figure 2 illustrates the PM2.5 and PM10 performance statistics of MFB and MFE as a function of absolute concentrations in different months of 2013 and in different regions.PM2.5 predictions based on each inventory are within the performance goal of MFB and between goal and criteria of MFE in all months.There is no significant difference among inventories.Half of monthly averaged PM10 MFB values fall within the goal while the rest are between goal and criteria.MFE values of PM10 are all between goal and criteria.From the regional perspective.PM2.5 performance in NE by SOE is out of MFB criteria, while that in Sichuan Basin (SCB) by MEIC, SOE, and REAS2 are out of MFE criteria.MFB of PM10 at all regions meet the criteria except NW, where is largely affected by windblown dust.EDGAR and REAS2 had very similar difference with MEIC, i.e., more than 5 ppb higher concentrations in the NCP and YRD, ~2 ppb higher concentrations in the PRD, 2-3 ppb lower concentrations in the NE and up to 5 ppb lower concentrations in the CEN and SCB. Figure 4 shows the seasonal distribution of PM2.5 total mass predicted by MEIC and differences between SOE, EDGAR, and REAS2 to MEIC.In the spring, MEIC predicted PM2.5 concentrations are ~50 µg m -3 in east and south parts of China and South Asia has the highest value of ~100 µg m -3 .SOE predicts 5-10 µg m -3 lower PM2.5 in north China and < 5 µg m -3 higher values in south China and along the coastline.EDGAR predicts >20 µg m -3 lower values in NCP and ~10 µg m -3 lower values in NE, CNT, and SCB, but up to 20 µg m -3 higher values in PRD.REAS2 predicts higher PM2.5 values in most parts of China except under-predictions in NE and SCB.The over-predictions in YRD and NCP were up to 20-30 µg m -3 .In summer, the high PM2.5 regions are much smaller compared to spring with ~50 µg m -3 ppb concentrations in NCP, north part of YRD and SCB and 20-30 µg m -3 in other parts.Generally, SOE predicts <10 µg m -3 lower values in most regions.EDGAR predicts lower values in NCP and SCB and 5-10 µg m -3 higher values in south part.REAS2 almost predicts higher values in all the regions except some scattered areas in NCY, YRD, and SCB.In fall, PM2.5 concentrations are larger than 50 µg m -3 in most regions except NW and are ~100 µg m -3 in part of NCP, CNT, and SCB.SOE predicted values are lower in north part and higher in south part.EDGAR predicts up to 30 µg m -3 lower values in NCP and SCB while up to 20 µg m -3 higher values in YRD.REAS2 again estimates close values to MEIC with less than 5 µg m -3 differences in most regions and up to 20 µg m -3 higher values in scattered areas in YRD and SCB.In winter, MEIC predicted PM2.5 concentrations are up to 200 µg m -3 in NCY, CNT, YRD, and SCB, while YRD has concentrations of ~50 µg m -3 .SOE severely underestimates by 30 µg m -3 in all regions with high PM2.5 concentrations and only coast areas experience <10 µg m -3 higher

Ensemble predictions
Above analyses indicate that model performance with different inventories varies on different pollutants and in different regions.Table 2 shows the observed annual average concentrations of PM2.5 in the 60 cities and the predictions from the four inventories as well as the weighted ensemble predictions.The weighting factors for predictions using MEIC, REAS2, SOE and EDGAR are 0.31, 0.36, 0.24 and 0.20, respectively (Table 3).The ensemble predictions greatly reduce MFB with a value of -0.11, compared to the MFB values of -0.25 --0.16 in the individual simulations.Also, the ensemble predictions have an MFE value of 0.24, lower than and MFE values of 0.26 -0.31 in any individual simulations (Figure 6).The ensemble predictions of annual O3-1h have the MNB and MNE of 0.03 and 0.14, improved from MNB of 0.06 -0.19 and MNE of 0.16 -0.22 in the individual predictions, respectively.To further evaluate the ability of the ensemble method in improving predictions at locations where observational data are not available, ensemble predictions were made using a data withholding method.For each city, the observations at the other 59 cities were used to determine the weighting factors in E6 and the ensemble prediction at the city was calculated.Performance of the ensemble predictions at the city was calculated using the withheld observations to evaluate the performance.The evaluation process was repeated for each of the 60 cities and the performance was compared to that with individual inventories (shown in Table 4).The results show that the ensemble predictions are better than those with EDGAR, MEIC, REAS2 and SOE at 36, 37, 32 and 40 cities for PM2.5, and 39, 39, 43, and 38 cities for O3-1h, respectively.The ensemble predictions are better than ≥ 2 of the individual predictions at 45 and 41 cities for PM2.5 and O3-1h, respectively.Out of the 15 cities that the ensemble PM2.5 is only better than one or none of the individual predictions, 10 cities have MFB within ±0.25 and MFE less than 0.25.Out of the 19 cities that the ensemble O3-1h is only better than one or none of the individual predictions, 14 cities still have MNB within ±0.2 and MNE less than 0.2.The results demonstrate that the ensemble can improve the predictions even at locations with no observational data available.Previous studies have revealed that CTMs predictions agree more when averaging over longer times (i.e., annual vs. monthly vs. daily averages) (Hu et al., 2014b;Hu et al., 2015b).Ensemble predictions were also calculated with daily and monthly averages for PM2.5, in addition to the calculation with annual averages discussed above.The weighting factors and the performance of ensemble predictions are shown in Table 3 and Figure 6, respectively.The weighting factors vary largely with the averaging times, suggesting that the prediction optimization need to be conducted separately when using different time averages.The ensemble predictions improve the agreement with observations in all averaging time cases, with lower MNB and MNE than any of the individual predictions.Table 5 shows the ensemble prediction performance on PM2.5 and O3-1h in different regions of China using the daily average observations and daily average predictions with individual inventories.The weighting factors vary greatly among regions, reflecting that substantial difference in the spatial distributions of PM2.5 and O3 when using different inventories.The MNB and MNE values of ensemble predictions are reduced in all regions for both pollutants, suggesting the ensemble predictions improve the accuracy and can be better used in further health effects studies.The similar findings are also found with the monthly average observations and predictions (shown in Table S3).Figure 7 shows spatial distributions of PM2.5 and its components from the ensemble predictions using the weighting factors of annual averages.The ensemble of PM2.5 components were calculated using the same weighting factors for PM2.5.Over 80 µg m -3 annual average PM2.5

Conclusion
In this study, air quality predictions in China in 2013 were conducted using the WRF/CMAQ modeling system with anthropogenic emissions from four inventories including MEIC, SOE, EDGAR, and REAS2.Model performance with the four inventories was evaluated by comparing with available observation data from 422 sites in 60 cities in China.Model predictions of hourly O3 and PM2.5 with the four inventories generally meet the model performance criteria, but that model performance with different inventories varies on different pollutants and in different regions.To improve the overall agreement of the predicted concentrations with observations, ensemble predictions were calculated by linearly combining the predictions from different inventories.The ensemble annual concentrations show improved agreement with observations for both PM2.5 and O3-1h.The MFB and MFE of the ensemble predictions of PM2.5 at the 60 cities are -0.11 and 0.24, respectively, which are better than the MFB (-0.25 --0.16) and MFE (0.26 -0.31) of any individual simulations.The ensemble predictions of annual O3-1h have the MNB and MNE of 0.03 and 0.14, improved from MNB (0.06 -0.19) and MNE (0.16 -0.22) in individual predictions.The ensemble predictions with data withholding method at each city show better performance than the predictions with individual inventories at most cities, demonstrating the ability of the ensemble in improving the predictions at locations where observational data are not available.The ensemble predictions agree better with observations with daily, monthly, and annual averaging times in all regions of China.The study demonstrates that ensemble predictions by combining predictions from individual emission inventories can improve the accuracy in the concentration estimation and the spatial distributions of air pollutants.The products of the current study can be further applied in health effects studies.For example, the spatial distribution of excess mortality due to adult (> 30 years old) ischemic heart disease (IHD), cerebrovascular disease (CEV), chronic obstructive pulmonary disease (COPD) and lung cancer (LC) in China caused by PM2.5 exposure (Hu et al., 2017a).Any health studies requiring human exposure information to different pollutants would benefit from this study.The data presented in the paper is available for downloading via requests.

Figure 3
Figure3shows the spatial distribution of annual averaged gas species, 1-hour peak O3 (O3-1h), 8-hour O3 (O3-8h), NO2, and SO2 predicted by MEIC and differences between SOE, EDGAR, and REAS2 to MEIC.MEIC predicted annual O3-1h concentrations are ~60ppb in most parts of China with the highest values of ~70ppb in SCB.SOE predicts lower O3-1h values for all the domain, with about 5 ppb differences in the SCB, CNT, and North China Plain (NCP) regions and 2-3 ppb differences in other regions.EDGAR also predicts 2-3 ppb lower O3-1h in most regions than MEIC but its O3-1h predictions in the Tibet Plateau, NCP and ocean regions are 2-3 ppb higher than MEIC predictions.REAS2 predicted O3-1h values are lower those of MEIC for scattered areas in the NE, NW, and CNT regions and other regions experience slightly higher O3-1h.MEIC, SOE, and REAS2 have similar results out of China since the simulations used same emissions for those regions.O3-8h shows similar spatial distributions as O3-1h among inventories with slightly less differences.NO2 concentrations are 10-15ppb in developed areas of the NCP and YRD regions, and greater than 5 ppb at other urban areas as predicted by MEIC.SOE predicts 2-3 ppb lower NO2 concentrations in most areas except the vast NW region.EDGAR predicts lower NO2 (more than 5 ppb difference) in urban areas of the NCP and YRD areas but higher concentrations in the entire west part of China by approximately 1-2 ppb.REAS2 has the closest NO2 with MEIC as the 1-2 ppb underestimation and overestimation are almost evenly distributed in the whole country.SO2 concentrations are up to 20ppb in the NCP, CNT, and SCB regions while are less than 5 ppb in other regions.SOE mostly predicts 2-3 ppb lower SO2 in the east half of China with the largest difference of -10 ppb in the CNT region.EDGAR and REAS2 had very similar difference with MEIC, i.e., more than 5 ppb higher concentrations in the NCP and YRD, ~2 ppb higher concentrations in the PRD, 2-3 ppb lower concentrations in the NE and up to 5 ppb lower concentrations in the CEN and SCB.Figure4shows the seasonal distribution of PM2.5 total mass predicted by MEIC and differences between SOE, EDGAR, and REAS2 to MEIC.In the spring, MEIC predicted PM2.5 concentrations are ~50 µg m -3 in east and south parts of China and South Asia has the highest value of ~100 µg m -3 .SOE predicts 5-10 µg m -3 lower PM2.5 in north China and < 5 µg m -3 higher values in south China and along the coastline.EDGAR predicts >20 µg m -3 lower values in NCP and ~10 µg m -3 lower values in NE, CNT, and SCB, but up to 20 µg m -3 higher values in PRD.REAS2 predicts higher PM2.5 values in most parts of China except under-predictions in NE and SCB.The over-predictions in YRD and NCP were up to 20-30 µg m -3 .In summer, the high PM2.5 regions are much smaller compared to spring with ~50 µg m -3 ppb concentrations in NCP, north part of YRD and SCB and 20-30 µg m -3 in other parts.Generally, SOE predicts <10 µg m -3 lower values in most regions.EDGAR predicts lower values in NCP and SCB and 5-10 µg m -3 higher values in south part.REAS2 almost predicts higher values in all the regions except some scattered areas in NCY, YRD, and SCB.In fall, PM2.5 concentrations are larger than 50 µg m -3 in most regions except NW and are ~100 µg m -3 in part of NCP, CNT, and SCB.SOE predicted values are lower in north part and higher in south part.EDGAR predicts up to 30 µg m -3 lower values in NCP and SCB while up to 20 µg m -3 higher values in YRD.REAS2 again estimates close values to MEIC with less than 5 µg m -3 differences in most regions and up to 20 µg m -3 higher values in scattered areas in YRD and SCB.In winter, MEIC predicted PM2.5 concentrations are up to 200 µg m -3 in NCY, CNT, YRD, and SCB, while YRD has concentrations of ~50 µg m -3 .SOE severely underestimates by 30 µg m -3 in all regions with high PM2.5 concentrations and only coast areas experience <10 µg m -3 higher Atmos.Chem.Phys.Discuss., doi:10.5194/acp-2017-182,2017   Manuscript under review for journal Atmos.Chem.Phys.Discussion started: 16 May 2017 c Author(s) 2017.CC-BY 3.0 License.values.EDGAR also predicts 30 µg m -3 lower PM2.5 concentrations in NE, NCP, CNT, and SCB, but the YRD region has 20 µg m -3 higher values.The regions with lower values by REAS2 compared to MEIC are much smaller but are at the same regions of NE, NCP, CNT and SCB.South parts of YRD and NCP have higher PM2.5 values than MEIC.Figure5shows the annual averaged concentrations of PM2.5 components predicted by MEIC and the differences between other inventories with MEIC.Annual averaged particulate sulfate (SO4 2-) concentrations are 20-25 µg m -3 in NCP, CNT, and SCB, and about 10 µg m -3 in other regions in the southeast China.SOE predicts ~10 µg m -3 lower values in high concentration areas and 2-3 µg m -3 lower in other areas.EDGAR predicts ~5 µg m -3 higher SO4 2-in southeast China and 2-3 µg m -3 lower values in SCB.REAS2 predicted SO4 2-are generally 2-3 µg m -3 lower than that of MEIC in areas except the coastal areas.MEIC predicts the highest particulate nitrate (NO3 -) concentrations of up to 30 µg m -3 in NCP and YRD and values in other regions are 5-10 µg m -3 except the northwest China.SOE predicts <5 µg m -3 lower values in the high concentrations areas and ~2 µg m -3 higher values in coastal areas.EDGAR uniformly predicts lower NO3 - values than MEIC with the largest different of 10 µg m -3 .REAS2 has similar results to SOE.Particulate ammonium (NH4 + ) concentrations predicted by MEIC have a peak of 15 µg m -3 and are mostly less than 10 µg m -3 in the east and south parts of China.SOE predicts slightly lower values except for coastal areas in PRD, where 1-2 µg m -3 higher values are observed.Elemental carbon (EC) concentrations are generally low compared to other components as predicted by MEIC.The highest values are less than 10 µg m -3 in NCP, CNT and SCB.All other three inventories predicted 1-2 µg m -3 lower EC values throughout the country.Primary organic aerosol (POA) predicted by MEIC are 20-30 µg m -3 in NCP, CNT and SCB, and are ~10 µg m -3 in other areas in east and south parts of China.SOE predicts up to 5 µg m -3 higher values in most areas with scattered places with ~2 µg m -3 lower values compared to MEIC.EDGAR and REAS2 predict up to ~10 µg m -3 lower values except for coastal areas.SOA concentrations are low in north part of China and up to 10 µg m -3 in the whole east and south parts.All three other inventories predict ~2 µg m -3 lower SOA values compared to MEIC.For other implicit components (OTHER), the highest concentrations are ~15 µg m -3 in NW and NCP, while other regions have lower than 5 µg m -3 concentrations.In NW, the major sources of OTHER are windblown dust online generated by CMAQ simulations, thus almost no differences are observed among inventories.SOE and EDGAR predict lower OTHER vales in north part (~2 µg m -3 ) and slightly higher values in south and east parts (~5 µg m -3 ).REAS2 predicts higher OTHER values in the whole east part uniformly with up to 10 µg m -3 differences in NCP, YRD, and SCB regions.Additional comparison of model predictions in different regions and some major cities in China are shown in Figures S2-S5 in the Supplemental Material.
Atmos.Chem.Phys.Discuss., doi:10.5194/acp-2017-182,2017 Manuscript under review for journal Atmos.Chem.Phys.Discussion started: 16 May 2017 c Author(s) 2017.CC-BY 3.0 License.concentrations are estimated in NCP, CNT, YRD and SCB regions in 2013.Secondary inorganic aerosols (SO4 2-, NO3 -, and NH4 + ) account for approximately half of PM2.5, and exhibit similar spatial patterns.Carbonaceous aerosols (EC, POA, and SOA) account for about 30%, but POA and SOA have quite different spatial distributions.High POA concentrations are mainly distributed in NCP, CNT and SCB, while high SOA concentrations are found in the south part of China.By considering the spatial distributions of population and ensemble PM2.5, the populationweighted annual averaged PM2.5 concentration in China in 2013 is 59.5 µg m -3 , which is higher than the estimated value of 54.8 µg m -3 by Brauer et al. (2016).

Environment
Monitoring and Pollution Control of Nanjing University of Information Science and Technology, and Jiangsu Province Innovation Platform for Superiority Subject of Environmental Science and Engineering (No. KHK1201).The authors want to acknowledge the Texas A&M Supercomputing Facility (http://sc.tamu.edu)and the Texas Advanced Computing Center (http://www.tacc.utexas.edu/)for providing computing resources essential for completing the research reported in this paper.Atmos.Chem.Phys.Discuss., doi:10.5194/acp-2017-182,2017 Manuscript under review for journal Atmos.Chem.Phys.Discussion started: 16 May 2017 c Author(s) 2017.CC-BY 3.0 License.

Figure 1 .
Figure 1.Performance of predicted O3, CO, NO2, and SO2 for different months (top two rows) and regions based on simulations with individual inventories.The blue dashed lines on the O3 plots are +/-0.15for MNB and 0.3 for MNE as suggested by U. S. EPA (2001).Changes of colors show the months from March to December in top two rows, while show regions from NCP to Other in the bottom two rows.The same for Figure 2.

Figure 2 .
Figure 2. Performance of predicted PM2.5 and PM10 for different months (a-d) and regions (e-h) based on simulations with individual inventories.The model performance criteria and goal are suggested by Byun and Russell (2006).

Figure 3 .
Figure 3. Spatial difference of model predicted annual averaged gas species concentrations with different inventories.Units are ppb.The color bars of the first column are different to better show the spatial distribution of different species.White indicates zero while blue, green, yellow and red means concentrations from low to high.The color bar for the other three columns are same, white indicates zero, blue and green mean values less than zero while yellow, purple and red mean values larger than zero.

Figure 4 .
Figure 4. Spatial difference of model predicted seasonal averaged PM2.5 concentrations with different inventories.Units are µg m -3 .In the first column, white indicates zero while blue, green, yellow and red means concentrations from low to high.The color bar for the other three columns are same, white indicates zero, blue and green mean values less than zero while yellow, purple and red mean values larger than zero.The same for Figure 5.

Figure 6 .
Figure 6.MFB and MFE of predicted PM2.5 for with an averaging time of 24 hours, 1 month, and 1 year based on the individual inventories and the ensemble.

Figure 7 .
Figure 7. Spatial distributions of PM2.5 and its components in the ensemble predictions.Units are µg m -3 .The scales of the panels are different.White indicates zero while blue, green, yellow and red means concentrations from low to high.

Table 1 .
Overall model performance of gas and PM species in 2013 using different inventories.Obs is observation, MFB is mean fractional bias, MFE is mean fractional error, MNB is mean normalized bias, and MNE is mean normalized error.

Table 3 .
The weighting factors (w) of each inventory in the ensemble predictions of PM2.5 when using daily, monthly, or annual averages in the objective function (E5).

Table 4 .
Comparison of the data-withholding ensemble prediction of PM2.5 and O3-1h at each city with predictions of individual inventories.The ensemble predictions at each city are calculated by using the data in the other 59 cities (i.e., withholding the data at that city) to determine the ensemble weighting factors.Symbol '×' indicates the ensemble prediction performance is better than the performance of a specific inventory (i.e., both MFB (MNB) and MFE (MNE) values are smaller for PM2.5 (O3-1h)); otherwise symbol '-' indicates the ensemble prediction performance is worse.

Table 5 .
Atmos.Chem.Phys.Discuss., doi:10.5194/acp-2017-182,2017 Manuscript under review for journal Atmos.Chem.Phys.Discussion started: 16 May 2017 c Author(s) 2017.CC-BY 3.0 License.Performance of daily PM2.5 (MFB and MFE) and O3-1h (MNB and MNE) in different regions of China based on individual inventories and the ensemble.The weighting factors (w) used to calculate the ensemble of each region are also included.