Ensemble prediction of air quality using the WRF / CMAQ model system for health effect studies in China

Accurate exposure estimates are required for health effect analyses of severe air pollution in China. Chemical transport models (CTMs) are widely used to provide spatial distribution, chemical composition, particle size fractions, and source origins of air pollutants. The accuracy of air quality predictions in China is greatly affected by the uncertainties of emission inventories. The Community Multiscale Air Quality (CMAQ) model with meteorological inputs from the Weather Research and Forecasting (WRF) model were used in this study to simulate air pollutants in China in 2013. Four simulations were conducted with four different anthropogenic emission inventories, including the Multi-resolution Emission Inventory for China (MEIC), the Emission Inventory for China by School of Environment at Tsinghua University (SOE), the Emissions Database for Global Atmospheric Research (EDGAR), and the Regional Emission inventory in Asia version 2 (REAS2). Model performance of each simulation was evaluated against available observation data from 422 sites in 60 cities across China. Model predictions of O3 and PM2.5 generally meet the model performance criteria, but performance differences exist in different regions, for different pollutants, and among inventories. Ensemble predictions were calculated by linearly combining the results from different inventories to minimize the sum of the squared errors between the ensemble results and the observations in all cities. The ensemble concentrations show improved agreement with observations in most cities. The mean fractional bias (MFB) and mean fractional errors (MFEs) of the ensemble annual PM2.5 in the 60 cities are −0.11 and 0.24, respectively, which are better than the MFB (−0.25 to −0.16) and MFE (0.26–0.31) of individual simulations. The ensemble annual daily maximum 1 h O3 (O3-1h) concentrations are also improved, with mean normalized bias (MNB) of 0.03 and mean normalized errors (MNE) of 0.14, compared to MNB of 0.06–0.19 and MNE of 0.16–0.22 of the individual predictions. The ensemble predictions agree better with observations with daily, monthly, and annual averaging times in all regions of China for both PM2.5 and O3-1h. The study demonstrates that ensemble predictions from combining predictions from individual emission inventories can improve the accuracy of predicted temporal and spatial distributions of air pollutants. This study is the first ensemble model study in China using multiple emission inventories, and the results are publicly available for future health effect studies. Published by Copernicus Publications on behalf of the European Geosciences Union. 13104 J. Hu et al.: Ensemble prediction of air quality using the WRF/CMAQ modeling system in China


Introduction
A significant portion of the population in China has been exposed to severe air pollution in recent decades as the consequence of intensive energy use without efficient control measures.Based on ambient air pollution data published by the China National Environmental Monitoring Center (CNEMC), most of the major cities are in violation of the Chinese Ambient Air Quality Standards grade II standard (35 µg m −3 ) for annual average particulate matter with diameter of 2.5 µm or less (PM 2.5 ; Zhang and Cao, 2015;Y. Wang et al., 2014), with a mean population weighted PM 2.5 concentration of over 60 µg m −3 during 2013-2014.Long-term exposure to such high levels of PM 2.5 greatly threatens public health in China.Recent studies have suggested that approximately more than 1 million premature deaths can be attributed to outdoor air pollution each year in China (Lelieveld et al., 2015;Liu et al., 2016;Hu et al., 2017a).
Accurate exposure estimates are required in health effect studies.Ambient air quality is usually measured at monitoring sites and used to represent the exposure of the population in the surrounding areas.A routine central monitoring network in China has been operating since 2013, but it is still limited in spatial coverage and lacks detailed information of the chemical composition, PM size fractions, and source origins of air pollutants.Chemical transport models (CTMs) have been widely used in health effect studies to overcome the limitations in central monitoring measurements for exposure estimates (Philip et al., 2014;Lelieveld et al., 2015;Liu et al., 2016;Laurent et al., 2016a, b;Ostro et al., 2015).However, the accuracy of the predictions from CTMs is largely affected by the accuracies of the emission inventories (Wang et al., 2010), meteorological fields (Hu et al., 2010), and numerical solutions to the equations that describe various atmospheric processes (Hu et al., 2006;Yu et al., 2005).Several emission inventories have been created to cover China.Different emission inventories focus on specific geographical regions in the urban, regional (Zhao et al., 2012;Zhang et al., 2008), and national or continental (Zhang et al., 2009;Kurokawa et al., 2013) scales, and/or focus on specific pollutants (Su et al., 2011;Ou et al., 2015) and specific sectors (Zhao et al., 2008;Xu et al., 2017).
Despite great efforts in improving the accuracy of emission inventories in China, large uncertainties remain.Generally, emissions of pollutants are estimated as the product of activity levels (such as industrial production or energy consumption), unabated emission factors (i.e., mass of emitted pollutant per unit activity level), and the efficiency of emission controls.Large uncertainties are associated with activity levels, emission source fractions, and emission factors (Akimoto et al., 2006;Lei et al., 2011a).For a Pearl River delta (PRD) inventory in 2006, SO 2 emission has low uncertainties of −16 to +21 % from power plant sources quantified by Monte Carlo simulations, while NO x has medium-to-high uncertainties of −55 to +150 % and VOC, CO, and PM have even higher uncertainties (Zheng et al., 2009).For an inventory for the Yangtze River delta (YRD) region, the overall uncertainties for CO, SO 2 , NO x , PM 10 , PM 2.5 , VOCs, and NH 3 emissions are ±47.1,±19.1, ±27.7, ±117.4, ±167.6, ±133.4, and ±112.8 %, respectively (Huang et al., 2011).A comprehensive quantification study by Zhao et al. (2011) using Monte Carlo simulations showed that the uncertainties of Chinese emissions of SO 2 , NO x , PM 2.5 , BC, and OC in 2005 are −14 to +13, −13 to +37, −17 to +54, −25 to +136, and −40 to +121 %, respectively.
The uncertainties in emission inventories are carried into CTMs simulations, leading to uncertainties in air quality predictions, which need to be carefully evaluated to identify the useful information for health effect studies (Hu et al., 2017b(Hu et al., , 2014c(Hu et al., , b, 2015b;;Tao et al., 2014).An evaluation of 1-year air pollutants predictions using the Weather Research and Forecasting (WRF) / Community Multi-scale Air Quality (CMAQ) modeling system with the Multi-resolution Emission Inventory for China (MEIC) has been reported (Hu et al., 2016a).The model predictions of O 3 and PM 2.5 generally agree with ambient measured concentrations, but the model performance varies in different regions and seasons.In some regions, such as Northwest China, the model significantly underpredicted PM 2.5 concentrations.A recent study compared a few anthropogenic emission inventories in China during 2000-2008(Saikawa et al., 2017)), but detailed evaluation of model results based on these inventories has not been performed.
Ensemble techniques are often used to reduce uncertainties in model predictions from combining multiple data sets.They have been widely used in climate predictions (Murphy et al., 2004;Tebaldi and Knutti, 2007), and have been adopted recently in air quality predictions (Delle Monache et al., 2006;Huijnen et al., 2010).The methods to utilize the strength of different emission inventories to get improved air quality predictions for China have not been reported in the literature.The aim of this study is to create an improved set of air quality predictions in China by using an ensemble technique.First, four sets of 1-year air quality predictions were conducted using the WRF/CMAQ modeling system with four different anthropogenic emission inventories for China in 2013.In addition to MEIC, the three other emission inventories are the Emissions Database for Global Atmospheric Research (EDGAR), Regional Emission inventory in Asia version 2 (REAS2), and Emission Inventory for China developed by School of Environment at Tsinghua University (SOE).The model performance of PM 2.5 and O 3 with different emission inventories was then evaluated against available observation data for 60 cities in China.The differences among air quality predictions with the four inventories were also compared and identified.Finally, an ensemble technique was developed to minimize the bias of model predictions and to create improved exposure predictions.To the authors' best knowledge, this is the first ensemble model study in China using multiple emission inventories.The ensemble predic-tions of this study are available for public health effect analyses upon request to the corresponding author.
This paper is organized as follows.The CMAQ model, emissions and other inputs for the model, observational data sets used for model performance evaluation, and the method for ensemble calculation are described in Sect. 2. Section 3 discusses the model performance on gaseous and particulate pollutants simulated with the four emission inventories, as well as the performance of the ensemble predictions in different regions/cities and with different averaging times.The major findings are summarized in the Conclusion section.

Model description
In this study, the applied CMAQ model is based on CMAQ v5.0.1 with changes to improve the model's performance in predicting secondary organic and inorganic aerosols.The details of these changes can be found in previous studies (Hu et al., 2016a(Hu et al., , 2017c) and the references therein; therefore, only a brief description is summarized here.The gas-phase photochemical mechanism SARPC-11 was modified to better treat isoprene oxidation chemistry (Ying et al., 2015;Hu et al., 2017c).Formation of secondary organic aerosol (SOA) from reactive uptake of dicarbonyls, methacrylic acid epoxide, and isoprene epoxydiol through surface pathways (Li et al., 2015;Ying et al., 2015) was added.Corrected SOA yields due to vapor wall loss (Zhang et al., 2014) were adopted.Formation of secondary nitrate and sulfate through heterogeneous reactions of NO 2 and SO 2 on particle surface (Ying et al., 2014) was also incorporated.It has been shown that these modifications improved the model performance on secondary inorganic and organic PM 2.5 components.

Anthropogenic emissions
The CMAQ model was applied to study air pollution in China and surrounding countries in eastern Asia using a horizontal resolution of 36 km.The modeling domain is shown in Fig. 1.The anthropogenic emissions are from four inventories: MEIC, SOE, EDGAR, and REAS2.MEIC was developed by a research group in Tsinghua University (http: //www.meicmodel.org).Compared with other inventories for China, e.g., INTEX-B (Zhang et al., 2009) or TRACE-P (Streets et al., 2003), the major improvements include a unit-based inventory for power plants (Wang et al., 2012) and cement plants (Lei et al., 2011b), a county-level highresolution vehicle inventory (Zheng et al., 2014), and a novel non-methane VOC (NMVOC) speciation approach (Li et al., 2014).The VOCs were speciated to the SAPRC-07 mechanism.As the detailed species to model species mapping of the SAPRC-11 mechanism is essentially the same as the SAPRC-07 mechanism (Carter and Heo, 2012), the speciated VOC emissions in the MEIC inventory were directly used in the simulation.
The SOE emission inventory was developed using an emission factor method (Wang et al., 2011;Zhao et al., 2013b).The sectorial emissions in different provinces were calculated based on activity data, technology-based and uncontrolled emissions factors, and penetrations of control technologies (fractions of pollutants not collected).Elemental carbon (EC) and organic carbon (OC) emissions were calculated based on PM 2.5 emissions and their fractions in PM 2.5 in source-specific speciation profiles.The sectorial activity data and technology distribution were obtained using an energy demand modeling approach with various Chinese statistics and technology reports.More details, including the spatiotemporal distributions and speciation of NMVOC emissions, can be found in previous publications (Zhao et al., 2013a, b;Wang et al., 2011).Since MEIC and SOE emission inventories only cover China, emissions from other countries and regions were based on REAS2 (Kurokawa et al., 2013).
Version 4.2 of EDGAR emissions (http://edgar.jrc.ec.europa.eu/overview.php?v=42) has a spatial resolution of 0.1 × 0.1 • .The EDGAR inventory contains annual emissions from different sectors based on IPCC designations.REAS2 has a spatial resolution of 0.25 × 0.25 • for all of Asia.The inventory contains monthly emissions of pollu-tants from different source categories.Detailed information regarding these inventories can be found in the publications presenting them.Table S1 in the Supplement shows the total emissions of major pollutants within China in a typical workday of each season.In general, large differences exist among different inventories for China.MEIC has the highest CO emissions in winter while REAS2 has the highest in other seasons.MEIC has the highest NO x emissions while REAS2 has the highest emissions of VOCs in all seasons.EDGAR predicts the highest SO 2 emissions, which are approximately a factor of 2 higher than those estimated by SOE.SOE has the highest NH 3 emissions while EDGAR has much lower NH 3 emissions than the other three.EDGAR also has the lowest EC and OC emissions, but the total PM 2.5 emissions are the highest.Standard deviations indicate that winter has the largest uncertainties for all species except SO 2 and NH 3 .Winter has the lowest SO 2 uncertainties while summer has the largest NH 3 uncertainties.
All emissions inventories were processed with an in-house program and re-gridded into the 36 km resolution CMAQ domain when necessary.Representative speciation profiles based on the SPECIATE 4.3 database maintained by the U.S. EPA were applied to split NMVOC of EDGAR and REAS2 into the SAPRC-11 mechanism, and PM 2.5 of all inventories was split into AERO6 species.Monthly emissions were temporally allocated into hourly files using temporal allocation profiles from previous studies (Chinkin et al., 2003;Olivier et al., 2003;Wang et al., 2010).More details regarding EDGAR can be found in D. Wang et al. (2014), while those for REAS2 can be found in Qiao et al. (2015).

Other inputs
The Model for Emissions of Gases and Aerosols from Nature (MEGAN) v2.1 was used to generated biogenic emissions (Guenther et al., 2012).The 8-day Moderate Resolution Imaging Spectroradiometer (MODIS) leaf area index (LAI) product (MOD15A2) and the plant function type (PFT) files used in the Global Community Land Model (CLM 3.0) were applied to generate inputs for MEGAN.The readers are referred to Qiao et al. (2015) for more information.Open biomass burning emissions were generated using a satelliteobservation-based fire inventory developed by NCAR (Wiedinmyer et al., 2011).The dust emission module was updated to be compatible with the 20-category MODIS land use data (Hu et al., 2015a) for inline dust emission processing, and sea salt emissions were also generated in-line during CMAQ simulations.
The meteorological inputs were generated using WRF v3.6.1 (Skamarock et al., 2008).The initial and boundary conditions for WRF were downloaded from the NCEP FNL Operational Model Global Tropospheric Analyses data set.WRF configurations details can be found in Zhang et al. (2012).WRF performance has been evaluated by comparing predicted 2 m above surface temperature and relative humidity, and 10 m wind speed and wind direction with all available observational data at ∼ 1200 stations from the National Climate Data Center (NCDC).The model performance is generally acceptable and detailed evaluation results can be found in a previous study (Hu et al., 2016a).
The initial and boundary conditions representing relatively clean tropospheric concentrations were generated using CMAQ default profiles.

Model evaluation
Model predictions with the four emission inventories were evaluated against available observation data in China.Hourly observations of PM 2.5 , PM 10 , O 3 , CO, SO 2 , and NO 2 from March to December 2013 at 422 stations in 60 cities were obtained from CNEMC (http://113.108.142.147:20035/emcpublish/), but no observations were available for January and February.Observations at multiple sites in one city were averaged to calculate the average concentrations of the city.Detailed quality control of the data can be found in previous studies (Hu et al., 2014a(Hu et al., , 2016a;;Y. Wang et al., 2014).Statistical matrix of mean normalized bias (MNB), mean normalized error (MNE), mean fractional bias (MFB), and mean fractional error (MFE) were calculated using Eqs.( 1)-( 4): where C m and C o are the predicted and observed city average concentrations, respectively, and N is the total number of observation data.MNB and MNE are commonly used in evaluation of model performance of O 3 , and MFB and MFE are commonly used in evaluation of model performance of PM 2.5 (Tao et al., 2014).The U.S. EPA previously recommended O 3 model performance criteria of within ±0.15 for MNB and less than 0.30 for MNE (as shown in Fig. 1), and PM model performance criteria of within ±0.60 for MFB and less than 0.75 for MFE (U.S. EPA, 2001).Figure 2 includes the criteria and goals for PM as a function of PM concentration, as suggested by Boylan and Russell (2006), which have been widely used in model evaluation.

Ensemble predictions
The four sets of predictions with the different inventories were combined linearly to calculate the ensemble predic- tions, as shown in Eq. ( 5): where C pred,ens is the ensemble prediction, C pred,m is the predicted concentration from the mth simulation, N m is the number of simulations in the ensemble (N m = 4), and w m is the weighting factor of the mth simulation.The weighting factor for each set of predictions was determined by minimizing the objective function Q in Eq. ( 6): where C obs i is the observed PM 2.5 or O 3 concentration at the ith city, N city is the total number of cities with observation (N = 60), C pred,m i is the predicted concentration at the ith city from the mth simulation, and N m is the number of simulations in the ensemble (N m = 4).The weight factor w m of the mth simulation to be determined is within the range of [0, 1], with w = 0 represents no influence of the individual simulation on the ensemble prediction, and w = 1 indicates that concentrations of the individual simulation are fully accounted for in the ensemble prediction.The observation data were the same as used in the model evaluation.Ensemble predictions were performed for PM 2.5 and O 3 in this study.A MATLAB program was developed to solve above equation and determine the weighting factors using the linear least squares solver "lsqlin".The difference in model performance with the four inventories also varies seasonally and spatially.Figure 2 shows the comparison of model performance for hourly gaseous species (O 3 , CO, NO 2 , and SO 2 ) in each month from March to December 2013.The MNB values of O 3 in most months are within the criteria for all inventories except for SOE, which underpredicts O 3 concentrations.March has the worst performance of O 3 for all inventories with MNE values larger than 0.4 for MEIC, SOE, and EDGAR.No significant performance difference among different inventories in different months is found, but large differences exist in various regions of China (see the definition of regions of China in Fig. 1).O 3 predicted using MEIC, EDGAR, and REAS2 meets the performance criteria in most regions except for YRD by MEIC and PRD by EDGAR.O 3 predicted using SOE only meets the criteria in the Northwest (NW) and other region (Other) of China.CO and NO 2 are underpredicted in all regions, with the largest underpredictions in NW and Other.This pattern is similar among the results with all inventories.SO 2 is generally underpredicted in all regions but overpredicted in the Sichuan Basin (SCB) by all inventories.SO 2 is also overpredicted by EDGAR in the PRD region.SO 2 in Northeast (NE) is substantially underpredicted by MEIC and REAS2.In general, model performance in the more developed regions such as YRD, and PRD are relatively better, compared to NW and Other.

Model performance on gaseous and particulate pollutants
Figure 3 illustrates the PM 2.5 and PM 10 performance statistics of MFB and MFE as a function of absolute concentrations in different months of 2013 and in different regions.PM 2.5 predictions based on each inventory are within the performance goal of MFB and between the goal and criteria of MFE in all months.There are no significant differences among inventories.Half of monthly averaged PM 10 MFB values fall within the goal while the rest are between the goal and criteria.MFE values of PM 10 are all between the goal and criteria.From the regional perspective, PM 2.5 performance for NE by SOE fails the MFB criteria, while that for SCB by MEIC, SOE, and REAS2 fails the MFE criteria.MFB values of PM 10 in all regions meet the criteria except NW, due to underestimation of windblown dust emissions in NW.

Spatial variations in predicted gaseous and particulate pollutants
Figure 4 shows the spatial distribution of annual average daily maximum 1 h O 3  EDGAR and REAS2 have very similar differences in SO 2 concentrations with MEIC, i.e., more than 5 ppb higher concentrations in the NCP and YRD than MEIC, ∼ 2 ppb higher  concentrations in the PRD, 2-3 ppb lower concentrations in the NE, and up to 5 ppb lower concentrations in the CNT and SCB.
Figure 5 shows the seasonal distribution of PM 2.5 total mass predicted by MEIC and differences between predictions by the other three inventories and those by MEIC.In spring, MEIC-predicted PM 2.5 concentrations are ∼ 50 µg m −3 in eastern and southern China.Southeast Asia has the highest value of ∼ 100 µg m −3 .SOE predicts 5-10 µg m −3 lower PM 2.5 than MEIC in north China and < 5 µg m −3 higher values in southern China and along the coastline.EDGAR predicts > 20 µg m −3 lower values in NCP and ∼ 10 µg m −3 lower values in NE, CNT, and SCB, but up to 20 µg m −3 higher values in PRD.REAS2 predicts higher PM 2.5 values in most parts of China except underpredictions in NE and SCB.The difference in PM 2.5 in YRD and NCP is up to 20-30 µg m −3 .In summer, the high PM 2.5 regions are much smaller compared to spring with ∼ 50 µg m −3 in NCP, northern part of YRD and SCB and 20-30 µg m −3 in other parts.Generally, SOE predicts < 10 µg m −3 lower values in most regions.EDGAR predicts lower values in NCP and SCB but 5-10 µg m −3 higher values in southern China.REAS2 predicts higher values in almost all the regions except some scattered areas in NCP, YRD, and SCB.
In fall, PM 2.5 concentrations are larger than 50 µg m −3 in most regions except NW and are ∼ 100 µg m −3 in part of NCP, CNT, and SCB.SOE predicts lower values than MEIC in northern China but higher in southern China.EDGAR predicts up to 30 µg m −3 lower values in NCP and SCB while up to 20 µg m −3 higher values in YRD.REAS2 again estimates similar values as MEIC with less than 5 µg m −3 differences in most regions and up to 20 µg m −3 higher values in scattered areas in YRD and SCB.In winter, MEICpredicted PM 2.5 concentrations are up to 200 µg m −3 in NCP, CNT, YRD, and SCB, while PRD has concentrations of ∼ 50 µg m −3 .SOE-predicted concentrations are severely lower by 30 µg m −3 in most regions with high PM 2.5 concentrations but by < 10 µg m −3 higher in only coast areas.EDGAR also predicts 30 µg m −3 lower PM 2.5 concentrations in NE, NCP, CNT, and SCB, but 20 µg m −3 higher in the YRD region.The regions with lower values by REAS2 compared to MEIC are in the regions of NE, NCP, CNT, and SCB, similar to EDGAR but with much smaller areas.
Figure 6 shows the annual average concentrations of PM 2.5 components predicted by MEIC and the differences between predictions by the other three inventories and those by MEIC.Annual average particulate sulfate (SO 2− 4 ) concentrations with MEIC are 20-25 µg m −3 in NCP, CNT, and SCB, and about 10 µg m −3 in other regions in the southeastern China.SOE-predicted concentrations are ∼ 10 µg m −3 lower in the high-concentration areas and 2-3 µg m −3 lower in other areas.EDGAR-predicted SO 2− 4 are ∼ 5 µg m −3 higher in southeastern China and 2-3 µg m −3 lower in SCB.REAS2-predicted SO 2− 4 concentrations are generally 2-3 µg m −3 lower than those of MEIC in most areas except 3 ) concentrations of up to 30 µg m −3 in NCP and YRD and concentrations in other regions are 5-10 µg m −3 except northwest China.SOE-predicted nitrate concentrations are < 5 µg m −3 lower in the high-concentration areas and ∼ 2 µg m −3 higher values in coastal areas.EDGAR uniformly predicts lower NO − 3 values than MEIC with the largest difference of 10 µg m −3 .REAS2 has similar results to SOE.Particulate ammonium (NH + 4 ) concentrations predicted by MEIC have a peak value of 15 µg m −3 and are mostly less than 10 µg m −3 in eastern and southern China.SOE predicts slightly lower concentrations except for the coastal areas in PRD, where the SOE predictions are 1-2 µg m −3 higher.
EC concentrations are generally low compared to other components as predicted by MEIC.The concentrations are less than 10 µg m −3 in NCP, CNT, and SCB.All other three inventories predict 1-2 µg m −3 lower EC values than MEIC throughout the country.Primary organic aerosol (POA) concentrations predicted by MEIC are 20-30 µg m −3 in NCP, CNT, and SCB, and ∼ 10 µg m −3 in other areas in eastern and southern China.SOE-predicted concentrations are up to 5 µg m −3 higher in most areas, but in scattered places the SOE predictions are ∼ 2 µg m −3 lower than MEIC.EDGAR and REAS2 predictions are up to ∼ 10 µg m −3 lower except for coastal areas.SOA concentrations are low in northern China and are up to 10 µg m −3 in eastern and southern China.All three other inventories predict ∼ 2 µg m −3 lower SOA concentrations than MEIC.For other implicit components (OTH), the highest concentrations are ∼ 15 µg m −3 in NW and NCP, while other regions have concentrations lower than 5 µg m −3 .In NW, the major sources of OTH are windblown dust generated in-line by CMAQ simulations; thus, almost no differences are observed among the four simulations.SOE and EDGAR predict lower OTH vales in northern China (∼ 2 µg m −3 and slightly higher values in southern and eastern China (∼ 5 µg m −3 ).REAS2 predicts higher OTH values in eastern China uniformly with up to 10 µg m −3 differences in the NCP, YRD, and SCB regions.
Additional comparisons of the model predictions in different regions and some major cities in China are shown in Figs.S1-S4 in the Supplement.

Ensemble predictions
The above analyses indicate that model performance with different inventories varies for different pollutants and in different regions.Table S2 shows the observed annual average concentrations of PM 2.5 in the 60 cities and the predictions from the four inventories as well as the weighted ensemble predictions.The weighting factors for predictions using MEIC, REAS2, SOE, and EDGAR are 0.31, 0.36, 0.24, and 0.20, respectively (Table 2).The ensemble predictions greatly reduce MFB to a value of −0.11, compared to the MFB values of −0.25 to −0.16 using the annual average concentrations in the individual simulations.Also, the ensemble prediction yields an MFE value of 0.24, lower than any MFE values of 0.26-0.31based on individual simulations (Fig. 7).The ensemble predictions of annual O 3 -1h have MNB and MNE of 0.03 and 0.14, respectively, improved from MNB of 0.06-0.19and MNE of 0.16-0.22 in the individual predictions.
To further evaluate the ability of the ensemble method in improving predictions at locations where observational data are not available, ensemble predictions were made using a data withholding method.For each city, the observations at the other 59 cities were used to determine the weighting factors in E6 and the ensemble prediction at the city was calculated.Performance of the ensemble predictions at the city was calculated using the withheld observations to evaluate  the performance.The evaluation process was repeated for each of the 60 cities and the performance was compared to that with individual inventories (shown in Table S3).Previous studies have revealed that CTMs predictions agree more when averaging over longer periods of time (i.e., annual vs. monthly vs. daily averages; Hu et al., 2014bHu et al., , 2015b)).Ensemble predictions were also calculated with daily and monthly averages for PM 2.5 , in addition to the calculation with annual averages discussed above.The weighting factors and the performance of ensemble predictions are shown in Table 2 and Fig. 7, respectively.The weighting factors vary largely with the averaging times, suggesting that the prediction optimization needs to be conducted separately when using different time averages.The ensemble predictions improve the agreement with observations in all averaging time cases, with lower MNB and MNE than any of the individual predictions.In general, EDGAR and REAS2 have large weights for daily and monthly ensemble calculations, and MEIC and SOE have large weights for annual ensemble calculations.This result indicates that the annual total emission rates of MEIC and SOE are likely accurate, but the temporal profiles to allocate the annual total emissions rates to specific days/hours need to be improved.
Table 3 shows the ensemble prediction performance on PM 2.5 and O 3 -1h in different regions of China using the daily average observations and daily average predictions with individual inventories.The weighting factors vary greatly among regions, reflecting the substantial difference in the spatial distributions of PM 2.5 and O 3 when using different inventories.The MNB and MNE values of ensemble predictions are reduced in all regions for both pollutants, suggesting the ensemble predictions improve the accuracy and can be better used in further health effect studies.The similar findings are also found with the monthly average observations and predictions (shown in Table S4).
Figure 8 shows spatial distributions of PM 2.5 and its components from the ensemble predictions using the weighting factors of annual averages.The ensemble of PM 2.5 components was calculated using the same weighting factors as for PM 2. The results of the current study can be further applied in health effect studies.The first such analysis used annual PM 2.5 ensemble predictions to assess the spatial distribution of excess mortality due to adult (> 30 years old) ischemic heart disease (IHD), cerebrovascular disease (CEV), chronic obstructive pulmonary disease (COPD), and lung cancer (LC) in China caused by PM 2.5 exposure (Hu et al., 2017a).Any health studies requiring human exposure infor-mation to different pollutants would benefit from this study.Even though the weighted factors vary depending on the regions, averaging times and different study years, the ensemble method proposed in this study minimizes the difference between predictions and observations and can be applied in different studies.The way to calculate the weighting factors depends on the objectives of specific studies.But in general, the more observation data used in the calculation, the more accurate the ensemble prediction will be.

Conclusion
In this study, air quality predictions in China in 2013 were conducted using the WRF/CMAQ modeling system with anthropogenic emissions from four inventories including MEIC, SOE, EDGAR, and REAS2.Model performance with the four inventories was evaluated by comparing against available observation data from 422 sites in 60 cities in China.Model predictions of hourly O 3 and PM 2.5 with the four inventories generally meet the model performance criteria, but model performance with different inventories varies by pollutant and by region.To improve the overall agreement of the predicted concentrations with observations, ensemble predictions were calculated by linearly combining the predictions from different inventories.The ensemble annual concentrations show improved agreement with observations for both PM 2.5 and O 3 -1h.The MFB and MFE of the ensemble predictions of PM 2.5 in the 60 cities are −0.11 and 0.24, respectively, which are better than the MFB (−0.25 to −0.16) and MFE (0.26 to 0.31) of any individual simulations.The ensemble predictions of annual O 3 -1h have MNB and MNE of 0.03 and 0.14, improved from MNB (0.06-0.19) and MNE (0.16-0.22) in individual predictions.The ensemble predictions with the data withholding method at each city show better performance than the predictions with individual inventories at most cities, demonstrating the ability of the ensemble at improving the predictions at locations where observational data are not available.The ensemble predictions agree better with observations with daily, monthly, and annual averaging times in all regions of China.The study demonstrates that ensemble predictions from combining predictions from individual emission inventories can improve the accuracy of concentration estimations and the spatial distributions of air pollutants.

Figure 1 .
Figure 1.The WRF/CMAQ modeling domain and the regions in China.The dots represent the 60 cities where observational data are available for ensemble analysis.The x and y axis is the Lambert projection grid numbers in the west-east and southnorth direction.NCP represents the North China Plain region.The provinces included in each region are as follows: NCP: Beijing, Tianjin, Hebei, Shandong, and Inner Mongolia; Northeast: Liaoning, Jilin, and Heilongjiang; YRD: Shanghai, Jiangsu, and Zhejiang; Central China: Shanxi, Henan, Hubei, Anhui, Hunan, Jiangxi; Northwest: Xinjiang, Qinghai, Ningxia, Gansu, and Shaanxi; Sichuan Basin: Sichuan and Chongqing; Southwest: Tibet, Yunnan, Guizhou, Guangxi; PRD: Guangdong, Hong Kong, and Macau; Fujian, Hainan, and Taiwan are grouped as the "Other" region.

Figure 2 .
Figure 2. Performance of predicted O 3 , CO, NO 2 , and SO 2 for different months (top two rows) and regions based on simulations with individual inventories.The blue dashed lines on the O 3 plots are ±0.15 for MNB and 0.3 for MNE as suggested by the U.S. EPA (2001).Changes of colors show the months from March to December in top two rows (3 refers to March, 12 to December, etc.), while showing regions from NCP to Other in the bottom two rows.

Figure 3 .
Figure 3. Performance of predicted PM 2.5 and PM 10 for different months (a-d) and regions (e-h) based on simulations with individual inventories.The x axis is the observed concentrations.The model performance criteria (solid black lines) and goals (dash blue lines) are suggested byBoylan and Russell (2006).The model performance goals represent the level of accuracy considered to approximate the best a model could be expected to achieve, and the model performance criteria represent the level of accuracy that is considered to be acceptable for modeling applications.Changes in colors show the months from March to December in the top two rows (3 refers to March, 12 to December, etc.), while they show regions from NCP to Other in the bottom two rows.

Figure 4 .
Figure 4. Spatial differences of model-predicted annual average gas species concentrations (in the horizontal panels) with different inventories (in the vertical panels).Units are ppb.The color bars of the first column are different to better show the spatial distribution of different species.White indicates zero, while blue, green, yellow, and red mean concentrations from low to high.The color bars for the other three columns are same; white indicates zero and blue and green mean values less than zero, while yellow, purple, and red mean values larger than zero.O 3 -1h represents daily maximum 1 h O 3 and O 3 -8h represents daily maximum 8 h mean O 3 .

Figure 5 .
Figure5.Spatial differences of model-predicted seasonal averaged PM 2.5 concentrations (in the horizontal panels) with different inventories (in the vertical panels).Units are µg m −3 .In the first column, white indicates zero, while blue, green, yellow, and red mean concentrations from low to high.The color bars for the other three columns are same; white indicates zero and blue and green mean values less than zero, while yellow, purple, and red mean values larger than zero.

Figure 6 .
Figure 6.Spatial differences of model-predicted annual PM 2.5 components (in the horizontal panels) with different inventories (in the vertical panels).Units are µg m −3 ."OTHER" represents the other implicit components (OTH).Colors are used as in Fig. 5.

Figure 7 .
Figure 7. MFB (a) and MFE (b) of predicted PM 2.5 for with an averaging time of 24 h, 1 month, and 1 year based on the individual inventories and the ensemble.

Figure 8 .
Figure 8. Spatial distributions of PM 2.5 and its components in the ensemble predictions.Units are µg m −3 .The scales of the panels are different.White indicates zero, while blue, green, yellow, and red mean concentrations from low to high."OTHER" represents the other implicit components (OTH).
5 in total mass.Concentrations of over 80 µg m −3 annual average PM 2.5 are estimated in NCP, CNT, YRD, and SCB regions in 2013.Secondary inorganic aerosols (SO 2− 4 , NO − 3 , and NH + 4 ) account for approximately half of PM 2.5 , and exhibit similar spatial patterns.Carbonaceous aerosols (EC, POA, and SOA) account for about 30 %, but POA and SOA have quite different spatial distributions.High POA concentrations are mainly distributed in NCP, CNT, and SCB, while high SOA concentrations are found in southern China.By considering the spatial distributions of population and ensemble PM 2.5 , the population-weighted annual average PM 2.5 concentration in China in 2013 is 59.5 µg m −3 , which is higher than the estimated value of 54.8 µg m −3 by Brauer et al. (2016).

Table 1 .
Table1summarizes the overall model performance on O 3 , CO, NO 2 , SO 2 , PM 2.5 , and PM 10 with different inventories using the averaged observations in 60 cities in 2013.Model performance meets the O 3 criteria for all inventories.O 3 from SOE is 7.2 parts per billion (ppb) lower than the mean observed concentration while the underpredictions of the other three inventories are less than 2 ppb.CO, NO 2 , and SO 2 are underpredicted by all inventories, indicating potential emission underestimation of these species in the inventories.CO predictions from three inventories (SOE inventory does not include CO) are substantially lower than observations, with the best performance (lowest MNB and MNE) from REAS2.The overall performance of NO 2 is similar to CO.However, MEIC and SOE yield the lowest MNB, while EDGAR yields the highest MNB for CO.SO 2 performance is better than CO and NO 2 , and MEIC and SOE yield the Overall model performance of gas and PM species in 2013 using different inventories.Obs is observation, MFB is mean fractional bias, MFE is mean fractional error, MNB is mean normalized bias, and MNE is mean normalized error.The indices were calculated with hourly observations and predictions.The best performance is indicated by the bold numbers.

Table 2 .
The weighting factors (w) of each inventory in the ensemble predictions of PM 2.5 when using daily, monthly, and annual averages in the objective function (Eq.5).

Table 3 .
Performance of daily PM 2.5(MFB and MFE)and O 3 -1h (MNB and MNE) in different regions of China based on individual inventories and the ensemble.The weighting factors (w) used to calculate the ensemble of each region are also included.The best performance is indicated by the bold numbers.
at 45 and 41 cities for PM 2.5 and O 3 -1h, respectively.Out of the 15 cities for which the ensemble PM 2.5 is only better than one or none of the individual predictions, 10 cities have MFB within ±0.25 and MFE less than 0.25.Out of the 19 cities for which the ensemble O 3 -1h is only better than one or none of the individual predictions, 14 cities still have MNB within ±0.2 and MNE less than 0.2.The results demonstrate that the ensemble can improve the predictions even at locations with no observational data available.
The results show that the ensemble predictions are better than those with EDGAR, MEIC, REAS2, and SOE at 36, 37, 32, and 40 cities for PM 2.5 , and 39, 39, 43, and 38 cities for O 3 -1h, respectively.The ensemble predictions are better than ≥ 2 of the individual predictions