Evaluation and uncertainty investigation of the NO2, CO and NH3 modeling over China under the framework of MICS-Asia III

. Despite the signiﬁcant progress in improving chemical transport models (CTMs), applications of these modeling endeavors are still subject to large and complex model uncertainty. The Model Inter-Comparison Study for Asia III (MICS-Asia III) has provided the opportunity to assess the capability and uncertainty of current CTMs in East Asian applications. In this study, we have evaluated the multi-model simulations of nitrogen dioxide (NO 2 ), carbon monoxide (CO) and ammonia (NH 3 ) over China under the framework of MICS-Asia III. A total of 13 modeling results, provided by several independent groups from different countries and regions, were used in this study. Most of these models used the same modeling domain with a horizontal resolution of 45 km and were driven by common emission inventories and meteorological inputs. New observations over the North China Plain (NCP) and Pearl River Delta (PRD) regions were also available in MICS-Asia III, allow-ing the model evaluations over highly industrialized regions. The evaluation results show that most models captured the monthly and spatial patterns of NO 2 concentrations in the NCP region well, though NO 2 levels were slightly underestimated. Relatively poor performance in NO 2 simulations was found in the PRD region, with larger root-mean-square error and lower spatial correlation coefﬁcients, which may be related to the coarse resolution or inappropriate spatial allocations of the emission inventories in the PRD region. All models signiﬁcantly underpredicted CO concentrations in both the NCP and PRD regions, with annual mean concentrations that were 65.4 % and 61.4 % underestimated by the ensemble mean. Such large underestimations suggest that CO emissions might be underestimated in the current emission inventory. In contrast to the good skills for simulating the monthly variations in NO 2 and CO concentrations, all models failed to reproduce the observed monthly variations in NH 3 concentrations in the NCP region. Most models mismatched the observed peak in July and showed negative correlation coefﬁcients with the observations, which may be closely related to the uncertainty in the monthly variations in NH 3 emissions and the NH 3 gas–aerosol partitioning. Finally, model intercomparisons have been conducted to quantify the impacts of model uncertainty on the simulations of these gases, which are shown to increase with the reactivity of species. Models contained more uncertainty in the NH 3 simulations. This suggests that for some highly active and/or short-lived primary pollutants, like NH 3 , model uncertainty can also take a great part in the forecast uncertainty in addition to the emission uncertainty. Based on these results, some recommenda-tions are made for future studies. weaknesses of current multi-scale air quality models and provide techniques to reduce uncertainty in Asia; (2) develop reliable anthropogenic emission inventories in Asia and understand the uncertainty of bottom-up emission inventories in Asia; and (3) provide multi-model estimates of radiative forcing and sensitivity analysis of short-lived climate pollu-tants.This study addresses one component of topic 1, focusing on the three gas pollutants of NO 2 , CO and NH 3 . Compared with MICS-Asia II, more modeling results (14 different models with 13 regional models and 1 global model) were brought together within topic 1 of MICS-Asia III, run by independent modeling groups from China, Japan, Korea, United States of America and other countries/regions. The different models contain differences in their numerical ap-proximations (time step, chemical solver, etc.) and parameterizations, which represent a sampling of uncertainties re-siding in the air quality modeling. However, it would be difﬁcult to interpret the results from intercomparison studies wherein the models were driven by different meteorological ﬁelds and emission inventories. Thus, in MICS-Asia III the models were constrained so that they operated under the same conditions by using common emission inventories, meteorological ﬁelds, modeling domain and horizontal resolution. The simulations were also extended from the 4 months in MICS-Asia II to the entire year of 2010.

L. Kong et al.: Evaluation and uncertainty investigation of the NO 2 , CO and NH 3 modeling over China Abstract. Despite the significant progress in improving chemical transport models (CTMs), applications of these modeling endeavors are still subject to large and complex model uncertainty. The Model Inter-Comparison Study for Asia III (MICS-Asia III) has provided the opportunity to assess the capability and uncertainty of current CTMs in East Asian applications. In this study, we have evaluated the multi-model simulations of nitrogen dioxide (NO 2 ), carbon monoxide (CO) and ammonia (NH 3 ) over China under the framework of MICS-Asia III. A total of 13 modeling results, provided by several independent groups from different countries and regions, were used in this study. Most of these models used the same modeling domain with a horizontal resolution of 45 km and were driven by common emission inventories and meteorological inputs. New observations over the North China Plain (NCP) and Pearl River Delta (PRD) regions were also available in MICS-Asia III, allowing the model evaluations over highly industrialized regions. The evaluation results show that most models captured the monthly and spatial patterns of NO 2 concentrations in the NCP region well, though NO 2 levels were slightly underestimated. Relatively poor performance in NO 2 simulations was found in the PRD region, with larger root-mean-square error and lower spatial correlation coefficients, which may be related to the coarse resolution or inappropriate spatial allocations of the emission inventories in the PRD region. All models significantly underpredicted CO concentrations in both the NCP and PRD regions, with annual mean concentrations that were 65.4 % and 61.4 % underestimated by the ensemble mean. Such large underestimations suggest that CO emissions might be underestimated in the current emission inventory. In contrast to the good skills for simulating the monthly variations in NO 2 and CO concentrations, all models failed to reproduce the observed monthly variations in NH 3 concentrations in the NCP region. Most models mismatched the observed peak in July and showed negative correlation coefficients with the observations, which may be closely related to the uncertainty in the monthly variations in NH 3 emissions and the NH 3 gas-aerosol partitioning. Finally, model intercomparisons have been conducted to quantify the impacts of model uncertainty on the simulations of these gases, which are shown to increase with the reactivity of species. Models contained more uncertainty in the NH 3 simulations. This suggests that for some highly active and/or short-lived primary pollutants, like NH 3 , model uncertainty can also take a great part in the forecast uncertainty in addition to the emission uncertainty. Based on these results, some recommendations are made for future studies.

Introduction
As the rapid growth in East Asia's economy with surging energy consumption and emissions, air pollution has become an increasingly important scientific topic and political concern in East Asia due to its significant environmental and health effects (Anenberg et al., 2010;Lelieveld et al., 2015). Chemical transport models (CTMs), serving as a critical tool in both the scientific research and policy making, have been applied into various air quality issues, such as air quality prediction, long-range transport of atmospheric pollutants, development of emission control strategies and understanding of observed chemical phenomena (e.g., Cheng et al., 2016;Lu et al., 2017;Ma et al., 2019;Tang et al., 2011;Xu et al., 2019;Zhang et al., 2019). Nevertheless, air quality modeling remains a challenge due to the multi-scale and nonlinear nature of the complex atmospheric processes . It still suffers from large uncertainties related to the missing or poorly parameterized physical and chemical processes, inaccurate and/or incomplete emission inventories, as well as the poorly represented initial and boundary conditions Dabberdt and Miller, 2000;Fine et al., 2003;Gao et al., 1996;Mallet and Sportisse, 2006). Understanding such uncertainties and their impacts on the air quality modeling is of great importance in assessing the robustness of models for their applications in scientific research and operational use.
There are specific techniques to assess these uncertainties. Monte Carlo simulations, based on different values of model parameters or input fields sampled from a predefined probability density function (PDF), can provide an approximation to the PDF of possible model output and serves as an excellent characterization of the uncertainties in simulations (Hanna et al., 2001). However, this method is more suited to deal with the uncertainty related to the continuous variables, such as input data or parameters in parameterization. The ensemble method, based on a set of different models, is an alternative approach to accounting for the range of uncertainties (Galmarini et al., 2004;Mallet and Sportisse, 2006). For example, the Air Quality Model Evaluation International Initiative (AQMEII) has been implemented in Europe and North America to investigate the model uncertainties of their regional-scale model predictions (Rao et al., 2011). To assess the model performances and uncertainties in East Asian applications, the Model Inter-Comparison Study for Asia (MICS-Asia) has been initiated in the year 1998. The first phase of MICS-Asia (MICS-Asia I) was carried out during the period 1998-2002, mainly focusing on the long-range transport and depositions of sulfur in Asia (Carmichael et al., 2002). In 2003, the second phase (MICS-Asia II) was initiated and took more species related to the regional health and ecosystem protection into account, including nitrogen compounds, O 3 and aerosols. Launched in 2010, MICS-Asia III has greatly expanded its study scope by covering three individual and interrelated topics: (1) evaluate the strengths and weaknesses of current multi-scale air quality models and provide techniques to reduce uncertainty in Asia; (2) develop reliable anthropogenic emission inventories in Asia and understand the uncertainty of bottom-up emission inventories in Asia; and (3) provide multi-model estimates of radiative forcing and sensitivity analysis of short-lived climate pollutants.
This study addresses one component of topic 1, focusing on the three gas pollutants of NO 2 , CO and NH 3 . Compared with MICS-Asia II, more modeling results (14 different models with 13 regional models and 1 global model) were brought together within topic 1 of MICS-Asia III, run by independent modeling groups from China, Japan, Korea, United States of America and other countries/regions. The different models contain differences in their numerical approximations (time step, chemical solver, etc.) and parameterizations, which represent a sampling of uncertainties residing in the air quality modeling. However, it would be difficult to interpret the results from intercomparison studies wherein the models were driven by different meteorological fields and emission inventories. Thus, in MICS-Asia III the models were constrained so that they operated under the same conditions by using common emission inventories, meteorological fields, modeling domain and horizontal resolution. The simulations were also extended from the 4 months in MICS-Asia II to the entire year of 2010.
NO 2 , CO and NH 3 are three important primary gas pollutants that has wide impacts on the atmospheric chemistry. As a major precursor of O 3 , NO 2 plays an important role in the tropospheric O 3 chemistry and also contributes to rainwater acidification and the formation of secondary aerosols (Dentener and Crutzen, 1993;Evans and Jacob, 2005). CO is a colorless and toxic gas ubiquitous throughout the atmosphere, which is of interest as an indirect greenhouse gas (Gillenwater, 2008) and a precursor for tropospheric O 3 (Seinfeld and Pandis, 1998). Being the major sink of OH, CO also controls the atmosphere's oxidizing capacity (Levy, 1971;Novelli et al., 1998). As the only primary alkaline gas in the atmosphere, NH 3 is closely associated with the acidity of precipitation and it can react with sulfuric acid and nitric acid, forming ammonium sulfate and ammonium nitrate, which account for a large proportion of fine particulate matter (Sun et al., , 2013. Assessing their model performances is thus important to help us better understand their environmental consequences and also help explain the model performances for their related secondary air pollutants, such as O 3 and fine particulate matter. In a previous phase of MICS-Asia, no specific evaluation and intercomparison work was conducted for these gases, especially for CO and NH 3 . In MICS-Asia II, model performance of NO 2 was evaluated as a relevant species to O 3 ; however, such evaluations were limited to the observation sites from EANET (Acid Deposition Monitoring Network in East Asia). Model evaluations and intercomparisons in industrialized regions of China have not been performed due to the limited number of monitoring sites in China from EANET, which hindered our understanding of the model performance in industrialized regions. More dense observations over highly industrialized regions of China, namely the North China Plain (NCP) and Pearl River Delta (PRD) regions, were first included in MICS-Asia III, allowing the model evaluations over highly industrialized regions. Meanwhile, the emission inventories of these three gases are still subject to the large uncertainties (Kurokawa et al., 2013;, which is a major source of uncertainties in air quality modeling and forecasts. Evaluating these gases' emission inventories from a model perspective is also a useful way to identify the uncertainties in emission inventories (Han et al., 2009;van Noije et al., 2006;Pinder et al., 2006;Stein et al., 2014;Uno et al., 2007).
In all, this paper is aimed at evaluating NO 2 , CO and NH 3 simulations using the multi-model data from MICS-Asia III; we try to address three questions: (1) what the performance of current CTMs is for simulating NO 2 , CO, and NH 3 concentrations over highly industrialized regions of China; (2) what potential factors are responsible for the model deviations from observations and differences among models; and (3) how large the impacts are of model uncertainties on the simulations of these gases. Six different chemical transport models have participated in MICS-Asia III, with their major configurations summarized in Table 1. These models included NAQPMS (Wang et al., 2001), three versions of CMAQ (Byun and Schere, 2006), WRF-Chem (Grell et al., 2005), NU-WRF (Peters-Lidard et al., 2015), NHM-Chem (Kajino et al., 2012) and GEOS-Chem (http://acmg.seas.harvard.edu/geos/, lass access: 18 December 2019). All models employed a same modeling domain (Fig. 1), with a horizontal resolution of 45 km, except M13 (0.5 • latitude × 0.667 • longitude) and M14 (64 km × 64 km). Detailed information on each component of these CTMs can be obtained from Chen et al. (2019) and Tan et al. (2019). Standard model input datasets of raw meteorological fields, emission inventories and boundary conditions were provided by MICS-Asia III for all participants. Raw meteorological fields were generated from a whole year of simulations in 2010 using Weather Research and Forecasting Model (WRF) version 3.4.1 (Skamarock, 2008) with a horizontal resolution of 45 km and 40 vertical layers from the surface to the model top (10 hPa). Initial and lateral boundary conditions for meteorological simulation were generated every 6 h by using the 1 • × 1 • NCEP FNL (Final) Operational Global Analysis data (ds083.2). Real-time, global, Multi-scale ACM2 (Pleim, 2007) SAPRC99 (Carter, 2000) Aero6 (Binkowski and Roselle, 2003

Geos-Chem
No a Standard represents the reference meteorological field provided by MICS-Asia III project; WRF/NCEP and WRF/MERRA represent the meteorological field of the participating model itself, which was run by WRF driven by the NCEP and Modern Era Retrospective-analysis for Research and Applications (MERRA) reanalysis dataset. RAMS/NCEP is the meteorology field run by RAMS driven by the NCEP reanalysis dataset. b Boundary conditions of M10 are from MOZART and GOCART (Chin et al., 2012;Horowitz et al., 2003), which provided results for gaseous pollutants and aerosols, respectively. sea surface temperature (RTG_SST_HR) analysis were used to generate and update lower boundary conditions for sea areas. Four-dimensional data assimilation nudging (gridded FDDA and SFDDA) was performed during the simulation to increase the accuracy of WRF after the objective analysis with NCEP FNL (Final) Operational Global Analysis data (ds083.2), NCEP Automatic Data Processing (ADP) Global Surface Observation Weather Data (ds461.0), and NCEP ADP Global Upper Air and Surface Weather Data (ds337.0). Detailed configurations of the standard meteorological model are available in Table S1 in the Supplement. The simulated wind speed, relative humidity and air temperature were evaluated against the observations over the NCP and PRD regions, with detailed results shown in Sect. S1. In general, the standard meteorological simulations captured the main features of meteorological conditions in the NCP and PRD regions well, with a high correlation coefficient, small biases and low errors for all meteorological parameters (Figs. S1-S3 and Table S2).
Standard emission inventories provided by the MICS-Asia III were used by all participants. The anthropogenic emissions were provided by a newly developed anthropogenic emission inventory for Asia (MIX), which integrated five national or regional inventories, including the Regional Emission inventory in Asia (REAS) developed at the Japan National Institute for Environment Studies, the Multi-resolution Emission Inventory for China (MEIC) developed at Tsinghua University, the High-Resolution Ammonia Emission Inventory in China developed at Peking University, the Indian emission inventory developed at Argonne National Laboratory in the United States and the Clean Air Policy Support System (CAPSS) Korean emission inventory developed at Konkuk University (M. . Hourly biogenic emissions for the entire year in 2010 in MICS-Asia III were provided by the Model of Emissions of Gases and Aerosols from Nature version 2.04 (Guenther et al., 2006). The Global Fire Emissions Database 3 (Randerson et al., 2013) was used for biomass burning emissions. Volcanic SO 2 emissions were provided by the Asia Center for Air Pollution Research (ACAP) with a daily temporal resolution. Air and ship emissions with an annual resolution were provided by the HTAP version 2 emission inventory for 2010 (Janssens-Maenhout et al., 2015). NMVOC (nonmethane volatile organic compound) emissions were spectated into the model-ready inputs for three chemical mechanisms (CBMZ, CB05 and SAPRC-99), and the weekly and diurnal profiles for emissions were also provided.
MICS-Asia III has provided two sets of top and lateral boundary conditions for the year 2010, which were derived from the 3-hourly global CTM outputs of CHASER (Sudo et al., 2002a, b) and GEOS-Chem (http://acmg.seas.harvard. edu/geos/), run by Nagoya University (Japan) and the University of Tennessee (USA), respectively. GEOS-Chem was run with 2.5 • × 2 • resolution and 47 vertical layers, while the CHASER model was run with 2.8 • × 2.8 • and 32 vertical layers.
All participants were required to use the standard model input data to drive their model run so that the impacts of model input data on simulations could be minimized. However, the models are quite different from each other, and it is difficult to keep all the inputs the same. The majority of models have applied the standard meteorology fields, while the GEOS-Chem and RAMS-CMAQ utilized their own meteorology models. The GEOS-Chem was driven by the GEOS-5 assimilated meteorological fields from the Goddard Earth Observing System of the NASA Global Modeling Assimilation Office, and the RAMS-CMAQ was driven by meteorological fields provided by Regional Atmospheric Modeling System (RAMS) (Pielke et al., 1992). WRF-Chem utilized the same meteorology model (WRF) as the standard meteorological simulation, but two of them considered the two-way coupling effects of pollutants and meteorological fields. The meteorological configurations of these WRF-Chem models were compared to the configurations of the standard meteorological model (Table S1), which shows slight differences from the standard meteorological model. The CTM part of NHM-Chem is coupled with the non-hydrostatic meteorological model (NHM) of the Japan Meteorological Agency (NHM) (Saito et al., 2006), but an interface to convert a meteorological model output of WRF to a CTM input was implemented (Kajino et al., 2018). Thus, the standard meteorology field was used in the NHM-Chem simulation, too.

Data and statistical methods
All modeling groups have performed a base of year-long simulations in 2010 and were required to submit their modeling results according to the data protocol designed in MICS-Asia III. Gridded monthly concentrations of NO 2 , CO, NH 3 and ammonium (NH + 4 ) in the surface layer were used in this study. Note that modeling results from M3 and NH 3 simulations from M8 were excluded due to their incredible results, thus only 13 modeling results were used in this study.
Hourly observed concentrations of NO 2 and CO were collected over the NCP (19 stations) and PRD (13 stations) regions, obtained from the air quality network over northern China  and the Pearl River Delta re-gional air quality monitoring network (PRD RAQMN), respectively. The air quality monitoring network over northern China was set up by the Chinese Ecosystem Research Network (CERN), the Institute of Atmospheric Physics (IAP) and the Chinese Academy of Sciences (CAS) and has been operational since 2009 within an area of 500 × 500 km 2 in northern China. All monitoring stations were selected and set up according to the US EPA method designations . The PRD RAQMN network was jointly established by the government of Guangdong Province and the Hong Kong Special Administrative Region, consisting of 16 automatic air quality monitoring stations across the PRD region . A total of 13 of these stations are operated by the Environmental Monitoring Centers in Guangdong Province that were used in this study, while the other three are located in Hong Kong (not included in this study) and are managed by the Hong Kong Environmental Protection Department. Monthly averaged observations were calculated for the comparisons with the simulated monthly surface NO 2 and CO concentrations. It should be noted that these networks measured the NO 2 concentrations using a thermal conversion method, which would overestimate the NO 2 concentrations due to the positive interference of other oxidized nitrogen compounds (Xu et al., 2013). NH 3 observations for long-term period are indeed challenging and limited due to its strong spatial and temporal variability, quick conversion from one phase to another, and its stickiness to the observational instruments (von Bobrutzki et al., 2010). Measurements of surface NH 3 concentrations in the year 2010 were not available in this study; however, 1 year surface measurement of monthly NH 3 concentrations over China from September of 2015 to August of 2016 were used as a reference dataset in this study, which were obtained from the Ammonia Monitoring Network in China (AMoN-China) . The AMoN-China was established based on the CERN and the Regional Atmospheric Deposition Observation Network in North China Plain , which consists of 53 sites over China and measured the monthly ambient NH 3 concentrations using the passive diffusive technique. A total of 11 stations located in the NCP region were used in this study. Distributions of the observation sites of NO 2 , CO and NH 3 over the NCP and PRD regions, as well as their total emissions in the year 2010 provided by MICS-Asia III, are shown in Fig. 1. Besides the surface observations, the satellite retrievals of NH 3 total columns from IASI (Infrared Atmospheric Sounding Interferometer) were also used in this study to qualitatively evaluate the modeled monthly variations in NH 3 concentrations. The ANNI-NH3-v2.1R-I retrieval product (Van Damme et al., 2017 was used in this study, which is the reanalysis version of NH 3 retrievals from IASI instruments and provides the daily morning (∼ 09:30 local time) NH 3 total columns from 2008 to 2016. More detailed information and the processing of satellite data are available in Sect. S2.
Mean bias error (MBE), normalized mean bias (NMB), root-mean-square error (RMSE) and correlation coefficient (R) were calculated for the assessment of model performances. Standard deviation of the ensemble models was used to measure the ensemble spread and the impacts of model uncertainty. Coefficient of variation (hereinafter, CV), defined as the standard deviation divided by the average, with a larger value denoting a lower consistency among models, was also used to measure the impacts of model uncertainty in a relative sense. However, by this definition, there is a tendency toward lower concentrations being more likely to be associated with a higher value of CV, thus we did not calculate the values of CV over model grids whose simulated concentrations were lower than 0.1 ppbv for NO 2 and NH 3 and 0.1 ppmv for CO, respectively. March-May, June-August, September-November and December-February were used to define the four seasons, spring, summer, autumn and winter, respectively.

Evaluating the ensemble models with observations
To facilitate comparisons, the modeling results were interpolated to the observation sites by taking the values from the grid cell where the monitoring stations are located. Model evaluation metrics defined in Sect. 2.2 were then calculated to evaluate the modeling results against the observations.  Table 2. M13 is not included in the evaluation of NO 2 since it did not submit the NO 2 concentrations. In general, the majority of models underpredicted NO 2 levels in both the NCP and PRD regions. Calculated MBE (NMB) ranges from −6.54 ppbv (−28.4 %) to −2.45 ppbv (−10.6 %) over the NCP region and from −9.84 ppbv (−44.0 %) to −1.84 ppbv (−8.2 %) over the PRD region among these negatively biased models. These underpredicted NO 2 concentrations are consistent with the overpredicted O 3 concentrations by these models found in Li et al. (2019). O 3 productions can either increase with NO x under NO x limited conditions or decrease under the NO x saturated (also called volatile organic compounds, VOCs, limited) conditions (Sillman, 1999). Both the NCP and PRD regions are industrialized regions in China with high NO x emissions (Fig. 1). Observations also showed that the NCP and PRD regions are falling into or changing into NO x -saturated regimes (Shao et al., 2009;Jin and Holloway, 2015). Therefore, the underestimated NO 2 concentrations may contribute to the overpredicted O 3 concentrations in these two regions. Detailed results about the O 3 predic-tions can be found in Li et al. (2019). In addition, as we mentioned in Sect. 2.2, the negative biases in the simulated NO 2 concentrations can be also partly attributed to the positive biases in the NO 2 observations. M5, M8, M9, and M11 in the NCP region and M5, M8, and M11 in the PRD region were exceptions that overpredicted NO 2 concentrations. M11 showed good performances in predicting NO 2 levels in the NCP region, with the smallest RMSE, while M9 significantly overestimated NO 2 , with the largest MBE and RMSE values. NO 2 predictions by M8 were close to the observations over the PRD region, with the smallest RMSE value. Meanwhile, we also found that models exhibited better NO 2 modeling skills in the NCP region than in the PRD region, with smaller biases and RMSE values.
According to the spatial correlation coefficients (Table 2), all models reproduced the main features of the spatial variability of NO 2 concentrations in the NCP region well, with correlation coefficients ranging from 0.57 to 0.70. However, models failed in capturing the spatial variability of NO 2 concentrations in the PRD region with correlation coefficients only ranging from 0.00 to 0.38. Such low correlation might be attributed to the coarser model resolution (45 km), that some local impacts on the NO 2 concentrations might not be well resolved in the model, and/or the uncertainties in emission inventories, which were not well resolved in the PRD region. To investigate this, we have conducted an additional 1 year simulation with finer horizontal resolutions (15 and 5 km, Fig. S4) in the PRD region using the NAQPMS model. Detailed experimental settings are presented in the Sect. S3. The experiment results indicate that when using the same emission inventory as the coarse-resolution simulation, the high-resolution simulation still show poor model performances in capturing the spatial variability of NO 2 concentrations in the PRD region, with calculated correlation coefficient of only 0.03 and 0.02 for 15 and 5 km resolutions, respectively (Sect. S3, Figs. S5-S6 and Table S3). Thus, the poor model performance in the PRD region could be more related to the coarse resolution and/or inappropriate spatial allocation of the emission inventories. These results also suggested that only increasing the resolutions of model may not help improve the model performance. Figure 3 presents the monthly time series of the observed and simulated regional mean NO 2 concentrations over the NCP (Fig. 3a) and PRD (Fig. 3b) regions from January to December in 2010. The models captured the monthly variations in NO 2 concentrations well both in the NCP and PRD regions. According to Table 2, the correlation coefficient ranges from 0.28 to 0.96 in the NCP region and from 0.52 to 0.95 in the PRD region. M8 showed the largest overestimation among all models in summer that MBE (NMB) can reach 12.1 ppbv (75.8 %) in the NCP region, which may help explain the low correlation of this model. M9 exhibited a significant overestimation in winter in the NCP region with MBE (NMB) up to 22.0 ppbv (79.3 %), while there was much less overestimation or even underestimation (sum- Figure 2. Boxplot of simulated and observed annual mean NO 2 , CO and NH 3 concentrations sampled from different stations over the NCP (a, c, e) and PRD (b, d) regions. The outlier was defined as values larger than q 3 + 15 × (q 3 − q 1 ) or less than q 1 − 15 × (q 3 − q 1 ), where q 3 denotes the 75th percentile and q 1 the 25th percentile. This approximately corresponds to 99.3 % coverage if the data are normally distributed.  mer) in other seasons. This discrepancy may be explained by the fact that M9 was an online coupled model that considers two-way coupling effects between the meteorology and chemistry. During the period with heavy haze, the radiation can be largely reduced by aerosol dimming effects, leading to weakened photochemistry, lowered boundary layer height, and thus an increase in NO 2 concentrations. Severe haze was reported to occur in northern China in January 2010, with maximum hourly PM 2.5 concentration even reached as high as ∼ 500 µg m −3 in urban Beijing . Such high aerosol loadings in the atmosphere could trigger interactions between chemistry and meteorology. Interestingly, M9 did not overestimate NO 2 during winter in the PRD region. This might be related to the lower aerosol concentrations and weaker chemistry-meteorology coupling effects in the PRD region.

CO
Similar analyses were performed for modeling results of CO. All models significantly underestimated the annual mean CO concentrations both in the NCP and PRD regions (Fig. 2c-d and Table 2). Calculated MBE (NMB) ranges from −1.69 ppmv (−76.2 %) to −1.16 ppmv (−52.0 %) in the NCP region and from −0.67 ppmv (−69.6 %) to −0.50 ppmv (−52.3 %) in the PRD region (Table 2). Such large negative biases in all models were not likely to be explained by the model uncertainties, suggesting negative biases in the CO emissions over China. This is consistent with the inversion results of Tang et al. (2013), which indicates a significant underestimation of CO emissions over Beijing and the surrounding area in the summer of 2010. Over the most recent decades, global models also reported CO underestimations in the Northern Hemisphere (Naik et al., 2013;Stein et al., 2014), and a number of global model inversion studies have been conducted to derive the optimized CO emissions. Most of these studies have reported a significant underestimation of CO emissions in their a priori estimates (Bergamaschi et al., 2000;Miyazaki et al., 2012;Pétron et al., 2002Pétron et al., , 2004. Our findings agree with these studies and indicate that more accurate CO emissions are needed in future studies. Model performances in simulating spatial variability of CO concentrations were still poor in the PRD region according to Table 2, with most models showing negative correlation coefficients. Time series of the observed and simulated regional mean CO concentrations in the NCP and PRD regions are presented in Fig. 3c-d. It shows that the models except M5 reproduced the monthly variations in CO concentrations in both the NCP and PRD regions well, with a high temporal correlation coefficient (Table 2). All models, however, underestimated CO concentrations throughout the year and showed the largest underestimations in winter with MBE (NMB) by ensemble mean up to −2.1 ppmv (−64.9 %) in the NCP region and −0.75 ppmv (−60.6 %) in the PRD region. Figure 2e shows the comparisons of the observed and simulated annual mean NH 3 concentrations in the NCP region. Since we used the NH 3 observations from September 2015 to August 2016, negative biases are expected according to the increasing trend of atmospheric ammonia during the period 2003-2016 detected by recently retrievals from the Atmospheric Infrared Sounder (AIRS) aboard NASA's Aqua satellite (Warner et al., 2016(Warner et al., , 2017. Due to the interannual uncertainty, we mainly focused on the disparities among different models rather than the deviation from observations. Large differences can be seen in simulated NH 3 concentrations from different models. M14 simulated very low concentrations and exhibited the largest negative biases with MBE (NMB) of −12.2 ppbv (−66.3 %), which may be related to the higher conversion rate of NH 3 to NH + 4 in M14 (discussed later in this section). In contrast, M9 provided much higher NH 3 concentrations than other models, with MBE (NMB) up to 21.8 ppbv (118.7 %). For the CMAQ models, M1 and M2 exhibited higher NH 3 concentrations and larger spatial variability compared to other CMAQ models. Such a discrepancy may be explained by the fact that M1 and M2 are two model runs using CMAQ version 5.0.2. The bidirectional exchange of NH 3 has been integrated into CMAQ from version 5.0. This module can simulate the emitted and deposited processes of NH 3 between atmosphere and the surface, allowing the additional NH 3 emissions to the atmosphere (US EPA Office of Research and Development, 2012).

NH 3
As can be seen in Table 2, the observed spatial variations in NH 3 over the NCP region can be reproduced well by all models (R = 0.57-0.71), indicating that the spatial variations in current NH 3 emissions over the NCP region are well represented in emission inventories. However, all models failed to capture the observed monthly variations in NH 3 concentrations, with most models mismatching the observed NH 3 peak (July) and showing negative correlation coefficients. M10 and M13 are exceptions showing good temporal correlations of 0.64 and 0.65, respectively ( Fig. 3e and Table 2). This is quite different from the model behavior in simulating the monthly variations in NO 2 and CO concentrations. As seen in Fig. 3e, the observation showed the peak concentrations of NH 3 in summer months and lower concentrations in autumn and winter, which is consistent with the previous NH 3 observations in the NCP region (Shen et al., 2011;Xu et al., 2016;Meng et al., 2011). Newly derived satellitemeasured NH 3 at 918 hPa averaged between September 2002 and August 2015 also demonstrated higher concentrations in spring and summer and lower concentrations in autumn and winter (Warner et al., 2016). However, all models predicted a peak concentration in November except for M10 in August and M13 in June. We also used the satellite re- trievals of NH 3 total columns from IASI to further evaluate the modeled monthly variations in NH 3 concentrations, since evaluating the model results using observations from different years may be inappropriate due to the emission change of NH 3 . Comparisons of the surface NH 3 observations from AMoN-China and NH 3 total columns from IASI (Fig. S7) suggest that the IASI measurement can represent the monthly variations in surface NH 3 concentrations well, which can be used to qualitatively evaluate the modeled monthly variations in surface NH 3 concentrations. The monthly time series of the regional mean NH 3 total columns over the NCP region from January 2008 to December 2016 are shown in Fig. S8, which shows similar monthly variations to the surface observations, with the highest value in July, and confirms the poor model performances for reproducing the monthly variations in NH 3 concentrations. The IASI measurement also indicates that the interannual variability of monthly variations in NH 3 concentrations over the NCP region were small from 2008 to 2016, which suggests that using observations from different years could still provide valuable clues for verifying the modeled monthly variations.
The simulated monthly variations in NH 3 concentrations were closely related to the monthly variations in the NH 3 emissions. Most models predicted three peak values of NH 3 concentrations in June, August and November but exhibited a significant decrease in July, which was in good agreement with the peaks and drops of the NH 3 emission rates in these months (Fig. 4). The strong relationship between the simulated NH 3 concentrations and the emission rates suggests that the poor model performance for reproducing the monthly variations in NH 3 concentrations is probably related to the uncertainties in the monthly variations in NH 3 emissions. This is consistent with the recent bottom-up and top-down estimates of agriculture ammonia emissions in China by Zhang et al. (2018), which shows more distinct seasonality of Chinese NH 3 emissions.
It is worth noting that there are also important uncertainties in the models beyond emission uncertainty. In or-der to investigate this issue, we have analyzed the impact of gas-aerosol partitioning of NH 3 on the simulations of NH 3 concentrations. Figure 5 shows the time series of the simulated total ammonium (NH x =NH 3 +NH + 4 ) in the atmosphere, along with the ratio of gaseous NH 3 to total ammonium. M10 is excluded in Fig. 5 since the GOCART model does not predict NH + 4 concentrations. As a result, the emitted NH 3 would be only presented as the gas phase in M10, leading to higher NH 3 predictions. This may also help explain the different monthly variations in NH 3 concentrations seen in M10. Without the considerations of NH + 4 , the monthly variations in NH 3 concentrations in M10 were more consistent with the monthly variations in NH 3 emissions, which highlighted the importance of gas-aerosol partitioning of NH 3 on the predictions of monthly variations in NH 3 concentrations. As seen in Fig. 5, there is a large discrepancy in the simulated gas-aerosol partitioning of NH 3 from different models. M7 and M9 showed a higher NH 3 /NH x ratio than other models, which means that these two models tended to retain the NH 3 in the gas phase and thus predicted higher NH 3 concentrations than other models. For example, M7 predicted comparable magnitude of total ammonium to most models, while gas NH 3 concentration in M7 accounted for more than 60 % of total ammonium in summer and 90 % in winter. The lower conversion rate of NH 3 to NH + 4 in M9 may be related to the gas-phase chemistry used in the model. M9 used the RADM2 mechanism, which gives lower reaction rates of oxidation of SO 2 and NO 2 by the OH radical, as compiled by Tan et al. (2019), leading to lower productions of acid and thus lower conversion rate of NH 3 to NH + 4 . In the case of M7, the hydrolysis of N 2 O 5 was not considered in M7, which leads to a lower tendency in the prediction of NO − 3 (Chen et al., 2019) and partly explains the higher NH 3 predictions of M7. On the contrary, M14 showed a much lower NH 3 /NH x ratio than most models, which is related to its higher production rates of sulfate than other models as seen in Chen et al. (2019). In terms of monthly variations, most models predicted a lower NH 3 /NH x ratio in summer than that in other seasons, suggesting the higher conversion rates of NH 3 from gas phase to aerosol phase in summer. This would be related to the higher yield of ammonium sulfate due to the enhanced photochemical oxidation activity in summer. However, different from the modeling results, the NH 3 and NH + 4 observations over the NCP region indicated a lower NH 3 /NH x ratio, with higher ammonium concentrations in autumn and winter (Shen et al., 2011;Xu et al., 2016). Although observed NH + 4 was largest in summer at a rural site in Beijing, the observed NH 3 /NH x ratio was still highest in summer according to observations from Meng et al. (2011). These results indicate that there would be large uncertainties in the modeling of seasonal variations in the gas-aerosol partitioning of NH 3 over the NCP region. The formation of NH + 4 mainly depends on the acid gas concentrations, temperature, water availability (Khoder, 2002) and the flux rates of NH 3 (Nemitz et al., 2001). Compared with spring and Figure 5. Time series of the multi-model-simulated total ammonium (NH x =NH 3 + NH + 4 ) in the atmosphere, along with the ratio of gaseous NH 3 to total ammonium, over the NCP region from January to December in the year 2010. summer, the lower temperature and higher SO 2 and NO x emissions should favor the gas-to-particle phase conversion of NH 3 and lead to higher NH + 4 concentrations. This contrast indicates that some reaction pathways of acid production (H 2 SO 4 or HNO 3 ) may be missing in current models, such as aqueous-phase and heterogeneous chemistry (Cheng et al., 2016;Wang et al., 2016;Zheng et al., 2015). Such uncertainty may be another important factor contributing to the poor model performances for reproducing the monthly variations in NH 3 concentrations over the NCP region.

Quantifying the impacts of model uncertainty
In this section, we further investigate the discrepancies among the different models to quantify the impacts of model uncertainty on the simulations of these gases. As we mentioned in Sect. 2, most of these models employed common meteorology fields and emission inventories over China under the same modeling domain and horizontal resolutions, which comprised an appropriate set for investigating the model uncertainties.
Figures 6-8 present the simulated annual mean concentrations of NO 2 , CO and NH 3 from different models. The spatial distributions of the simulated NO 2 , CO and NH 3 concentrations from different models agreed well with each other, similar to the spatial distributions of their emissions (Fig. 1). High NO 2 concentrations were mainly located in northern and central eastern China, and several hot spots of NO 2 were also detected in northeastern China and the PRD region. M5, M8, M9 and M11 predicted higher NO 2 concentrations than other models, especially for M8, which also predicted very high NO 2 levels over southeastern China. Similar to NO 2 , high CO concentrations were generally located over northern and central eastern China, as well as east of the Sichuan basin. M8, M9 and M11 predicted higher CO concentrations than other models as well. In terms of NH 3 , although most models shared similar spatial patterns of NH 3 simulations, the simulated NH 3 concentrations varied largely from different models. High NH 3 concentrations were mainly located over northern China and the Indian subcontinent, which was in accordance with the distribution of agricultural activity intensity over East Asia. Among these models, M9 and M10 produced much higher NH 3 concentrations over East Asia, while M4, M5, M6, M13 and M14 produced much lower concentrations.
The impacts of model uncertainty on the simulations of NH 3 (Fig. 9a), CO (Fig. 9b) and NO 2 (Fig. 9c) were then quantified in Fig. 9, denoted by the spatial distributions of the standard deviation (ensemble spread) and the corresponding distributions of CV on the annual and seasonal basis. Note that M13 and M14 were excluded in the calculation of ensemble spread and CV to reduce the influences of the meteorological input data and horizontal resolutions. It seems that the impacts of model uncertainty increase with the reactivity of gases. NH 3 simulations were affected most by the model uncertainty, while CO suffered least from the uncertainty in models.
The ensemble spread of NH 3 simulations exhibited a strong spatial variability, with higher values mainly located in the NCP region. Standard deviation of the annual mean NH 3 concentrations can be over 20 ppbv in Henan province and 15 ppbv in the south of Hebei province, which is about 60 %-80 % and 40 %-60 % of the ensemble mean, respectively, according to the CV distribution. As we mentioned in  Sect. 3.1.3, these large modeling differences can be partly explained by the differences in the bidirectional exchange and gas-aerosol partitioning of NH 3 in different models. A strong seasonal pattern was also found in the differences of NH 3 simulations over the NCP region. The ensemble spread was smallest in spring and largest in autumn, up to 25 ppbv in most areas of the NCP region. However, in the relative sense, the modeling differences were larger in summer and winter and smaller in spring and autumn. Southeastern China shared a similar magnitude of the ensemble spread (2-5 ppbv) and showed weaker seasonal variability. However, the modeling differences in the relative sense were larger than that in the NCP region with CV over 1.0 in all seasons except in summer. This could be due to the simulated concentrations being more influenced by the model processes over the areas with low emissions and more constrained by the emissions over high emission rate areas.
CO was least affected by the model uncertainty among the three gases, which is consistent with its weaker chemical activity and longer lifetime in the atmosphere. The ensemble spread of annual mean CO concentration was about 0.05-0.2 ppmv in eastern China, only about 20 %-30 % of the ensemble mean. Meanwhile, CO modeling differences were more uniformly distributed in eastern China with CV less than 0.3 over most areas of eastern China. However, large modeling differences were visible over Myanmar during spring when there were high CO emissions from biomass burning. Model differences turned out to be larger during winter in the NCP region with ensemble spread and CV about 0.3-0.5 ppmv and 0.3-0.4, respectively.
NO 2 was moderately affected by the model uncertainty among the three gases. Ensemble spread of annual mean NO 2 concentration was 5-7.5 ppbv in the NCP region and 2.5-5 ppbv in southeastern China, which accounted for about 20 %-30 % of the ensemble mean in the former but more than 70 % in the latter. The ensemble spread was largest in winter, which was over 10 ppbv in the NCP region (30 %-40 %) and 5-7.5 ppbv in southeastern China (over 70 %). Similar to NH 3 , southeastern China exhibited more modeling differ-ences than the NCP region in a relative sense, with CV higher than 0.7 in most areas of southeastern China.

Summary
In this study, 13 modeling results of surface NO 2 , CO and NH 3 concentrations from MICS-Asia III were compared with each other and evaluated against the observations over the NCP and PRD regions. Three questions have been addressed, related to the performance of current CTMs in simulating the NO 2 , CO, and NH 3 concentrations over the highly industrialized regions of China; potential factors responsible for the model deviations from observations and differences among models; and the impacts of model uncertainty on the simulations of these gases.
Most models showed underestimations of NO 2 concentrations in the NCP and PRD regions, which could be an important potential factor contributing to the overpredicted O 3 concentrations in these regions. According to Xu et al. (2013), such underestimations would also be related to the positive biases in the NO 2 observations. The models showed better NO 2 model performance in the NCP region than in the PRD region, with smaller biases and RMSE. Most models reproduced the observed temporal and spatial patterns of NO 2 concentrations well in the NCP region, while relatively poor model performance was found in the PRD region in terms of the spatial variations in NO 2 concentrations. A sensitivity test with finer horizontal resolutions has been conducted to investigate the potential reasons for the poor model performance in the PRD region. The results show that increasing Figure 9. Spatial distribution of the standard deviation of (a) NH 3 , (b) CO and (c) NO 2 multi-model predictions from MICS-Asia III, as well as the corresponding distribution of CV on the annual and seasonal basis. the model resolution alone cannot improve the model performance in the PRD region, which suggests that the poor model performance in the PRD region would be related more to the coarse resolution and/or inappropriate spatial allocations of the emission inventories in the PRD regions. All models significantly underestimated the CO concentrations in the NCP and PRD regions throughout the year. Such large underestimations of all models are not likely to be fully explained by the model uncertainty, which suggests that CO emissions may be underestimated in current emission inventories. A more accurate estimate of CO emissions is thus needed for the year 2010. Underestimations of CO emissions may have been alleviated in recent years due to the decreasing trends in Chinese CO emissions in recent years Zhong et al., 2017;Sun et al., 2018;Muller et al., 2018;Zheng et al., 2018Zheng et al., , 2019. The inversion results of Zheng et al. (2018) also agree well with the MEIC inventory for CO emissions in China from 2013 to 2015. However, uncertainties still exist in the CO emissions for recent years, according to previous studies, the estimated CO emissions in China range from 134 to 202 Tg yr −1 in the year 2013 Zhong et al., 2017;Sun et al., 2018;Muller et al., 2018;Zheng et al., 2018Zheng et al., , 2019. Zhao et al. (2017) also suggested a −29 %-40 % uncertainty of CO emissions from the industrial sector in the year 2012. For NH 3 simulations, in contrast to the good skills in the monthly variations in NO 2 and CO concentrations, all models failed to reproduce the observed monthly variations in NH 3 concentrations in the NCP region, as shown by both the surface and satellite measurements. Most models mismatched the observed peak and showed negative correlation coefficient with observations, which may be closely related to the uncertainty in the monthly variations in NH 3 emissions and also the uncertainty in the gas-aerosol partitioning of NH 3 .
Several potential factors were found to be responsible for the model deviation and differences, including the emission inventories, chemistry-meteorology coupling effects, bidirectional exchange of NH 3 and the NH 3 gas-aerosol partitioning, which are all important aspects with respect to the model improvements in future. Previous studies also suggest that the nitrous acid (HONO) chemistry plays an important role in the atmospheric nitrogen chemistry, which influences the simulations of NO 2 and NH 3 Zhang et al., 2017Zhang et al., , 2016. Heterogeneous conversion from NO 2 to HONO (2NO 2(g) + H 2 O (l) → HONO (l) + HNO 3(l) ) is one of the dominant sources of HONO in the atmosphere, which has been considered in most models of MICS-Asia III, including CMAQ since version 4.7, NAQPMS, NHM-Chem and GEOS-Chem. However, some other important sources of HONO may still be underestimated by models in MICS-Asia III. For example, Fu et al. (2019) suggested that the high relative humidity and strong light could enhance the heterogeneous reaction of NO 2 and that the photolysis of total nitrate was also an important source of HONO. These sources have not been included in the models of MICS-Asia III, which would lead to the deviations from observations. The intercomparisons of the ensemble models quantified the impacts of model uncertainty on the simulations of these gases, which shows that the impacts of model uncertainty increase with the reactivity of these gases. Models contained more uncertainties in the prediction of NH 3 than the other two gases. Based on these findings, we make the following recommendations for future studies.
1. More accurate estimation of CO and NH 3 emissions are needed in future studies. Both bottom-up and topdown methods (inversion technique) can help address this problem. The inversion of NH 3 emissions would be more complicated than the inversion of CO emissions due to the larger uncertainties in modeling the atmospheric processes of NH 3 . Nevertheless, it could still provide valuable clues for verifying the bottom-up emission inventories  if the models are well validated. In addition, by using ground or satellite measurements, top-down methods could also give valuable information about the spatial and temporal patterns of NH 3 emissions, such as the inversion studies by Paulot et al. (2014) and Zhang et al. (2018). However, more attention should be paid to the validations of the model before the inversion estimation of NH 3 emissions. How to represent the model uncertainties in the current framework of emission inversion is also an important aspect in future studies. Things could be better for CO, considering its small and weakly spatially dependent model uncertainties.
2. For some highly active and/or short-lived primary pollutants, like NH 3 , model uncertainty can also make up a large part in the forecast uncertainty. Emission uncertainty alone may not be sufficient to explain the forecast uncertainty and may cause under-dispersive and overconfident forecasts. Future studies are needed of how to better represent the model uncertainties in the model predictions to obtain a better forecast skill. Such model uncertainties also emphasize the need to validate the individual model before using its results to make important policy recommendations.
3. Gas-aerosol partitioning of NH 3 is shown to be an important source of uncertainties in NH 3 simulation. The formation of NH + 4 particles is mainly limited by the availability of H 2 SO 4 and HNO 3 under ammonia-rich conditions, which involves complex chemical reactions, including gas-phase, aqueous-phase and heterogeneous chemistry (Cheng et al., 2016;Wang et al., 2016;Zheng et al., 2015). These processes are needed to be verified and incorporated into models to better represent the chemistry in the atmosphere. Author contributions. XT, JZ, ZiW and GRC conducted the design of this study. JSF, XW, SI, KY, TN, HJL, CHK, CYL, LC, MZ, ZT, JL, MK, HL and BG contributed to the modeling data. ZhW performed the simulations of the standard meteorological field. ML and QW provided the emission data. KS provided the CHASER output for boundary conditions. YW, YP and GT provided the observation data. LK and XT performed the analysis and prepared the manuscript with contributions from all authors.
Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "Regional assessment of air pollution and climate change over East and Southeast Asia: results from MICS-Asia Phase III". It is not associated with a conference.