MICS-Asia III: multi-model comparison and evaluation of aerosol over East Asia

Abstract. A total of 14 chemical transport models (CTMs) participated in
the first topic of the Model Inter-Comparison Study for Asia (MICS-Asia)
phase III. These model results are compared with each other and an extensive
set of measurements, aiming to evaluate the current CTMs' ability in
simulating aerosol concentrations, to document the similarities and
differences among model performance, and to reveal the characteristics of
aerosol components in large cities over East Asia. In general, these CTMs
can well reproduce the spatial–temporal distributions of aerosols in East
Asia during the year 2010. The multi-model ensemble mean (MMEM) shows
better performance than most single-model predictions, with correlation
coefficients (between MMEM and measurements) ranging from 0.65 (nitrate,
NO3-) to 0.83 (PM2.5). The
concentrations of black carbon (BC), sulfate
(SO42-), and PM10 are
underestimated by MMEM, with normalized mean biases (NMBs) of −17.0 %,
−19.1 %, and −32.6 %, respectively. Positive biases are simulated
for NO3- (NMB = 4.9 %), ammonium
(NH4+) (NMB = 14.0 %), and PM2.5
(NMB = 4.4 %). In comparison with the statistics calculated from
MICS-Asia phase II, frequent updates of chemical mechanisms in CTMs during
recent years make the intermodel variability of simulated aerosol
concentrations smaller, and better performance can be found in reproducing
the temporal variations of observations. However, a large variation (about a
factor of 2) in the ratios of SNA (sulfate, nitrate, and ammonium) to
PM2.5 is calculated among participant models. A more intense secondary
formation of SO42- is simulated by Community Multi-scale Air Quality (CMAQ)
models, because of the higher SOR (sulfur oxidation ratio) than other
models (0.51 versus 0.39). The NOR (nitric oxidation ratio) calculated by all
CTMs has larger values (∼0.20) than the observations,
indicating that overmuch NO3- is
simulated by current models. NH3-limited condition (the mole ratio of
ammonium to sulfate and nitrate is smaller than 1) can be successfully
reproduced by all participant models, which indicates that a small reduction
in ammonia may improve the air quality. A large coefficient of variation
(CV > 1.0) is calculated for simulated coarse particles,
especially over arid and semi-arid regions, which means that current CTMs
have difficulty producing similar dust emissions by using different dust
schemes. According to the simulation results of MMEM in six large Asian
cities, different air-pollution control plans should be taken due to
their different major air pollutants in different seasons. The MICS-Asia
project gives an opportunity to discuss the similarities and differences of
simulation results among CTMs in East Asian applications. In order to acquire
a better understanding of aerosol properties and their impacts, more
experiments should be designed to reduce the diversities among air quality
models.



Introduction
Urbanization and industrialization have stimulated economic growth and population expansion during the last several decades in East Asia (Spence et al., 2008;Yan et al., 2016; but also brought about noticeable degradation of ecological environment at the same time (Hall, 2002;Han et al., 2014;Yue et al., 2017). Significant increase in atmospheric aerosol loading, especially from anthropogenic emissions, can exert adverse effects on weather (Cowan et al., 2013), climate (H. , air quality (Y. Gao et al., 2016), and human health (Carmichael et al., 2009). For example, aerosols can modify the thermodynamic structure of the atmospheric boundary layer by absorbing and scattering solar radiation Petaja et al., 2016), alter cloud properties and precipitation, by acting as cloud condensation nuclei and ice nuclei (Lohmann and Diehl, 2006;Wang, 2013), deteriorate visibility, and cause haze events (Singh and Dey, 2012;Li et al., 2014). In addition, fine particulate matter with aerodynamic diameters smaller than 2.5 µm (PM 2.5 ) may enter into the alveoli and cause severe cardiovascular diseases, respiratory diseases, and even lung cancer (Pope and Dockery, 2006;. The impacts have attracted considerable attention from the public and policy makers in East Asia, and therefore the research on aerosol has become a hot topic during recent years. In order to better understand the properties of atmospheric aerosols and their impacts, chemical transport models (CTMs) can be a critical tool, and they have been applied to study various air-pollution issues all over the world. For example, a fully coupled online Weather Research and Forecasting model with chemistry (WRF-Chem) was developed by Grell et al. (2005), and it has been widely used to study the aerosol-radiation-cloud feedbacks on meteorology and air quality Qiu et al., 2017); a Community Multi-scale Air Quality (CMAQ) modeling system was designed by the US Environmental Protection Agency (Byun and Ching, 1999), and it has been applied to address acid deposition, visibility and haze pollution issues (Zhang et al., 2006;Han et al., 2014;; a nested air quality prediction model system (NAQPMS) was developed by the Institute of Atmospheric Physics, Chinese Academy of Science (IAP/CAS) (Wang et al., 2001) to reproduce the mechanism of transport and evolution of atmospheric pollutants in Asia ; a global three-dimensional chemical transport model (GEOS-Chem) was first presented by Bey et al. (2001), and researchers use the GEOS-Chem model to study the source sector contribution, long-range transport, and the prediction of future change in ozone and aerosol concentrations (Liao et al., 2006;K. Li et al., 2016b;Zhu et al., 2017).
Although significant advantages can be found in CTMs, how to accurately reproduce or predict the concentrations and the distributions of atmospheric pollutants is still a challenge, with the problems of inaccurate emission inventories, poorly represented initial and boundary conditions, and imperfect physical, dynamical, and chemical parameterizations . Meanwhile, most CTMs are designed to focus on the air quality over developed countries, such as Europe and America, rather than Asia. The assumptions or look-up tables used in CTMs may not be suitable for the simulations of the East Asian environment (Gao et al., 2018). Therefore, before providing meaningful results and answering "what if" questions for policy makers, model performance must be carefully evaluated. Hayami et al. (2008) and Mann et al. (2014) pointed out that different parameterizations used in CTMs can cause large variations in simulation results, and the multi-model ensemble mean (MMEM) tends to show better performance than most single-model predictions when compared with observations (Carmichael et al., 2002;Hayami et al., 2008;Wang et al., 2008;. In order to develop a better common understanding of the performance and uncertainties of CTMs in East Asian applications, and to acquire a more mature comprehension of the properties of atmospheric aerosols and their impacts, a model intercomparison study should be initiated, and Model Inter-Comparison Study for Asia (MICS-Asia) gives an opportunity to investigate these questions. Meanwhile, model intercomparison study in East Asia is very limited (Phadnis et al., 1998;Kiley et al., 2003;Han et al., 2008), and far more efforts are needed in the future.
The MICS-Asia project was initiated in 1998. In the first phase of MICS-Asia (MICS-Asia phase I), the primary target was to study the long-range transport and deposition of SO 2− 4 in East Asia by analyzing the submitted simulation results from eight CTMs. Source-receptor relationships, contributions from removal processes, and the influences of model structures and parameterizations on simulation results were also estimated. More details can be found in Carmichael et al. (2002). As an extension of phase I, MICS-Asia phase II included more chemical species of concern, such as sulfur, nitrogen, and ozone. This broader collaborative study examined four different periods, encompassing two different years and three different seasons (March, July, andDecember in 2001, andMarch in 2002). Simulation results from nine different regional modeling groups were analyzed. Detailed information about this project can be found in the overview paper of Carmichael et al. (2008). In 2010, the MICS-Asia III project was launched. As a part of the Acid Deposition Monitoring Network in East Asia (EANET), additional research activity, and a continuing research of MICS-Asia series, three topics were discussed, including comparison and evaluation of current multi-scale air quality models (topic 1), development of reliable emission inventories for CTMs in Asia (topic 2), and interactions between air quality and climate changes (topic 3). This paper focuses on the first topic of the MICS-Asia phase III and intends to present and summarize the following three objectives, specializing in the topic of aerosols. Firstly, comprehensive evaluations of the strengths and weaknesses of current CTMs for simulating particulate matter (PM) are provided against extensive in situ and satellite measurements, aiming to show the capability of participant models. Secondly, diversities of simulated aerosol concentrations among participant models are analyzed, including possible reasons for the inconsistency. Thirdly, characteristics of aerosol compositions in six metropolitan cities in East Asia are analyzed, which may be helpful to take measures to prevent and control air pollution in the future. The description of model configurations, model inputs, and observations are presented in Sect. 2. The evaluation of model performance and the intercomparison between participant models are shown in Sect. 3. The conclusions and discussions are presented in Sect. 4.

Intercomparison framework
A total of 14 regional models (M1-M14) participated in MICS-Asia phase III topic 1. All models were required to run for all of the year 2010, and provide gridded monthly simulation results of aerosols in the first model layer. These CTMs include the Weather Research and Forecasting model coupled with Community Multi-scale Air Quality (WRF-CMAQ), WRF-Chem, the nested air quality prediction model system (NAQPMS), the non-hydrostatic mesoscale model coupled with chemistry transport model (NHM-Chem), the Goddard Earth Observing System with chemistry (GEOS-Chem), and the Regional Atmospheric Modeling System coupled with CMAQ (RAMS-CMAQ). Among these models, there are three different versions of WRF-CMAQ (v5.0.2 is used by M1 and M2, v5.0.1 is used by M3, and v4.7.1 is used by M4, M5, and M6), four different versions of WRF-Chem (v3.7.1 is used by M7, v3.6.1 is used by M8, v3.6 is used by M9, and v3.5.1 is used by M10), one version of NAQPMS (M11), NHM-Chem (M12), GEOS-Chem (v9.1.3 is used by M13), and RAMS-CMAQ (v4.6 is used by M14). Basic information about the configurations of each model is summarized in Table 1.

Simulation domain
A unified simulation domain was designed by MICS-Asia organizers, which covers the region of (15.4 • S-58.3 • N, 48.5-160.2 • E) with 180 × 170 grid points at 45 km horizontal resolution, but participant models employed different modeling domains ( Fig. 1) with different grid resolutions (e.g., 0.5 • latitude × 0.667 • longitude in M13, 64 km × 64 km in M14, and 45 km × 45 km in the others). In order to minimize the influence from lateral boundary conditions and to cover most areas of interest in East Asia, an analyzed region was chosen in this paper (Fig. 1). For M13 and M14, missing values were used to fill the grids outside their simulation domains. Meanwhile, the analyzed region was divided into five different areas (Region_1 to Region_5). Region_1 contains the Korean Peninsula and Japan. Region_2 only contains China. Region_3 contains Mongolia and parts of Russia. Region_4 covers most countries in southeast Asia. Region_5 contains most countries in South Asia. Therefore, simulation results in each subregion can be analyzed and compared to show the performance of current CTMs.

Gas and aerosol modules
Gas-phase chemistry and aerosol chemistry are important parameterizations in CTMs. Luecken et al. (2008) and Balzarini et al. (2014) pointed out that different settings of chemical mechanisms could influence the simulation results significantly.   Gas-phase chemistry 1. The gas chemistry of SAPRC99 (Statewide Air Pollution Research Center 99) was used in M1, M2, M4, M5, M6, M12, and M14. It is a detailed mechanism for the gas-phase atmospheric reactions of VOCs and NO x in urban and regional atmosphere (Carter, 2000). The SAPRC99 mechanism has already been incorporated into CMAQ v4.6 with about 72 species and 214 reactions. Meanwhile, another three heterogeneous chemistry reactions of N 2 O 5 , HO 2 , and NO 2 are also considered in the SAPRC99 gas-phase chemistry in M12 (Kajino et al., 2018).
2. The Carbon Bond mechanism (CB05) was used in M3. It describes tropospheric oxidant chemistry and provides a basis for computer modeling studies of ozone, particulate matter, visibility, acid deposition, and air toxicity issues, with 51 species and 156 reactions (Yarwood et al., 2005).
3. The second-generation Regional Acid Deposition Model (RADM2) gas-phase chemical mechanism was used in M9 and M10. The inorganic species considered in RADM2 include 14 stable species, four reactive intermediates, and three abundant stable species. The organic chemistry is represented by 26 stable species and 16 peroxy radicals (Stockwell et al., 1990). This module can simulate the concentrations of PAN, HNO 3 , and H 2 O 2 under different environmental conditions (Stockwell et al., 1990).
4. Based on RADM2, the Regional Atmospheric Chemistry Mechanism (RACM) was developed with updated reaction rate constants and product yields according to more recent laboratory measurements. It is capable of simulating the troposphere from the Earth's surface through the upper troposphere and is valid for simulating remote to polluted urban conditions (Stockwell et al., 1997). M7 and M8 selected the RACM module. The rate coefficients were further updated in M7 (Kim et al., 2009). However, heterogeneous hydrolysis of N 2 O 5 is not considered in M7 and M8.
5. The gas chemistry of Carbon Bond mechanism version Z (CBMZ) was used in M11. This lumped-structure mechanism extends the original framework of CBM-IV to function properly at larger spatial and longer timescales, with revised inorganic chemistry, isoprene chemistry, and many other related parameterizations (Zaveri and Peters, 1999).
6. In M13, the NO x -O x -HC-Br tropospheric gas chemistry mechanism was used. It includes about 80 species and 300 chemical reactions (Bey et al., 2001;Zhu et al., 2017). Jimenez et al. (2003), Luecken et al. (2008),  summarized that different gas-phase chemistry mechanisms could predict large variations in reactive species, such as HO 2 and NO 3 , making the production of OH and H 2 O 2 different. In addition to the different number of species and reactions considered in each gas module, the reaction rates of the oxidation of SO 2 , NO x , and some VOCs to condensable SO 2− 4 , NO − 3 , and organic species are also largely different (Pan and . All these would affect the simulated aerosol concentrations, especially under the urban condition.

Aerosol chemistry
1. AERO with ISORROPIA: aerosol modules (AERO5 and AERO6) with thermodynamic equilibrium models (ISORROPIA v1.7 and v2) were used in M1, M2, M3, M4, M5, M6, M11, M12, and M14. Aerosols in AERO were divided into three modes: Aitken, accumulation, and coarse modes. Gas-liquid-solid equilibrium in inorganic aerosol was predicted by the ISOR-ROPIA model. The AERO5 ISORROPIA (v1.7) was mainly used in CMAQ v4, and the updated AERO6 ISORROPIA (v2) has been implemented since CMAQ v5. Overall, nine new PM species (e.g., Ca 2+ , K + , and Mg 2+ ) were added in the new aerosol module of AERO6. In order to support the additional crustal ion emissions introduced in AERO6, ISORROPIA (v1.7) was replaced by ISORROPIA (v2) (Nenes et al, 1998;Fountoukis and Nenes, 2007), and the corresponding modifications could affect the gas-particle partitioning of NO − 3 and NH + 4 . The rate constants for the S (IV) to S (VI) conversion through in-cloud oxidation pathways were also modified, including the catalysis effects through aqueous chemistry from Fe and Mn (Appel et al., 2013). In order to solve the overpredictions of the unspeciated PM 2.5 (also called PM other ) in CMAQ v4, detailed speciation profiles derived from Reff et al. (2009) were adopted in CMAQ v5 to subdivide the emissions of PM other into primary NO + 4 , Na + , Cl − , and other selected trace elements. Comparing with CMAQ v4.6, a new parameterization of heterogeneous N 2 O 5 hydrolysis was included in CMAQ v4.7 to improve the simulation results of NO − 3 . Comparing with CMAQ v5.0.1, a mass balance correction of NO − 3 aerosol under cold conditions was adopted in CMAQ v5.0.2. This adjustment would reduce the concentration of NO 3 and HNO 3 at the surface level.
2. MADE/SORGAM and MADE/VBS: detailed treatments of inorganic aerosol effects in M7, M8, and M9 were simulated by Modal Aerosol Dynamics Model for Europe (MADE). Three log-normal modes (Aitken, accumulation, and coarse modes) were used in this module to present the particle size distribution of submicrometer aerosol, such as SO 2− 4 , NO − 3 , NH + 4 , black carbon (BC), OC, and aerosol water (Ackermann et al., 1998). Aerosols were assumed to be internally mixed in the same mode but externally mixed among different modes (Zhao et al., 2010). The organic chemistry used in M7 and M9 was based on SORGAM (Secondary Organic Aerosol Model). This model was capable of simulating SOA formation including the production of low-volatility products and their subsequent gasparticle partitioning (Schell et al., 2001), but all activity coefficients were assumed to be 1 due to insufficient information. However, when it was coupled with MADE, the biogenic precursors and their resulting particle concentrations were set to be zero. The organic chemistry used in M8 was based on the volatility basis set (VBS) approach (Ahmadov et al., 2012). This module used the volatility basis set framework to simulate primary organic aerosol partitioning between the gas and particulate phases and the gas-phase oxidation of the corresponding vapors (Murphy and Pandis, 2009).

GOCART: the Goddard Chemistry Aerosol Radiation
and Transport (GOCART) model was used in M10 to simulate tropospheric aerosol components, such as SO 2− 4 , dust, BC, OC, and sea-salt aerosols (NO − 3 and NH + 4 are not considered), and all these aerosol species were assumed to be log-normal size distributions (Chin et al., 2000). SO 2− 4 was formed by the oxidation of SO 2 in the atmosphere, but the impacts from in-cloud oxidation pathways were not included (Chin et al., 2002). The source emission of BC and OC was mainly from biomass burning. Dust emission was following Ginoux et al. (2001). Sea-salt emission was highly dependent on wind speed. More details about the simulations of dust and sea-salt aerosols in GOCART will be described in Sect. 2.1.3 and 2.1.4. Different chemical species are considered in numerous aerosol equilibrium models, resulting in different equilibrium partitioning and water uptake during the simulation processes, which can affect the predicted aerosol concentrations (Fountoukis and Nenes, 2007). As Moya et al. (2002) and Wang et al. (2012) classified that the treatment of crustal material in aerosol chemistry could considerably improve model results in predicting the partitioning of NO − 3 and NH + 4 . Different heterogeneous reactions and their activity coefficients used in the thermodynamic equilibrium would also be a major source of uncertainty in simulated aerosol concentrations Kim et al., 2011;D. Chen et al., 2016).

Dust scheme
Natural emissions of windblown dust have been explicitly parameterized since CMAQ v5 (Foroutan et al., 2017), but all the participating WRF-CMAQ models did not turn this option on, which means dust aerosols were not considered in M1-M6. Meanwhile, the dust scheme in M7 and M8 was also turned off.
Dust particles in M10 and M13 were simulated by the GOCART model (Ginoux et al., 2001). This model includes eight size groups of mineral dust ranging from 0.1 to 10 µm. The emission flux for a size group can be expressed as follows: where C is a constant with the value of 1 µ g s 2 m −5 . S means the probability source function, representing the fraction of alluvium available for wind erosion. s p is the fraction of each size group within the soil. u 10 and u t are the wind speed at 10 m and threshold velocity of wind erosion, respectively.
A simplified dust emission parameterization proposed by Shao (2001) was used in M9 (Shao, 2004). Dust emission in Shao (2004) is proportional to streamwise saltation flux, and the proportionality depends on soil texture and soil plastic pressure. The size-resolved dust flux goes into four size bins, with diameters ranging from 1.95 to 20 µm . More details about the dust emission rate and the total dust flux can be found in Shao (2004).
A size-segregated dust deflation module proposed by Wang et al. (2000) was used in M11. It was developed based on three major predictors (friction velocity, surface humidity, and dominant weather system) and has been successfully applied in many dust-related simulations Yue et al., 2010). The dust flux F is calculated as follows: where C is equal to 10 −5 , ρ a indicates air density, and g is gravitational acceleration. E is the weighting factor, representing the uplifting capability of land surface. u * 0 and u * are the fraction and threshold friction velocities, respectively. RH and RH 0 are relative humidity and threshold relative humidity, respectively. According to soil categories and vegetation coverage, the dust emission intensity was further modified by Luo and Wang (2006). Four size bins of dust particles ranging from 0.43 to 10 µm were considered in this emission module. Meanwhile, several heterogeneous reactions on dust particles were also considered (J. . An empirical dust emission mechanism based on the approach of Gillette and Passi (1988) was used in M12 and M14 (Han et al., 2004). Dust flux can be calculated through the following formula: where u and u * are the friction and the threshold friction velocities, respectively. C is the correction coefficient (1.4×10 −15 ). f and R represent the fractional coverage of vegetation and the reduction factor in a model grid. Dust particles with diameters ranging from 0.43 to 42 µm were grouped into 11 bins, with the first eight bins below 11 µm for aerosol sampler, and the additional three bins above 11 µm for larger particles (Han et al., 2004).
Different dust schemes will produce different dust emission fluxes over arid and semi-arid regions (Zhao et al., 2010;Su and Fung, 2015). Several factors, such as potential source regions, threshold friction velocity, size distribution, and other surface and soil-related parameters used in equations, can be the primary causes for the inconsistency, and the differences in simulated dust emissions will affect the characteristics of spatial-temporal variations of atmospheric aerosol particles.

Sea-salt scheme
As one of the major components of primary aerosols, sea-salt aerosols contributes to 20 %-40 % of secondary inorganic aerosols (SIAs) over coastal regions Yang et al., 2016). These particles can provide surface areas for condensation and reaction of nitrogen and sulfur, making the simulated concentrations of SIAs more accurate (Kelly et al., 2010;Im, 2013).
In M12, the method of Clarke et al. (2006) was used to simulate the sea-salt emissions as follows: S 100 = C s ×k×V wind ×h A avg ×L+0.5×w 0 . The sea-salt source function (S 100 ) is defined as the number of sea-salt aerosols generated per unit area of ocean surface completely covered by bubbles (100 % coverage) per unit time. C s is the differences of condensation nuclei concentrations collected at 5 m (impacted by breaking waves) and 20 m (background values). k is the multiplier for tower C s compared to the mean profile. V wind indicates surf zone wind speed. h is the height of plume layer for beach profile. A avg represents mean bubble fractional coverage area between waves. L is the distance wave travels to shore, and w 0 is the initial width of breaking-wave bubble front.
In other participating models (sea-salt emission is not considered in M7 and M8), sea-salt emissions were simulated online by using the algorithm proposed by Gong et al. (2003). The density function dF dr (m −2 s −2 µm −1 ) is calculated as follows: dF dr = 1.373 × u 3.41 10 m × r −A × 1 + 0.057 × r 3.45 × 10 1.607e −B 2 , where u 10 m is the 10 m wind speed, and r is the particle radius at RH of 80 %. A represents an adjustment parameter, which control the shape of submicron size distribution. B = 0.433 − log 10 (r) /0.433, meaning a parameter related to particle radius. In CMAQ model, the sea-salt scheme was updated by Kelly et al. (2010) to enhance the emission of sea salt from the coastal surf zone and to allow dynamic transfer of HNO 3 , H 2 SO 4 , HCl, and NH 3 between coarse particles and gas phase. In GEOS-Chem, it was updated by Jaegle et al. (2011) to improve the simulation of sea salt with dry radii smaller than 0.1 µm.

Model inputs
Based on the experience concluded from phase I and phase II, all 14 models in phase III topic 1, in principle, were required to use the "standard" meteorological fields, emission inventories, and boundary conditions in order to reduce the potential diversities caused by model inputs. But different data were selected by participant models. In this section, some basic information about the model inputs are described.

Meteorological fields
The "standard" hourly meteorological fields were simulated by WRF v3.4.1 with the initial and lateral boundary conditions taken from the National Centers for Environmental Prediction (NCEP) Final Analysis (FNL) data. Fourdimensional data assimilation nudging toward the NCEP FNL data was also adopted to increase the accuracy of simulated meteorological variables. The reference meteorological fields were only used in M1-M6 and M11. For M7, M8, and M9, the standard meteorological simulation was run by the same model (WRF), but feedbacks between meteorological variables and pollutants were also considered in these WRF-Chem models. For M10, the Modern-Era Retrospective analysis for Research and Applications (MERRA) reanalysis was used to drive the WRF (v3.5.1) model. The outputs from the Japan Meteorological Agency (JMA) NHM were used to initialize M12 (Kajino et al., 2012). M13 was driven by assimilated meteorological data from GEOS of NASA's Global Modeling and Assimilation Office (Chen et al., 2009;K. Li et al., 2016b). Although the meteorological initial and lateral boundary conditions were taken from the same NCEP FNL data, three-dimensional meteorological fields used in M14 were simulated by Regional Atmospheric Modeling System (RAMS) (Zhang et al., , 2007Han et al., 2009Han et al., , 2013. Consequently, different meteorological fields used in the 14 participant models will cause different atmospheric circulation characteristics, which can further influence the spatialtemporal variation of air pollutants (Gao et al., 2018).

Emission inventories
All participant models utilized the "standard" emission inventory, including anthropogenic, biogenic, biomass burning, air and ship, and volcano emissions, which was prepared by the emission group in MICS-Asia phase III. The L. Chen et al.: MICS-Asia III: multi-model comparison and evaluation of aerosol over East Asia anthropogenic emission dataset over Asia, named MIX, was developed by harmonizing five regional and national emission inventories with a mosaic approach. These five inventories are REAS2 (REAS inventory version 2.1 for all of Asia; Kurokawa et al., 2013), MEIC (the Multi-resolution Emission Inventory for China developed by Tsinghua University), PKU-NH 3 (a high-resolution NH 3 emission inventory by Peking University; Huang et al., 2012), ANL-India (an Indian emission inventory developed by Argonne National Laboratory; Lu et al., 2011), and CAPSS (the official Korean emission inventory form the Clean Air Policy Support System; Lee et al., 2011). The MIX inventory includes 10 species (SO 2 , NO x , CO, CO 2 , NMVOCs (nonmethane volatile organic compounds), NH 3 (ammonia), BC (black carbon), OC (organic carbon), PM 2.5 , and PM 10 ) in each sector (power, industry, residential, transportation, and agriculture) and is developed for the year 2010 with monthly temporal resolution and 0.25 • spatial resolution. More details can be found in M. . Weekly and diurnal profiles of the anthropogenic emissions provided by the emission group were used in model simulations, including the emission factors for the first seven vertical levels ( Fig. S1 in the Supplement). Biogenic emissions were calculated by the Model of Emissions of Gases and Aerosols from Nature (MEGAN) version 2.04 (Guenther et al., 2006). In MEGAN v2.04, meteorological variables (e.g., solar radiation, air temperature, soil moisture) and land cover information (e.g., leaf area index and plant functional types) were necessary inputs, and these data were obtained from the WRF v3.4.1 simulation results and MODIS (Moderate Resolution Imaging Spectroradiometer) products, respectively. Biomass burning emissions were processed by regridding Global Fire Emissions Database (GFED) version 3 (van der Werf et al., 2010), and the diurnal profile was also provided. The aircraft and shipping emissions were based on the 2010 HTAPv2 (Hemispheric Transport of Air Pollution) emission inventory (0.1 • by 0.1 • ) (Janssens-Maenhout et al., 2015). Daily volcanic SO 2 emissions were collected from the AEROCOM program (https://aerocom.met.no/ DATA/download/emissions/AEROCOM_HC/volc, last access: 11 September 2019, Diehl et al., 2012;Stuefer et al., 2013). The spatial distributions of the merged emissions of SO 2 , NO x , NH 3 , and PM 2.5 from anthropogenic, biogenic, biomass burning, air and ship, and volcano emissions are shown in Fig. S2. Similar spatial patterns can be found among the four species, with high values in eastern China and northern India.

Boundary conditions
Two sets of the chemical initial and boundary conditions (CHASER and GEOS-Chem) were provided by MICS-Asia phase III. The 3-hourly global CTM outputs of CHASER (prepared by Nagoya University; Sudo et al., 2002a, b) were run with 2.8 • × 2.8 • horizontal resolution and 32 vertical lay-ers. The hourly outputs from GEOS-Chem (prepared by University of Tennessee; http://acmg.seas.harvard.edu/geos/, last access: 11 September 2019) was run with 2.5 • × 2 • horizontal resolution and 47 vertical layers. All participant models, except M2, M7, and M10, chose between them. For M2 and M7, the default chemical boundary conditions provided by CMAQ and WRF-Chem were used, respectively. For M10, the global GOCART simulations were used for atmospheric aerosols.

Coupled meteorology and chemistry modeling methods
As is known to all that meteorological fields have significant influences on air quality. Meanwhile, atmospheric compositions can also affect weather and climate. As Gao et al. (2018) pointed out, different coupling methods between aerosols and meteorological variables can cause different simulation results.
In order to simulate the concentrations of air pollutants, meteorological models and chemistry transport models should be implemented either offline or online . Offline modeling implies that the CTM is run after the meteorological simulation is completed, which means the chemical impacts on meteorology are not considered. Online modeling allows coupling and integration of some of the physical and chemical components (Baklanov et al., 2014). According to the extent of online coupling, there are two ways of coupling: (1) online integrated coupling (meteorology and chemistry are simulated simultaneously in the same grid) and (2) online access coupling (meteorology and chemistry are independent, but information can be exchanged between meteorology and chemistry) (Baklanov et al., 2014). Among these participating models, M4, M5, M6, M12, M13, and M14 are offline models. M1, M2, M3, and M11 are online access models. M7, M8, M9, and M10 are online integrated models.
More details about the model configurations can be found in Table 1 and the other MICS-Asia phase III companion papers Li et al., 2019).

Observation data
Monthly observations of SO 2− 4 , NO − 3 , NH + 4 , PM 2.5 , and PM 10 collected from 39 stations of EANET were used to evaluate the simulations. Common quality-assurance and quality-control standards promoted by the ADORC (Acid Deposition and Oxidant Research Center) were adopted among these EANET stations to guarantee a high-quality dataset. More information about the EANET dataset can be found at http://www.eanet.asia/index.html (last access: 11 September 2019). In addition to the EANET data, monthly mean concentrations of air pollutants (e.g., SO 2 , NO 2 , PM 2.5 , and PM 10 ) over the Beijing-Tianjin-Hebei (BTH) region (19 sites) and the Pearl River Delta (PRD) re- gion (13 sites) provided by the China National Environmental Monitoring Center (CNEMC) were also used to compare with the simulation results from participating models.
As is known to all, China has been experiencing heavy air pollution with high concentrations of fine particles. Recent studies highlighted the importance of secondary aerosols in the formation of haze episodes Sun et al., 2016a;Chen et al., 2018). However, observations (e.g., SO 2− 4 , NO − 3 and NH + 4 ) in China were only available at one EANET site (the Hongwen site). In order to make the model evaluation more credible, observed monthly/seasonal/yearly concentrations of BC, SO 2− 4 , NO − 3 , NH + 4 , and PM 2.5 in China were also collected from published literature.
The Aerosol Robotic Network (AERONET), a groundbased remote-sensing aerosol network consisting of worldwide automatic Sun-and sky-scanning spectral radiometers (Holben et al., 1998), provides the aerosol optical depth (AOD) products at 440 and 675 nm, which can be used to calculate the AOD at 550 nm according to the Ångström exponent. The AERONET level 2.0 monthly AOD (cloudscreened and quality-assured) data at 33 sites were utilized in this study. Meanwhile, satellite-retrieved 550 nm AOD products from the Moderate Resolution Imaging Spectroradiometer (MODIS) were also used to compare with simulations. Figures 2 and S3 show the geographical locations of all the observation sites. Most SO 2− 4 , NO − 3 , and NH + 4 monitoring sites are located in China, Japan, and southeast Asia. Three PM 10 sites are located in southeast Asia, while others are in China and Japan. Detailed information about these stations is listed in Tables S1 and S2.
In general, the wide variety of in situ and satellite measurements used in this paper can allow for a rigorous and comprehensive evaluation of model performance.

Model evaluation
According to the objective of MICS-Asia phase III topic 1, comparisons of aerosol concentrations between observations and simulations are presented to evaluate the performance of current multi-scale air quality models in East Asia, including analyzing the similarities and differences between participant models. Simulation results of BC, OC, SO 2− 4 , NO − 3 , NH + 4 , PM 2.5 , PM 10 , and AOD are requested to submit for the project, but no data can be acquired from M10, and extremely large values are predicted by M3. Therefore, only 12 models are actually considered in this paper. Among the 12 models, AOD is missing in M5, M6, and M8, PM 10 is missing in M13, OC is missing in M7, and BC and OC are missing in M9 (Table S3). Figure 3 illustrates the observed and simulated groundlevel annual mean concentrations of BC, SO 2− 4 , NO − 3 , NH + 4 , PM 2.5 , and PM 10 . Multi-model ensemble mean (MMEM), defined as the average of all available participating models (except M3 and M10), is presented to exhibit a composite of model performance. Normalized mean biases (NMBs) between observations and MMEM in each defined subregion (Region_1 to Region_5) and the whole analyzed region (Re-gion_All) are also calculated.

Evaluation for aerosol compositions
Analyzing Fig. 3a, we can find that most models show good skills in simulating the BC concentrations and their spatial distribution characteristics, with relative high values over large emission areas (e.g., north China) (K. Li et al., 2016a). But the NMB for MMEM is −15.8 %. This underestimation may be attributed to the large negative bias at the Gucheng site (site 24) (NMB for MMEM is −38.3 %). This station is located in the industrial province of Hebei, where air pollution is serious and BC emission is large (P. . Due to the low reactivity of BC in the atmosphere, the high uncertainty of BC in current emission inputs  may cause this underestimation. For SO 2− 4 , observations are relative low in Region_1 (mean value is 3.8 µg m −3 ), Region_3 (mean value is 2.5 µg m −3 ), and Region_4 (mean value is 3.5 µg m −3 ), and most models (except M7, M9, and M14) perform well over these areas (NMBs range from −26.3 % to 30.0 %). In Re-gion_2, all the observed concentrations of SO 2− 4 are larger than 10 µg m −3 (mean value is 16.9 µg m −3 ), but models fail to reproduce the high magnitude. As Zheng et al. (2015) and Shao et al. (2019) pointed out, missing sulfate formation mechanisms (e.g., heterogeneous sulfate chemistry) on aerosol in current air quality models may result in this underestimation, especially in China where significant increase of secondary aerosols (such as sulfate) can be observed during polluted periods . A large  Table S1 in the Supplement). Normalized mean biases (NMBs) between observations and MMEM in each defined subregion (shown in black) and the entire analyzed region (shown in red) are also shown. In this figure, the annual mean observations are taken from EANET, CNEMC, and published literature.
For NO − 3 , low concentrations are observed in Re-gion_1 (1.5 µg m −3 ), Region_3 (0.6 µg m −3 ), and Region_4 (1.8 µg m −3 ), but high values are presented in Region_2 (13.4 µg m −3 ), showing the similar spatial distribution characteristics as the observed SO 2− 4 . In CTMs, there are two pathways about the nitrate formation. The dominant pathway is the homogeneous gas-phase reaction between HNO 3 (NO 2 oxidation by OH during the daytime) and NH 3 under ammonia-rich conditions, and the second pathway is the heterogeneous hydrolysis of N 2 O 5 on aerosol surface at night in ammonia-poor environments (Seinfeld and Pandis, 2006;Archer-Nicholls et al., 2014). As NH 4 NO 3 is semi-volatile species, and the equilibrium surface concentration of H 2 SO 4 is set to be zero in CTMs, so (NH 4 ) 2 SO 4 is the preferential species in the completion when H 2 SO 4 and HNO 3 are both present. Only if NH 3 is in excess will NH 4 NO 3 be formed. Analyzing the performance of each participant model, NO − 3 concentration is overpredicted by most models, and the underestimation of SO 2− 4 can be used to explain this overestimation (Chen et al., 2017). Meanwhile, the biases from model-calculated gas-phase oxidation (e.g., NO 2 + OH → HNO 3 ) and/or gas-aerosol phase partitioning (e.g., HNO 3(g) + NH 3(g) ↔ NH 4 NO 3(s, aq) ) may also result in the overestimation Gao et al., 2014).
However, M7 and M8 significantly underestimate the observed NO − 3 concentrations (NMB ∼ −93.4 %). One reason for the extremely low values may result from the incorrect concentrations of NH 3 simulated by M7 and M8 (Fig. S4). As  pointed out, the amount of NH 3 in the atmosphere is a key factor in determining the NO − 3 concentration. Another reason for this underestimation is that M7 and M8 did not consider the impacts of N 2 O 5 heterogeneous reaction (N 2 O 5(g) + H 2 O (aq) → 2HNO 3(aq) ).  pointed out that the hydrolysis of N 2 O 5 can lead up to a 21.0 % enhancement of NO − 3 , especially over polluted regions. Although the NMB calculated in Region_All for MMEM is only −1.1 %, MMEM systematically overpredicts observations in Region_1 (NMB = 45.2 %) and Re-gion_3 (NMB = 38.2 %) but underpredicts them in Region_2 (NMB = −0.7 %) and Region_4 (NMB = −44.9 %).
Simulated NH + 4 concentrations are influenced by the partitioning between gaseous NH 3 and aerosol NH + 4 , and are also associated with the SO 2− 4 and NO − 3 concentrations (Gao et al., 2018). Model predictions (except M7, M8, and M14) can reproduce the measurements relatively well in each defined subregion. But significant overestimation is shown by M14, while significant underestimation is simulated by M7 and M8, especially in Region_2 with NMBs of 72.2 % for M14, −94.9 % for M7, and −81.0 % for M8, respectively. For M14, overestimated SO 2− 4 and NO − 3 make the concentrations of NH + 4 higher, since more ammonium is required to neutralize particle-phase acid. For M7 and M8, extremely low concentrations of NH 3 are simulated, which means less gaseous NH 3 can be converted to aerosol NH + 4 . In general, the calculated NMB in Region_All by MMEM is 4.0 %.
On average, the observed PM 2.5 concentration in Re-gion_2 is larger than 50 µg m −3 , but the mean value in Re-gion_1 is only about 10 µg m −3 . All participating models can generally capture this spatial distribution pattern. However, significant underestimation is simulated at the three remote stations (sites 1, 2, and 7) in Region_1 with the NMB of −39.0 % for MMEM. Similar negative bias can also be found in Ikeda et al. (2013), who compared CMAQ (v4.7.1) simulation results against observations from the same remote monitoring stations (Rishiri and Oki) in 2010. Ikeda et al. (2013) pointed out that the underestimated concentrations of organic aerosols may cause this bias. In Region_2, the NMB for MMEM is −10.0 %.
For PM 10 , the mean observed concentrations in each region are 26.6 µg m −3 (Region_1), 114.4 µg m −3 (Region_2), and 38.1 µg m −3 (Region_4), respectively. But nearly all participant models (except M14) underestimate the PM 10 concentrations. M14 predicts higher concentrations in Region_1, especially at coastal sites, such as site 1 (Rishiri), site 2 (Ochiishi), site 4 (Sadoseki), site 7 (Oki), and site 14 (Cheju). The high-value anomalies in M14 at coastal stations can also be found in Fig. 10, and the positive bias may be caused by the emission and gravitational settling of sea salt. As Monahan and Muircheartaigh (1980) pointed out, sea-salt emissions can be enhanced in the surf zone due to the increased number of wave breaking events, and the degree of the enhancement highly depends on the 10 m wind speed used in the whitecap coverage parameterization. According to the simulation results from published literature, higher wind speed is simulated by M14 (RAMSCMAQ) when compared with observations, especially at coastal stations . Meanwhile, a gravitational settling mechanism of coarse aerosols from upper to lower layers was added in M14, and the net effect of this update could make an increase in the concentrations of coarse particles, especially near coastal areas impacted by sea spray (Nolte et al., 2008). Generally, the NMB for MMEM in Region_All is −31.0 %.
Time series of the monthly observed and simulated aerosol compositions, including BC, SO 2− 4 , NO − 3 , NH + 4 , PM 2.5 , and PM 10 , are shown in Figs. 4 and 5. According to the predefined subregions as illustrated in Fig. 2, all simulations and observations are grouped into the five regions, with the modeling results sampled at the corresponding observation stations before averaging together.
The measured BC concentrations in Region_2 exhibit an obvious seasonal variation, with the minimum (∼ 3.5 µg m −3 ) in spring and summer, and the maximum (∼ 8 µg m −3 ) during late autumn and winter. Participant models can capture this seasonality quite well, and nearly all simulation results are within the standard deviation of the observations, but a large intermodel variation is also simulated, especially in winter when BC concentration is high. Due to its low reactivity in the atmosphere, this variation may be caused by their simulated meteorological conditions, including the impacts of different coupling ways between meteorological and chemical modules (Y. . As Briant et al. (2017) and Huang et al. (2018) concluded, the online integrated models can simulate higher BC concentra-tions than offline models, especially during polluted periods. The correlation coefficient in MMEM is 0.73.
For PM 2.5 , the observed monthly concentrations in Re-gion_2 are higher than those in Region_1. This is because the emissions in China are larger than those in Japan and the Korean Peninsula (Fig. S2). But nearly all models tend to underpredict the concentrations of PM 2.5 in Region_1, with NMBs ranging from −44.3 % (in winter) to −22.7 % (in summer) for MMEM. Comparing with the correlation coefficient (R = 0.40) in Region_1, CTMs can better reproduce the seasonality of the observed PM 2.5 in Region_2, with the R of 0.69 for MMEM. Generally, the R for MMEM in Re-gion_All is 0.83 and the NMB ranges from −2.2 % (in autumn) to 13.9 % (in winter).
Similar temporal-variation characteristics of PM 10 concentrations are observed in Region_1, Region_2, and Re-gion_4, with the maximum occurring in March and November, and the minimum occurring during summer. Most models fall within the standard deviation of the observations. The simulated PM 10 concentrations in Region_2 show less diversity, but nearly all models peak 2 months later. A distinctive seasonality can be found in Region_4, with the highest value (nearly 80 µg m −3 ) observed in March, but most models cannot reproduce this characteristic. This is because GFED substantially underestimates the biomass burning emissions over southeast Asia (Fu et al., 2012), especially during March-April when most intense biomass burning occurred in Myanmar, Thailand, and other southeast Asian countries , and the emission bias is mainly due to the lack of agricultural fires (Nam et al., 2010). Finally, a weak seasonality in PM 10 is simulated by MMEM with R of 0.58 in Region_4. In Region_All, although consistent underestimation is simulated during the whole period, with NMB ranging from −40.8 % to −25.2 % for MMEM, the seasonal cycle can be well reproduced by MMEM with R of 0.78.
The seasonal variation characteristics of observed SO 2− 4 , NO − 3 and NH + 4 in Region_1 are not obvious, with the annual mean of ∼ 4 µg m −3 for SO 2− 4 , 1.5 µg m −3 for NO − 3 , and 1.0 µg m −3 for NH + 4 , respectively. A large intermodel spread of simulated SO 2− 4 is shown in Fig. 5a1, with the maximum variation range in June. Most models significantly overpredict the observed NO − 3 concentrations, especially in summer with the NMB of 164.3 % for MMEM. Simulated monthly NH + 4 concentrations from most models are within the standard deviation of observations, and the R for MMEM is as high as 0.74. In Region_2, the observations are only available at one EANET site (the Hongwen site, located in the eastern coastal area of China), and the seasonality of observed SO 2− 4 , NO − 3 , and NH + 4 from this station is obvious with the maximum in spring and winter, and the minimum in late summer and early autumn. Nearly all models tend to underpredict these concentrations, but the MMEM captures the seasonal cycle relative well with R values of 0.57 for SO 2− 4 , 0.85 for NO − 3 , and 0.86 for NH + 4 , respectively. In Region_3, the observed maximum concentrations of SO 2− 4 and NH + 4 are in winter, but most models cannot reproduce the increasing tendency during the late autumn and the early winter, which means participant models fail to capture the seasonality (R values of 0.20 for SO 2− 4 , 0.34 for NO − 3 , and 0.18 for NH + 4 , respectively). This may be due to the low emission of primary aerosols and their precursors in Region_3. Meanwhile, the Regional Emission Inventory in Asia (REAS v2.1) is used in Region_3, which is calculated based on the emissions from 2000 to 2008 (M. , not extended to the simulation year of 2010. The updated emissions with localized data may increase the accuracy of simulation results. In Region_4, the simulated concentrations of SO 2− 4 , NO − 3 , and NH + 4 are fairly good when compared with the measurements. The R values of MMEM are 0.73 for SO 2− 4 , 0.63 for NO − 3 , and 0.73 for NH + 4 . Meanwhile, the model diversities are small. Generally, in Region_All, MMEM can well reproduce the magnitudes of observed SO 2− 4 , NO − 3 , and NH + 4 during the whole simulation period, as well as the seasonal variation characteristics. As mentioned above, the observed monthly mean concentrations of aerosol compositions in China are only available at one EANET station (site 17, the Hongwen station), with missing values in June and October. In order to make the evaluation more comprehensive, observed seasonal mean concentrations of SO 2− 4 , NO − 3 , and NH + 4 collected from pub-lished literature are also used to compare with simulation results (Fig. S5). M2, M12, and M14 reasonably reproduce the SO 2− 4 concentrations in the four seasons, while others fail to simulate the high observed SO 2− 4 concentrations. The NMBs of SO 2− 4 range from −79.4 % (M7) to 12.8 % (M14). On the contrary, nearly all participant models overestimate the concentrations of NO − 3 (except M4, M7, and M8), with NMBs ranging from 1.7 % (M5) to 50.2 % (M9). The underestimation of SO 2− 4 and the overestimation of NO − 3 may be the general performance in current CTMs (Y. Gao et al., 2014;Huang et al., 2014;Zheng et al., 2015), and some hypotheses should be deeply tested in the future to reduce these deviations, such as (1) missing oxidation mechanisms of SO 2 may lead to low concentrations of SO 2− 4 , which allows for excess NO − 3 in the presence of ammonia, and (2) there is an issue with NO x partitioning and/or missing NO x sink. Meanwhile, Seinfeld and Pandis (2006) pointed out that the chemical production of SO 2− 4 and NO − 3 is mainly from the gas-phase and/or liquid-phase oxidation of SO 2 and NO 2 . Therefore, further comparisons of observed and simulated SO 2 and NO 2 are shown in Figs. S6 and S7. From Fig. S6, participant models can generally reproduce the seasonality of the two gases, with R values of 0.61 for SO 2 and 0.65 for NO 2 , respectively. But overestimations (underestimations) of SO 2 (NO 2 ) are found during most simulation periods, not only in China but also in other defined subre- Figure 5. The same as Fig. 4 but for (a1-a5) SO 2− 4 , (b1-b5) NO − 3 , and (c1-c5) NH + 4 . In this figure, the monthly measurements are taken from EANET. gions (Fig. S7). The overestimated (underestimated) concentrations of SO 2 (NO 2 ) can be used to explain the underestimation (overestimation) of simulated SO 2− 4 (NO − 3 ). However, significant underestimation of NO − 3 is also simulated by M7 and M8. As mentioned above, the extremely low concentrations of NH 3 in M7 and M8 may be the main reason for this negative bias. Analyzing the results from ensemble mean, MMEM shows better performance than participating models, with NMBs of −46.0 % for SO 2− 4 , 1.9 % for NO − 3 , and 13.1 % for NH + 4 , respectively.

Evaluation for aerosol optical depth
Simulated AODs at 550 nm from the nine participant models (M1, M2, M4, M7, M9, M11, M12, M13, and M14) are compared with the measurements from AERONET. From Fig. 6, we can find that most models tend to overpredict AOD values during the whole simulation period in Region_1, Re-gion_2, and Region_3 with NMBs of 74.0 %, 38.8 %, and 107.0 % for MMEM, respectively. In Region_4, an obvious seasonality is observed, with the maximum in spring and the minimum in summer. Models can capture this seasonality well, although underestimation is found in spring.
The R for MMEM is 0.65 and the NMB is −8.7 % in Re-gion_4. Smaller NMB (−4.2 %) is calculated in Region_5 by MMEM, but a quite weak seasonality is shown with underestimated AOD in spring and summer, and overestimated AOD in autumn and winter. Generally, simulated AOD values are within a standard deviation of the observations in Re-gion_All, with a slight overestimation in autumn and winter. The MMEM can reproduce the seasonal cycle with R of 0.68, and the NMB for MMEM is 18.7 %. Figure 7 presents the spatial distributions of the observed and simulated AOD at 550 nm. MODIS AOD is collected from the Terra and Aqua satellites during the year 2010. The observed AODs from AERONET are also shown. In order to quantify the ability of each model to simulate the spatial distribution of aerosol particles, spatial correlation coefficients are also given in the bottom left corner of each panel. Analyzing the observations from MODIS, we can conclude that AOD values are higher in central and eastern China, including the Sichuan province, with the maximum over 1.0. High values can also be observed in the north of India. Due to dust events happening in arid and semi-arid regions, AOD values over the Taklimakan are also large (∼ 0.5). Comparing with MODIS AOD, most models can reproduce the spatial distribution characteristics, with high values in China and India, and low values in other countries. The R values range from 0.78 (M12) to 0.86 (M1, M11 and M13). But most models tend to underestimate the AOD in the eastern coastal regions  of China and the north regions of India (Fig. S8), where anthropogenic emissions are large. Meanwhile, dust particles can be frequently observed. Generally, MMEM captures the AOD spatial variation better with R of 0.87, and the mean bias is −0.08. Table 2 shows the statistics of correlation coefficient (R), normalized mean bias (NMB), and root mean squared error (RMSE) for BC, SO 2− 4 , NO − 3 , NH + 4 , PM 2.5 , PM 10 , and AOD. Simulation results from participant models and MMEM are compared with available observations. Best results are in bold and underlined.

Statistics for aerosol particles and aerosol optical depth
It can be found that participant models are able to capture the variability of BC in China, with R values ranging from 0.65 (M5) to 0.80 (M8), but nearly all models tend to underestimate the BC concentration, except M1 and M2. The maximum negative deviation is simulated by M5 (NMB = −54.9 %), while the maximum positive deviation is from M2 with NMB of 12.7 %. All the RMSEs are less than the observed mean concentration of BC (5.0 µg m −3 ). Comparing to the observed SO 2− 4 , most models fail to reproduce the high values, and the NMB for MMEM is −19.1 %, meaning the underestimation of the simulated SO 2− 4 concentration is a general phenomenon in current CTMs. Implementing more detailed sulfate aerosol formation mechanisms (e.g., heterogeneous reaction and catalytic oxidation) into air quality models may improve the accuracy of simulation results (Huang et al., 2014;Zheng et al., 2015;Fu et al., 2016). But most models can capture the variation of SO 2− 4 with R values ranging from 0.46 (M14) to 0.76 (M13). For NO − 3 , R values vary from 0.29 (M8) to as high as 0.65 (MMEM). M5 shows the largest correlation (0.65) and the smallest NMB (−1.7 %) among models. Although a high value of R (0.64) is calculated by M9, the NMB is the largest (125.7 %). All RM-SEs are larger than the measured NO − 3 (1.7 µg m −3 ), meaning a relative poor performance for current CTMs to simulate the NO − 3 concentrations in East Asia. For NH + 4 , underestimation can be found in M4, M7, and M8, while the others tend to overestimate the NH + 4 concentration. Although all RMSEs are larger than the observed NH + 4 (mean value is 1.1 µg m −3 ), most models can capture the variability, with R values ranging from 0.34 (M8) to 0.75 (M9). Generally, MMEM matches the observations with R of 0.71, NMB of 14.0 %, and RMSE of 1.11 µg m −3 , respectively. Although significant underprediction is found in PM 10 (NMBs range from −55.7 % in M5 to −16.9 % in M9, except M14) and the intermodel spread is large in PM 2.5 (NMBs range from −26.5 % in M13 to 46.0 % in M14), the variations of simulated PM 2.5 and PM 10 are well correlated with measurements (R values > 0.60) and the RMSEs are all smaller than the averaged concentrations (51.4 µg m −3 for PM 2.5 , 80.7 µg m −3 for PM 10 ). For AOD, large positive deviations are simulated

Intercomparison between MICS-Asia phase II and phase III
The main purpose of MICS-Asia phase III topic 1 is to assess the ability of current multi-scale air quality models to reproduce the air-pollutant concentrations in East Asia. In order to reveal the improvements of the simulation ability in current CTMs, statistics (e.g., RMSE and R) for observed and simulated SO 2− 4 , NO − 3 , and NH + 4 from MICS-Asia phase II and phase III are compared in Fig. 8.
The statistics of MICS-Asia phase II are taken from Hayami et al. (2008). The observed monthly mean concentrations are monitored with high completeness at the 14 EANET stations in March, July, andDecember 2001, andMarch 2002, and the model-predicted monthly surface concentrations are from eight regional CTMs. Notably, NO − 3 and NH + 4 used in Hayami et al. (2008) are total NO − 3 (the combination of gaseous HNO 3 and particulate NO − 3 ) and total NH + 4 (the combination of gaseous NH 3 and particulate Figure 8. Intercomparison of model performance between MICS-Asia phase II (blue) and phase III (red) for SO 2− 4 , NO − 3 , and NH + 4 . Detailed information about the observations and simulations used in phase II can be obtained from Hayami et al. (2008). Each box plot exhibits the full range, the interquartile range, and the median for the (a) RMSE and (b) correlation coefficient. Detailed values of the median (the 25th percentile, the 75th percentile) are also listed above each panel.
Analyzing the RMSEs in Fig. 8 Although the medians (except NH + 4 ) are a little larger than that in phase II, the interquartile ranges are quite smaller, indicating similar concentrations can be simulated by current CTMs. Meanwhile, the medians of the correlations of SO 2− 4 , NO − 3 , and NH + 4 in phase III, including the upper and lower quartiles, are all larger than that in phase II, which means current CTMs show better performance in reproducing the spatial-temporal variations of observations.
Although the participating models (8 versus 12 CTMs), observation sites (14 versus 31 EANET stations), and simulation periods (4 months versus 1 year) are different between phase II and phase III, more reasonable statistics are calculated by current CTMs, reflecting better performance in simulating the concentrations of aerosols and their spatialtemporal variations. Figure 9 shows the spatial distributions of simulated PM 2.5 concentrations from each participant model and the MMEM. The coefficient of variation (hereinafter, CV), defined as the standard deviation of the models divided by their mean, is also calculated. The larger the value of CV, the lower the consistency among the participating models Gao et al., 2018). All simulation results can reproduce the high PM 2.5 in the northern India and the eastern China, including the Sichuan province in China. The areas with high PM 2.5 concentrations (> 40 µg m −3 ) are consistent with the regions where CV is low (< 0.3), indicating similar performance of the CTMs in simulating the air pollutants over haze-polluted areas.
SOR and NOR can be used to estimate the degree of secondary formation of SO 2− 4 and NO − 3 (Sun et al., 2006;Zhao et al., 2013). When SOR and NOR are less than 0.1, SO 2− 4 and NO − 3 mainly come from the primary source emissions; otherwise, high oxidation rates of SOR and NOR can result in large fractions of SO 2− 4 and NO − 3 in PM 2.5 (Q. Fu et al., in each defined subregion. The ratio of SNA (sulfate, nitrate, and ammonium) to PM 2.5 , the SOR (sulfur oxidation ratio), the NOR (nitric oxidation ratio), and the PNR (particle neutralization ratio) are also given at the bottom of each panel.
2008). Generally, CMAQ models (M1, M2, M4, M5, M6 and M14) produce 30.7 % higher SOR than others (except M8), which means more intense secondary formation of SO 2− 4 is simulated by CMAQ. Similar NOR is predicted by participant models (∼ 0.24), except M7 and M8. The extremely low value of NOR (∼ 0.02) from M7 and M8 is due to the unreasonably low NO − 3 concentrations. Previous measurements show that the mean value of NOR is about 0.15 (Du et al., 2011;Zhang et al., 2018), which is lower than the predicted one from MMEM (0.20) in this study, indicating more NO − 3 is produced by secondary formation in current CTMs.
PNR is defined as the mole ratio of ammonium to sulfate and nitrate. When PNR is larger than unity, sufficient ammonia can be used to neutralize the acidic sulfate and nitrate; otherwise, there is an incomplete neutralization of acidic species. Analyzing the calculated PNRs from partic- Figure 10. The same as Fig. 9 but for PM coarse (coarse particles, subtract PM 2.5 from PM 10 ).
ipant models, all values are smaller than 1, which means atmospheric conditions are considered to be ammonia deficient. But the mole ratios of nNH + 4 /(2 × nSO 2− 4 ) are all larger than 1 (∼ 1.6, except M7 and M8). All these indicate that acidic sulfate is fully neutralized to form (NH 4 ) 2 SO 4 or NH 4 HSO 4 , and parts of acidic nitrate are changed to NH 4 NO 3 . Meanwhile, under NH 3 -limited conditions, small reductions in ammonia may cause significant reductions in particulate matter (Makar et al., 2009).
However, a large CV (> 1.0) is simulated over arid and semi-arid regions (Fig. 9), such as the Taklimakan Desert and the Gobi Desert, where dust events are often observed, which means current CTMs have difficulty processing dust aerosols, especially in producing a similar amount of dust emissions and in identifying the same potential dust source regions, by using different dust schemes. Large CVs are also shown in simulated coarse particles (subtract PM 2.5 from PM 10 ) in Fig. 10. High concentrations of coarse particles simulated by M9 over arid and semi-arid regions may be caused by the inaccurate physicochemical parameters (e.g., plastic pressure of the soil surface) used in the Shao dust scheme . Large values (> 20 µg m −3 ) over coastal regions from M14 may result from the inadequate simulation results of sea-salt aerosols.
From Table 3 we can conclude that the low consistency (or the large CV) of simulated coarse particles in each defined subregion is mainly caused by the dust particles. Without the impacts of dust aerosols and sea salts (only simulation results from M7 and M8 are considered), the calculated CVs for Region_1 to Region_5 are 0.29, 0.30, 0.33, 0.19, and 0.10, respectively. Without the impacts of dust aerosols (only simulation results from M1, M2, M4, M5, and M6 are considered), similar spatial distributions are found in Fig. 10, and the CVs averaged over each subregion are 0.37 (Re-gion_1), 0.65 (Region_2), 0.48 (Region_3), 0.59 (Region_4), and 0.65 (Region_5), respectively. But when the influences of dust aerosols and sea salts are both considered (simulation results from M9, M11, M12, and M14 are used), larger CVs are obtained with values of 0.97 for Region_1, 1.04 for Re-gion_2, 1.27 for Region_3, 0.95 for Region_4, and 0.88 for Region_5.
Aerosol chemical compositions simulated by each participant model and the MMEM in the six metropolitan cities (Beijing, Shanghai, Guangzhou, Delhi, Seoul, and Tokyo) are shown in Fig. 11. PM 2.5 is composed of SNA (SO 2− 4 + NO − 3 + NH + 4 ) and OTHER1 (BC + OC + OTHER2). PM 10 includes PM 2.5 and PM coarse (coarse particles). Notably, PM coarse cannot be calculated by M13 because PM 10 is missing in M13.
High values of PM 2.5 and PM 10 in Beijing, Shanghai, Guangzhou, and Delhi are simulated by nearly all models, and the annual mean concentrations of PM 2.5 and PM 10 from MMEM are all larger than the IT-1 (interim target 1, 35 µg m −3 for PM 2.5 and 70 µg m −3 for PM 10 ) proposed by WHO. But relatively small concentrations are presented in Tokyo (15.5 and 21.3 µg m −3 for PM 2.5 and PM 10 , respectively) and Seoul (21.7 and 27.6 µg m −3 for PM 2.5 and PM 10 , respectively). For each city, a large spread of concentrations of aerosol compositions can be found among participant models (a factor of ∼ 10 for SNA, a factor of ∼ 2 for PM 2.5 and PM 10 ). This is partly caused by the differences in gas-aerosol partitioning and dust emissions, including the removal processes (e.g., dry and wet depositions).
Analyzing the ratios of aerosol compositions to PM 2.5 in MMEM (Fig. 11b1-b6), the sums of the contributions of BC, OC, SO 2− 4 , NO − 3 , and NH + 4 in Beijing (63.8 %), Shanghai (60.4 %), Guangzhou (63.1 %), and Delhi (65.1 %) are all less than those in Tokyo (87.2 %) and Seoul (75.2 %). Among these components, NO − 3 is the major species in Beijing (20.7 %) and Delhi (23.6 %), while SO 2− 4 is the major species in Guangzhou (22.2 %). Similar contributions of SO 2− 4 and NO − 3 can be found in Shanghai, Seoul, and Tokyo. All these suggest that different air-pollution control plans should be taken in different metropolitan cities.

Conclusion and discussion
This paper mainly focuses on the first topic of the MICS-Asia phase III, and intends to analyze the following objectives: (1) provide a comprehensive evaluation of current air quality models against observations, (2) analyze the diversity of simulated aerosols among participant models, and (3) re-veal the characteristics of aerosol components in large cities over East Asia. Comparisons against monthly observations from EANET and CNEMC demonstrate that all participant models can well reproduce the spatial-temporal distributions of aerosols. The MMEM shows better performance than most singlemodel predictions, with correlation coefficients (R values, between MMEM and measurements) ranging from 0.65 (nitrate, NO − 3 ) to 0.83 (PM 2.5 ). Differences between predictions and observations are also simulated; for instance, sulfate (SO 2− 4 ) is underestimated by participant models (except M12 and M14), with NMBs ranging from −67.7 % (M7) to −1.6 % (M8). The concentrations of nitrate (NO − 3 ) and ammonium (NH + 4 ) are overestimated by most models, with NMBs of 4.9 % for NO − 3 and 14.0 % for NH + 4 in MMEM. The absence of sulfate formation mechanisms (e.g., heterogeneous chemistry) in CTMs can be used to explain the underestimation of SO 2− 4 , and the underestimated SO 2− 4 will result in the overestimation of NO − 3 . However, significant underestimations of NO − 3 and NH + 4 are shown in M7 and M8. This is because extremely low values of NH 3 are simulated by these models. The intermodel spread of simulated PM 2.5 is large, with NMBs ranging from −26.5 % (M13) to 46.0 % (M14), and nearly all models underestimate the PM 2.5 concentrations in Region_1. The underestimation may be the insufficient precursors and formation pathways of organic aerosols in current CTMs. Underestimations of PM 10 are also simulated in each subregion, and the NMB is −32.6 % in MMEM. This may due to the inaccurate emission inventories (e.g., anthropogenic emissions, biomass burning emissions, and natural emissions) considered in CTMs.
In order to reveal the improvements of the simulation ability in current CTMs, statistics for observed and simulated SO 2− 4 , NO − 3 , and NH + 4 from MICS-Asia phase II and phase III are compared. Results obviously show that the spread of RMSEs for each species in phase III is smaller, meaning similar concentrations can be simulated by current CTMs. Meanwhile, the medians of the correlations, including the upper and lower quartiles, are larger, which means current CTMs show better performance in reproducing the temporal variations of observations.
Analyzing the ratio of SNA to PM 2.5 , large variations are simulated by participant models, with values ranging from 31.1 % (M7) to 75.1 % (M5). Different gas-phase and aerosol schemes used in CTMs can explain this inconsistency. Higher SOR (sulfur oxidation ratio) is calculated by CMAQ models, indicating that CMAQ has a more intense secondary formation of SO 2− 4 than other participant models. Similar NOR (nitric oxidation ratio) is predicted by CTMs, but the value (∼ 0.20) is larger than the observed one (∼ 0.15), which means overmuch NO − 3 is simulated by current CTMs. According to the mole ratio of ammonium to sulfate and nitrate, NH 3 -limited condition can be successfully simulated by all participant models, which indicates that a small reduction in ammonia may improve the air quality significantly.
The coefficient of variation (CV) can be used to quantify the intermodel deviation, and a large CV is shown in simulated coarse particles (subtract PM 2.5 from PM 10 ). The poor consistency, especially over the arid and semi-arid regions, is mainly caused by the dust aerosols, which means current CTMs have difficulty reproducing similar dust emissions by using different dust schemes. But the simulated fine particles are in good agreement, especially over the haze-polluted areas.
According to the MMEM simulation results, the highest PM 2.5 concentrations in Beijing, Shanghai, Guangzhou, and Delhi are shown in winter, mainly due to the high emissions and unfavorable weather conditions. But the highest value in Tokyo appears in summer. PM 2.5 concentrations are comparable in the four seasons in Seoul. Analyzing the ratios of each composition to PM 2.5 , NO − 3 is the major component in Beijing and Delhi, SO 2− 4 is the major one in Guangzhou, and similar contributions of SO 2− 4 and NO − 3 are calculated in Shanghai, Seoul, and Tokyo. All these suggest that different air-pollution control plans should be taken in different cities.
The MICS-Asia project gives an opportunity to understand the performance of CTMs in East Asian applications, including the similarities and differences among air quality models. In order to quantify the impacts of different model inputs and model configurations, and to reduce the diversities among simulation results, more detailed sensitivity experiments should be discussed. For example, simulation results from M1 and M2 can be used to assess the impacts of boundary conditions, since the configurations in these two models are similar except the boundary conditions. M1 adopts the downscale results from GEOS-Chem, while M2 uses the default values from CMAQ. From Fig. S9, we can find that positive biases are simulated ((M1 − M2)/M2 · 100 % > 0), especially around the edges of the simulation domain, and the maximum deviation can be over 100 %. This is because the boundary conditions from GEOS-Chem consider the impacts of aerosols outside the domain. All these demonstrate that the impacts of boundary conditions should not be neglected when analyzing the spatial distribution characteristic of simulated aerosols around the edge of the domain. But in most inland regions, differences between M1 and M2 are smaller (< ±10 %). Meanwhile, process analysis techniques (i.e., integrated process rate (IPR) analysis) should be developed and implemented in air quality models. This is because IPR can be used to calculate the contributions of each physical/chemical process to variations in aerosol concentrations ; then it will be easier to draw conclusions about the fundamental problems that cause the differences between model predictions . Fully understanding of the source-receptor relationship in each process for a given aerosol species can also be helpful to revise parameterization schemes for better simulation capability. What is more, extensive observations should be collected and used in the next MICS-Asia project.
Author contributions. LC, YG, and MZ conducted the study design. LC, JZ, HL, JL, KH, BG, XW, YFL, CL, SI, TN, MK, and KY contributed to modeling data. JSF, ZW, and JK provided the emission data and observation data. YG and JZ helped with data processing. MZ, JSF, and JZ were involved in the scientific interpretation and discussion. LC prepared the manuscript with contributions from all co-authors.
Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "Regional assessment of air pollution and climate change over East and Southeast Asia: results from MICS-Asia Phase III". It is not associated with a conference.