Why do models perform differently on particulate matter over East Asia? A multi-model intercomparison study for MICS-Asia III

. This study compares the performance of 12 regional chemical transport models (CTMs) from the third phase of the Model Inter-Comparison Study for Asia (MICS-Asia III) on simulating the particulate matter (PM) over East Asia (EA) in 2010. The participating models include the Weather Research and Forecasting model coupled with Community Multiscale Air Quality (WRF-CMAQ; v4.7.1 and v5.0.2), the Regional Atmospheric Modeling System coupled with CMAQ (RAMS-CMAQ; v4.7.1 and v5.0.2), the Weather Research and Forecasting model coupled with chemistry (WRF-Chem; v3.6.1 and v3.7.1), Goddard Earth Observing System coupled with chemistry (GEOS-Chem), a non-hydrostatic model coupled with chemistry (NHM-Chem), the Nested Air Quality Prediction Modeling System (NAQPMS) and the NASA-Uniﬁed WRF (NU-WRF). This study investigates three model processes as the possible reasons for different model performances on PM. (1) Models perform very differently in the gas–particle conversion of sulfur (S) and oxidized nitrogen (N). The model differences in sulfur oxidation ratio (50 %) are of the same magnitude as that in SO 2 − 4 concentrations. The gas–particle conversion is one of the main reasons for different model performances on ﬁne mode PM. (2) Models without dust emission modules can perform well on PM 10 at non-dust-affected sites but largely underestimate (up to 50 %) the PM 10 concentrations at dust sites. The implementation of dust emission modules in the models has largely improved the model accuracies at dust sites (reduce model bias to − 20 %). However, both the magnitude and distribution of dust pollution are not fully captured. (3) The amounts of modeled depositions vary among models by 75 %, 39 %, 21 % and 38 % for S wet, S dry, N wet and N dry depositions, respectively. Large inter-model differences are found in the washout ratios of wet deposition


Introduction
Atmospheric pollution due to particulate matter (PM) has raised worldwide attention for its relationship with environmental and public health issues (Fuzzi et al., 2015;Nel, 2005). Fine particles (PM 2.5 ) are associated with cardiovascular-and respiratory-related cancers and premature death (Hoek and Raaschou-Nielsen, 2014;Knol et al., 2009). Outdoor PM 2.5 pollution is estimated to cause 2.1-5.2 million premature deaths worldwide annually (Lelieveld et al., 2015;Rao et al., 2012;Silva et al., 2013). It accounted for 8 % of global mortality in 2015 and ranks fifth in the list of global mortality risk (Cohen et al., 2017). East Asia (EA) has been suffering from severe PM pollution due to anthropogenic emissions and natural dust emissions (Akimoto, 2003). China and India are the top two countries suffering from outdoor air pollution, which altogether accounted for 20 % of global mortality caused by PM 2.5 exposure in 2010 (Lelieveld et al., 2015). The mixing of dust with anthropogenic pollutants can even increase the effects of pollution (Li et al., 2012). However, the impact evaluation of PM pollution is of high uncertainty due to the unclearness of the toxicity of PM components (Lippmann, 2014) and difficulty in the measurement and prediction of PM concentrations.
For a better understanding of PM pollution, a modeling approach has been adopted to study the spatial distributions of PM with the aid of measurements. The multi-model ensemble approach, which interprets modeling results with combined information from several models, has been proven to increase the reliability of model accuracy (Tebaldi and Knutti, 2007). This method has been widely used for studies in Europe (Bessagnet et al., 2016;Vivanco et al., 2017) and at global scales Galmarini et al., 2017) on air quality issues. The Model Inter-Comparison Study for Asia (MICS-Asia) aims at understanding the air quality issues in EA. The first phase of MICS-Asia (MICS-Asia I) was carried out in the 1990s with eight regional chemical transport models (CTMs). The study focused on air pollution issues related to sulfur (S, including SO 2 , SO 2− 4 and wet SO 2− 4 deposition). The second phase of MICS-Asia (MICS-Asia II) was launched in the early 2000s with nine CTMs . The study covered the chemistry and transport of S, nitrogen (N), PM and acid deposition. Multi-model results on SO 2− 4 , NO − 3 and NH + 4 (SNA) were evaluated with measurements from 14 sites of Acid Deposition Monitoring Network in East Asia (EANET) and the Fukue site in Japan. However, a non-exhaustive evaluation on PM 10 concentrations in China with scarce datasets left an unclear view of the models' ability in this area, a region recognized as one of the most heavily polluted in EA. Mean-while, model results were found with high inconsistencies on simulating both gas and aerosol phases of S and N . Further efforts are needed to investigate the reasons for model differences to improve model accuracies.
This study compares the performance of 12 regional models which participated in the third phase of MICS-Asia (MICS-Asia III) on simulating PM over EA. The comparison among models aims at identifying the reasons for different model performances. The models involved in this study include the Weather Research and Forecasting Model (WRF) coupled with Community Multiscale Air Quality (CMAQ) modeling (version 4.7.1 and v5.0.2), the Regional Atmospheric Modeling System coupled with CMAQ (RAMS-CMAQ), WRF model coupled with chemistry (WRF-Chem) (v3.6.1 and v3.7.1), Goddard Earth Observing System coupled with chemistry (GEOS-Chem), a non-hydrostatic model coupled with chemistry (NHM-Chem), the Nested Air Quality Prediction Modeling System (NAQPMS) and the NASA-Unified WRF (NU-WRF). The models' performance on simulating PM has been reported in a companion paper (Chen et al., 2019). The main findings are described in Sect. 3.1. Section 3.2-3.4 examine the influences of three model processes on model performance: (1) formation of fine particles (PMF) -model differences in the gas-particle conversion; (2) formation of coarse particles (PMC) -model improvements by implementing dust emission modules for simulating PM and the remaining problems; and (3) removal processes of particles from the atmosphere -uncertainties lie in the efficiencies of wet and dry depositions. Section 4 concludes the findings of this study and provides suggestions for further study.

Framework of MICS-Asia
MICS-Asia is a model intercomparison study with contributions from international modeling groups to simulate the air quality and deposition over EA. MICS-Asia I focused on air quality issues related to S. The multi-model performances on simulating SO 2 and SO 2− 4 concentrations and SO 2− 4 wet deposition were evaluated with observations from 18 stations (Carmichael et al., 2002). A source-receptor relationship of S deposition was developed based on the sensitivity simulations for seven prescribed receptor regions: Komae, Oki, Fukue, Yangyang, Beijing, Nanjing and Taichung (Carmichael et al., 2002).
MICS-Asia II was initiated in 2003. Nine regional models simulated the air quality for 4 months (March, July andDecember of 2001 andMarch of 2002) to study the chemistry and transport of air pollutants and acid deposition . All modeling groups were enforced to use the same emission: the Transport and Chemical Evolution over the Pacific (TRACE-P) emission of 2000 and common initial condition (IC) and boundary condition (BC) to facilitate a comparison between the physical and chemical mechanisms of the models. The modeling species expanded to S, N, O 3 , PM and acid deposition. Model evaluations and major findings can be found in the literature Fu et al., 2008;Hayami et al., 2008). MICS-Asia III was launched in 2010. The simulation time covers the whole year of 2010. All modeling groups are required to use the prescribed anthropogenic emissions and natural inputs (including biogenic emissions, biomass burning emissions and volcanic SO 2 emissions. Dust and seasalt emissions are produced by the corresponding modules in different models) (Li et al., 2017). Three purposes are set for this project. Topic I aims at evaluating the strengths and weaknesses of current multi-scale air quality models in simulating air qualities over EA and providing suggestions to reduce uncertainty for future simulations. Topic II intends to develop a reliable anthropogenic emission inventory for EA. The purpose of Topic III is to investigate the interaction of aerosol-weather-climate by using online coupled air quality models. This study focuses on Topic I.

Model configurations
The model setup can be found in Table 1 of Chen et al. (2019). Fourteen modeling groups (M1-M14) participated, but M3 and M9 are not included in this study due to uncompleted model submissions. The M14 model has a smaller simulation domain than the others; therefore, it is not included in the multi-model mean (MMM) results. The gas and aerosol modules and dust schemes employed by the participating models were introduced in detail in Sect. 2.1 of Chen et al. (2019). The following are the descriptions on the model setup for wet and dry depositions.
Wet deposition removes gases and aerosols from the atmosphere by rain droplets, involving both in-cloud scavenging (rainout) and below-cloud scavenging (washout). The gases in the atmosphere are dissolved in the raindrop and then removed from the atmosphere. For the nonreactive gases, the removal rate depends on the solubility of gases and is a function of Henry's law. Particles take part in the cloud condensation nuclei in the presence of supersaturated water vapor and then grow into cloud droplets. In this study, only M2, M4, M6, M11 and M12 submitted the main components of S and N depositions. All these models use the same wet deposition scheme based on Henry's law. The efficiency of wet deposition is assessed by the so-called "washout ratio", calculated as the ratio of particle concentrations in deposition to particle concentrations in surface air as shown in Eq. (1).
where λ wet (%) is the washout ratio for wet deposition, C depo (µg m −3 ) is the concentration of particles in deposition and C surface_air (µg m −3 ) is the concentration of particles at nearsurface atmosphere. Dry deposition is mainly driven by turbulent and molecular diffusion processes. All models except M12 use the same dry deposition scheme from Wesely (1989). The dry deposition flux is proportional to the concentration of pollutants at height. The dry deposition velocity is calculated with Eq. (2).
where F c (mg m −2 yr −1 ) is the dry deposition flux, V d (cm s −1 ) is the deposition velocity and C a (µg m −3 ) is the concentration of species at height. The negative mark indicates the direction of the dry deposition velocity. The V d is determined by the resistance of the air layer (r). The total r is composed of three factors (Eq. 3): the aerodynamic resistance (r a ), boundary layer resistance (r bc ) and canopy resistance (r surf ). M12 uses the general approach from Wesely (1989) and updates from Zhang et al. (2003). Zhang et al. (2003) update the value of non-stomatal resistance (r ns ), which is a component of r surf related to the soil uptake and cuticle uptake of dry deposition. A model evaluation shows the updates can improve the model prediction of dry deposition velocities of SO 2 (Zhang et al., 2003).

Observational data
To make the discussion clear, we define the regions used in the following analysis here: northern EA (Russia and Mongolia), central EA (China), eastern EA (Japan and the Korean Peninsula) and southern EA (Cambodia, Lao PDR, Myanmar, Thailand, Vietnam, Indonesia, Malaysia and the Philippines). The following monitoring datasets are used in the analysis in Sect. 3.2-3.4. The air pollution index (API) provides monthly average PM 10 data from 86 sites (A1-A86 in Fig. 1) (http://www.mee.gov.cn/, last access: 17 June 2020). This dataset has been widely used to study PM pollution (Qu et al., 2010;Chen et al., 2008;Deng et al., 2011), as well as for model evaluation Xing et al., 2015) in China. It was replaced by the air quality index (AQI) in 2013. The API data cover eastern China well with intensively located sites, but sites in western China are very limited. EANET (E1-E54) provides monthly average concentrations of PM 10 and SNA, as well as S and N depositions, from

Brief results of model performance evaluation
All models have submitted the monthly average concentrations of PM 10 , PM 2.5 and SNA at the surface layer except PM 10 from M13 and NO − 3 and NH + 4 from M10. An evaluation of the models' performance on aerosols can be found in our companion paper (Chen et al., 2019). The following are the main findings. The differences between MMM and observational and satellite data for the surface concentrations of PM 10 , PM 2.5 , SO 2− 4 , NO − 3 and NH + 4 and the column integrated aerosol optical depth (AOD) were −32.6 %, 4.4 %, −19.1 %, 4.9 %, 14.0 % and 18.7 %, respectively (calculated with normalized mean biases, NMBs). PM 10 concentrations were generally underestimated over the simulation domain. PM 2.5 concentrations were also underestimated over eastern EA but were simulated well in central EA. The models failed to reproduce the high peaks of SO 2− 4 concentrations in central EA probably due to missing SO 2− 4 formation mechanisms (such as heterogeneous SO 2− 4 chemistry), which have been reported as an important formation pathway of SO 2− 4 in China. NO − 3 concentrations were overpredicted by most models over the simulation domain and were associated with the underestimation of SO 2− 4 . The M7 and M8 models produced significantly lower NO − 3 concentrations than observations and other models due to the underestimation of NH 3 concentrations, which might be caused by low NH 3 emissions, and missing N 2 O 5 heterogeneous reactions, which serve as an important formation pathway of NO − 3 (Chen et al., 2019). The spatial distributions of AOD were generally simulated well, but several models were found to underestimate the AOD values around the Himalaya mountains, Taklamakan Desert and Gobi Desert.
This study compares the models' performance with a global-scale model study. The Task Force on Hemispheric Transport of Air Pollution (TF HTAP) is an intercomparison study of global and regional models to assess the impact of the hemispheric transport of air pollutants on regional atmosphere. The second phase of HTAP (HTAP-II) involved more than 20 global models to simulate the air quality in 2010 (Galmarini et al., 2017). Most models utilize coarseresolution grids at about 2-3 • . HTAP-II and MICS-Asia III share some common points like using the same emission inventory in East Asia (Li et al., 2017) and using the same observational dataset to evaluate PM 10 (more than 100 EANET and API sites) and PM 2.5 (two EANET sites) (Dong et al., 2018). The mean bias (MB) of PM 10 over EA is −30.7 and −18.6 µg m −3 for HTAP-II and this study, respectively (values for sites used by both studies). The MB of PM 2.5 is −1.6 and −4.3 µg m −3 for HTAP-II and this study, respectively. Both studies find an underestimation of PM 10 concentrations, while PM 2.5 concentrations are produced well. The models of MICS-Asia III perform slightly better than those of HTAP-II with lower model bias in PM 10 , probably taking advantage of finer resolutions of model grids.
The so-called "diagnostic evaluation" approach is adopted to check the model bias oriented by individual processes (Dennis et al., 2010). Although all modeling groups are required to use the prescribed emission inventory, a mismatch was found during the temporal and vertical treatments of emission files by different modeling groups which has caused differences in the model inputs (Itahashi et al., 2020). To avoid the possible impacts on the inter-model comparison, we compare the indicators (i.e., sulfur oxidation ratio, SOR) instead of direct model outputs (i.e., SO 2− 4 concentrations) to focus on the differences caused by model mechanisms. The following three processes are examined.
1. Formation of PMF. Section 3.2 investigates the differences in the gas-particle conversion of S and N among different models.
2. Formation of PMC. Section 3.3 assesses the models' ability to reproduce the spatial and temporal distributions of PM in regions affected by dust storms. A comparison is conducted between models with and without dust emission modules.
3. Removal of particles from the atmosphere. Section 3.4 compares the models' performance in simulating the amounts of deposition and the efficiencies of wet and dry depositions.
3.2 Gas-particle conversion The following two indicators are calculated to illustrate the gas-particle conversions of S and N.
where n-SO 2− 4 , n-SO 2 , n-NO − 3 and n-NO 2 (mol m −3 ) are the mole concentrations of SO 2− 4 particles, SO 2 gas, NO − 3 particles and NO 2 gas. The C(NO 2 ) (%) indicator only has NO − 3 and NO 2 in the denominator due to the limitations of observational data, but it still can portray the conversion of N between the gas phase and the particle phase. Figures 2 and 3 show the distributions of the SOR and C(NO 2 ) values of the models. The SOR values are lowest around the HBT region in northeastern China (10 %-40 %) and highest in southwestern China (60 %-80 %) (Fig. 2). The CMAQ models (including WRF-CMAQ and RAMS-CMAQ) produce similar SOR patterns, except that the CMAQv5.0.2 models (M1 and M2) predict a 10 % higher SOR in the HBT region than the CMAQv4.7.1 models (M4, M5 and M6). CMAQv502 updated the production of SO 2− 4 in the aqueous reaction of the older version (Appel et al., 2013;Fountoukis and Nenes, 2007). The explicit treatment of Fe and Mn allows a more consistent treatment of aqueous reactions from SO 2 to SO 2− 4 . For the Chem models (including WRF-Chem, GEOS-Chem and NHM-Chem), the two WRF-Chem models (M7 and M8) produce similar SOR magnitudes and distributions in all regions except southwestern China (around Tibet in Fig. 1) and the open oceans, while the NHM-Chem (M12) and GEOS-Chem (M13) models produce slightly higher SOR values over the whole simulation domain. The differences between the CMAQ and the Chem models are significant over the inland regions of northern and eastern China, Japan, and southern EA. The CMAQ models generally predict 5 %-20 % higher SOR values than the Chem models. Similarly, the CMAQ models generally give 20 % higher C(NO 2 ) values than the WRF-Chem models especially in eastern EA (Fig. 3). The C(NO 2 ) of M8 is extremely low due to unreasonably low NO − 3 concentrations.   Figure 4b-e show the results in different regions. In northern EA, the total amount of S is underestimated by all models except M13 and M14. However, the SOR value (0.12) is reproduced well by most models (0.08-0.20) except M12 (0.25) and M10 (0.32). There is only one site available for central EA. Most models (except M12 and M13) have largely underestimated the SOR value, while M14 has largely overestimated it. For eastern EA, the total amount of S is captured well by all models except M11, M12 and M14. The SOR value (0.55) is generally underestimated by all models except M10 (0.55) and M14 (0.71). For southern EA, the total amount of S is generally overestimated by all models except M13, while the SOR value is underestimated by all models except M13 and M14. Overall, the models have both positive and negative biases in simulating the total amounts of S, but they generally underestimate the SOR values in all regions. Furthermore, the modeled SOR values vary largely among models (ranging from 0.12 to 0.57), resulting in a large inter-model difference (1 standard deviation %, SD %, = 50 %, SD % is calculated as 100 % × 1 SD / MMM). This variation is of the same magnitude as the variation of SO 2− 4 concentrations (1 SD % = 50 % in Supplement Fig. S2). The results suggest that differences in gas-particle conversion among models could account largely for the models' inconsistency in simulating the SO 2− 4 concentrations. Figure 4f-h compare the gas-particle conversion of N with the C(NO 2 ) indicator. Only one site in China and one site in Japan have both NO 2 and NO − 3 observations. At the Hongwen sites in China, all models except M5 underestimate the The C(NO 2 ) of M8 is extremely low due to the unreasonably low NO − 3 concentration, which is considered an outlier in this study. Values are calculated by annual average data.
sum of NO 2 and NO − 3 , but the modeled C(NO 2 ) values are close to the observation (0.18) except M5 (0.07), M8 (0.00) and M12 (0.40). Similar to the results of S conversion, the newer version of the WRF-CMAQ model generally produces higher C(NO 2 ) than the older version, but the differences between the two in C(NO 2 ) are smaller than those for SOR. At the Banryu site in Japan, the sum of NO 2 and NO − 3 is simulated well by all models except M8. The C(NO 2 ) (0.19) value is also simulated well by all models except M8 (0.00), M12 (0.53) and M14 (0.77). Overall, the model accuracy of C(NO 2 ) is slightly higher than that of SOR according to the comparison with observed values. The models also have higher consistencies with C(NO 2 ) than SOR. However, further validation is required due to the limited number of observations for the conversion of N.
Besides the inter-model differences in the pathways of SO 2− 4 and NO − 3 formation, the interaction between aerosols and atmospheric oxidants can also affect the formation of aerosols (Liao et al., 2003). Aerosols affect the tropospheric oxidant (i.e., HO x ) budget by altering the photolysis rates and uptake of reactive gases (Tie et al., 2005;Li et al., 2018). In turn, the abundance of HO x affects the gas-aerosol conversion of S and N. In addition, the conversion between sul-furic acid and SO 2− 4 depends on the abundance of neutralizers such as Na + and NH + 4 .

Implementation of dust emission modules in the models
The PMC concentrations at the surface layer are calculated by subtracting PM 2.5 from PM 10 . Figure 5 shows the spatial distribution of annual average PMC in the models. Most models show very low (< 2 µg m −3 ) concentrations of PMC around the Taklamakan Desert and the Gobi Desert in northern China except M10, M11 and M14. These three models use dust emission modules in simulations (Chen et al., 2019). M12 also includes dust emissions, but its PM 10 concentrations over northern China are much lower than the three models. The predicted PMC concentrations of the three models differ largely. The domain-average concentrations of PMC are 21, 7 and 12 µg m −3 for M10, M11 and M14, respectively. The distributions of PMC also differ largely over northwestern China, where the impacts of dust are most significant. The differences among the models mainly come from the different parameterizations such as source functions, dustlifting mechanisms and the size distributions of particles (Chen et al., 2019). Different PMC concentrations are also found over oceans, mainly attributed to the sea-salt emissions in this study. The sea-salt emissions are parameterized in the models with various formulas (Chen et al., 2019). In this study, the WRF-Chem models (M7 and M8) turned off the sea-salt emissions; thus, their PMC concentrations over the oceans and seas are not defined. The two WRF-CMAQ models use the in-line sea-salt emission module of Gong (2003) with updates by Kelly et al. (2010). They predict consistent distributions of PMC over oceans. M10 and M11 use the same module as the CMAQ models (Gong, 2003) but produce higher PMC over oceans. M12 adopts the method of breaking waves over the seashore by Clarke et al. (2006) and produces the highest PMC over oceans among all models. The implementation of dust emissions is expected to improve the models' performance, but how significant could the improvement be? Can models predict the PM concen-trations reasonably at regions affected by dust with current dust emission modules? To answer these questions, all sites are grouped as dust and non-dust sites according to their locations. The sites located in regions that have been reported to receive severe impacts and the rapid deposition of dust are marked as dust sites (Shao and Dong, 2006) (graycolored shaded areas in Fig. 1). Figure 6a-b and Table 1 compare the models' performance at the dust and non-dust sites. For the non-dust sites (Fig. 6b), most models have captured well the magnitudes of PM 10 at the "API non-coastal, non-dust" sites (MB = −8 % and NMB = −8 %). The sites marked "API coastal", which are located close to the coastal regions, are all slightly underestimated by about 25 µg m −3 (30 %). Similarly, the PRD and Taiwan sites, which are also located near the coastal regions, are all underestimated by about 20 µg m −3 (37 %). Bias in sea-salt emissions is the pos- sible reason. Sea-salt emissions are reported to contribute 20 %-40 % of SNA and PM 10 over coastal regions . Including the sea-salt emissions in model simulations can improve the model accuracy with an 8 %-20 % increase in PM 10 , SNA, Na + and Cl − (Kelly et al., 2010;Im, 2013). The influence of sea-salt emissions is not the focus of this study, but further study is strongly recommended.
For the dust sites (Fig. 6a), most models have generally underestimated the PM 10 concentrations by 10-40 µg m −3 (15 %-50 %). The three models with dust modules perform better than the others at the dust sites, especially site A2, A30, A68, A69, R5 and R18. However, they miss the high PM 10 concentrations at sites like R1-R3 and R11 and overestimate the PM 10 concentrations at sites such as A60 and A80. This indicates that the dust emission modules involved in this study can not fully capture the magnitudes and distributions of dust pollution over EA. In addition, the modeled PMC differs a lot with different dust emission modules (Fig. 5). The M10 model produces very high PMC over the whole of eastern China, while the M11 model only predicts high PMC around the HBT region. Overall, the models' performance with PM over dust regions can be improved largely by including dust emission modules. However, the concentrations and distributions are not yet captured well, and large inconsistencies are found among different dust emission modules. Figure 6c-d compare the modeled monthly trends of PM 10 with observations at the dust and non-dust sites, and Fig. 6e shows the correlation (R) values between models and observations. For the non-dust sites (Fig. 6d), the trends are caught well by most models. The R values are close to 0.70 for all models except M7 (0.62), M8 (0.58) and M14 (0.63). The WRF-Chem models (M7 and M8) simulate too low PM 10 concentrations in winter. The M14 model overestimates the PM 10 concentrations from March to May. Most models have much lower R values at the dust sites than the non-dust sites (Fig. 6e) due to the underestimation of the PM 10 concentrations during winter. For instance, R values of M10 drop from 0.7 at the non-dust sites to 0.11 at the dust sites. Spring (March, April and May) has the largest model biases at the dust sites, which is coincident with the dust storm season in Asia (Arimoto et al., 2006). The M10 and M14 models per-form well in most months at both the dust and non-dust sites, taking advantage of their dust emission modules, but their R values at the dust sites are still low. Future study is strongly suggested for a better understanding of the seasonal variations of dust pollution.  Table 2 show the models' performance on simulating wet deposition. For wet SO 2− 4 deposition, despite the fact that the two sites with the highest deposition (E2 and E3) in China are underestimated, the other sites are generally well simulated by MMM with a low MB of −8 %. The individual model bias varies from −22 % to 41 %. The CMAQ models (M2, M4 and M6) all underestimate the wet SO 2− 4 deposition. There are large differences between CMAQv4.7.1 and CMAQv5.0.2 in Japan, where the CMAQv4.7.1 models (M4 and M6) slightly overestimate the wet SO 2− 4 deposition at E19 and E23, while the CMAQv5.0.2 model (M2) slightly underestimates the value at these sites. The M11 model produces a considerably higher wet deposition of SO 2− 4 and NO − 3 than the other models in eastern EA. The possible reasons are discussed later. The MMM underestimates the NO − 3 wet deposition by 29 % due to the large underprediction in southern EA. Southern EA has several sites with very high depositions, such as the E29 site in Malaysia and the E35 and E36 sites in the Philippines, but all models fail to catch those high peaks. The individual model bias varies from −59 % to 30 % among the models. M2, M4, M6 and M12 perform similarly with a high underestimation ranging from 39 % to 59 %. M11 is the only model that succeeds in capturing the high wet NO − 3 deposition at E2 and E3 in China, but it overestimates most sites in China, Japan and the Korean Peninsula. In the case of wet NH + 4 deposition, the MMM generally underestimates the amount at all sites with a bias of −40 % especially at E2-E4 in China, E45 in Thailand, and E35 and E36 in the Philippines. The individual model bias varies from −10 % to −37 %. The M2, M4 and M6 models perform similarly, while the M11 and M12 models predict higher depositions at all sites. Overall, large inter-model disagreements are found in eastern EA for wet depositions of SO 2− 4 and NO − 3 and in southern EA for the wet NH + 4 deposition. The observations of dry depositions are composed by the observed concentration of air pollutants and the simulated deposition velocity. Since the EANET network only provides the former, a complete evaluation of the dry deposition is not avail-able in this study (complete observation for dry deposition with velocity is available after 2013). Table 3 lists the domain-total, annual-accumulated amounts of S and N depositions by the models. The total wet S deposition (D Swet ) includes wet depositions of SO 2 , H 2 SO 4 and SO 2− 4 . The total dry S deposition (D Sdry ) includes dry depositions of SO 2 , H 2 SO 4 and SO 2− 4 . The total wet N deposition (D Nwet ) includes wet depositions of NO − 3 , NH + 4 , HNO 3 and NH 3 . The total dry N deposition (D Ndry ) includes dry depositions of NO, NO 2 , NO − 3 , NH + 4 , HNO 3 and NH 3 . D Swet values range from 10.5 to 31.3 Tg (S) yr −1 among models (1 SD % = 75 %). The estimation by the M11 model is 2-fold higher than the other four models. The inter-model difference is significant even among the same type of models with different versions. The CMAQv4.7.1 models (M4 and M6) produce 12.5 Tg (S) yr −1 (M4) and 13.8 Tg (S) yr −1 (M6) of D Swet , while the prediction by the CMAQv5.0.2 model (M2) is 25 % lower. Despite the large discrepancies in the total amount, all five models agree that over 95 % of D Swet is wet SO 2− 4 deposition. The total amounts of D Sdry range from 4.3 to 10.6 Tg (S) yr −1 among models (1 SD % = 39 %). M11 predicts higher D Sdry than the other models, and the CMAQv5.0.2 model (M2) predicts 45 % lower D Sdry than the two CMAQv4.7.1 models (M4 and M6). Similar to D Swet , all models have strong agreements on the proportions of the components. D Nwet ranges from 12.2 to 20.0 Tg (N) yr −1 among models (1 SD % = 21 %). The CMAQ models (M2, M4 and M6) simulate close results (12-15 Tg (N) yr −1 ), while M11 (20.0 Tg (N) yr −1 ) and M12 (16.5 Tg (N) yr −1 ) simulate slightly higher amounts. As for the proportion of components, the M2, M4, M6 and M12 models predict high proportions of wet NO − 3 and wet NH + 4 depositions (particle phase), while the M11 model produces higher percentages of wet HNO 3 and wet NH 3 depositions (gas phase). D Ndry ranges from 3.9 to 14.1 Tg (N) yr −1 (1 SD % = 38 %). M12 gives a considerably lower amount than the other models. The models are quite consistent on the proportions of components. The amount of wet deposition is determined by the C surface_air and λ wet (mentioned in Sect. 2.2). In this study, C surface_air may be partially influenced by different model inputs, caused by a mismatch occurring in the vertical and temporal allocation of emission inputs and the employment of different mechanisms to produce dust and sea-salt emissions. Thus, we used λ wet , instead of direct model outputs of wet deposition, as an indicator to reveal the inter-model differences of wet deposition in the following analysis. For the same reason, we used V d as an indicator for the inter-model comparison of dry deposition. Figure 8a-e show λ wet of S deposition (λs wet ) by the models. The CMAQ models (M2, M4 and M6) have similar patterns for λs wet over the inland regions, while the M12 model predicts 30 %-90 % lower ratios in India. The M11 model generally predicts about 20 %-70 % lower λs wet than the other four models except India, where the difference could reach up to 170 %. For λ wet of N deposition (λ Nwet ) (Fig. 8fj), the CMAQv4.7.1 models (M4 and M6) and M12 perform similarly, but the CMAQv5.0.2 model (M2) predicts 30 % lower λ Nwet in India, Japan and the Korean Peninsula. M11 generally predicts lower ratios in India (60 % lower), Indonesia and the Philippines (120 % lower) than the CMAQ models. Figure 9 shows the spatial distributions of V d . For V d of S deposition (V Sd ) (Fig. 9a-e), the CMAQ models (M2, M4 and M6) simulate very similar spatial distributions. The M11 and M12 models predict a 0.5 cm s −1 lower V Sd than the CMAQ models over the whole inland regions especially -

Wet and dry depositions
1.  - in eastern China and the Indian peninsula. For V d of N deposition (V Nd ) ( Fig. 9f-j), the CMAQ models (M2, M4 and M6) predict very similar distributions. M11 and M12 predict about a 0.3 and 1-2 cm s −1 lower V Nd than the CMAQ models over the inland regions. Both λ wet and V d of M11 are much lower than the other models especially over eastern EA. This is a possible reason for the biased performance of M11 for wet deposition (Fig. 7). Overall, large inter-model differences are found in predicting both the amounts of depositions and the efficiencies of depositions.

Conclusions
Topic I of the MICS-Asia III aims at (i) evaluating the strengths and weaknesses of current multiscale air quality models in simulating concentration and deposition fields over East Asia and (ii) providing suggestions for future model developments. This study compares the performance of 12 regional models for the prediction of PM concentrations over EA. The participating models include WRF-CMAQ (v4.7.1 and v5.0.2), RAMS-CMAQ, WRF-Chem (v3.6.1 and v3.7.1), GEOS-Chem, NHM-Chem, NAQPMS and NU-WRF. Three processes/mechanisms are investigated to identify the causes of inter-model differences.
For the formation of PMF, SOR and C(NO 2 ) values are used to demonstrate the inter-model differences in gasparticle conversions. The SOR values are generally underestimated by most models at the EANET sites. A general trend is found that the WRF-CMAQv5.0.2 models produce the highest SOR values among all models, followed by the WRF-CMAQv4.7.1 models (10 % lower in the HBT region), the WRF-Chem models and other models (5 %-20 % lower over inland regions). The inter-model variation of SOR (1 SD % = 50 %) is of the same magnitude as that of SO 2− 4 concentrations. Similar results are found in C(NO 2 ), but models have stronger agreements on C(NO 2 ) than SOR. The different treatments of gas-particle conversions account largely for the different model performances on simulating PMF.
For the formation of PMC, the models without dust emission modules generate very low (< 2 µg m −3 ) PMC concentrations. They can capture well the PM 10 concentrations at non-dust-affected sites but underestimate the PM 10 concentrations at sites affected by dust storms by up to 50 %. This underestimation is largely improved by implementing dust Figure 8. Washout ratios (λ wet ) of (a-e) S deposition and (f-j) N deposition of models. Values are calculated with annual accumulated depositions. The unit is percent (%). emission modules (bias reduced to around −20 %). However, both the magnitude and distribution of dust pollution are not fully captured. In addition, models employing different dust emission modules show strong disagreements on the distribution of PMC.
For the removal of PM from the atmosphere, the amounts of atmospheric deposition vary largely among models (1 SD %) by 75 %, 39 %, 21 % and 38 % for D Swet , D Sdry , D Nwet and D Ndry , respectively. The λ wet and V d indicators are used to exclude the influences brought by model inputs. For λ wet , models agree more on the D Swet than D Nwet . The largest model inconsistencies are found in India (up to 170 %), Indonesia and the Philippines (up to 120 %). For V d , models differ more on D Ndry than D Sdry . The inter-model dif-ferences are widely found over the inland regions for D Sdry (about 0.5 cm s −1 ) and D Ndry (0.3-2 cm s −1 ).
The following are the main observations of this study. (1) We compare the modeled conversions of S and N between gas and particle phases with observations, which makes it possible to both quantify the inter-model differences and tell which module might be more reasonable.
(2) Several new updates on dust modules have been published in recent literature, but there is limited study on the intercomparison. This study provides an opportunity to bring together the new updates on dust emission modules and review their performance in EA. (3) The study provides a comprehensive view on the total budget of S and N aerosols by including the analysis of the removal processes. It turns out that this process Figure 9. Dry deposition velocities (V d ) of (a-e) S deposition and (f-j) N deposition of models. Values are calculated with annual accumulated depositions. The unit is (cm s −1 ). brings significant uncertainties to inter-model differences. It should be noted that other factors such as vertical diffusion can also contribute to model differences. Meanwhile, this study focuses on comparing the models' ability in simulating PM in 2010. The chemical regimes may have changed drastically due to the rapid changes in emissions and the implementation of control policies in Asia. Studies on more recent years and heavily polluted episodes are under preparation.
Data availability. The observation data are introduced with details in Sect. 2.3 with web links of publicly available datasets. The model data are available upon request.
Author contributions. JT and JSF designed the study. JT processed and analyzed the data. JSF, GRC, SI and ZT contributed to the results and discussions. JSF, ZT, KH, SI, KY, TN, YM, XW, YL, HJL, JEK, CYL, BG, MK, JZ, MZ, LH and ZW provided modeling data. All co-authors provided comments on the paper.