The AeroCom evaluation and intercomparison of organic aerosol in global models

Abstract. This paper evaluates the current status of global modeling of the organic aerosol (OA) in the troposphere and analyzes the differences between models as well as between models and observations. Thirty-one global chemistry transport models (CTMs) and general circulation models (GCMs) have participated in this intercomparison, in the framework of AeroCom phase II. The simulation of OA varies greatly between models in terms of the magnitude of primary emissions, secondary OA (SOA) formation, the number of OA species used (2 to 62), the complexity of OA parameterizations (gas-particle partitioning, chemical aging, multiphase chemistry, aerosol microphysics), and the OA physical, chemical and optical properties. The diversity of the global OA simulation results has increased since earlier AeroCom experiments, mainly due to the increasing complexity of the SOA parameterization in models, and the implementation of new, highly uncertain, OA sources. Diversity of over one order of magnitude exists in the modeled vertical distribution of OA concentrations that deserves a dedicated future study. Furthermore, although the OA / OC ratio depends on OA sources and atmospheric processing, and is important for model evaluation against OA and OC observations, it is resolved only by a few global models. The median global primary OA (POA) source strength is 56 Tg a−1 (range 34–144 Tg a−1) and the median SOA source strength (natural and anthropogenic) is 19 Tg a−1 (range 13–121 Tg a−1). Among the models that take into account the semi-volatile SOA nature, the median source is calculated to be 51 Tg a−1 (range 16–121 Tg a−1), much larger than the median value of the models that calculate SOA in a more simplistic way (19 Tg a−1; range 13–20 Tg a−1, with one model at 37 Tg a−1). The median atmospheric burden of OA is 1.4 Tg (24 models in the range of 0.6–2.0 Tg and 4 between 2.0 and 3.8 Tg), with a median OA lifetime of 5.4 days (range 3.8–9.6 days). In models that reported both OA and sulfate burdens, the median value of the OA/sulfate burden ratio is calculated to be 0.77; 13 models calculate a ratio lower than 1, and 9 models higher than 1. For 26 models that reported OA deposition fluxes, the median wet removal is 70 Tg a−1 (range 28–209 Tg a−1), which is on average 85% of the total OA deposition. Fine aerosol organic carbon (OC) and OA observations from continuous monitoring networks and individual field campaigns have been used for model evaluation. At urban locations, the model–observation comparison indicates missing knowledge on anthropogenic OA sources, both strength and seasonality. The combined model–measurements analysis suggests the existence of increased OA levels during summer due to biogenic SOA formation over large areas of the USA that can be of the same order of magnitude as the POA, even at urban locations, and contribute to the measured urban seasonal pattern. Global models are able to simulate the high secondary character of OA observed in the atmosphere as a result of SOA formation and POA aging, although the amount of OA present in the atmosphere remains largely underestimated, with a mean normalized bias (MNB) equal to −0.62 (−0.51) based on the comparison against OC (OA) urban data of all models at the surface, −0.15 (+0.51) when compared with remote measurements, and −0.30 for marine locations with OC data. The mean temporal correlations across all stations are low when compared with OC (OA) measurements: 0.47 (0.52) for urban stations, 0.39 (0.37) for remote stations, and 0.25 for marine stations with OC data. The combination of high (negative) MNB and higher correlation at urban stations when compared with the low MNB and lower correlation at remote sites suggests that knowledge about the processes that govern aerosol processing, transport and removal, on top of their sources, is important at the remote stations. There is no clear change in model skill with increasing model complexity with regard to OC or OA mass concentration. However, the complexity is needed in models in order to distinguish between anthropogenic and natural OA as needed for climate mitigation, and to calculate the impact of OA on climate accurately.


Abstract. This paper evaluates the current status of global modeling of the organic aerosol (OA) in the troposphere and analyzes the differences between models as well as between models and observations. Thirty-one global chemistry transport models (CTMs) and general circulation models (GCMs) have participated in this intercomparison, in the framework of AeroCom phase II. The simulation of OA varies greatly between models in terms of the magnitude of primary emissions, secondary OA (SOA) formation, the number of OA species used (2 to 62), the complexity of OA parameterizations (gas-particle partitioning, chemical aging, multiphase chemistry, aerosol microphysics), and the OA physical, chemical and optical properties. The diversity of the global OA simulation results has increased since earlier Ae-roCom experiments, mainly due to the increasing complexity of the SOA parameterization in models, and the implementation of new, highly uncertain, OA sources. Diversity of over one order of magnitude exists in the modeled vertical distribution of OA concentrations that deserves a dedicated future study. Furthermore, although the OA / OC ratio depends on OA sources and atmospheric processing, and is important for model evaluation against OA and OC observations, it is resolved only by a few global models.
The median global primary OA (POA) source strength is 56 Tg a −1 (range 34-144 Tg a −1 ) and the median SOA source strength (natural and anthropogenic) is 19 Tg a −1 (range 13-121 Tg a −1 ). Among the models that take into account the semi-volatile SOA nature, the median source is calculated to be 51 Tg a −1 (range 16-121 Tg a −1 ), much larger than the median value of the models that calculate SOA in a more simplistic way (19 Tg a −1 ; range 13-20 Tg a −1 , with one model at 37 Tg a −1 ). The median atmospheric burden of OA is 1.4 Tg (24 models in the range of 0.6-2.0 Tg and 4 between 2.0 and 3.8 Tg), with a median OA lifetime of 5.4 days (range 3.8-9.6 days). In models that reported both OA and sulfate burdens, the median value of the OA/sulfate burden ratio is calculated to be 0.77; 13 models calculate a ratio lower than 1, and 9 models higher than 1. For 26 models that reported OA deposition fluxes, the median wet removal is 70 Tg a −1 (range 28-209 Tg a −1 ), which is on average 85 % of the total OA deposition.
Fine aerosol organic carbon (OC) and OA observations from continuous monitoring networks and individual field campaigns have been used for model evaluation. At urban locations, the model-observation comparison indicates missing knowledge on anthropogenic OA sources, both strength and seasonality. The combined model-measurements analysis suggests the existence of increased OA levels during summer due to biogenic SOA formation over large areas of the USA that can be of the same order of magnitude as the POA, even at urban locations, and contribute to the measured urban seasonal pattern.
Global models are able to simulate the high secondary character of OA observed in the atmosphere as a result of SOA formation and POA aging, although the amount of OA present in the atmosphere remains largely underestimated, with a mean normalized bias (MNB) equal to −0.62 (−0.51) based on the comparison against OC (OA) urban data of all models at the surface, −0.15 (+0.51) when compared with remote measurements, and −0.30 for marine locations with OC data. The mean temporal correlations across all stations are low when compared with OC (OA) measurements: 0.47 (0.52) for urban stations, 0.39 (0.37) for remote stations, and 0.25 for marine stations with OC data. The combination of high (negative) MNB and higher correlation at urban stations when compared with the low MNB and lower correlation at remote sites suggests that knowledge about the processes that govern aerosol processing, transport and removal, on top of their sources, is important at the remote stations. There is no clear change in model skill with increasing model complexity with regard to OC or OA mass concentration. However, the complexity is needed in models in order to distinguish between anthropogenic and natural OA as needed for climate mitigation, and to calculate the impact of OA on climate accurately.

Introduction
Atmospheric aerosols are important drivers of air quality and climate. The organic component of aerosols can contribute 30-70 % of the total submicron dry aerosol mass, depending on location and atmospheric conditions Murphy et al., 2006). The majority of fine aerosol mass (PM 1 : particulate matter of dry diameter smaller than 1 µm) consists of non-refractory material, and has been found to contain large amounts of organic matter , as measured by the Aerosol Mass Spectrometer (AMS).
Global model estimates of the dry organic aerosol (OA) direct radiative forcing at the top of the atmosphere are −0.14 ± 0.05 W m −2 based on AeroCom phase I experiments , which was decomposed during AeroCom phase II to −0.03 ± 0.01 W m −2 for primary organic aerosol (POA) from fossil fuel and biofuel, −0.02 ± 0.09 W m −2 for secondary organic aerosol (SOA) and 0.00 ± 0.05 W m −2 for the combined OA and black carbon from biomass burning . IPCC (2013) assessed the contribution of anthropogenic primary and secondary organic aerosols to the radiative forcing from aerosol-radiation interactions (RFari) to be −0.12 (−0.4 to +0.1) W m −2 . Spracklen et al. (2011) estimated the climate forcing of the anthropogenically driven natural SOA alone (including the presence of water on hydrophilic OA) at −0.26 ± 0.15 W m −2 (direct effect) and −0.6 +0.24 −0.14 W m −2 (indirect effect). These amounts largely depend on the atmospheric loadings of OA simulated by the models under past, present and future climate conditions, and on the properties they attribute to them. Indeed, Myhre et al. (2013) calculated a SOA load of 0.33 ± 0.32 Tg, while Spracklen et al. (2011) estimated a SOA load of 1.84 Tg, which resulted in an order of magnitude higher radiative forcing. There is therefore an urgent need for a consensus between models and agreement with observations, in order to constrain the large variability between models and, consequently, the OA impact on climate.

Definitions
OA can be emitted directly as POA or formed via gas-phase reactions and subsequent condensation of semi-volatile vapors, resulting in SOA. In addition, multiphase and heterogeneous processes can also contribute to SOA formation. Emissions of volatile organic compounds (VOCs) from terrestrial vegetation are 10 times larger than from anthropogenic sources (Guenther et al., 1995;Kanakidou et al., 2005, and references therein). In addition, the mass of organic carbon emitted in the gas phase exceeds by more than a factor of 10 that emitted directly as primary particulate matter (Goldstein and Galbally, 2007;Kanakidou et al., 2012). VOCs therefore have a large potential to contribute to SOA formation. However, the exact formation processes and composition of OA are poorly understood. Fuzzi et al. (2006) and Hallquist et al. (2009) provided a number of marker compounds and observations that could be used to distinguish the various OA sources. Most OA observational techniques measure the particulate organic carbon content of OA mass, either total (OC) or the water soluble component (WSOC), while some of the variability of OA is accounted for by oxygen, nitrogen and other elements in the organic compounds. Significant discrepancies in OC concentrations determined by different techniques have been identified , and have been addressed by protocols of the definition of OC / EC (elemental carbon) measurements (Cavalli et al., 2010). The use of OC historically corresponded to its easier measurement. Recently, Aerosol Mass Spectrometer (AMS) observations started providing very high temporal resolution information on the OA mass of the non-refractory PM 1 . It has to be emphasized that it is the OA mass, not the OC, which determines aerosol properties such as chemical composition, size, hygroscopicity and hygroscopic growth, each of which is an important factor affecting aerosol scattering, absorption and the ability to act as cloud condensation nuclei (CCN). Therefore, the ratio of OA to OC mass (Turpin and Lim, 2001;Aiken et al., 2008) requires careful investigation. Furthermore, OA compounds differ in their volatility, solubility, hygroscopicity, chemical reactivity and their physical and optical properties. Due to the chemical complexity of the organic component of aerosols (Goldstein and Galbally, 2007), only simplified representations are introduced in global chemistry climate models Hallquist et al., 2009). As a compromise between simplicity and accuracy, the net effect of the complex mixture of OA is described by only a limited number of representative compounds or surrogates. Kanakidou et al. (2005) reviewed how organic aerosols were incorporated into global chemistry transport models (CTMs) and general circulation models (GCMs), and identified gaps in knowledge that deserved further investigation. The POA sources include fossil fuel, biofuel and biomass burning, as well as the less understood sources of marine OA, biological particles and soil organic matter on dust (Kanakidou et al., 2012, and references therein). Biogenic VOCs (BVOCs) greatly contribute to OA formation (e.g., Griffin et al., 1999b;Kanakidou et al., 2012), implying that significant feedbacks exist between the biosphere, the atmosphere and climate that affect the OA levels in the atmosphere, which was also demonstrated by more recent studies Arneth et al., 2010;Carslaw et al., 2010;Paasonen et al., 2013). In addition, oxidant and pollutant enhancement by human-induced emissions is expected to increase OA levels, even those chemically formed by BVOC (Hoyle et al., 2011, and references therein); it is therefore conceivable that some portion of the ambient biogenic SOA, which would had been absent under preindustrial conditions, can be removed by controlling emissions of anthropogenic pollutants (Carlton et al., 2010). Goldstein and Galbally (2007) estimated that SOA formation could be as high as 910 TgC a −1 , which is at least an order of magnitude higher than any SOA formation modeling study, as shown here. Spracklen et al. (2011) were able to reconcile AMS observations (mostly from the Northern Hemisphere mid-latitudes during summer) with global CTM simulations by estimating a large SOA source (140 Tg a −1 ). 100 Tg a −1 was characterized as anthropogenically controlled, 90 % of which was possibly linked to anthropogenically enhanced SOA formation from BVOC oxidation. Similar conclusions were reached by Heald et al. (2011) by comparing aircraft AMS observations of submicron OA with the results of another global model, and by Heald et al. (2010) by accounting for the satellite-measured aerosol optical depth that could possibly be due to OA. Recently, Carlton and Turpin (2013) showed that anthropogenically enhanced aerosol water in the eastern USA could lead to an increase in WSOC from BVOC. Although large uncertainties still exist in SOA modeling, there is a need for models to document and improve treatments of solubility, hygroscopicity, volatility and optical properties of the OA from different sources. The SOA formation from anthropogenic VOCs, despite a recent estimate of 13.5 Tg a −1 that makes it a non-negligible SOA source in polluted regions (De Gouw and Jimenez, 2009), is frequently neglected by global models.

Atmospheric processing
Improvement in our understanding and quantification of the emissions of POA and SOA precursors demonstrated from earlier review studies Fuzzi et al., 2006) motivated a number of experimental, chamber and field studies that have also significantly enhanced our knowledge on the OA atmospheric cycle. Aging, both physical (e.g., condensation and coagulation) and chemical (in any phase), has been suggested as a significant contributor to the observed OA levels Fuzzi et al., 2006;Hallquist et al., 2009), which influences the amount and properties of organic material in the aerosol phase, and occurs at different rates and via different mechanisms in the various atmospheric compartments (e.g., urban/rural/marine boundary layers, low/middle/upper troposphere) (e.g., Molina et al., 2004;Ervens et al., 2011). Despite these advances in understanding, such OA processing remains to date either missing or very poorly parameterized in global models, since advances in OA parameterizations are limited by weak observational constraints. Zhang et al. (2007) and  compiled experimental evidence showing that most of the OA in the atmosphere has undergone chemical aging, most likely via SOA formation, and is significantly oxygenated, with lower volatility and higher hygroscopicity than its precursors. To explain these large amounts of oxygenated OA, several chemical pathways have been suggested , which differ in the O / C atomic ratio and in the volatility changes they induce in the parent compounds. Donahue et al. (2006) suggested lumping organic compounds according to their volatility and developed the volatility-basis set (VBS) algorithm to parameterize the many organic compounds present in the atmosphere into several lumped OA species of different volatilities. Chemical aging via gas-phase reactions in the parameterization resulted in changes in the volatility of the species; this has been implemented for SOA from VOCs (e.g., Tsimpidi et al., 2010) and also for SOA from semi-volatile and intermediate volatility species (Robinson et al., 2007). However, the implementation of VBS into global models is hindered both by the large number of tracers required, and the underlying uncertainties and free parameters involved. The VBS method was recently expanded to account for the degree of oxidation of OA, by tracking the O / C content of the organics per volatility class; the method is called 2-D VBS  has been successfully used to simulate the evolution of OA in field campaigns (Murphy et al., 2011(Murphy et al., , 2012. Unfortunately, this new approach needs an even larger number of tracers, which makes it extremely difficult to implement in a global climate model without a large performance penalty. Still, it certainly adds value to our OA understanding, since the ratio of organic aerosol mass (OA) to organic carbon (OA / OC), an alternative way to describe the degree of oxidation of OA, does greatly vary in time and space (Turpin and Lim, 2001). This variability is either neglected or taken into account in a very simplistic way in models. Yu (2011) extended the two-product SOA formation scheme in the GEOS-Chem model by taking into account the volatility changes of secondary organic gases arising from the oxidative aging process ) as well as the kinetic condensation of low-volatility secondary organic gases. It was shown that, over many parts of the continents, low-volatility secondary organic gas concentrations are generally a factor of ∼ 2-20 higher than those of sulfuric acid gas, and the kinetic condensation of low-volatility secondary organic gases significantly enhances particle growth rates. Based on this computationally efficient new SOA formation scheme, annual mean SOA mass concentrations in many parts of the boundary layer increase by a factor of 2-10, in better agreement with Aerosol Mass Spectrometer (AMS) SOA measurements (Yu, 2011). Hallquist et al. (2009) also summarized new laboratory data that provided insight into the chemical reaction pathways for the formation of oligomers and other higher molecular weight products observed in SOA. They determined higher production rates of SOA from their precursors' oxidation than earlier measurement studies and linked the dependence of SOA yield from VOC oxidation to the oxidant levels. In chamber experiments, Volkamer et al. (2009) have shown that even small (C 2 ) molecules undergoing aqueousphase reactions can produce low-volatility material and contribute to SOA formation in the atmosphere, a process that was reviewed by Ervens et al. (2011) and Lim et al. (2013). The global modeling study of Myriokefalitakis et al. (2011) has shown that multiphase reactions of organics significantly increase the OA mass (5-9 % when expressed as OC) and its oxygen content, while Murphy et al. (2012) suggested that these reactions are not enough to explain the observed O / C content of OA. Hallquist et al. (2009) used the VBS concept and estimated the atmospheric deposition of OA to be 150 Tg a −1 , higher than earlier estimates and similar to the total particulate OC deposition of 147 Tg a −1 (109 Tg a −1 of WSOC) calculated by Kanakidou et al. (2012). Dry and wet removal of organic vapors that are in thermodynamic equilibrium with SOA becomes increasingly important with atmospheric processing (Hodzic et al., 2013) and was found to lead to 10-30 % (up to 50 %) removal of anthropogenic (biogenic) SOA (Hodzic et al., 2014). Volatilization of OA upon heterogeneous oxidation has been observed for laboratory and ambient particles (George and Abbatt, 2010) and might be a significant OA sink (Heald et al., 2011).

Motivation and aim
During the AeroCom phase I modeling experiments , although most of the models considered both primary and secondary OA sources, OA was simulated in a very simplified way in which both primary and secondary OA were treated as non-volatile. OA was only allowed to age via hydrophobic-to-hydrophilic conversion, and was removed from the atmosphere by particle deposition. Comparisons of individual models with OA observations have shown a large underestimation of the organic aerosol component by models, especially in polluted areas (Volkamer et al., 2006, and references therein). They showed that the underestimation of SOA by models increases with photochemical age, which can be partially correlated with long-range transport, with the largest discrepancies in the free troposphere, suggesting missing sources or underestimated atmospheric processing of organics in models.
Several global models now treat SOA as semi-volatile, as detailed below, which enables potentially more accurate model calculations. Some models also account for intermediate-volatility organics, multiphase chemistry and semi-volatile POA (e.g., Pye and Seinfeld, 2010;Jathar et al., 2011;Myriokefalitakis et al., 2011;Lin et al., 2012), with encouraging results in reducing the difference between models and observations. Indeed, the modeled SOA concentrations in Mexico City were much closer to observations when intermediate-volatility organics were taken into account in a regional model, although it was unclear if the modelobservation gap was reduced for the right reasons (Hodzic et al., 2010). However, OA simulations have many degrees of freedom due to incomplete knowledge of the behavior and fate of OA in the troposphere. Thus, several assumptions made are translated to model tuning parameters that vary greatly between models.
This organic aerosol AeroCom intercomparison aims to update the evaluation of OA modeling by documenting the current status of global modeling of OA in the troposphere, identifying weaknesses that still exist in models, as well as explaining the similarities and differences that exist between models and observations. It quantifies the uncertainties in surface OA concentrations and attributes them to major contributors. It also attempts to identify and analyze potential model systematic biases. The ensemble of the simulations is used to build an integrated and robust view of our understanding of organic aerosol sources and sinks in the troposphere. The target year of simulations was selected to be 2006, with a free choice for each modeling group on the meteorological conditions and emission inventories to be used.

Terminology
In atmospheric OA research, several naming conventions and abbreviations are used, often ambiguously and inconsistently between authors. To avoid confusion, we clarify here the conventions adopted in this paper, which we use throughout. Note that some aspects of our terminology are different from the very recent VBS-centered attempt by Murphy et al. (2014) to clarify this ambiguity systematically; new model development is required from modelers to adopt the new naming convention in future model simulations.
-Organic aerosol (OA) and the main OA components, i.e., primary and secondary OA (POA and SOA, respectively): we use these terms to refer to the total mass that organic compounds have in the aerosol phase, including H and O, and potentially other elements like N, S and P. Other authors have used the term organic matter (OM), which is synonymous with our OA definition. The units used are µg m −3 for surface mass concentrations at ambient conditions and Tg for burden and budget calculations. OA amounts exclude the water associated with it (assuming that OA is hygroscopic), an important additional component that affects particle size, refractive index and light scattering efficiency.
-Organic carbon (OC), together with other OC components, like, e.g., primary and secondary OC (POC and SOC, respectively): these terms refer to the mass of carbon present in OA, instead of to the total OA mass. The units used here are µgC m −3 for surface mass concentrations. This is typically the terminology that is used when comparing model results with filter measurements analyzed by thermal-optical methods.
OA mass can increase for constant OC, due to oxidative aging; this is something that very few models calculate, and should be improved in the future. The OA / OC ratio is discussed in more detail in Sect. 1.7. Care should also be taken for the case of methane sulfonic acid (MSA), since the letter A stands for "acid", not "aerosol", as in OA. When reporting MSA results, we refer to the total methane sulfonic acid mass present in OA and not its carbon mass only, unless clearly stated otherwise.

OA / OC and O / C ratios
To calculate the total organic aerosol mass concentration for each model, we apply the following equation: where (OA / OC) i is the organic aerosol to organic carbon ratio for aerosol tracer i (Table 1). OA / OC, frequently termed as OM / OC in the literature (OM: organic matter), was found to correlate extremely well with the O / C ratio in Mexico City and chamber data (Aiken et al., 2008), because of low N / C ratios. A low OA / OC ratio is also indicative of "fresh" OA as deduced from observations (Turpin and Lim, 2001;Philip et al., 2014). The OA / OC ratio varies greatly between models, with many of them setting OA / OC = 1.4 as a constant for all OA sources. Some models use different OA / OC ratios for every OA tracer: IMAGES, IMPACT, and the two TM4-ECPL models calculate the specific OA / OC ratio for each of their aerosol tracers, depending on their sources and chemical identity. CAM4-Oslo uses 1.4 for fossil fuel and biofuel, OsloCTM2 and SPRINTARS use 1.6, while all three models use 2.6 for biomass burning. In the case of CAM4-Oslo and SPRINTARS, it is not possible to calculate the OC concentration from the model fields accurately, since they only track one tracer. For this, we used a single value, that of the fossil fuel each model is using, which will lead to an underestimation of their OC concentration (but not of OA) close to biomass burning sources. The remaining models use a constant OA / OC ratio: GEOS-Chem and GEOS-Chem-APM use a specified value of 2.1, GISS-CMU-VBS and GISS-CMU-TOMAS use 1.8, and all other models use 1.4. Observations (Turpin and Lim, 2001;Aiken et al., 2008) suggest that OA / OC values of 1.6 ± 0.2 and 2.1 ± 0.2 are good approximations for urban and non-urban aerosols, respectively, indicating that most models might use OA / OC values that are low. The study of both the OA / OC and O / C ratios is extremely important and warrants a dedicated investigation; although this will be mentioned in the present work, it will be studied in detail in the future.

Organic aerosol speciation
In the present work, we have separated organic aerosols into five categories, as described below and summarized in Table 1. The models are then grouped based on their OA parameterizations in Table 2. 1. tPOA, for terrestrial primary organic aerosol, which includes primary emissions from fossil fuel, biofuel and biomass burning. All models participating in this intercomparison include these three tPOA sources. Several models also consider a biogenic secondary organic aerosol source that is included in tPOA (BCC, CAM4-Oslo, CanAM-PAM, ECHAM5-HAMMOZ, ECHAM5-SALSA, ECMWF-GEMS, EMAC, GISS-CMU-TOMAS, GISS-MATRIX, GISS-TOMAS, GMI, GOCART, LMDz-INCA, SPRINTARS and TM5), as discussed earlier. This is considered to be linked with monoterpene emissions (Guenther et al., 1995), producing non-volatile aerosol mass with a fixed yield as discussed in Sect. 2.2. Some models have a simplified chemistry that produces non-volatile SOA, also included in tPOA: in GISS-CMU-TOMAS and GISS-TOMAS a generic SOA precursor is emitted in the gas phase representing all SOA precursor gases ; 15 % of the monoterpene emissions, emitted in the gas phase) with a chemical lifetime of 12 h that forms a non-volatile SOA tracer (which is included in tPOA). In GISS-TOMAS the SOA precursor emissions are based on terpenes, with a 10 % yield, while a-pinene oxidation by all major oxidants (OH, O 3 , NO 3 ) produces non-volatile SOA (included in tPOA) with a 13 % yield in GLOMAPbin and GLOMAPmode. SPRINTARS has a 9.2 % yield of non-volatile SOA (Griffin et al., 1999a, b) from monoterpene emissions, and considers this tracer as inert and tracks it separately, in contrast to the other models that produce non-volatile SOA and track it together with tPOA. SOA from anthropogenic VOCs is included in only a few models, and is not included in tPOA.
2. mPOA, for primary organic aerosol from marine sources. CAM4-Oslo has a primary marine organic source of 8 Tg a −1 (Spracklen et al., 2008) with the same emissions distribution as sea salt (provided by Dentener et al., 2006) included in tPOA. IMPACT includes a mPOA source of 35 Tg a −1 (Gantt et al., 2009a), which scales with chlorophyll a and sea salt as a proxy of marine biological activity (O'Dowd et al., 2004), while GISS-modelE-G/I and TM4-ECPL-F/FNP include a similar source of submicron mPOA based on Vignati et al. (2010). The GISS-modelE-G/I source is described in Tsigaridis et al. (2013) and the TM4-ECPL-F/FNP mPOA source in Myriokefalitakis et al. (2010). It has to be noted that these two studies have a factor of 10 difference in submicron mPOA source strength, despite having very similar source function parameterizations. This results from differences in sea-spray size distribution assumptions, as discussed in Tsigaridis et al. (2013). In addition to the fine-mode mPOA source, TM4-ECPL-FNP accounts for about 30 TgC a −1 of coarse-mode mPOA (Kanakidou et al., 2012), but that was not taken into account in the present study, since all measurements used here are for fine aerosols.
3. trSOA, for "traditional" secondary organic aerosol, which is produced by gas to particle mass transfer of secondary organic material, either assuming the material has a finite vapor pressure (a gas-particle partitioning process) or that it has zero vapor pressure (a condensation process). The most common precursors of SOA used across models are isoprene and terpenes, although few models have other precursors as well, as presented in Sect. 2. All models have some form of trSOA, either included in tPOA (as explained above), or via an explicit treatment of the semi-volatile oxidation products of the precursor VOCs. For the models other than the ones presented in (a) above that treat SOA as part of tPOA, the approach used and species taken into account differ. CAM5-MAM3 prescribes mass yields from 5 trSOA precursor categories (isoprene, terpenes, aromatics, higher molecular weight alkanes and alkenes, with yields of 6.0, 37.5, 22.5, 7.5, and 7.5 %, respectively), which then reversibly and kinetically partition to the aerosol phase. GISS-CMU-VBS uses the volatilitybasis set, but without aging for the biogenic trSOA. The rest of the models use the two-product model approach to calculate trSOA; see the references column in Table 3 for more details. GEOS-Chem-APM considers the volatility changes of the gaseous semi-volatile compounds arising from the oxidation aging process, as well as the kinetic condensation of low-volatility gases (Yu, 2011). HadGEM2-ES does not calculate trSOA online; instead, it uses an offline 3-D monthly mean trSOA climatology obtained from the STOCHEM CTM (Derwent et al., 2003). The two-product model implemented in IMAGES was modified to account for the effect of water uptake on the partitioning of semi-volatile organics, through activity coefficients parameterized using a detailed model for α-pinene SOA (Ceulemans et al., 2012). IMPACT predicts semi-volatile SOA from organic nitrates and peroxides using the gas-particle partitioning parameterization with an explicit gas-phase organic chemistry. These condensed semi-volatile compounds are assumed to undergo further aerosol-phase reactions to form non-evaporative SOA with a fixed 1day e-folding time . The two TM4-ECPL models account for SOA aging by gas-phase oxidation by OH with a rate of 10 −12 cm 3 molec −1 s −1 , while the conversion of insoluble POA to soluble is parameterized as described by Tsigaridis and Kanakidou (2003) with a decay rate that depends on O 3 concentration and water vapor availability, which corresponds to an approximately 1-day global mean turnover time, with strong spatial variability.
4. ntrSOA, for non-traditional secondary organic aerosol, which comes from a variety of sources, as explained below. GISS-CMU-VBS includes the VBS (Robinson et al., 2007), which allows tPOA to evaporate and age (via oxidation) in the gas phase, producing less volatile gas-phase products, which can again partition between the gas and aerosol phases. This model, which is the only one in the present study that takes into account the intermediate-volatility species as additional sources of OA, enables the application of the partitioning theory to POA and its associated vapors as well, not only SOA. The aerosol phase of these oxidized products is termed ntrSOA. The impact of this process strongly affects the chemical composition of SOA and will be discussed later (Sect. 4.3.3). Other models, namely IMAGES, IMPACT, and TM4-ECPL-F/FNP, include an aqueous-phase oxidation pathway of small organic molecules like glyoxal and methylglyoxal that produces low-volatility compounds and oligomers in cloud and aerosol water (Fu et al., 2008(Fu et al., , 2009Stavrakou et al., 2009;Myriokefalitakis et al., 2011), with the two TM4-ECPL models having a primary glyoxal source from the oceans of 4.1 TgC a −1 , which is not present in the other two models. Glyoxal and methylglyoxal are highly reactive species in the aqueous phase. The aqueous-phase reactions can occur both in aerosol water and cloud droplets; after droplet evaporation, the residual organic compounds remain in the aerosol phase in the form of OA. By applying a reactive uptake (γ ) of glyoxal and methylglyoxal on aqueous particles and cloud drops (Liggio et al., 2005), IMAGES and IMPACT parameterized the irreversible surface-controlled uptake of these soluble gas-phase species. On the other hand, Myriokefalitakis et al. (2011) applied a much more detailed aqueous-phase chemical scheme in cloud droplets in order to produce oxalate. For IMPACT, 52 % of the total SOA comes from glyoxal and methylglyoxal multiphase chemistry . IMPACT also includes ntrSOA formation from the uptake of gas-phase epoxides onto aqueous sulfate aerosol (Paulot et al., 2009), which contributes by 25.1 Tg a −1 (21 %) to the total SOA formation .
5. MSA, an oxidation product of DMS, is also a SOA component. Although a minor organic aerosol component on the global scale, MSA can be very important in remote oceanic regions, especially when mPOA is relatively low: observations indicate that MSA can be at least 10 % of the total WSOC mass (Sciare et al., 2001;Facchini et al., 2008) at marine locations. Only CAM4-Oslo, GEOS-Chem-APM, GISS-modelE-G/I, IMPACT, LMDz-INCA, TM4-ECPL-F/FNP and TM5 have this tracer, which has been typically neglected from the organic aerosol budget in modeling studies. In CAM4-Oslo, MSA is included in tPOA, in IMPACT it is included in mPOA (which is in turn included in tPOA), whereas in the other models, it is individually tracked.
A summary of the OA processes included in the models is presented in Table 2. The total organic aerosol mass is calculated as follows: The models that have mPOA, SOA and/or MSA included in tPOA do not track them separately, so there is no risk of double-counting any OA species. In addition to this categorization, in order to compare with AMS data (see Sect. 3) we separate the modeled OA into HOA (hydrocarbon-like OA) and OOA (oxygenated OA) as defined by Zhang et al. (2005), when sufficient information on hydrophobic/hydrophilic speciation from the models is available. We use the terminology HOA / OOA instead of water soluble/insoluble OC (WSOC / WIOC), and compare only with AMS organic aerosol data, in order to contrast with the OC measurements that refer to organic carbon. The separation into HOA and OOA has been provided by only a few models: ECHAM5-HAM2, ECMWF-GEMS, EMAC, GISS-modelE-G, GISS-modelE-I, GISS-TOMAS, GLOMAPbin, GLOMAPmode, IMAGES, LMDz-INCA, TM4-ECPL-F, TM4-ECPL-FNP and TM5. From the AMS perspective, the total OA is calculated as follows: Further subdivisions into other categories of OOA ) are neglected in this study. In addition, the term POA used in Zhang et al. (2011) as a surrogate for different HOA categories is also not taken into account here.

Description of models
The models participating in the present study differ in (a) the spatial resolution, both horizontal and vertical, (b) the underlying model with which the aerosol calculations are coupled, which can be either a CTM or a GCM, and will be named "host model" from now on, (c) the emissions used, both for POA and SOA precursors, as well as for other gaseous and aerosol tracers, (d) the inclusion or not of aerosol microphysics, which are implemented in multiple ways (Mann et al., 2014), and (e) the OA processes simulated, i.e., the chemical and physical processes that change existing OA (such as oxidative aging), and the representation of SOA formation. The complexity of the OA calculations varies greatly between models (Table 3). There are differences in OA emission source strength, both for primary particles (Table 4) and precursors of secondary OA (Table 5), as well as in the total number of OA tracers used (2 to 62; Table 1) and their properties, especially with regard to the temperature dependence of their vapor pressure (Sect. 6). Although a classification is difficult, one can categorize the models in various groups when considering OA modeling from different perspectives. The classification used here will be presented later (Sect. 1.8).
Some models using the same host model have very specific (and not necessarily a few) differences. ECHAM5-HAM2, ECHAM5-HAMMOZ, ECHAM5-SALSA and EMAC use the same host model (ECHAM5) but different aerosol parameterizations: the first two use M7 (modal), ECHAM5-SALSA uses SALSA (sectional) and EMAC uses a modified version of M7. ECHAM5-HAMMOZ uses the previous version of the HAM aerosol module, which does not take into account the detailed SOA formation introduced in ECHAM5-HAM2 (O'Donnell et al., 2011). GEOS-Chem and GEOS-Chem-APM use the same host model (GEOS-Chem) but different aerosol representations: the first uses the default bulk aerosol scheme, while the latter uses a sizeresolved (bin) advanced particle microphysics (APM) module (Yu and Luo, 2009). GISS-CMU-VBS and GISS-CMU-TOMAS use the same host GCM (GISS-II'), with the only difference being in the calculation of OA: the first one uses a bulk aerosol scheme with the VBS approach (Donahue et al., 2006;Jathar et al., 2011), and the second one the aerosol microphysics scheme TOMAS (Adams and Seinfeld, 2002;Adams, 2010, 2012). Similarly, GISS-MATRIX, the two GISS-modelE models and GISS-TOMAS use the same host GCM (GISS-E2), but they have different aerosol representations: GISS-MATRIX uses the aerosol microphysics module MATRIX (Bauer et al., 2008), the two modelE versions have a bulk aerosol scheme Miller et al., 2006;Tsigaridis and Kanakidou, 2007; and GISS-TOMAS uses the same aerosol microphysics scheme as GISS-CMU-TOMAS (Lee and Adams, 2012;Lee et al., 2014). GISS-modelE-G and GISS-modelE-I only differ in the emissions used; they both have CMIP5 anthropogenic emissions for all tracers (Lamarque et al., 2010), but GISS-modelE-G uses GFED3 (van der Werf et al., 2010) for biomass burning. GLOMAPbin and GLOMAPmode use the same host CTM (TOMCAT; Chipperfield, 2006), with the only difference being the sectional and modal aerosol microphysics calculations (Mann et al., 2012). TM4-ECPL-FNP is almost identical to TM4-ECPL-F, but also takes into account the contribution to OA from primary biological particles and soil dust in the fine and coarse modes (Kanakidou et al., 2012). These two models also use different biogenic and anthropogenic VOC emission inventories (Tables 4 and 5).
All model results presented here come from monthly mean data, while measurements are averaged in monthly mean values, prior to any comparison with model data.

Meteorology
One major difference between the configurations of the models is the meteorology and meteorological year used. This affects aerosol transport, removal, chemistry (e.g., temperature dependence of reaction rates) and gas-particle partitioning of semi-volatile species. In some models, meteorology also directly affects natural aerosol emissions, like wind-driven sea salt, marine organic aerosol, dust and VOC emissions from the vegetation and oceans. Indirectly, meteorology affects MSA sources, since MSA is produced via dimethyl sulfide (DMS) oxidation, whose source is affected by wind speed and its oxidation depends on chemical rates.
The remaining models use a variety of prescribed meteorology data sets for the year 2006 (Table 3), except that GISS-CMU-VBS uses 2008, IMPACT uses 1997, and TM4-ECPL-FNP uses 2005.

Emissions
All participating models include POA in their simulations. The sources are both anthropogenic and biogenic, and can be classified as follows: 1. Fuel emissions. These exclusively anthropogenic sources include fossil fuel and biofuel burning. All models include these sources, but the emission inventories used are not always the same (Table 4) (Table 4), which is the reference year in the present study. Biomass burning is the largest POA source; it has significant interannual and strong seasonal variability and is the most uncertain POA source on a global scale (Andreae and Merlet, 2001), making it extremely important for comparison with measurements, especially at remote sites, to properly represent this source. Comparisons of several model simulations with the smoke aerosol optical depth (AOD) observed by MODIS have indicated a systematic underestimation when emissions from bottom-up inventories like GFED, used by several models here, are used. The underestimation may be as high as a factor of 3 on the global scale (Kaiser et al., 2012, and references therein), and strongly varies by region (Petrenko et al., 2012).
3. Marine sources. Few models take into account marine sources of organic aerosols (see Sect. 1.8); these depend on sea spray emissions. The GISS-modelE-G and GISS-modelE-I source depends on SeaWiFS chlorophyll a measurements from the year 2000 , while IMPACT and TM4-ECPL-F/FNP calculations use the MODIS chlorophyll a data from the corresponding simulated year. However, recent observations indicate the presence of marine organic aerosol over oceanic oligotrophic areas ; this can be either due to long-range transport, or a missing source not accounted for with the current source parameterizations, or both. CAM4-Oslo also has marine organic emissions, with a global flux based on Spracklen et al. (2008), and a spatial distribution given by the prescribed AeroCom phase I fine-mode sea salt emissions . (Kanakidou et al., 2012) includes some fine-mode POA sources that do not exist in any other global model in this intercomparison. These consist of primary biological particle emissions from plants (25 Tg a −1 ) and soil organic matter on dust (0.2 Tg a −1 ).

"
Pseudo" primary non-volatile SOA fluxes. A number of models parameterize SOA chemical production in the atmosphere as a source of non-volatile aerosol emitted directly from vegetation. SOA is then modified similarly to POA by processes like transport, chemical aging, growth, coagulation and condensation, among others, depending on the model. BCC, CanAM-PAM, ECHAM5-HAMMOZ, ECHAM5-SALSA, ECMWF-GEMS, EMAC, GISS-CMU-TOMAS, LMDz-INCA and TM5 use a global source of 19.1 Tg a −1 . This source is equivalent to a 15 % yield from the year 1990 monoterpene emissions (Guenther et al., 1995) and is identical to the source used during the AeroCom phase I experiments. GISS-CMU-TOMAS, GISS-TOMAS, GLOMAPbin and GLOMAPmode also use the same approach (based on the Guenther et al. (1995) emissions, except GISS-TOMAS, which is based on Lathière et al., 2005), but with SOA produced according to an assumed molar yield following oxidation (see Sect. 1.8 and Table 1), which results in a calculated SOA source of 19.1, 17.1, 23.1, and 23.0 Tg a −1 , respectively. GISS-MATRIX and GISS-TOMAS use a 10 % yield (17.1 Tg a −1 ) from monoterpene emissions for the year 1990 from Lathière et al. (2005), while GMI and GOCART assume a 10 % yield (12.7 Tg a −1 ) from the Guenther et al. (1995) monoterpene emissions. In the case of CAM4-Oslo, the strength of the secondary source suggested by Dentener et al. (2006) has been scaled up to 37.5 Tg a −1 , based on Hoyle et al. (2007).
In addition to the primary aerosol emissions, the inventories used for the precursors of secondary organic aerosols are also both very diverse and of great importance. These are presented in Table 5.
The IMPROVE and AQSEPA networks cover most of the United States more than adequately. The EMEP monitoring network together with the European Integrated project on Aerosol, Cloud, Climate, and Air Quality Interactions (EU-CAARI) and CREATE data sets and other studies found in the literature provide good coverage of a large part of Europe, with stations in 17 countries. Although the spatial and temporal coverage is not as extensive as in the USA, it provides a comprehensive representation of different sources and chemical environments over Europe. There are limited measurements from Asia, with many of them being at urban or urban-influenced locations in India and China. South America, Africa and Oceania have very poor spatial and temporal coverage, despite the importance of the tropical forests of the former two on the global OA budget. Marine areas are almost exclusively covered by short-term measurement campaigns, with the exception of Amsterdam Island in the southern Indian Ocean (Sciare et al., 2009). All OC measurements are PM 2.5 or smaller sizes, e.g., PM 1.8 (Koulouri et al., 2008).
A rapidly increasing number of AMS OA measurements has been reported in the literature since the work of . Most of these AMS measurements are available online, in a web page created and maintained by Q. Zhang and J.-L. Jimenez (http://tinyurl.com/ams-database). We include in this analysis most of the ground-based data available as of January 2013. These data include the only AMS measurements so far available for a whole year (using the ACSM instrument, which is a monitoring version of the AMS; Ng et al., 2011), from Welgegund, South Africa (Tiitta et al., 2014); all other stations were measuring for about a month or less. The geographical coverage of the AMS stations is far less dense than the OC measurement locations, but the number of stations is rapidly increasing. Longer records are also starting to appear in the literature (Tiitta et al., 2014), and are expected to increase in the near future. It is important to note that the OA values provided by the AMS-type instruments have uncertainties (30 %) inherent in quantifying the detection efficiency for the wide range of organic molecules that make up complex SOA material Middlebrook et al., 2012). Care should be taken when using AMS-type OA data in models that estimate organic aerosol content.
All station data have been classified in three main categories: urban, remote and marine. Urban sites are defined as those that are either in cities or highly influenced by them. AMS stations characterized as "urban downwind" fall in this category. Remote sites are defined as those not influenced by local anthropogenic activities, and include forested regions, mountains, rural areas, etc. Marine sites are all measurements from ships or from coastal stations that are highly influenced by the marine atmosphere. Only two AMS stations fall into this category (Okinawa, Japan, and Mace Head, Ireland), and for simplicity, they were classified in the "remote" category.
The two databases (OC and OA measurements) have been kept separate because of the added complexity related to the OA / OC ratio (Sect. 1.7). Almost all models calculate OA mass concentration, integrated across the fine-mode size distribution where appropriate, which can be compared with AMS measurements without any unit conversion. To compare with filter measurements of OC, we used the models' assumptions about the OA / OC ratio to convert the modeled OA to OC. As mentioned earlier, the importance of the OA / OC ratio will be explored in the future. The cutoff diameter of aerosols can also be an issue (Koulouri et al., 2008), but it is not expected to be significant in the present study, given the assumptions that the models adopt for the primary OA sources. No model adds fine OA mass from coarse-mode sources, and no model allows partitioning of semi-volatile gases to the coarse mode; thus, the difference between the PM 2.5 filter measurements and PM 1 AMS data is not expected to be properly resolved by models, even if they include aerosol microphysics calculations.

Global budgets
Many global models have evolved significantly since the Ae-roCom phase I intercomparison studies. During phase I, the first experiment, AeroCom A (ExpA), was designed in a very similar way to the AeroCom phase II model simulations described here . For the second, AeroCom B (ExpB), all models used the same emission inventories. The outcomes of these studies have been summarized by Textor et al. (2006) for ExpA and Textor et al. (2007) for ExpB and is compared with the present study in detail here (Fig. 1). The two AeroCom phase I studies focused on the total aerosol budget, but the individual aerosol components were also studied. Sixteen models participated in ExpA and twelve in ExpB, most of which are earlier versions of the models that participated in the present intercomparison.
The large number of models used in this study adds a significant level of complexity to the interpretation of results, due to the large diversity of inputs and configurations used by the different modeling groups. Despite the large differences between model formulations, on the global scale, several interesting similarities and patterns appear, which are frequently associated with the parameterizations and emission inventories used.

Emissions
Global mean model POA emissions used in the models are in the range of 34-144 Tg a −1 . The emissions in most models lie below 80 Tg a −1 (Fig. 2), with a median value of 56 Tg a −1 . Notable exceptions are the two GISS-modelE models (G and I), in which about two thirds of the POA emissions come from marine sources ; without this source, these two models have the same emissions as GISS-MATRIX (39.5 Tg a −1 ), which falls below the  (Textor et al., , 2007 results. The boxes represent the first and third quartile range (50 % of the data), the line is the median value, the star is the mean, and the error bars represent the 9/91 % of the data. Outliers are presented with x-symbols, with the corresponding color of the model, and the numbers of models participating in each bars statistics are presented with a grey number at the top. The AeroCom phase I outliers are presented with black color, since there is no direct correspondence with the models that participate in the present study. Bar colors are POA (brown), SOA (green), OA (blue), AeroCom A (red; Textor et al., 2006), and AeroCom B (orange; Textor et al., 2007). 25 % quantile. CAM4-Oslo also has the highest terrestrial sources of all models (144 Tg a −1 ), followed by IMPACT (98 Tg a −1 ) and EMAC (92 Tg a −1 ). All models appear to have similar seasonality in POA emissions that are driven by tPOA, with increased emissions during Northern Hemisphere summer due to the enhanced contribution of Northern Hemisphere biomass burning emissions from temperate and boreal forests to the total POA fluxes. In addition, several models include SOA sources in tPOA as explained earlier, scaled by BVOC emissions, which also peak during Northern Hemisphere summer (Guenther et al., 1995; this contributes to a seasonal cycle of tPOA that is caused by the trSOA treatment as part of tPOA, and should not be interpreted as a tPOA seasonality. Also note that contrary to biomass burning, anthropogenic tPOA sources have no seasonality in their emission inventories. The IMPACT model appears to have the opposite seasonality, with maximum POA emissions during winter and minimum from late spring to early summer, due to the fossil fuel emissions scaling to fit observations . The minimum of the emissions for all models except IMPACT is during Northern Hemisphere spring, when neither biomass burning nor the photochemical trSOA sources (included in tPOA by many models) are high.
The POA emissions variability from phase II is roughly the same as that of the OA variability from ExpA, which indicates that the significant uncertainties in the POA emissions in global models since AeroCom phase I have not been reduced. However, some models have very high POA emissions, due to the recently developed parameterizations of mPOA sources in global models. These highly uncertain sources were absent in AeroCom phase I.

Chemical production
The chemical production of SOA is much more complex compared to the POA emissions. Firstly, many models include SOA sources as primary emissions, which are included in tPOA (see Sect. 1.8 and Table 1). This type of source was used during AeroCom phase I experiments . The direct consequence of this assumption is that any uncertainties resulting from the OA sources in ExpA are only related to the POA emissions, since the SOA sources were identical across models. For AeroCom phase II, 13 out of 31 models still use this source parameterization (Table 2), while 5 models use a simple SOA production rate based on gas-phase oxidation, which then forms non-volatile SOA. These 18 models have a median SOA source strength of 19.1 Tg a −1 (mean 20 Tg a −1 ) and a standard deviation of 4.9 Tg a −1 (Fig. 2). Very few models that include this source have provided budget information on the seasonal variability of its SOA source, since it is implicitly included in the tPOA sources and is not tracked separately. However, it has a virtually identical seasonality to that of the monoterpene emissions adopted in each model.
From the other models that include a more complex calculation of SOA chemical production, there is a large intermodel variability in the source flux, with median 51 Tg a −1 (mean 59 Tg a −1 ) and 38 Tg a −1 standard deviation, based on 12 out of 14 models that include such parameterizations and have submitted budget information. This is more than twice as high as the models that use the AeroCom phase I parameterization, and with much larger model diversity. The seasonality of OA emissions in all these models peaks during Northern Hemisphere summer (Fig. 2), when VOC fluxes from temperate and boreal forests are at a maximum, while emissions from tropical forests are high year-round. Six Top row: POA emissions included in models (before POA evaporation in the case of GISS-CMU-VBS); middle row: SOA chemical production (including the pseudo-primary SOA source, where applicable); bottom row: total OA sources (sum of top and middle rows) for the annual mean (left column; short dashes: mean; long dashes: median; dotted lines: 25/75 % of the data) and seasonal variability (right column). Note that not all models have submitted annual budget data, and fewer have submitted seasonal information; thus, their corresponding columns/lines are not shown. The models are grouped based on their complexity, as separated by vertical solid lines in the annual mean budgets. Groups from left to right: SOA is directly emitted as a non-volatile tracer; SOA is chemically formed in the atmosphere, but is considered non-volatile; SOA is semi-volatile; SOA is semi-volatile and also has VBS (GISS-CMU-VBS) or multiphase chemistry sources. models (IMAGES, IMPACT, GISS-CMU-VBS, HadGEM2-ES, OsloCTM2 and TM4-ECPL-F) include very strong SOA sources of 120, 119, 79, 64, 53 and 49 Tg a −1 , respectively, followed by CCSM4-Chem (33 Tg a −1 ) and GEOS-Chem (31 Tg a −1 ). About 42 % (50 Tg a −1 ) in IMAGES are due to non-traditional sources (glyoxal and methylglyoxal). The traditional SOA source in IMAGES accounts for water uptake, which is found to increase the partitioning of semi-volatile intermediates (Müller, 2009). Monoterpenes alone account for about 40 Tg a −1 . This large contribution is due to the very high SOA yields (∼ 0.4) in the oxidation of monoterpenes by OH in low-NO x conditions, which are justified by the formation of low-volatility compounds like hydroxy di-hydroperoxides (Surratt et al., 2010). IMPACT has sev-eral non-traditional SOA sources from aqueous chemistry, which locally can contribute as much as 80 % of the total OA mass. CAM5-MAM3 and IMPACT also include anthropogenic precursors. CAM5-MAM3 also uses a factor of 1.5 SOA yield increase in order to reduce anthropogenic aerosol indirect forcing, by elevating the importance of SOA during the preindustrial period . As mentioned before, HadGEM2-ES does not calculate SOA production explicitly; instead, it uses the Derwent et al. (2003) climatology from STOCHEM, which calculates an SOA formation of 64 Tg a −1 . For comparison, satellite-constrained studies estimate that the total OA formation (primary and secondary) can be as high as 150 Tg a −1 , with 80 % uncertainty (Heald et al., 2010); AMS-constrained estimates put the total SOA formation rate between 50 and 380 Tg a −1 , with 140 Tg a −1 being the best estimate (Spracklen et al., 2011), while Hallquist et al. (2009 estimated, using a top-down approach, that the best estimate for the total biogenic SOA formation is 88 TgC a −1 , out of a total 150 TgC a −1 of OC.
The case of GISS-CMU-VBS deserves focus. This model calculates SOA production based on the VBS approach. Its secondary source of 79 Tg a −1 includes not only newly formed SOA both from POA and intermediate-volatility organics, but also gas-phase chemical conversion of organic mass that has evaporated from emitted POA, to produce less volatile organics, i.e., mass that has undergone aging in the atmosphere. The traditional SOA sources from biogenic VOC are included in this model like in other models that use the two-product model, but also the chemical conversion of intermediate-volatility organics to less volatile OA is taken into account, again with the use of the VBS. Overall, GISS-CMU-VBS presents a similar seasonal pattern of SOA chemical production as other models, but shifted by one month, i.e., peaking in August, when biomass burning is at its maximum in the Northern Hemisphere, instead of maximizing in July, when photochemical activity and biogenic VOC emissions are higher globally. This might be due to the inclusion of the intermediate-volatility compounds as SOA precursors, which also have large biomass burning sources. CCSM4-Chem and GEOS-Chem also have a shift in the seasonal maximum. For CCSM4-Chem this is due to strong production from biomass burning sources, while in the case of GEOS-Chem the seasonal cycle seems to be driven by production from Amazonia, which is related with both biogenic and biomass burning emissions.
The total OA sources during ExpA were very similar to the total sources from the phase II experiments (median 97 Tg a −1 both in ExpA and here), while ExpB had much lower total OA sources, 67 Tg a −1 (Fig. 1). All of these sources include SOA, either as pseudo-emissions (phase I) or from a variety of parameterizations (phase II). The models from phase II present a much higher variability in their total OA sources, which is primarily attributed to the SOA chemical production variability that was not present in ExpA.

Burden
From the models that have submitted POA burden data (also termed load; the mean total mass in the atmosphere), both its seasonality and amplitude largely follow those of the corresponding POA emissions (Fig. 3), with two notable differences. The two GISS-modelE models have much lower POA burdens (but similar seasonality) than their emissions would imply. The reason is that the mPOA fraction of POA has a very short lifetime of ∼ 1.5 days, since mPOA is assumed to be internally mixed with fine-mode sea salt, which is removed efficiently due to wet scavenging . This keeps the overall load of POA fairly low, and comparable with the models that do not have mPOA. The other difference is GISS-CMU-VBS, which also has a much lower POA load than their emissions would suggest. This is due to the POA aging parameterization, which converts POA into SOA, drastically reducing the POA burden. The other models appear to have the expected POA load, given their emissions, including IMPACT, whose different seasonal variability of the emissions is also reflected on its OA load.
For the computed SOA load (Fig. 3), all models assume that SOA is very soluble, with 80-100 % of its total mass considered soluble, which results in similar globally averaged removal rates across the models. This means that the differences in the SOA loads are expected to be driven primarily by the SOA chemical production, similar to how the POA load is driven by emissions. This is indeed the case for almost all models, with GISS-CMU-VBS, IMAGES, IM-PACT, CCSM4-Chem and CAM5-MAM3 having the highest loads, exceeding 1 Tg, with the first two models being as high as 2.3 and 2.2 Tg, respectively, and GEOS-Chem being just below 1 Tg. Spracklen et al. (2011) estimated a global SOA burden of 1.84 Tg, similar to the high-end models that participate in the current intercomparison, but for a SOA formation rate of 140 Tg a −1 , which is about 20 % higher than IMPACT and IMAGES (the models with the strongest SOA formation here), and about 3 times higher than the median SOA formation rate of the models that have a complex SOA parameterization. ECHAM5-HAM2 calculates an increasing load over the course of 1 year, which is related to the short spin-up time of 3 months, which is not sufficient for the upper tropospheric SOA to reach equilibrium. GEOS-Chem simulates an inverse seasonality when compared with other models, with the maximum load calculated during Northern Hemisphere winter and the minimum during Northern Hemisphere summer. The cycle seems to be dominated by the SOA load over the Southern Ocean; probably the removal processes are slower than other models there, thus SOA may form a uniform band between 30 and 50 • S during the whole austral summer.
With regard to the total OA load, a median of 1.4 Tg (mean 1.6 Tg) and standard deviation of 0.8 Tg is calculated; half the models lie within the range of 1-1.6 Tg (Fig. 3). CAM4-Oslo calculates a global burden of 3.8 Tg, reflecting the very high POA emissions, while IMAGES, IMPACT, GISS-CMU-VBS and CCSM4-Chem calculate a burden of 3.7, 2.6, 2.4 and 2 Tg, respectively, as a result of their high SOA production. Overall, the models calculate very similar total OA load seasonality, which peaks during the Northern Hemisphere summer season, when both primary (biomass burning) and secondary (chemical production) OA sources are high, and minimizes during Northern Hemisphere spring, when neither biomass burning nor SOA chemical production is significant in the Northern Hemisphere. The tropical biomass burning and SOA production around December and January both contribute to the secondary maximum that all models calculate during that time. The relative importance of SOA over POA will be discussed in Sect. 4.3.3. The total OA load is calculated to be mostly lower than the sulfate load in the models that reported budget values for both aerosol components, with a median value of the OA / SO 2− 4 mass load ratio of 0.77 (mean 0.95). The ratio lies in the range 0.26-2.0; CAM4-Oslo, CAM5-MAM3, GEOS-Chem, GISS-modelE-G/I, IMAGES, IMPACT, and TM4-ECPL-F/FNP calculate values above 1, which means that annually on the global scale OA dominates over sulfate aerosols. That was the case for 5 out of 16 models during Ae-roCom phase I . Note however that Ae-roCom phase I models were simulating the year 2000, while here we simulate the year 2006; interactive chemistry, new sources (isoprene, mPOA and ntrSOA) and different emission inventories also contribute to significant differences between the two studies. One has to be reminded that even in AeroCom phase II, many models used some emission inventories from a year other than 2006 (Tables 4 and 5).

Deposition
Dry deposition is a minor removal pathway for OA, accounting for a median of 13 Tg a −1 (range 2-36 Tg a −1 ) and a mean of 15 Tg a −1 (standard deviation of 10 Tg a −1 ; Fig. 4). On average, dry deposition is responsible for 15 % of the total OA removal across models. The two TOMAS models and TM5 calculate by far the lowest dry deposition flux of all, followed by three of the ECHAM5 models, excluding EMAC. The two TOMAS models use essentially the same aerosol microphysics parameterization in two different host models, GISS-II' for GISS-CMU-TOMAS and GISS-E2 for GISS-TOMAS. GISS-modelE-G/I and GISS-MATRIX use the same host model and identical emissions as GISS-TOMAS, a fact that suggests the TOMAS aerosol module (Adams and Seinfeld, 2002) either is less efficient in scavenging OA via dry deposition, or is more efficient in removing OA from the system via wet deposition, or both. The latter, though, would mean that the OA load (Fig. 4) would be much smaller in GISS-TOMAS in order to have low enough dry deposition fluxes, which does not appear to be the case.
Other than the two TOMAS models, of the remaining models that have submitted dry deposition flux data, three models calculate very low fluxes: ECHAM5-HAM2, ECHAM5-HAMMOZ, and TM5, with the latter already mentioned earlier. The first two models use ECHAM5 as the host model, and all three use the M7 aerosol microphysics module (Vignati et al., 2004). As for the TOMAS case, this is strong evidence that the M7 module does not allow OA to deposit as fast as in most other models; ECHAM5-SALSA, which uses the same host model as ECHAM5-HAM2 and ECHAM5-HAMMOZ, calculates higher dry deposition fluxes than the two ECHAM5 models with M7. The largest difference in dry deposition between the two aerosol microphysics schemes comes from the treatment of external mixing of OA in the accumulation sized particles. ECHAM5-SALSA includes soluble and insoluble OA in the accumulation mode, while ECHAM5-HAMMOZ and ECHAM5-HAM2 include only soluble OA. In addition, EMAC, which uses a sectional version of M7 called GMXe, does not calculate as low a dry deposition as the models that use the modal version of M7. The fact that there are other models with aerosol microphysics parameterizations in this intercomparison, both modal and sectional, that do not calculate such low dry deposition fluxes, suggests that it is not a general aerosol microphysics calculation issue.
Comparisons of phase I models results for ExpA and ExpB strengthen this conclusion, since the model with the lowest OA dry deposition flux of ExpA (MPI_HAM; 5 Tg a −1 ) and that of ExpB (TM5; 1.7 Tg a −1 ) both use the aerosol microphysics module (M7). This scheme appears to be responsible for the lowest dry deposition fluxes calculated by the models that participate in the present intercomparison: the updated versions of these two phase I models, ECHAM5-HAM2, ECHAM5-HAMMOZ and TM5, participate in the phase II experiment and simulated the lowest dry deposition fluxes among all phase II models, together with the GISS-CMU-TOMAS and GISS-TOMAS models that did not participate in phase I. Whether the above explanation suffices to explain the low dry deposition, or other processes are involved as well, like very strong wet removal that does not allow time to dry deposition to become effective, the calculated aerosol size distribution, the aerosol properties that impact dry deposition rates, or something else, remains to be explored by dedicated deposition flux model-data comparisons. Also note that we have not assessed this feature of the models against observations, so we do not know which models are closer to observations. CAM4-Oslo has the highest dry deposition flux of 36 Tg a −1 , which is due to the high OA load. BCC follows with 33 Tg a −1 , which is then followed by the two GISS-modelE models and IMAGES with ∼ 28 Tg a −1 . In the case of the two GISS-modelE models, this is due to the strong removal of mPOA, which is internally mixed with sea salt (as explained earlier), while for IMAGES, it is due to the high OA load, as a result of strong trSOA formation. BCC uses a smaller mass mean diameter than the size distribution of POA emissions, which can explain the high dry deposition flux . Despite these large differences between models, the calculated dry deposition fluxes follow the same seasonal pattern as the aerosol load presented earlier (Sect. 4.1.3 and Fig. 4).
The effective dry deposition rate coefficient, defined as the ratio of the dry deposition flux over the aerosol burden that is being deposited , ranges from 0.005 to 0.13 days −1 , with a median value of 0.025 days −1 , a mean value of 0.029 days −1 and a standard deviation of 0.046 days −1 . The diversity (defined as the standard deviation over the mean) has increased since AeroCom phase I, from 0.62 to 0.87. BCC has the largest effective dry deposition rate coefficient, 0.13 days −1 , more than double that of any other model. The models with very low dry deposition fluxes are the ones that have the lowest effective dry deposition rate coefficients, all below 0.014 days −1 , supporting the hypothesis that their dry deposition flux is probably too low.
By far the most important removal mechanism across all models is wet deposition (Fig. 4). Due to similar OA solubility assumptions across all models, the wet deposition flux largely follows the OA load, both in the annual budget and the seasonality. IMPACT has the highest wet deposition flux of all models (209 Tg a −1 ), followed by IMAGES (163 Tg a −1 ), CAM4-Oslo (146 Tg a −1 ), CAM5-MAM3 (134 Tg a −1 ), OsloCTM2 (128 Tg a −1 ) and GISS-modelE-G/I (120/125 Tg a −1 , respectively). These are the models with the highest OA sources (Fig. 2), thus also with the highest sinks. Wet removal of OA is simulated to range from 28 to 209 Tg a −1 for the 26 models that reported fluxes, with mean (median) standard deviation values of 86 (70) 43 Tg a −1 , which is on average 85 % of the total OA deposition.
The effective wet deposition rate coefficient ranges from 0.09 to 0.24 days −1 , with a median value of 0.15 days −1 , a mean value of 0.16 days −1 and a standard deviation of 0.04 days −1 . The diversity since AeroCom phase I has virtually not changed, with a slight increase from 0.27 to 0.28. OsloCTM2 has the highest effective wet deposition rate coefficient, and LMDz-INCA the lowest.
Wet removal, which together with aerosol sources is a major driver of the calculated aerosol lifetime and load, presents a much higher variability in the phase II models (Fig. 1). This is largely due to the consideration of SOA formation, which is responsible for the large variability in OA sources and burden in the models, as well as to differences in the assumptions on SOA solubility and aging.

Lifetime
The combination of all sources and sinks affects the load and lifetime of OA, either directly or indirectly. The lifetime of a species is calculated as the ratio of the species burden over its total removal; in the case of aerosols, the removal is dry and wet deposition. Unfortunately, while most model groups have submitted total OA diagnostics to calculate the OA lifetime, few have submitted the diagnostics required to calculate the global mean POA and SOA lifetimes.
The calculated median POA lifetime from the 13 models that reported relevant data is 4.8 days (mean 4.8 ± 1.4 days). The modeled lifetime ranges from 2.7 days for the two GISS-modelE models to 7.6 days for IMAGES (Fig. 5). The GISS-modelE models have the lowest lifetime, which is consistent with roughly two-thirds of POA being removed rapidly with sea salt (as mPOA). There is no clear seasonal signal on the calculated POA lifetime.
The SOA lifetime calculated by 12 out of 31 models also lacks a clear seasonal signal (Fig. 5). The GISS-modelE-G/I models, CCSM4-Chem, ECHAM5-HAM2 and GISS-CMU-VBS have the highest SOA lifetime of 15/14, 14, 13 and 10 days, respectively, which is related to large amounts of SOA in the upper troposphere, where there is virtually no removal mechanism and therefore SOA lifetime is enhanced, until atmospheric circulation or sedimentation brings it to lower layers where it becomes susceptible to removal. For the remaining models that provide information, the calculated SOA lifetime ranges from 2.4 to 6.8 days. The median SOA lifetime from all models that provide budget information is calculated to be 6.1 days (range 2.4-14.8 days), higher than the median POA lifetime. Anthropogenic POA, which in general is more hydrophobic than SOA, is almost exclusively emitted close to surface and below clouds, making it more susceptible to dry and wet removal; biomass burning POA can be emitted at higher altitudes , while a significant amount of SOA is formed above clouds in the models, where temperatures are low. For instance, in TM4-ECPL-FNP, about 42 % of the total SOA mass is formed in the free troposphere, while 98 % of POA mass is emitted in the boundary layer. Furthermore, although one might expect that SOA is more soluble, thus more susceptible to removal, this does not appear to be reflected in the model results; the reason is that SOA can be formed above clouds and avoid removal for long periods of time.
Twenty-four models provide sufficient information to calculate the total OA lifetime, which lies in the range of 3.8-9.6 days, with a median of 5.4 days and a mean of 5.7 ± 1.6 days (Fig. 5). GISS-CMU-TOMAS has a very strong seasonality in the calculated OA lifetime, with a maximum during late Northern Hemisphere spring and a minimum during late Northern Hemisphere fall, and GISS-CMU-VBS has the highest OA lifetime of all the models. As in the case of POA and SOA, there is no clear seasonality in the OA lifetime across models.
The high wet removal variability across all AeroCom phase II models is also reflected in the total OA load and lifetime (Fig. 1), where SOA presents a very high variability between models, especially in the case of SOA lifetime. This slightly increases the calculated variability of the total OA by the phase II models compared to phase I. This change is not so pronounced in the OA burden, due to the relatively low contribution of SOA to the OA load calculated by the models. This might change in the future, though, since SOA is believed to be significantly underestimated in global models (Spracklen et al., 2011), as also supported by observations that indicate large amounts of processed OA in the atmosphere .

Optical depth
The aerosol-cloud interactions that comprise the indirect effect have been studied with many of the models used here (e.g., Quaas et al., 2009), and the direct effect has been studied previously, both during AeroCom phase I Schulz et al., 2006) and phase II Samset et al., 2013). The impact of the direct and indirect effects of organic aerosols on climate is beyond the scope of the present study. Still, for completeness, we performed a comparison of the OA optical depth at 550 nm (Fig. 6). It has to be noted that this is not always straightforward, or even possible: models that include aerosol microphysics or internally mixed aerosols cannot always separate the aerosol optical depth (AOD) of the organic component of the aerosol alone, and subtracting simulations with and without OA does not give the right answer, due to non-linearities in the aerosol microphysics calculations. Such a distinction is prohibited by the multi-component aerosol mixtures and water uptake that are taken into account, as well as the non-linear response of the aerosol-radiation interactions caused by such mixtures (e.g., Bond and Bergstrom, 2006). The models that use M7 microphysics (ECHAM5-HAM2, ECHAM5-HAMMOZ and TM5) and thus consider internally mixed aerosols for diagnostic purposes calculate an OA AOD assuming external mixing in each aerosol mode, although this is not very accurate for estimating the OA contribution to the total AOD; their results are presented in Fig. 6, but should be interpreted with caution. For models that can calculate the organic AOD and have submitted results for both quantities, the organic AOD presents very similar behavior to the OA load, since it is a strong function of the OA column burden, given that most models use very similar optical properties for OA and water uptake parameterizations. Excluding CAM4-Oslo, which calculates a global mean organic AOD of 0.06 due to the computed very high OA load, the other models have organic AOD spanning almost an order of magnitude, from 0.004 to 0.023, with a median value of 0.014. This is 8 % of the total AOD calculated by the same models.

Surface distribution
The composite annual mean OC and OA surface air concentrations, defined as the median of the regridded model fields to a 5 • × 5 • horizontal resolution, exceed 0.5 µg C m −3 (or µg m −3 ) across most continental regions, with maximum concentrations primarily over biomass burning regions and secondarily over industrialized areas (Fig. 7). The model diversity, defined as the ratio of the standard deviation of all models over their corresponding mean value calculated on the same grid, is smallest over and downwind continental regions, with ratios below 1 over most continental areas, and above 1 over the remote oceans (Fig. 7).
Diversity that exceeds 2 is evident over most of the oceanic regions south of 30 • S and Antarctica, which is a result of the marine OA sources being present in only a few models. Ratios approaching 2 are also found over the northern Pacific and Atlantic oceans, and are also related to the marine OA sources. However, these local maxima are not as pronounced as in the Southern Hemisphere, due to (a) the much stronger seasonality, and (b) the stronger influence of continental aerosol sources in the Northern Hemisphere.
Over and close to the continents, the model diversity is low, except in three areas that present striking differences. Two are located over biomass burning regions, Indonesia and the Pacific borders of the USA and Canada, where the different emissions used by the models produce a large local diversity in concentrations. The third case is off the Pacific coast of Mexico; although this might also be related to biomass burning, the exact reason for the high model diversity is not clear, since this is not over an aerosol source area. Marine sources or different precipitation patterns in the models can also be part of the explanation; however, there are very few measurements (Shank et al., 2012) over that region, which hinders a definite conclusion.
Overall, it appears that the model diversity is low over and downwind of continental source regions, since the primary sources of aerosols are constrained by the availability of only a few different emissions inventories to be selected by the models. In addition, less constrained parameters like SOA and mPOA formation, long-range transport and OA removal (which affects OA lifetime) increase the model diversity over remote areas.

Vertical distribution
The vertical distribution of the mean OC simulated by all except three models (GOCART has only submitted surface data, and GISS-CMU-TOMAS and GISS-CMU-VBS have not submitted all necessary fields for unit conversions) shows concentrations increasing with height up to a mean pressure level of about 800-900 hPa, and then decreasing with altitude (Fig. 8). The increase in concentration is due to (a) a maximum OC concentration over the tropics, where strong convection raises OC from the surface sources to the lower troposphere, (b) the SOA formation that largely takes place above the surface, (c) the biomass burning emissions that some models distribute to more layers than just the surface one, and (d) the absence of dry deposition above the surface (Fig. 9). A local maximum also exists at low altitudes over the industrialized northern mid-latitudes, although less pronounced than the tropical one. From the middle to the upper troposphere, the OC concentrations simulated by most models decline steeply with altitude. Some models show a secondary maximum at around 100-200 hPa, with concentrations much lower than the maximum near the surface, above which the concentrations decline even faster with height: CCSM4-Chem, ECHAM5-HAM2, ECHAM5-HAMMOZ, GISS-modelE-G/I, IMAGES, LMDZ-INCA, OsloCTM2 and SPRINTARS present a local minimum in concentrations around 400 hPa, which then increase, before dropping again above 100 hPa. The increase around the tropopause is due to the low temperatures that allow condensation of the semi-volatile SOA precursors that had not condensed at lower layers, or OA accumulation above clouds, where wet deposition does not happen, or both. The models that explicitly calculate SOA seem to have slower removal of SOA from these altitudes than the other models. In addition, uplift in strong convective regions of OA (both primary and secondary) can also explain this local maximum, due to transport of aerosols to layers of the atmosphere with very slow removal. The modeled vertical distribution of OA presents a diversity that spans over one order of magnitude.
The model diversity is relatively low in the lower troposphere (below 600 hPa) between 30 • S and 60 • N, but very high over the poles and near the tropopause (Fig. 9). A similar pattern was found for BC, sulfate aerosol and particles larger than 100 µm in dry diameter in another AeroCom phase II intercomparison study that focused on aerosol microphysics (Mann et al., 2014). This points out three important features: (a) the areas directly affected by strong primary and secondary sources around the tropics and northern midlatitudes do not present a large diversity, due to the fairly similar emission inventories in the different models; (b) the primary marine sources of OA however are both highly uncertain and not present in many models, resulting in the high model diversity close to the surface over the Southern Ocean; and (c) the processes that involve low temperatures (which favor condensation of semi-volatile compounds) are not well constrained either, and they are also absent in many models, leading to very high model diversity over the poles and above 200 hPa. The vertical distribution of OA is thus very  poorly understood, much less than its surface concentration, and deserves a dedicated study with thorough analysis.

Comparison with measurements
Many model-measurement comparisons can be performed with the extensive data set used here. The focus of the comparisons in the present study is to identify model strengths and weaknesses, and try to explain where and why the models are failing to simulate the measured concentrations. This will provide insight to directions for future model improvements. In parallel, we are also interested in understanding where and why the models successfully reproduce the observations, and focus on these areas in order to understand the role of the different model complexities on simulations with comparable skill. It is not within the scope of this work to identify which model is the "winner" in simulating OA concentrations, especially since one model is unlikely to outperform the others on all metrics, but to provide information on the robustness of the model results. The present study focuses on the surface OC and OA concentrations. The sources and amount of OA in the upper layers of the atmosphere are not explicitly studied here, although accounted for in the OA budget terms discussed above. The detailed analysis of the vertical distribution of OA will be the topic of a future study. Due to the very inhomogeneous spatial variability of measurements (supplementary material), only a general global model performance benchmark is performed here. Most data have been collected in the USA, followed by Europe and China. The rest of the world, including some very important regions with regard to OA, are severely under-represented, or not represented at all. Such regions include all tropical forest areas (the Amazon basin, Africa and Southeast Asia) and the boreal forests of Canada and Russia. Long-term measurements in these areas are extremely scarce, with the only notable exception being Alta Floresta in the Amazon, where OC measurements for more than ten years are available.

Model skill
One of the major challenges when comparing global models with observations is whether the measurement locations are representative of the regional levels of the measured quantity in question. For most urban measurements, this is not the case, since the aerosol concentrations at urban centers are usually much higher than the regional background concentrations. Even a model with a very high horizontal resolution for a global model (like SPRINTARS) is not expected to cap-ture the measurements at urban locations, since its grid cells are of the order of 100 × 100 km, which is still too coarse to accurately resolve urban pollution. Many of the "urban downwind" AMS data are also expected to fall into this category; thus we included them in the "urban" category.
For all stations, there are several instances where more than one measurement locations are present in a given grid box for a certain model. When this is the case, we use the arithmetic mean of the measurements for that specific grid box, in order to compare the single aerosol concentration the model is providing with a single measurement value. When discussing the model ensemble results we use the median of all models, while we also analyze the mean normalized bias (MNB) of the models against measurements. The perfect comparison should have a MNB = 0 and correlation r = 1. The normalized bias (NB) at a given grid box is calculated as follows: where C model,i is the modeled concentration in grid box i, and C meas,i is the measured concentration in the same grid box. If more than one station exists in the same grid box, C meas,i is the arithmetic mean of the individual stations. The model's MNB is derived as the arithmetic mean of all NB i values.

Urban locations
The models perform poorly at urban locations, as expected. Most models strongly underestimate the measurements, having a median MNB of −0.64 (mean −0.62, range −0.04 to −0.86) for OC (Fig. 10) and −0.51 (mean −0.48, range −0.1 to −0.85) for OA (Fig. 11). CAM5-MAM3 appears in both OC and OA as an outlier, with a slightly negative MNB for OC and +0.24 for OA. As mentioned earlier, CAM5-MAM3 has an enhancement factor of 1.5 for the SOA formation, which might be part of the reason for the generally higher OA concentrations, which result in less bias, compared to the other models. Interestingly, the correlation of model results with measurements is slightly higher for the OA data; a median value of 0.54 is calculated for OA (mean 0.52, range 0.11 to 0.77), compared to 0.47 for OC (mean 0.43, range −0.09 to 0.70). Note though that the locations and temporal resolution of OC and OA measurements differ greatly, making a conclusive comparison between them impossible. In addition, these results are not representative of the overall performance of the models on the global scale; they only represent the models' ability to capture the available measurements, which are very inhomogeneously distributed in space and time (Supplement).

Remote locations
The models show a completely different behavior when compared with measurements of OC (Fig. 12) and OA (Fig. 13) at remote locations. Compared with the models' performance at urban stations, more models have more negative than positive MNB in the case of OC at remote locations, with the range spanning from −0.61 to 1.29 (median −0.15, mean −0.02), while most models have a positive MNB in the case of OA, with a range from −0.38 to 2.17 (median 0.51, mean 0.70). It has to be noted, though, that the locations and times of OC and OA measurements are not the same, which means the model performance for OC and OA data are not directly comparable, due to the different spatial and temporal coverage of the stations. Only four models present relatively high positive MNB values when compared with the OC data: CAM5-MAM3 (1.3), EMAC (0.9), ECHAM5-SALSA (0.7) and ECMWF-GEMS (0.6). CAM5-MAM3 has the third highest SOA source of all models, but none of the other three models with strong positive MNB has exceptionally high POA or SOA sources (Fig. 2) and sinks (Fig. 4). All of EMAC, ECHAM5-SALSA and ECMWF-GEMS present a very strong maximum in the OC concentrations at the western border of the USA with Canada; monthly mean concentrations exceeding 200 µgC m −3 in EMAC (Fig. S3 in the Supplement) might be the reason for the positive MNB. Also note that EMAC emits all biomass burning aerosols at the surface, while most other models distribute them to a number of layers above the surface, typically within the boundary layer. The other models that present a positive correlation are all linked with either strong POA sources (CAM4-Oslo) or strong SOA sources (HadGEM2-ES and IMPACT), as presented in Fig. 2, but that is not the case for IMAGES, which has the highest SOA source, but presents a MNB of only +0.1, and TM4-ECPL-FNP, which has the 7th strongest SOA  source from the models that submitted their SOA chemical production, but presents the second strongest negative MNB of all the models. Many models have a lower correlation with remote OC and OA measurements than with urban OC and OA. Al-though this might appear unexpected, a possible explanation might be that urban pollution probably adds a large offset in the comparison, which does not affect correlation. In remote sites on the other hand, long-range transport adds one additional level of uncertainty in the model calculations,  which can result in lower correlation of the model results with measurements. The correlation coefficient against OC remote measurements rarely exceeds 0.5, with the correlation for about half of the models lying below 0.4 (median 0.39, mean 0.40, range 0.11-0.67), while when compared against the remote OA measurements the correlations are slightly lower, with a median and mean value of 0.37 (range 0.07-0.55). It is possible that either a remote source is missing or treated in a too simplistic way, or that the transport and lifetime (which largely depend on solubility, representation of precipitation from clouds, and poorly represented ageing processes) of organic aerosols in the regional and remote atmosphere are not properly calculated in models, or that the seasonality of sources is not represented accurately, or a combination of any of these reasons. High (negative) MNB and high correlation (−0.61 and 0.47, respectively for OC) for the urban stations support the missing sources hypothesis. Low (negative) MNB and low correlation (−0.15 and 0.4, respectively for OC) for remote stations support the conclusion that the knowledge about the processes, on top of the sources, contributes to the OA modeling uncertainty at remote stations.

Marine locations
Since there are only two AMS OA marine stations categorized as remote in the global AMS database, only the OC model results have been compared against the marine OC measurements (Fig. 14). Very few models include a marine organic aerosol source: CAM4-Oslo, the two GISS-modelE models, IMPACT and the two TM4-ECPL models. Even with or without the primary marine source, rather poor statistics are calculated for most of the models. Most models have a negative MNB (median −0.30, mean −0.15, range −0.64 to +0.90), with a few exceptions: the two GISS-modelE models, with MNB ∼ 0.85-0.90, have a strong mPOA source, the strongest of all models that participate in this intercomparison; HadGEM2-ES, whose strong SOA source that is based on a climatology might be the reason for the high MNB; IMPACT and IMAGES, which have a simplified multiphase chemistry source that might be responsible for the increased remote marine OA; and EMAC, which is among the models with the highest POA sources (Fig. 2).
The GISS-modelE models appear to have worse correlation with measurements than other models. The reason might be the variability of the source of marine organics that may not be captured by the models: both GISS-modelE models that present the lowest correlation with marine OC measurements calculate the marine OC sources as a function of chlorophyll; this might not be the optimal parameterization of the marine POA source. The IMPACT and TM4-ECPL models, which include similar mPOA sources, do not produce such low correlations. These models include aqueous production of OA, which acts as an additional source in the remote atmosphere. IMAGES, which also has an aqueous OA source, produces a rather high correlation with the marine OC measurements and a positive MNB. Although more marine observations are needed to verify this hypothesis, it appears that a multiphase source does improve the model comparison with remote marine measurements, as also discussed by Myriokefalitakis et al. (2011). One cannot dismiss the fact though that an increase in SOA sources via gasphase production is not the missing source in these locations, which might be able to improve the correlation there. One has to be reminded that IMAGES and IMPACT have a different source parameterization compared with that in TM4-ECPL-F/FNP, which results in a stronger aqueous OA source that degrades correlation, but not MNB, compared to the same model-measurements comparison when excluding the multiphase aerosol contribution (not shown). In TM4-ECPL-F/FNP, the multiphase OA source is weaker (13-29 Tg a −1 ) than in the other two models, and no statistically significant improvement is seen in the model's performance at the surface when accounting for this source. Additional models able to simulate aqueous-phase OA formation and comparison with targeted observations are needed to consolidate the importance of this process for the OA budget. The primary marine source also improves the comparison over the oceans (Fig. 23), but further work is needed to constrain this source. Overall, the median and mean correlations are very close (0.25 and 0.24, respectively), and the correlation range is from −0.03 to +0.41.

Importance of model complexity
In the comparisons of model results with urban station data, the correlations with OA observations were higher than those with OC. Urban aerosols are mostly fresh, compared to the more aged ones at remote locations. All models simulate OA, and then the OA / OC ratio is used to convert from OA to OC, in order to compare with OC data. Emission inventories however are frequently in units of carbon, not organic matter, adding an additional conversion, thus uncertainty, in the models. Using the same OA / OC ratio to convert emissions and then the simulated concentrations implies that the OA / OC ratio has not changed with atmospheric processing. This is clearly a weak assumption, since OA / OC is different at emission time and after atmospheric processing. Since all models have some aging parameterization in their calculations, this strongly suggests that the OA / OC ratio in models has to be revisited. As a general rule, models are expected to underestimate OA / OC, since several of them use a constant value of 1.4 throughout the entire troposphere. Three models (CAM4-Oslo, OsloCTM2 and SPRINTARS) use OA / OC ratio of 2.6 for biomass burning aerosol, a value that came from measurements (Formenti et al., 2003), which is above the high-end value recently suggested in the literature for ambient aerosol (2.5; Aiken et al., 2008). Four models account for temporally and spatially variable OA / OC ratios dependent on the OA speciation in the atmosphere, but their results are completely different (Fig. 15). Measurements of OA and OC at the same location have a different seasonality, as presented later (Sect. 4.3.3) for Finokalia, Greece, which is not evident in the model results. This shows that the OA / OC ratio changes with atmospheric processing, and as applied in the model simulations (in most cases by a spatially and temporally fixed ratio), is not appropriate. A dedicated study aiming to tackle the OA / OC ratio is clearly needed.
Overall, the increased model complexity does not improve the comparison with measurements. The MNB of the urban OA comparison appears to be lower in the models that take into account the semi-volatile nature of SOA, but the correlation degrades to values as low as 0.1. The correlation of model results with remote OC data is higher for models that include semi-volatile SOA, but the difference is really small. In all other cases, no change in model skill is observed. However, the complexity is needed in models in order to distinguish between anthropogenic and natural OA and accurately calculate the OA physical, chemical and optical properties, and their impact on climate.

Seasonality
Most measurements, especially at locations with at least a full year of data, are located in the USA, although recently observations have been made available from the EU-SAAR/ACTRIS observational network in Europe. Throughout the USA, where data availability is the highest, the general finding is that all models have a pronounced seasonal cycle, with minimum concentrations during winter and maximum concentrations during summer, except for the western coast, where agricultural and biomass fuel burning invert the picture, in line with previous results (Bahadur et al., 2009). This seasonal cycle is primarily caused by the presence of SOA, whose chemical production maximizes during summer, due to both elevated precursor emissions and enhanced photochemistry. Biomass burning also contributes to this summertime increase, although some models simulate excessively high monthly mean OA concentrations that can exceed 200 µg m −3 , due to biomass burning emissions.
Although a global model is not the best tool to study urban aerosol levels, useful results can be extracted by collective comparison of OC measurements with model results.
In the western states of the USA, as well as in Alaska and Florida, the typical observed urban OC seasonality presents maximum concentrations during winter and minimum during summer. This would have been expected for primary anthropogenic material due to, e.g., enhanced residential emissions from heating during winter, as well as due to enhanced agricultural and biofuel burning during winter on the west coast of the USA, seasonal patterns currently absent from most emission inventories. However, the observed seasonality is opposite of what the models calculate, which compute an OA maximum during summer, following biogenic SOA formation (Fig. 16a). In the southeast, the typical urban measured pattern does not present a pronounced seasonal cycle, with most urban locations showing a fairly flat or noisy seasonality in observed OA with no unique pattern (Fig. 16b). In most other urban cases in USA, either there is no clear seasonal pattern, or the two cases described earlier are repeated, with one unique characteristic: a peak during summer, which distorts the seasonality described above (Fig. 16c, d). Thus, the combined model-measurements analysis, given the limitations global models have when compared against urban data, suggests the existence of increased OA levels during summer due to biogenic SOA formation over large areas of the USA. This summertime OA can be of the same order of magnitude as the anthropogenic OA, even inside cities. The absolute OC values are generally still underestimated, especially during winter.
The reason why this is not the case in the western states, Alaska and Florida, might be that these areas have a strong marine influence, with air masses that do not have very aged SOA. For Alaska, due to its location at very high latitudes, even during summer photochemistry is less intense than at mid-latitudes, resulting in lower SOA formation rates. On the other hand, it is not clear why the OA observations in the southeastern USA do not show a peak during summer; this area is well known for its strong SOA formation potential (Carlton et al., 2010), due to both vicinity of sources and abundance in solar radiation, especially during summer. One explanation might be that wintertime emissions are much stronger there than in other areas in USA, enhancing the wintertime OA levels and masking the summertime SOA contribution. Additionally, enhanced anthropogenic aerosols like sulfate might increase aerosol water content substantially in the southeast USA (Dick et al., 2000), counterbalancing the photochemical production of SOA, an effect currently absent from all models participating in this study that do not take into account aqueous SOA formation. All these hypotheses need to be investigated in the future by both field and modeling studies in more detail.
The absence of seasonality measured at several urban locations might be due to a combination of stronger anthropogenic primary sources and reduced dispersion during winter and enhanced SOA formation during summer, as well as missing processes from the models, flattening the seasonal cycle. The missing processes include the intermediatevolatility organic compounds, which are expected to condense more during winter, and the assumption of semivolatile POA, which will favor POA evaporation during summer. The combination of these two processes will lead to Figure 17. Same as in Fig. 16, for remote stations. Arizona (a;114.07 • W, 36.02 • N, years 2000Georgia (b;82.13 • W, 30.74 • N, years 1993Colorado (c;107.80 • W, 37.66 • N, years 2000Ohio (d;81.34 • W, 39.94 • N, years 1998 higher OA concentrations during winter and lower during summer when compared with the current OA parameterizations. This is expected to vary spatially, depending on the availability of these species and that of preexisting aerosols, and assuming no seasonality in their sources. Whether SOA dominates over anthropogenic POA, appears to be the decisive factor for the seasonal pattern. However, this is only a hypothesis that is driven by the model results, that needs to be explored in the field. The fact that the models appear to be (a) missing an urban source, and (b) underestimating the pollution levels in cities, is also supported by the comparison of the model results with remote stations close to the urban ones presented in Fig. 16, where the models are able to capture both the magnitude and seasonality of measurements much better (Fig. 17). An important thing to note is that the measurements are roughly a factor of 5 lower in these remote stations compared to their urban counterparts, except the case of Ohio, where the remote station appears to be influenced by urban pollution: its levels are only half that of the Ohio urban station, while its seasonality resembles the seasonality present in several urban stations discussed earlier.

Chemical composition
Unfortunately, it is impractical to present and analyze every individual station used in the present study. Instead, a number of stations have been selected, based on a number of criteria: they must be far enough away from each other geographically, have enough data to capture both their seasonality and, where present, their interannual variability, and/or be potentially interesting for any other reason if none of the other criteria are met. Only one station has a full year of AMS data (Welgegund, South Africa, using an ACSM for real-time aerosol composition data), and only one station has both OC and more than a couple of months of AMS data (Finokalia, Greece).
The stations that are analyzed here are the remote stations Finokalia (Greece), Welgegund (South Africa), Alaska (USA), and Manaus (Brazil), as well as the marine station Amsterdam Island (southern Indian Ocean). For clarity, only a few models are presented in the following discussion and in the figures. The remaining models (which have at least both tPOC and trSOC tracers submitted) are presented in the Supplementary Material. In addition, a number of other interesting stations are discussed in the Supplement: the urban and remote Colorado US stations discussed in Sect. 4.3.2, the remote stations LinAn (China), Alta Floresta (Brazil), Melpitz (Germany) and Mace Head (Ireland), and the marine station Okinawa (Japan).
The remote station Finokalia, Greece, has both OC and OA (AMS) measurements. The OC data (Fig. 18) do not exhibit any seasonality, in contrast to all models that underestimate the wintertime measurements by simulating a wintertime minimum and a summertime maximum. The measured OA concentrations (Fig. 19), although from only four out of twelve months, appear to be higher during summer, a feature that is captured both in shape and magnitude by a small number of models. The air masses that arrive at Finokalia are aged, since there are no significant sources upwind for at least 300 km (Mihalopoulos et al., 1997). This is also evident from the GISS-CMU-VBS results, where virtually all POA is calculated to be ntrSOA (aged primary), which means that photochemistry, which is expected to be higher during summer, has already contributed to the aging of the air masses arriving at the station. If this is indeed the case, it means that the OA / OC ratio during summer is higher than the winter value, a fact that is implied by the measurements. Note however that it is not trivial to compare the PM 1.8 OC data with the PM 1 AMS data and calculate an OA / OC ratio (Koulouri et al., 2008); it is also not straightforward to calculate OA / OC from O / C that the AMS provides, without introducing an additional level of uncertainty, due to the small, but not negligible, contribution of other heteroatoms like N, S, and P in OA. In any case, the fact that OA / OC appears to be changing with seasons is something that has to be taken into account by models that use a constant OA / OC ratio in their calculations. The evaluation of OA / OC will be studied in detail in the future; as a first estimation, since many models calculate high SOA during summer at that station, it is anticipated that the modeled OA / OC ratio will also be higher during summer. Two of the models that include multiphase chemistry of organics (IMAGES and IMPACT) calculate a significant contribution of ntrSOA to the total OC over Finokalia.
Welgegund, South Africa, is the only station for which we have been able to obtain a full year of AMS data from Fig. 20; unfortunately, no OC measurements in our database are in the same area to perform the same analysis as in Finokalia. Welgegund is a station that is strongly affected by seasonal biomass burning, and occasional anthropogenic  pollution (Tiitta et al., 2014). Besides EMAC, which overpredicts the biomass burning seasonal maximum by a factor of more than 3, most models appear to capture both the seasonal variability and levels at that station. EMAC uses the GFED inventory, the same as ECHAM5-SALSA (which lies at the high end of the models but does not stand out) and BCC, which strongly underestimates the biomass burning peak. The reason why the OC load calculated by EMAC is so high, which is evident in comparisons with several stations that are strongly affected by biomass burning, might be the fact that EMAC puts all biomass burning emissions at the first model layer, in contrast to the other models that distribute them between many layers close to the surface. Several mod-els simulate peak OC values during September, in line with a September-October maximum in the measurements, which can be attributed to biomass burning. Caution has to be taken for the exact interpretation of the absolute values or even the peaks in the data set, since the measurements are from the year 2011, and no model has used emissions or meteorology from that year. Since biomass burning has a strong interannual variability, either multi-year data are needed in order to construct a climatology and then compare with a model year that is not exactly the same as that of the data, or the simulations should use emission inventories and meteorology for the specific year that the measurements have been performed. There is agreement between the models that the September maximum is due to POA, while SOA is fairly constant yearround; aqueous chemistry also contributes a small amount to the total OA, which is enhanced during the wet season. GISS-CMU-VBS calculates that most of the POA is already aged, although during the biomass burning season, there is a non-negligible amount that is still fresh.
In Alaska, USA (Fig. 21), many models simulate a summer maximum, in agreement with the measurements; this is due to biomass burning sources. TM4-ECPL-FNP calculates a very strong contribution from primary biological particles to the total OC, resulting in a slight overestimation of measurements throughout the year. The four models that have provided mPOA concentrations (two GISS-modelE and two TM4-ECPL models) suggest that marine organics are present in significant quantities. Multiphase chemistry is also calculated to contribute during the summer months. ECMWF-GEMS shows a very wide peak in OC during summer, in contrast with the other models, resulting in higher concentrations than the measured ones for half of the year. This might be caused by the averaging of biomass burning emissions over six fire seasons that this model uses, which exhibit a large interannual variability and which broaden the biomass burning contribution over many months. The remaining models generally underestimate the measurements, although they capture the observed seasonality rather well; more than half of the models have a correlation coefficient against measurements greater than 0.8. An interesting pattern is that of the two GISS-modelE models, which simulate a significant contribution of trSOA to the total OC, especially during winter. These two models are the only models that include semivolatile SOA, and use the Lathière et al. (2005) VOC emissions, in which strong summer emissions in southern Alaska are present . It is very likely that the distribution of VOC sources (which differs from that of the other models), when combined with the low temperatures in Alaska during winter (which favors partitioning to the aerosol phase), leads to the enhanced trSOA formation.
As expected, only the models that include a marine source of mPOA are able to capture the OA concentrations at remote marine stations. This is particularly true for the two versions of GISS-modelE , which have the strongest source of mPOA of all models that participate in the intercomparison. Although most of the remote marine data we have are single measurements and their seasonality cannot be studied, it is important to note that their chemical composition is dominated by mPOA. Fortunately, there is one station with five years of data in a remote marine environment: Amsterdam Island, in the southern Indian Ocean (Fig. 23). As at Mace Head, the models that include mPOA sources are closer to the measurements, while the rest of the models simulate extremely low OC concentrations. There are three notable exceptions: one is the two GISS-modelE models, which strongly overestimate the measurements, as discussed by Tsigaridis et al. (2013). Second, the ECMWF-GEMS model, which, although it does not have a marine OA source, simulates higher-than-expected OC concentrations there. Third, the IMAGES model, which is able to capture some of the measured data due to high ntrSOA amounts calculated there. Multiphase chemistry appears to contribute significantly to the OC mass calculated at Amsterdam Island in other models as well, which reproduce the long-range transport of biomass burning aerosol from southern Africa from August to October (Fig. 23), which is also seen in the observations (Sciare et al., 2009). The meteorology used appears to affect ntrSOA production in the two TM4-ECPL models significantly, due to differences in the availability of water in aerosols and the distribution of clouds between the years simulated: 2005 for TM4-ECPL-F and 2006 for TM4-ECPL-FNP.  Fig. 18, for Alaska, USA (remote, years 2002-2006). For the chemical composition in (b-f), brown is tPOC, green is trSOC, cyan is mPOC, blue is ntrSOC, and orange is MSA. Figure 22. Same as in Fig. 18, for Manaus, Brazil (remote, years 2008. For the chemical composition in (b-f), brown is tPOC, green is trSOC, cyan is mPOC, blue is ntrSOC, and orange is MSA.

Conclusions
This study shows that the diversity of the global OA modeling results has increased since AeroCom phase I, mainly due to both the increased complexity, as well as the increased diversity of the OA parameterizations and sources in the models, which is evident in the different chemical compositions simulated by the models at the various stations analyzed here. Increased number of tracers, however, does not necessarily mean increased complexity of OA parameterizations; models with aerosol microphysics must have a large number of organic aerosol tracers, even when they may simulate OA production in a very simplistic way. At present, about half of the thirty-one participating models include explicit treatment of semi-volatile SOA formation in the atmosphere. Four models also account for multiphase chemistry and six models for natural sources of POA, in particular the marine source, with one model including the emissions of primary biological particles.
The POA sources in the thrirty-one AeroCom models range from 34 to 144 Tg a −1 with a median value of 56 Tg a −1 . Secondary OA sources show larger model diversity spanning from 12.7 to 121 Tg a −1 , with a median value for the 12 out of 14 models that parameterize SOA chemical production of 51 Tg a −1 (mean 59 Tg a −1 with standard deviation of 38 Tg a −1 ). In the four models that account for multiphase chemistry of organics, its contribution to SOA levels is calculated to be significant (up to 50 % of total SOA formation), at least regionally.
The wet removal of OA is simulated to range from 28 to 209 Tg a −1 for 26 of the models, with median 70 Tg a −1 , which is on average 85 % of the total OA deposition. The high wet removal variability, together with the large variability of OA sources, is attributed primarily to the diversity of  Fig. 18, for Amsterdam Island, Indian Ocean (marine, years 2003-2007). For the chemical composition in (b-f), brown is tPOC, green is trSOC, cyan is mPOC, blue is ntrSOC, and orange is MSA. SOA formation, which affects the total OA load and lifetime. The very high variability of SOA budgets between models is especially evident in the SOA lifetime (2.4 days to 15 days). This slightly increases the calculated variability of the total OA by the phase II models compared to phase I, where the SOA model diversity was essentially zero.
The treatment of aerosol microphysics in the models appears to have a significant impact on the calculated OA load and dry deposition. The range in dry deposition flux for OA (2-36 Tg a −1 in the present study) has been greatly increased since both AeroCom ExpA and ExpB, by a factor of 2 or more, while the M7 and TOMAS aerosol microphysics parameterizations, used by three and two models, respectively, simulate very low dry deposition rates when compared to the other models and thus contribute a lot to this change in diversity.
The annual median atmospheric burden of OA is calculated to be 1.4 Tg by the AeroCom phase II models, with values that vary mostly between 0.6 Tg and 1.8 Tg. Four models simulate loadings higher than 2.0 Tg, up to 3.8 Tg. The models calculate very similar OA load seasonality, which maximizes during Northern Hemisphere summer, when both primary (biomass burning) and secondary (chemical production) OA are high and minimize during Northern Hemisphere spring. A median OA lifetime of about 5.4 days (ranging from 3.8 to 9.6 days) is derived from the present study. The median POA lifetime of 4.8 days (ranging from 2.7 to 7.6 days) from this study is slightly shorter than the median SOA lifetime of 6.1 days (range from 2.4 to 14.8 days).
For many models that reported both OA and SO 2− 4 loads, the OA load is calculated to be lower than that of SO 2− 4 , with a median value of the OA / SO 2− 4 mass load ratio of 0.77. Simulated values of this ratio span from 0.25 to 2.0, with 9 models having a value greater than 1, indicating that there is a low level of understanding of the relative importance of OA and SO 2− 4 aerosol components between models, although modeling studies indicate that this ratio will increase in the future due to sulfur emission controls. This ratio is also affected by multiphase chemistry of organics and deserves further attention in the future.
A significant (up to 45 %) but highly variable contribution of multiphase chemistry to global SOA formation is calculated by models that account for this process. The comparison with observations indicates that the lower estimate of this source might be closer to reality, but this has to be revisited when more models will include multiphase SOA formation. In addition, a gas-phase source of SOA, either new or an enhanced pre-existing one, has the potential to improve the comparison with measurements in the same way multiphase chemistry does; OA chemical composition measurements can help identify which one of the two, or both, is the case. Further investigation of the importance of multiphase chemistry on the global scale and evaluation against targeted observations and field campaigns is needed.
The models show a large diversity (about two orders of magnitude) in the free troposphere, pointing to uncertainties in the temperature-dependent partitioning of SOA, uncertainties in free tropospheric sources, and the impact of meteorology and transport. A systematic comparison of model results with the limited available free tropospheric observations would give important insights into the large model differences in the middle and upper troposphere.
Despite the increasing diversity between models since Ae-roCom phase I experiments, the models are now able to simulate the secondary nature of OA observed in the atmosphere as a result of SOA formation and POA aging, although the absolute amount of OA present in the atmosphere remains underestimated. The median MNB of all models against urban measurements at the surface is calculated to be −0.62 for OC and −0.51 for OA and with correlations 0.47 and 0.54, respectively, while for remote surface measurements Atmos. Chem. Phys., 14, 10845-10895 Comparison of model results with OA and OC, where available, shows that the models capture the submicron OA mass better than the PM 2.5 OC mass near the surface. Although this indicates a possible overestimate of the OA / OC ratio by the models, this is not necessarily the case, since virtually all OC and OA measurements were taken at different locations and different times. Most models use a constant value of 1.4, and only four models in this study calculate it prognostically. The limited number of observations that can be used to derive the OA / OC ratio indicate dependence on sources, atmospheric conditions and season; this will be revisited in a future study.
The flat seasonality measured at several urban locations is not reproduced by the models. The comparisons indicate a missing or underestimated source of OA in the models, either anthropogenic primary (for instance domestic wood burning), or secondary, primarily during winter. Improvements in the seasonality and strength of the anthropogenic POA sources in models can reduce the differences between model results and observations, but not eliminate them, since most global models cannot resolve urban pollution due to their large grid size.

Future directions
Available OC and OA observations and thus model evaluations are concentrated in the USA and Europe, but additional long-term observations from tropical, boreal, Southern Hemisphere and remote marine regions also from the free troposphere are needed to complement the global OA observational database.
Natural POA sources are important components of the OA global budget; however, among the thirty-one models participating in this intercomparison, only six account for the marine source of OA and one for the primary biogenic particles. Comparison of model results to observations over remote marine locations can provide constraints on our understanding of the marine POA source. The statistics on model performance calculated here are not able to quantify the importance or the understanding of this source because seasonal data from remote marine locations are limited. The magnitude of the marine source and the properties of marine OA remain highly uncertain and are an active area of research.
Primary biogenic particles can also be significant contributors to OA, particularly over land, but are taken into account only in one model. While the parameterization of the primary biogenic source of OA is extremely uncertain, model comparison with measurements is improved when accounting for this source in that model, by reducing the MNB. The correlation of the model results with observations does not change significantly when including or excluding this source. However, station-by-station comparison indicates a low level of understanding of the spatial and seasonal variability of this natural source, which deserves further investigation and improvement.
Both the model diversity that increased with increasing model complexity over the past decade, as well as the comparison of model results with station data, reveal important gaps in our understanding of OA concentrations, sources and sinks in the atmosphere, and point towards the need for better understanding of sources and chemical aging of OA. Although the increasing complexity did not significantly improve the model performance, model complexity is imposed by the need to provide information for future developments that will help quantify the anthropogenic impact to climate via the aerosol direct and indirect effects. The existence of significant secondary sources of OA that are enhanced by interactions of natural with anthropogenic emissions remains an open question that cannot be answered by a simple OA parameterization. Furthermore, the OA impact on climate depends on the OA physical, chemical and optical properties, as well as the OA distribution in the atmosphere, which is affected by continuous evaporation/condensation processes of semi-volatile organic material and consequent change of hygroscopicity.
In this respect, new information from dedicated field campaigns that either occurred over the past few years or are planned to take place soon, is expected to shed light on the OA formation processes and how these are altered in the presence of anthropogenic pollution. The model development related to OA is expected to accelerate in the near future and must be performed in parallel with extensive model evaluation. Important processes currently not included in many models that need to receive high priority from modeling groups include the semi-volatile nature of OA, the temperature-dependent OA formation and aging, which affects their volatility, and an improved parameterization of the OA / OC ratio. Improved laboratory measurements of SOA formation are also crucial for the model improvements . Isoprene, terpenes, aromatics, higher molecular weight alkanes and alkenes Prescribed mass yields for the 5 trSOA precursor categories (6.0, 37.5, 22.5, 7.5, and 7.5 %, respectively) that form a single semi-volatile species that then kinetically but reversibly partitions to the OA phase.
1.4 Precursor VOCs are lumped species from MOZART. Yields listed include a 1.5 times increase to reduce anthropogenic aerosol indirect forcing. The single semi-volatile gas has a saturation mixing ratio of 0.1 ppbv at 298 K. Includes aerosol microphysics (MAM3; modal).

1.4
Includes aerosol microphysics (moments). 1.4 50 % of anthropogenic and biomass burning OC is emitted as hydrophobic and 50 % as hydrophilic (Cooke et al., 1999); hydrophobic OC becomes hydrophilic in an e-folding time of 2.5 days.
IMAGES tPOA, trSOA, ntr-SOA 26 Isoprene, a-pinene, sesquiterpenes, benzene, toluene, xylene Two-product model Varying trSOA includes the effect of water uptake on partitioning. ntrSOA is glyoxal and methylglyoxal from cloud chemistry and aqueous aerosol processing.
IMPACT tPOA 5 , trSOA, ntrSOA 33 Isoprene, monoterpenes, aromatics SOA comes from organic nitrates and peroxides using the traditional gasparticle partitioning with an explicit full chemistry. The condensed SOA is further assumed to form oligomers with a 1 day e-folding time.