Quantifying the structural uncertainty of the aerosol mixing state representation in a modal model

Aerosol mixing state is an important emergent property that affects aerosol radiative forcing and aerosol-cloud interactions, but it has not been easy to constrain this property globally. This study aims to verify the global distribution of aerosol mixing state represented by modal models. To quantify the aerosol mixing state, we used the aerosol mixing state indices for submicron aerosol based on the mixing of optically absorbing and non-absorbing species (χo), the mixing of primary carbonaceous and non-primary carbonaceous species (χc), and the mixing of hygroscopic and non-hygroscopic species (χh). To 5 achieve a spatiotemporal comparison, we calculated the mixing state indices using output from the Community Earth System Model with the modal MAM4 aerosol module, and compared the results with the mixing state indices from a benchmark machine-learned model trained on high-detail particle-resolved simulations from the particle-resolved stochastic aerosol model PartMC-MOSAIC. The two methods yielded very different spatial patterns of the mixing state indices. In some regions, the yearly-averaged χ value computed by the MAM4 model differed by up to 70 percentage points from the benchmark values. 10 These errors tended to be zonally structured, with the MAM4 model predicting a more internally mixed aerosol at low latitudes, and a more externally mixed aerosol at high latitudes, compared to the benchmark. Our study quantifies potential model bias in simulating mixing state in different regions, and provides insights into potential improvements to model process representation for a more realistic simulation of aerosols.

PartMC-MOSAIC tracks the composition of individual particles and therefore resolves aerosol mixing state explicitly (Riemer et al., 2009;Zaveri et al., 2008). However, this modeling approach is computationally very expensive and therefore not practical for large-scale simulations of several months or years of simulation time. To estimate the global spatial distribution of mixing state, we recently developed a machine-learned (ML) model based on high-detail particle-resolved simulations (Zheng et al., 2021) that uses inputs that are known from global model simulations to predict χ. In this paper, we use this ML model 60 to predict the spatial distribution of the mixing index χ and then compare the results with χ-values that are derived from the Community Earth System Model Version 2 (CESM2 version 2.1.0; Danabasoglu et al., 2020) using the 4-mode version of the Modal Aerosol Module (MAM4; Liu et al., 2016). This paper is organized as follows. In Section 2 we introduce the setup of the Earth system model simulations. The definition of mixing state indices and the derivation of aerosol mixing state indices for modal models are given in Section 3. Section 4 65 briefly describes the ML model generated with machine learning and particle-resolved modeling for estimating the benchmark aerosol mixing state indices. Section 5 focuses on the comparison of mixing state indices from the particle-resolved and modal models, and Section 6 summarizes our findings.

Global model simulations
Here we employed CESM2 to provide the global model simulation data. Specifically, we used the component set FHIST to set 70 up the global simulations with aerosols. This component set represents a typical historical simulation in the Community Atmospheric Model (CAM6; Bogenschutz et al., 2018) using an active atmosphere and land with prescribed sea-surface temperatures and sea-ice extent, and a 1-degree finite volume dycore with the forcing data available from 1979 to 2015.
MAM4 is the default aerosol module of this component set, which represents the aerosol size distribution with four lognormal modes (Aitken, accumulation, coarse, and primary carbon modes; Liu et al., 2016). MAM4 tracks six aerosol species, and these 75 are distributed over the four modes as follows. The Aitken mode consists of dust, sulfate, secondary organic aerosol (SOA), and sea salt. The accumulation mode includes sulfate, SOA, sea salt, primary organic matter (POM), black carbon (BC), and dust. The coarse mode contains sulfate, dust, and sea salt. The primary carbon mode contains only BC and POM, which are supplied by primary aerosol emissions.
The choice of modes in MAM4 is motivated by the desire to treat the microphysical aging of the primary carbonaceous 80 aerosols in the atmosphere (Liu et al., 2016), similar to other modal models used in regional or global models (Riemer et al., 2003;Vogel et al., 2009). In MAM4, mass and number concentrations of BC and POM in the primary carbon mode are transferred to the accumulation mode by the processes of intermodal coagulation and condensation of SOA and sulfuric acid onto the primary carbon mode. The accumulation mode then represents aged BC and POM, as these species are internally mixed with other aerosol species. The MAM4 treatment of aging is critical for improving the long-range transport of carbonaceous 85 aerosols to remote regions such as the polar region, which suffered from a low bias in a prior version of the model when only three internally-mixed modes were used. We ran the model for the year 2011 with 6 years (2005-2010) of spinup. The simulation was conducted at a resolution of 0.9 • latitude by 1.25 • longitude along with emission inventories from CMIP6 emissions (Emmons et al., 2020). We stored the instantaneous outputs every three hours during the simulation, which yields 2920 timestamps for each surface-layer grid cell 90 for the entire year of simulation time. The surface layer was chosen to be in line with the PartMC-MOSAIC model scenarios that were used as training data for the ML models of mixing state indices (see Section 3) and which were designed to represent conditions in the planetary boundary layer. The mixing state index χ (Riemer and West, 2013) quantifies where an aerosol population lies on the continuum from external to internal mixing. That is, how "spread out" the chemical species are over an aerosol population. We will focus here on the mixing state of sub-micron aerosols (PM 1.0 ) due to their relevance for light scattering and absorption , and their contribution to CCN formation (Asmi et al., 2011;Pierce et al., 2015;Yu and Luo, 2009).
To summarize, the mixing state index χ is given by the affine ratio of the average particle species diversity, D α , and bulk 100 population species diversity, D γ , as The diversities D α and D γ are calculated as follows. First, the per-particle mixing entropies H i are determined for each particle by

105
Here, A is the number of distinct aerosol species and p a i is the mass fraction of species a in particle i. These values are then averaged (mass-weighted) over the entire population to obtain the average particle species diversity D α by where N p is the total number of particles in the population and p i is the mass fraction of particle i in the population. Finally, 110 the bulk diversity D γ is calculated as where p a is the bulk mass fraction of species a in the population.
Note that the definition of "species" for calculating χ is based on application needs. It can be based on operationally defined 115 chemical species (Healy et al., 2014;Ye et al., 2018), elemental composition (Fraund et al., 2017;O'Brien et al., 2015), or species groups such as volatile and nonvolatile species (Dickau et al., 2016) or hygroscopic and non-hygroscopic species (Ching et al., 2017;Hughes et al., 2018). In this paper we consider three different definitions of χ, which we explain in more detail in Section 3.2.

120
The framework laid out in Section 3.1 can be easily generalized to a modal modeling framework. The bulk mixing entropy, H γ , and the bulk diversity, D γ , can be calculated using the bulk mass fractions, p a , of species a from the MAM4 simulation and Equations (5) and (6). To calculate the average particle mixing entropy, H α , and the average particle species diversity, D α , we use 125 where p a m is the mass fraction of species a in mode m, p m is the mass fraction of mode m in the population, and H m are the per-mode mixing entropies. Finally, the mixing state index, χ, can be calculated using Equation (1). Note that Eqs. (7) and (8) are analogous to Eqs. (2) and (3). A detailed derivation of these equations is provided in the Appendix A.

130
In this study, we consider the mixing states of submicron aerosols including the Aitken, accumulation, and primary carbon modes and we do not include the coarse mode because the coarse particles are above one micron. Since the mixing entropies are mass-weighted (rather then number-weighted), the mixing state index is more representative of the modes with the larger particles, i.e., the accumulation and primary carbon modes.

135
Here we compare and contrast the aerosol mixing state indices defined in three different ways, namely based on the mixing of optically absorbing and non-absorbing species (χ o ), based on the mixing of primary carbonaceous and non-primary carbonaceous species (χ c ), and based on the mixing of hygroscopic and non-hygroscopic species (χ h ). Table 1 shows the definitions of these aerosol mixing state indices.
For χ o , we considered two surrogate species: black carbon (strongly absorbing, assigned a mass absorption coefficient in 140 CESM2 at 533 nm and 0% RH of 8.144 m 2 g −1 ) and the five other aerosol species grouped together (less absorbing or nonabsorbing, with mass absorption coefficients in CESM2 at 533 nm and 0% RH of 0.1442 m 2 g −1 , 9.975 × 10 −2 m 2 g −1 , 4.703 × 10 −2 m 2 g −1 , 2 × 10 −6 m 2 g −1 , 5 × 10 −7 m 2 g −1 , for POM, SOA, dust, sea salt, and sulfate, respectively). Thus, a lower value in χ o refers to the case where the strongly absorbing species black carbon (Yang et al., 2009) and the sum of the other species (termed "non-absorbing" here for convenience) are more externally mixed.

145
The index χ c is motivated by the primary carbon treatment of MAM4, where the primary particulate organic matter and black carbon are assigned to a separate primary carbon mode (Liu et al., 2016). A lower value in χ c refers to the situation where the primary carbonaceous species and all other species exist separately in different particles.
Similarly, χ h was also calculated from two surrogate species. We combined black carbon, primary organic matter and dust as one surrogate species, given their comparatively lower hygroscopicities (kappa values of ∼0, ∼0, and 0.068, respectively).

150
Accordingly, NaCl (1.16), SOA (0.14), and sulfate (0.507) were grouped as the other surrogate species. Here, a lower value in χ h represents the case where hygroscopic and non-hygroscopic species tend to be present in separate particles.

Machine-learned models of mixing state indices
Aerosol mixing state indices can be calculated directly using particle-resolved modeling, but this comes with large computational costs. Alternatively, Zheng et al. (2021) developed ML models, which integrate machine learning and particle-resolved ML models, an ensemble of particle-resolved model scenarios was created using the particle-resolved model PartMC-MOSAIC (Riemer et al., 2009;Zaveri et al., 2008). In brief, PartMC-MOSAIC simulates individual aerosol particles within a representative volume of air, including stochastic coagulation, particle-phase thermodynamics, gas-and particle-phase chemistries, and dynamic gas-particle mass transfer. Thus, the composition of the individual particles within a population evolves dynamically 160 and assumptions about mixing state are not necessary.
The strategy to generate the data was to vary the input parameters (45 in total) for the PartMC-MOSAIC model, including primary emissions of different aerosol types (e.g., carbonaceous aerosol and dust emissions), primary emissions of gas phase species (e.g., SO 2 , NO 2 , and various volatile organic compounds), and meteorological parameters (see Table 1 in Zheng et al. (2021) for more information). For instance, to vary the gas emissions, scaling factors were sampled from 0% to 200% for 165 different gas species, based on the emission rates in Riemer et al. (2009). A Latin hypercube sampling approach was employed to sample the parameter space efficiently for the training and testing data sets.
The ML models were derived by the machine learning algorithm eXtreme Gradient Boosting (XGBoost; Chen and Guestrin, 2016) from 45 000 particle populations. Each ML model was a tree-based ensemble model that could handle complex nonlinear interactions and collinearity among features. The hyperparameters were determined by grid search with 10-fold cross-170 validation. The ML models can be expressed as where χ S (x, y, t) is the mixing state index (χ o , χ c , or χ h ) at location (x, y) in the model layer nearest the surface at time t and f S denotes the function for calculating the corresponding mixing state index χ S . The set names A (aerosol), G (gas), and E (environmental) represent the predictors (features) used for predicting the mixing state index. Aerosol species include 175 black carbon, mineral dust, sea salt, primary organic aerosol, secondary organic aerosol, and sulfate. Of note is that we used the bulk (not the per-mode) concentrations of submicron aerosol species as the features. The gas species include dimethyl sulfide, hydrogen peroxide, sulfuric acid, ozone, semi-volatile organic gas and sulfur dioxide. The environmental variables are air temperature, relative humidity, and solar zenith angle. Table 2 shows the performance of the ML models when predicting the mixing state indices. The mixing state calculation in this study was purely based on the above six aerosol species (excluding 180 other aerosol species) for a fair comparison with the mode-based aerosol mixing state index, which resulted in slightly different performance of the ML model compared to Zheng et al. (2021). The average error of the ML model is about 5% for χ o and 8% for χ c and χ h (measured by mean absolute error).  Table 2. Predictive performance of the ML models using the testing data set. Metrics include the mean absolute error (MAE), root-meansquare error (RMSE), median absolute deviation (MAD), index of agreement (d; Willmott, 1981), Pearson correlation coefficient (PCC), and coefficient of determination (r 2 ).
XGBoost ML Models To compare the annual mean values, we calculated the mean difference (∆χ S ) and the mean absolute difference (|∆χ S |) for 190 each grid cell of the layer closest to the surface: where the subscript S refers to the mixing state index (o, c, or h), and the total number of timestamps is T = 2920. Since it only makes sense to quantify mixing state when at least two species are present in a given location, areas where the mass fraction of 195 any one surrogate species was higher than 99% for χ o (due to the low mass fraction of black carbon) and 97.5% for χ c and χ h were ignored for the calculation and appear as hatched areas in Figure 3. We will first discuss the overall probability density functions of these quantities ( Figure 2) and then their spatial distributions ( Figure 3).    and non-hygroscopic species were more externally mixed at high latitudes and more internally mixed at low latitudes. In contrast, the spatial distribution of χ ML h shows qualitative differences compared to χ ML c in two aspects. First, χ ML h was higher than χ ML c at high latitudes, meaning that hygroscopic species and non-hygroscopic appeared more internally mixed than primary carbonaceous and non-carbonaceous species in this region. Second, areas over the North Atlantic Ocean  where mineral dust is the dominant aerosol species (see Fig. A2).
These two facts lead to the overall finding that χ h exhibits the largest differences between the two methods. This applies especially to regions where mineral dust was the dominant aerosol species, which points to an important structural issue of the four-mode setup used in MAM4. While the ML model predicted a more external mixture in these regions (dust externally mixed from sea salt and other species), the MAM4 model could not represent this because the accumulation mode included 240 all six aerosol species in an internal mixture. Figure 4 illustrates the relationship of the mean absolute difference of χ h and the mass fraction of dust for all model gridpoints. It confirms that gridpoints with large dust mass fractions were associated with larger mean absolute differences in χ h . These results confirm the tradeoff discussed in Liu et al. (2012): MAM3 (and MAM4 in Liu et al. (2016)) intentionally combines dust and sea salt in the same mode to reduce the computational burden, however this simplification does not always realistically reflect the aerosol mixing state in the ambient atmosphere.

245
It is interesting to note that the areas where sea salt is present, but not dust, are not associated with large errors, even though sea salt-just like mineral dust-is a primary aerosol type. The reason for this lies in our surrogate species definitions (Table 1) for computing the mixing state index. Sea salt is always grouped with at least two other species (e.g., SOA and sulfate for χ h ). Therefore, none of the mixing state indices as defined here tell us how externally mixed sea salt is when it is considered as a single aerosol type. 250 Figure 5 further demonstrates the zonal mean annual aerosol mixing state indices, highlighting that differences between χ c and χ h tended to be zonally structured, where the MAM4 model overestimated at low latitudes, while it underestimated at high latitudes relative to the ML model. In contrast, the MAM4 model overestimated χ o at all latitudes north of 60 • S.  Figure 6a and d. The corresponding value for χ o is 80%. Figure 6b depicts particle population that was sampled from the MAM4 population in Figure 6a. All particles, except for the smallest ones (corresponding to Aitken mode particles), contain BC, which results in the relatively high mixing state index value for χ o . Note that in MAM4 BC is not included in the Aitken mode by definition. Considering the same particle population, but now evaluating the mixing state metric χ c , which 265 quantifies the degree of mixing of primary carbon and other species, yields the following observation. The entire primary carbon mode, by definition, consists of POM and BC, which results in an appreciable number of particles that contain only primary carbon (BC+POM), giving a mixing state index χ c of only 27%.

Interpretation of findings
We now compare the MAM4-sampled particle populations above to particle populations that were sampled from our PartMC scenario library. We searched for populations with similar mass fractions of BC and POM as in the MAM4 populations and that 270 were simulated at a similiar latitude as the grid point location of the CESM2/MAM4 model output. Figure 6c shows that the PartMC results have comparatively more BC-free particles, and Figure 6f shows that comparatively more particles are mixtures of primary carbon and other species. Overall, this means that in MAM4 BC appears too internally mixed (because irrespective of whether BC is placed in the primary carbon or accumulation mode, it is by design mixed with other species) and that at high latitudes the primary carbon mode is not transferring mass to the accumulation mode as quickly as is the case in PartMC 275 simulations.
The reason why MAM4 behaves in this way can be explained by the aging process treatment in MAM4. Aging in MAM4 is formulated using a threshold criteria. That is, BC and POA mass is transferred from the primary carbon mode to the accu-mulation mode when a certain threshold of sulfate and SOA has condensed. In MAM4 this threshold is set to a relatively large value. This is done to prevent BC from being removed too quickly by wet deposition-because the primary carbon mode has 280 a lower hygroscopicity than the accumulation mode and thus a lower wet scavenging efficiency-thereby counteracting a low bias in BC concentrations in the Arctic regions. From Wang et al. (2018) we already know that using such a high threshold may not be appropriate. However, the global model also has biases in other processes that contribute to the low BC bias in the Arctic, and setting the threshold to a high value compensates for these errors. Our results are a reflection of this fact.

Comparison to observational data
The question that arises from Sections 5.1 and 5.2 is of course: Which spatial distribution of aerosol mixing state reflects reality more closely? The validation of simulated mixing state indices with observational data is still challenging since perparticle mass fractions of species are required for calculating the mixing state indices (see Section 3.1). These are in principle obtainable from in situ deployments of single-particle mass spectrometers or by using electron microscopy techniques, but 295 their quantitative derivation comes with challenges and is not routinely done, so that only very few datasets exist that allow for a meaningful comparison (Riemer et al., 2019). Keeping these limitations in mind, Zheng et al. (2021) reported a qualitative comparison of available measurements of mixing state metrics in locations in developed countries (Paris, France; Pittsburgh, USA; various locations in Japan) (Healy et al., 2014;Ye et al., 2018;Ching et al., 2019) with seasonally-averaged results from the ML model based on particle-resolved simulations. This showed that the ML model was able to capture the range of values 300 that is consistent with the observations.
We further compared the ML model estimates using recent observations from China. Specifically, we compared χ ML o and χ MAM4 o with χ values from Taizhou (Zhao et al., 2021) and Beijing (Yu et al., 2020)   We can also relate our χ o index qualitatively to the Single Particle Soot Photometer measurements in the Finnish Arctic 315 during winter 2011-2012 (Raatikainen et al., 2015). Although this study did not provide quantitative mixing state index calculations, it is an important finding that BC-containing particles (with various amounts of coatings) co-existed with BC-free particles. As we saw in Section 5.2, this condition can easily be represented with a particle-resolved approach. However, the modal model with modes configured as in MAM4 puts black carbon in all accumulation-sized particles (Figure 6), which is not consistent with the observations. 320

Conclusions
In this paper we present a framework for evaluating the error in submicron aerosol mixing state induced by aerosol representation assumptions, which is one of the important contributors to structural uncertainty in aerosol models. We quantitatively compared mixing state indices for submicron aerosol predicted by the modal model MAM4 within the global model CESM to a machine-learned model based on high-detail particle-resolved simulations. We focused on the mixing of optically absorb-325 ing and non-absorbing species (χ o ), the mixing of primary carbonaceous with other aerosol species (χ c ), and the mixing of hygroscopic and non-hygroscopic species (χ h ).
For χ o , the MAM4 modal representation generally overestimated the degree of mixing of BC with other aerosol species.
This overestimation is due to the fact that MAM4's choice of modes does not allow for representing BC-free particles in the accumulation and primary carbon modes. This is in contrast to field observations by Brown et al. (2021), which showed that 330 BC and POM may be externally mixed near sources. The implication of this is that, if optical properties are calculated based on the aerosol composition, absorption will be overestimated.
For χ c and χ h , the error tended to be zonally structured, where the MAM4 model overestimated the mixing state indices at low latitudes, and underestimated them at high latitudes, compared to the ML model. This behavior could be explained by modeling choices in MAM4, in particular that (1) BC is always emitted with POM, (2) no BC-free particles exist in the 335 submicron modes, and (3) dust is always internally mixed with other aerosol species.
Mixing state is an important emergent property that affects the aerosol radiative forcing and aerosol-cloud interactions, but it is not easy to constrain this property globally. To the best of our knowledge, this is the first study that evaluated the spatial distribution of aerosol mixing state as predicted by a global model. Since errors in mixing state predictions propagate into errors in aerosol climate impacts, our findings provide a framework and reference for Earth system model developers and users 340 regarding simulation reliability. For example, this framework can be used to (1) quantify model bias in simulating mixing state in different regions, identifying model structural deficiencies, and (2) provide insights into potential improvements of model process representations for a more realistic simulation of aerosols.   To explain how to obtain Eqs. (7) and (8) from Eqs. (2) and (3), let us assume that each mode m contains N m particles, and the number of species in the population is A. The mixing entropy of particle i in mode m, H m,i , is given by The average particle mixing entropy of the entire population (summed over all modes), H α , is Given that each mode is assumed to be internally mixed, particles within the same mode have the same composition, and we With the mode-based H α , the other mixing state quantities can be computed as described in Section 3.2. 360 bc_a1 bc_a4 ncl_a1 ncl_a2 pom_a1 pom_a4 soa_a1 soa_a2 dst_a1 dst_a2 so4_a1 so4_a2 10 4 10 2 10 0 10 4 10 2 10 0 10 2 10 1 10 0 10 6 10 4 10 2 10 3 10 1 10 1 10 3 10 1 10 1 10 3 10 1 10 1 10 7 10 4 10 3 10 1 10 1 10 7 10 5 10 3 10 1 10 1 10 3 10 1 Figure A1. Aerosol species mixing ratio (µg/kg). a1: accumulation mode, a2: Aitken mode, a4: primary carbon mode, bc: black carbon, dst: dust, ncl: sea salt, pom: primary organic matter, soa: secondary organic aerosol, so4: sulfate.
Author contributions. ZZ, MW and NR conceptualized the analysis and wrote the manuscript with input from the co-authors. ZZ developed the code, carried out the simulations, and performed the analysis. LZ, PM, and XL provided scientific suggestions for the manuscript. All authors were involved in helpful discussions and contributed to the manuscript.
Acknowledgements. We would like to acknowledge high-performance computing support from Cheyenne (doi:10.5065/D6RX99HX) provided by NCAR's Computational and Information Systems Laboratory, sponsored by the National Science Foundation. The CESM project is supported primarily by the National Science Foundation. This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993) the State of Illinois, and as of December, 2019, the National Geospatial-Intelligence Agency. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications. We also acknowledge funding from DOE grant DE-SC0019192 and NSF grant