Improved representation of the global dust cycle using observational constraints on dust properties and abundance

Even though desert dust is the most abundant aerosol by mass in Earth’s atmosphere, atmospheric models struggle to accurately represent its spatial and temporal distribution. These model errors are partially caused by fundamental difficulties in simulating dust emission in coarseresolution models and in accurately representing dust microphysical properties. Here we mitigate these problems by developing a new methodology that yields an improved representation of the global dust cycle. We present an analytical framework that uses inverse modeling to integrate an ensemble of global model simulations with observational constraints on the dust size distribution, extinction efficiency, and regional dust aerosol optical depth. We then compare the inverse model results against independent measurements of dust surface concentration and deposition flux and find that errors are reduced by approximately a factor of 2 relative to current model simulations of the Northern Hemisphere dust cycle. The inverse model results show smaller improvements in the less dusty Southern Hemisphere, most likely because both the model simulations and the observational constraints used in the inverse model are less accurate. On a global basis, we find that the emission flux of dust with a geometric diameter up to 20 μm (PM20) is approximately 5000 Tg yr−1, which is greater than most models account for. This larger PM20 dust flux is needed to match observational constraints showing a large atmospheric loading of coarse dust. We obtain gridded datasets of dust emission, vertically integrated loading, dust aerosol optical depth, (surface) concentration, and wet and dry deposition fluxes that are resolved by season and particle size. As our results indicate that this dataset is more accurate than current model simulations and the MERRA-2 dust reanalysis product, it can be used to improve quantifications of dust impacts on the Earth system. Published by Copernicus Publications on behalf of the European Geosciences Union. 8128 J. F. Kok et al.: Improved representation of the global dust cycle


Introduction
Desert dust produces a wide range of important impacts on the Earth system, including direct and indirect interactions with radiation, clouds, the cryosphere, biogeochemistry, atmospheric chemistry, and public health (Shao et al., 2011). Despite the importance role of dust in the Earth system, simulations of the global dust cycle suffer from several key deficiencies. For instance, models show large differences relative to observations for critical aspects of 45 the global dust cycle, including dust size distribution, surface concentration, dust aerosol optical depth (DAOD), and deposition flux (e.g., Huneeus et al., 2011;Albani et al., 2014;Ansmann et al., 2017;Wu et al., 2020). Moreover, models struggle to reproduce observed interannual and decadal changes in the global dust cycle over the observational record (Mahowald et al., 2014;Ridley et al., 2014;Smith et al., 2017;Evan, 2018;Pu and Ginoux, 2018), and it remains unclear whether atmospheric dust loading will increase or decrease in response to 50 future climate and land use changes (Stanelle et al., 2014;Kok et al., 2018).
One key reason that models struggle to accurately represent the global dust cycle and its sensitivity to climate and land use changes is that dust emission is a complex process for which the relevant physical parameters vary over short distances of about 1 m to several km (Okin, 2008;Bullard et al., 2011;Prigent et al., 2012;Shalom et al., 2020). As such, large-scale models with typical spatial resolutions on the order of 100 km are fundamentally ill-55 equipped to accurately simulate dust emission. Confounding the problem is the non-linear scaling of dust emissions with near-surface wind speed above a threshold value (Gillette, 1979;Shao et al., 1993;Kok et al., 2012;Martin and Kok, 2017). As such, dust emissions are especially sensitive to variations in both wind speed and the soil properties that set the threshold wind speed. Despite some recent progress, accounting for the effect of sub-grid-scale wind variability on dust emissions remains a substantial challenge that causes the simulated global dust cycle to be 60 sensitive to model resolution (Cakmur et al., 2004;Comola et al., 2019), especially at low resolution (Ridley et al., 2013). An even more substantial challenge for models is likely the small-scale variability of vegetation Okin, 2008), surface roughness (Menut et al., 2013), soil texture (Laurent et al., 2008;Martin and Kok, 2019), mineralogy (Perlwitz et al., 2015a), and soil moisture (McKenna Neuman and Nickling, 1989;Fécan et al., 1999). These and other soil properties control both the dust emission threshold, and the intensity of dust emissions 65 once wind exceeds the threshold (Gillette, 1979;Shao, 2001;Kok et al., 2014b). Models lack accurate highresolution data sets of pertinent soil properties, which also limits the use of dust emission parameterizations that incorporate the effect of these soil properties (e.g., Darmenova et al., 2009). As a result of these fundamental challenges in accurately representing dust emission, most models use both a source function map (Ginoux et al., 2001) and a global dust emission tuning constant to produce a global dust cycle that is in reasonable agreement with 70 measurements (Cakmur et al., 2006;Huneeus et al., 2011;Albani et al., 2014;Wu et al., 2020).
A second key problem limiting the accuracy of model simulations of the global dust cycle is the challenge to adequately describe dust properties such as dust size, shape, mineralogy, and optical properties. All these dust properties have recently been shown to be inaccurately represented in many models (Kok, 2011b;Perlwitz et al., 2015b;Pérez Garcia-Pando et al., 2016;Ansmann et al., 2017;Di Biagio et al., 2017;Di Biagio et al., 2019;Adebiyi 75 and Huang et al., 2020). These and other dust properties are difficult to describe accurately in models because parameterizations are not always kept consistent with up-to-date experimental and observational constraints. In addition, models need to use fixed values for such physical variables and thus can only represent the uncertainties inherent in such constraints through computationally expensive perturbed parameter ensembles (Bellouin et al., 2007;Lee et al., 2016). 80 The nature of these challenges in accurately representing the global dust cycle is such that they are difficult to overcome from advances in modeling alone (e.g., Stevens, 2015;Kok et al., 2017;. We therefore develop a new methodology to obtain an improved representation of the present-day global dust cycle. We present an analytical framework that uses inverse modeling to integrate observational constraints on dust abundance and dust properties with an ensemble of global model simulations. Our procedure finds the emissions from different 85 major source regions and particle size ranges that best match simultaneous observational constraints on the dust size distribution, extinction efficiency, and regional dust aerosol optical depth. This methodology propagates uncertainties in these observational constraints and due to the spread in simulations in the model ensemble. As such, our approach mitigates the consequences of the fundamental difficulty that models have in representing the magnitude and spatiotemporal variability of dust emission, and in representing the properties of dust and the 90 uncertainty in those properties. Moreover, whereas the assimilation of observations in reanalysis products creates inconsistencies between the different components of the dust cycle (i.e., emission, loading, and deposition are not internally consistent), our framework integrates observational constraints in a self-consistent manner.
We detail our approach in Sect. 2, after which we summarize independent measurements used to evaluate our representation of the global dust cycle in Sect. 3, and present results and discussion in Sections 4 and 5. We find that 95 our procedure results in a substantially improved representation of the Northern Hemisphere global dust cycle and modest improvements for the Southern Hemisphere. We provide a data set representing the global dust cycle in the present climate (2004)(2005)(2006)(2007)(2008)) that is resolved by particle size and season. Because comparisons against independent measurements indicate that this data set is more accurate than obtained by an aerosol reanalysis product and by a large number of climate and chemical transport model simulations, this data set can be used to obtain more accurate 100 quantifications of the wide range of dust impacts on the Earth system.

Methods
We seek to obtain an improved representation of the global dust cycle by integrating observational and modeling constraints on dust properties and abundance with an ensemble of simulations of the spatial distribution of dust emitted from different source regions. We achieved this with an analytical framework that uses optimal estimation to 105 determine how many units of dust loading from different size ranges and main source regions are required to maximize agreement against observational constraints on the dust size distribution and dust aerosol optical depth near source regions (see Fig. 1). We then compare the results against independent measurements of dust surface concentration and deposition flux (Sect. 3.1). Although our methodology can be considered inverse modeling in that it inverts observational constraints to inform a model, the methodology used here differs substantially from standard 110 inverse modeling studies used in atmospheric and oceanic sciences (e.g., Bennett, 2002;Dubovik et al., 2008;Escribano et al., 2016;Brasseur and Jacob, 2017;Chen et al., 2019). We summarize the methodology in the next few paragraphs and then describe each step in detail in the sections that follow. companion paper, to be submitted soon. The subscripts r, s, and k respectively refer to the originating source region, the season, and a model's particle size bin. Other variables are defined in the main text and the glossary. 120 We first divided the world into nine major source regions (Fig. 2a), and obtained an ensemble of global model simulations of how a unit of dust mass loading (1 Tg) of different particle sizes from each of these source regions is distributed across the atmosphere (Sect. 2.1). We then used constraints on the globally averaged dust size distribution  and the size-resolved dust extinction efficiency  to determine the column-integrated dust aerosol optical depth produced by a single unit of bulk dust loading (1 Tg) from each 125 source region (Sect. 2.2). Then, we used an inverse model to determine the optimum number of units of loading that must be generated by each source region to best match joint observational-modeling constraints on the DAOD for fifteen regions (Fig. 2b) near major dust sources (Sect. 2.3). The calculations in Sections 2.2 and 2.3 are performed iteratively, because the fractional contribution to global dust loading from each source region affects the agreement against the constraint on the globally averaged dust size distribution. Since we have more regional DAOD 130 constraints than we have source regions, the problem is over-constrained, allowing for lower uncertainties on our results.
We summed the optimal dust loadings of the nine source regions to obtain the main properties of the global dust cycle, resolved by particle size, season, and location. Specifically, we obtained the dust emission flux, loading, concentration, deposition flux, and DAOD (Sect. 2.4), which we added to the Dust Constraints from joint 135 Experimental-Modeling-Observational Analysis (DustCOMM) dataset . Throughout these calculations, we used a bootstrap procedure to propagate uncertainties in the observational constraints on dust properties and abundance, and uncertainties due to the spread in our ensemble of model simulations of the spatial distributions of a unit of dust loading, concentration, and deposition (Sect. 2.5).
Our methodology uses a large number of variables, which are all listed in the glossary for clarity. To further help 140 distinguish between different variables, we denote input variables obtained directly from global model simulations with the accent "~" (yellow boxes in Fig. 1). These fields are seasonally averaged and either two-dimensional (2D; , ) or three-dimensional (3D; , , ), where , , and P respectively denote longitude, latitude, and the vertical pressure level (see Table 1). Moreover, all model fields are "normalized", meaning that they represent values produced per unit (1 Tg) of global loading of dust in a given particle size bin k from a given source region r and for 145 a given season s (seasons are taken as December-January-February (DJF), March-April-May (MAM), June-July-August (JJA), and September-October-November (SON)), We further use the accent "-" to denote an observational constraint on dust properties or dust abundance (blue boxes in Fig. 1). These include constraints on the globally averaged dust size distribution ( � atm ( ) ), the size-resolved extinction efficiency ( � ext ( )), and the regional DAOD ( ̅ ). All these fields have a quantified uncertainty, which we propagated through our analysis using the bootstrap 150 procedure discussed in Sect. 2.5. Finally, the accent " _ " denotes a product that results from our analysis, such as the 3D dust concentration, resolved by particle size and season (white and green boxes in Fig. 1). Such variables are thus generated by combining normalized model simulations with observational constraints on the dust size distribution, size-resolved extinction efficiency, and the DAOD near source regions.

Dividing the world into nine main source regions 155
The first step in our methodology is to divide the world into its major source regions. Most dust is emitted from the so-called "dust belt" of Northern Africa, the Middle East, Central Asia, and the Chinese and Mongolian deserts (Prospero et al., 2002). In addition, dust is emitted in smaller quantities from Australia, South Africa, and North and South America. Correspondingly, we divided the world into nine source regions that together account for the overwhelming majority (>99%) of desert dust emissions simulated in models (Fig. 2a). Our analysis includes both 160 natural and anthropogenic (land use) emissions of dust in those source regions, because our analysis is based on observations that by nature integrate both (but see further discussion in Sect. 5.1). However, our analysis explicitly does not include high-latitude dust sources, which produce dust through different mechanisms and with different properties than desert dust, yet likely dominate the dust loading for some high latitude regions Bullard et al., 2016;Tobo et al., 2019;Bachelder et al., 2020). The nine source regions partially follow the definition 165 in Mahowald (2007), with the main difference that we divided the North African source region, which accounts for ~half of global dust emissions (Wu et al., 2020), into western North Africa, eastern North Africa, and the Sahel. Similar dust source regions were also used in more recent studies (Ginoux et al., 2012;Di Biagio et al., 2017).   Table 2. Symbols in (c) and (d) denote groupings of observations by different regions. Made with Natural Earth.
We use an ensemble of global chemical transport and climate models (see Table 1) to obtain simulations of emission, transport, and deposition of dust from each of the nine source regions. Specifically, we use simulations 180 from the Community Earth System Model (CESM; Hurrell et al., 2013;Scanza et al., 2018), IMPACT , ModelE2 (Miller et al., 2006;Kelley et al., 2020), GEOS/GOCART (Rienecker et al., 2008;Colarco et al., 2010), MONARCH (Perez et al., 2011;Badia et al., 2017), and INCA/IPSL-CM6 . These six models were forced with three different reanalysis meteorology data sets (Table 1), which helped sample the uncertainty due to the exact reanalysis meteorology used, which past work indicates is substantial (Largeron et al., 185 2015;Smith et al., 2017;Evan, 2018). Most of the six models were ran for the years 2004-2008, or a subset thereof, to coincide with the analysis period of regional DAOD in Ridley et al. (2016), which provided most of observational DAOD constraints used in this study (see Table 1). Each model either ran a separate simulation for each source region, or used "tagged" dust tracers from each source region. The exact set-up of each model is described in the supplement. 190 Our inverse model uses several results derived from model simulations (Fig. 1). First, for each model we obtained the "normalized" seasonally averaged column loading ̃, , ( , ), which is the spatial distribution of a unit (1 Tg) of loading originating from source region r, for season s and particle size bin k. As such, the units of this field are m -2 (Tg m -2 loading per Tg of loading from source r), and we show annual averages of the normalized bulk dust loading for each model and source region in Fig. S1. Additionally, we obtained the "normalized" 3D concentration 195 (m -3 ) ( � , , ( , , ), units of m -3 ) and the 2D dust emission ( � , , ( , ), units of m -2 yr -1 ) and (dry and wet) deposition fluxes ( � , , ( , ), units of m -2 yr -1 ) that are associated with a unit of global dust loading for each source region, season, and particle size bin. All model fields were regridded using a modified Akima cubic Hermite interpolation (Akima, 1970) to a common resolution of 1.9° latitude by 2.5° longitude and with 48 vertical levels (see  for further details). 200 We restricted our analysis to dust with diameter D ≤ Dmax = 20 µm because there are insufficient measurements to constrain the abundance of coarser dust particles in the atmosphere  the few measurements that have been made of dust with D > 20 µm suggest that it is abundant over and near source regions such as North Africa and accounts for a non-negligible fraction of shortwave and longwave extinction (Ryder et al., 2019). As such, more measurements of "giant" dust are needed, which would allow the analysis 205 presented here to be extended to larger particle sizes in the future. Since some of the models in our ensemble do not account for dust with D up to 20 µm, we use the procedure in  see their section 2.3.1) to extend these models to 20 µm. In particular, we use the normalized 12 -20 µm particle size bin simulated by the GEOS/GOCART model to estimate what the CESM and GISS ModelE2 models would have simulated for an additional particle size bin extending to 20 µm (see Table 1). We chose this bin specifically from the 210 GEOS/GOCART model because it shows the best agreement against the observational constraint on regional DAOD (see Fig. 3). *Denotes an additional bin added to the original model output in order to extend the particle diameter range to Dmax = 20 μm. This additional bin was derived from the GEOS/GOCART 12-20 μm particle size bin (see main text). 215 **All model fields were regridded to a common resolution of 2.5° longitude by 1.9° latitude.

Constraining the spatially resolved DAOD corresponding to a unit (1 Tg) of bulk dust loading
We next implemented an inverse model to determine the optimal bulk dust loading that must be generated by each source region to produce the best match against constraints on regional DAOD. This inverse model thus requires the spatial pattern of DAOD produced per unit bulk dust loading from each source region, which is the Jacobian matrix 220 of DAOD with respect to dust loading. We obtained this DAOD produced per unit (1 Tg) of bulk dust loading by combining the simulated distributions of a unit of size-resolved dust loading (̃, , ( , )) with constraints on the globally averaged dust size distribution and extinction efficiency . The calculations of the Jacobian matrix (this section) and the optimal bulk loading per source region (next section) are performed iteratively, because each source region's fractional contribution to global dust loading from affects the 225 agreement against the constraint on the globally averaged dust size distribution.
The DAOD produced per unit of bulk dust loading originating from source region r in season s is  , ( , ) =̆, where � , is the globally integrated bulk dust loading generated by source region r in season s, ̆, ( , ) is the spatial distribution of DAOD due to dust from source region r in season s, , is the Jacobian matrix (Tg -1 ) of ̆, with respect to � , , bins is the number of particle size bins in a global model simulation (or derived from the 230 simulated modes), ̅ is the size-dependent mass extinction efficiency (m 2 /g) of particle size bin k and is defined further below, ̃, , ( , ) (m -2 ) is the simulated seasonally averaged spatial distribution of a unit of dust loading from source region r and particle bin k, and ̆, , (unitless) is the fractional contribution of dust loading in size bin k to the seasonally averaged global dust loading generated by source region r (i.e., ∑̆, , = 1). As such, Eq. (1) obtains the DAOD produced per unit dust loading from a given source region and season by adding up the 235 https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License. normalized spatial distributions of the loading from each particle size bin, in proportion to each bin's contribution to the globally integrated loading produced by the source region, and then multiplying the size-resolved loading by the mass extinction efficiency (MEE) to obtain the DAOD.
To obtain the Jacobian matrix in Eq. (1) we need to obtain ̆, , , each particle bin's fractional contribution to the globally integrated dust loading generated by source region r in season s. Because models as a group underestimate 240 the mass of particles with larger diameters (D > ~5 µm; Kok et al., 2017), we adjust the model size distribution to match a constraint on the globally averaged dust size distribution derived from a combination of observations and models . This procedure retains regional differences in the atmospheric dust size distribution that models simulated for the different source regions, while forcing the globally averaged dust size distribution that results from the summed contributions from all source regions to match the constraint on the 245 globally averaged dust size distribution. That is, where ̃, , is the modeled mass fraction per particle size bin for a given source region r and season s, and is the global correction factor for particle size bin k, which is different for each model. We obtained by setting the fraction of atmospheric dust in particle size bin k, summed over all source regions and seasons, equal to the constraint on the fractional contribution of particle size bin k to the global dust loading from Adebiyi and Kok 250 (2020). That is, where Nsreg = 9 is the number of source regions (Fig. 2a) and � atm ( ) is a realization of the size-normalized (that is, where Dmax = 20 µm) globally averaged volume size distribution from , which was obtained by combining dozens of in situ measurements of dust size distributions with an ensemble of climate model simulations. Further, Dk-and Dk+ are respectively the lower and upper diameter limits of particle size 255 bin k, and � , is the globally integrated and seasonally averaged bulk dust loading per source region (as obtained from the analysis below). As such, the denominator in Eq. (3) denotes the simulated globally averaged mass fraction, whereas the numerator denotes the globally averaged mass fraction in particle size bin k as constrained from in situ measurements and model simulations by .
The final ingredient needed to use Eq. (1) to obtain the DAOD produced by a unit (1 Tg) bulk dust loading from a 260 given source region and season is the MEE ( ̅ ). We do not use each model's assumed MEE because these tend to be substantially biased compared to measurements . This bias is largely due to a neglect or underestimation of the asphericity of dust (Huang et al., 2020), which increases the surface-to-volume ratio and thereby enhances the MEE by ~40 % . We thus follow Kok et al. (2017) in obtaining the MEE from constraints on the dust size distribution and the extinction efficiency of randomly-oriented (Ginoux, 2003;265 Bagheri and Bonadonna, 2016) aspherical dust. That is, where � ext ( ) is a realization of the globally averaged size-resolved extinction efficiency from the analysis of Kok et al. (2017), which is defined as the extinction cross-section divided by the projected area of a sphere with diameter D ( 2 /4). The term � atm ( ) inside the integrals approximates the sub-bin distribution in particle size bin k as the globally averaged dust volume size distribution. Further, ̅ = (2.5±0.2)×10 3 kg m -3 is the globally averaged density 270 of dust aerosols (Fratini et al., 2007;Reid et al., 2008;Kaaden et al., 2009;Sow et al., 2009). This observationally constrained density of dust is lower than the 2600 to 2650 kg m -3 used in many models (Tegen et al., 2002;Ginoux et al., 2004), most likely because dust aerosols are aggregates with void space that lowers their density below that of individual mineral particles.

Constraining the bulk dust loading generated by each source region 275
The above procedure combined model simulations of the 2D spatial variability of size-resolved dust loading with constraints on dust size distribution and MEE. This procedure yielded the spatial distribution of DAOD that is produced by a unit (1 Tg) of dust loading from a given source region and season. Next, we use an inverse modeling approach to determine the number of Tg of loading needed from each source region to produce optimal agreement against constraints on the seasonal DAOD over areas proximal to major dust source regions. 280 We use joint observational-modeling constraints on regional DAOD at 550 nm from Ridley et al. (2016). This study used three different satellite AOD retrievals -from the Multi-angle Imaging Radiometer (MISR) and the Moderate Resolution Imaging Spectroradiometer (MODIS) onboard the Terra and Aqua satellites -and bias-corrected those satellite data using more accurate ground-based aerosol optical depth measurements from AERONET. Ridley et al. (2016) then used an ensemble of global model simulations to obtain the fraction of AOD that is due to dust in 15 285 regions for which AOD is dominated by dust. Ridley et al. (2016) thus leveraged the strengths of these different tools by combining the accuracy of ground-based measurements with the global coverage of satellite retrievals and the ability of models to distinguish between different aerosol species. Furthermore, by averaging the resulting DAOD over large areas and long time periods (2004)(2005)(2006)(2007)(2008) for each season), this study minimized representation errors that can affect model comparisons to data (Schutgens et al., 2017). An additional strength of the Ridley et al. 290 (2016) analysis is that it propagates a range of uncertainties that are both observationally and modeling based, and which we propagate into our own analysis (see Sect. 2.5). Nonetheless, the Ridley et al. (2016) data are subject to some important limitations discussed further in Sect. 5.1.
Although we consider the Ridley et al. (2016) constraints on DAOD to be more accurate than constraints from individual satellite products, AERONET data, or aerosol reanalysis products, this study's results for the Southern 295 Hemisphere (SH) are susceptible to substantial biases. This is because dust makes up a substantially lower fraction of total AOD in the SH than for the main Northern Hemisphere (NH) source regions (e.g., Fig. S2 in Kok et al., 2014a). Therefore, we did not use the Ridley et al. (2016) results for the SH and instead used the seasonally averaged DAOD estimated by  over the three SH regions. These DAOD constraints are based on an ensemble of four aerosol reanalysis products, namely the Modern-Era Retrospective analysis for Research and 300 Applications, Version 2 (MERRA-2; Gelaro et al., 2017), the Navy Aerosol Analysis and Prediction System (NAAPS; Lynch et al., 2016), the Japanese Reanalysis for Aerosol (JRAero; Yumimoto et al., 2017), and the Copernicus Atmosphere Monitoring Service (CAMS) interim Reanalysis (CAMSiRA; Flemming et al., 2017). The resulting regional DAOD product also includes an error estimation based partially on the spread in DAOD in the four reanalysis products. In addition, we added a region over North America, for which Ridley et al. (2016) did not 305 obtain results, and for which we also use the reanalysis-based results of . In total, we thus have constraints with error estimates on the seasonal and area-averaged DAOD over 15 regions (see Fig. 2b and Table 2). We then used an inverse modeling approach to determine the optimal combination of dust loadings from the nine source regions (denoted with subscript r) that minimizes the disagreement against the DAOD constraint of these 15 observed regions (denoted with subscript p) for each season. We thus need to account for the contribution of each of 315 the nine source regions ( Fig. 2a) to the DAOD in each of these 15 observed regions. The seasonally averaged DAOD over the observed region p is where ̅ is the DAOD averaged over observed region p and season s, and , (Tg -1 ) is the Jacobian matrix of ̆, with respect to � , , where ̆, denotes the area-averaged and seasonally averaged DAOD over observed region p that is produced by dust from source region r. The Jacobian matrix , is the area-weighted DAOD over observed 320 region p that is produced per unit of bulk dust loading originating from source region r in season s. We obtain , by integrating Eq. (1) over Ap, the area of the observed region p (Table 2), The seasonally averaged globally integrated dust loading generated by each source region ( � , ) is thus determined from the number of units of dust loading from each source region r that results in best agreement against the constraint on DAOD ( ̅ ) over the 15 observed regions. Eq. (5) thus represents a system of equations for each 325 simulation in our global model ensemble, which we can write in explicit matrix form for clarity, We used Eq. (7) to obtain the seasonally averaged global dust loading generated by each source region. Specifically, for each season s we used the simplex search optimization method (Lagarias et al., 1998) to determine the nine values of � , that minimize the summed squared deviation ( 2 ) between the 15 DAOD constraints and the corresponding regional DAOD calculated from Eq. (7). That is (e.g., Cakmur et al., 2006), 330 where Nτ,reg = 15 and τ,reg = 9. Because the variables in Eqs.

Obtaining constraints on DAOD, emission, loading, deposition, and concentration
After constraining the seasonal dust loading � , generated by each source region, we now obtain the 2D DAOD and the size-resolved dust loading, emission and deposition fluxes, and 3D concentration. We do so by using the fact that 335 other dust cycle components (DAOD, concentration, deposition) scale linearly with dust loading because our model https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License.
simulations are driven by reanalysis products (Table 1), such that dust does not impact the meteorology. Each dust field can therefore be obtained by multiplying the simulated normalized dust field (e.g., seasonal dust concentration per unit dust loading) by the number of units of dust loading per source region and season ( � , ).
The 2D DAOD is then: 340 The size-resolved and bulk dust loadings are respectively: , and (10) Similarly, the 3D size-resolved and bulk concentrations produced by each source region are: where P is the vertical pressure level. And the size-resolved and bulk emission fluxes are: Finally, the size-resolved and bulk deposition fluxes are: See the glossary for further descriptions of each variable. In our companion paper, to be submitted soon, we further 345 partition these fields into the originating source region.

Improved model and inverse model results with uncertainty
The results represented by Eqs. (9)-(17) require realizations of the various inputs ( Fig. 1), which include both model fields and constraints on dust properties and abundance. Because each of these inputs is uncertain, and as such is represented by a probability distribution, we obtained two products that sample different aspects of this uncertainty 350 of the inputs, namely "improved model" results and "inverse model" results.
First, we obtained "improved model" results by sampling over different realizations of observational constraints on dust properties and abundance but using the output of only a single model. That is, we solved Eqs.
(1)-(17) a large number of times (100; limited by computational resources), and for each iteration we drew a random realization of each of the observational constraints but used simulation results from a single model. This procedure thus includes a 355 random drawing of realizations of the globally averaged dust size distribution ( � atm ( ) ), the extinction efficiency ( � ext ( )), the particle density ( ̅ ), and the observed regional DAOD ( ̅ ). As such, the improved model results represent output from a single model (see Table 1) for which DAOD is calculated from loading using the observational constraint on extinction efficiency (Eq. (4)), and for which the contributions from different source regions and particle bins are added in such a way to simultaneously match observational constraints on the dust size 360 distribution (Eq. (2)) and DAOD (Eq. (8)).
Second, we obtained our main product, namely the "inverse model" product that represents the optimal representation of the global dust cycle. We obtained this product by similarly sampling over different realizations of the input fields, but now including a random drawing of one of the six global model simulations in each of the bootstrap iterations. This additional step propagates uncertainty in model predictions of the normalized size-resolved 365 https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License. dust loading, concentration, and deposition fields into our results (Eqs. (9)- (17)). Because different models use different particle size bins (Table 1), we convert the size-resolved results from each bootstrap iteration to common particles size bins of 0.2-0.5, 0.5-1, 1-2.5, 2.5-5, 5-10, and 10-20 µm. We do so by assuming that sub-bin distributions follow the constraint on the globally averaged dust loading (Fig. 1). This assumption will introduce some further error in size-resolved results. For both the inverse model and improved model products, we retained 370 only those bootstrap iterations that produced a root-mean-squared error of less than 0.05 relative to the DAOD constraints; this quality control retained approximately three quarter of the iterations.
In drawing the realizations of seasonally averaged observed DAOD ( ̅ ), we need to account for correlations of errors between different seasons and regions. Specifically, some of the errors in the calculation of the DAOD in Ridley et al. (2016) and  are systematic, such as errors in satellite retrieval algorithms and 375 systematic model errors in simulations of (dust and non-dust) aerosols. These errors are thus at least partially correlated between seasons and regions, although we cannot establish the exact degree of correlation. We can thus roughly divide the errors into three different categories: errors that are completely random between seasons and regions, systematic errors that are correlated between different seasons for the same region, and systematic errors that are correlated across regions for a given season. The sum of the squared contributions of these three errors 380 equals the square of the total error � reported in Table 2. Since we cannot determine what the relative contribution of each of these three types of errors is, we assume that the contribution of each of these three errors is equal. Although the uncertainty in our results as quantified from the bootstrap procedure increases if a larger fraction of the DAOD error is assumed to be systematic, the median results presented in Sect. 4 are not sensitive to the partitioning of this error. The details of the mathematical treatment for calculating these errors are provided in the Supplement. 385 The bootstrap procedure used in the inverse model product propagates all the quantified random and systematic errors present in the inputs. Nonetheless, it cannot account for systematic biases in these inputs, such as the tendency of models to underestimate coarse dust lifetime van der Does et al., 2018;. As such, the obtained uncertainty ranges should be interpreted as a lower bound on the actual uncertainty.

Comparison of inverse model results against independent measurements and model simulations 390
We evaluate the results of the inverse model described in the previous section using independent measurements of dust surface concentration and deposition fluxes (Sect. 3.1). We also compare the inverse model results against the ensemble of AeroCom Phase 1 global dust cycle simulations (Huneeus et al., 2011) and the MERRA-2 dust product (Sect. 3.2).

Independent dust measurements used to evaluate the inverse model 395
We use two sets of independent measurements to evaluate the ability of the inverse model to reproduce the global dust cycle. The first data set is a compilation of dust surface concentration measurements. Of the 27 total stations in this compilation, 22 are measurements of the bulk dust surface concentration taken in the North Atlantic from the Atmosphere-Ocean Chemistry Experiment (AEROCE; Arimoto et al., 1995) and taken in the Pacific Ocean from the sea-air exchange program (SEAREX; Prospero et al., 1989) for observation periods noted in Table 2 of Wu et  400 al. (2020). These data were obtained by drawing large volumes of air through a filter. To reduce the effects of anthropogenic aerosols, measurements were only taken when the wind was onshore and in excess of 1 m/s (Prospero et al., 1989). The mineral dust fraction of the collected particulates was determined either by burning the sample and assuming the ash residue to represent the mineral dust fraction, or from their Al content (assumed to be 8% for mineral dust, corresponding to the Al abundance in Earth's crust) (Prospero, 1999). Note that since these 405 measurements were taken during the period 1981-2000, the dust surface concentration "climatology" obtained from these measurements is for a different time period than that of the model simulations used in the inverse model (Table  1).
Since most of the AEROCE and SEAREX stations are located far downwind of source regions, we also added a data set of dust surface concentration in the Sahel from the African Monsoon Multidisciplinary Analysis (AMMA; Lebel 410 et al., 2010;Marticorena et al., 2010). This data set contains measurements over 5-10 years of the surface concentration of aerosols with aerodynamic diameter ≤ 10 µm (PM10,aer) at four stations in the western Sahel (M'Bour, Bambey, Cinzana, and Banizoumbou; see http://www.lisa.u-pec.fr/SDT/). As with the AEROCE and https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License.
SEAREX data sets, only measurements were used for which the wind direction was predominantly coming from dust-dominated regions. As such, these measurements have at least two systematic errors: (i) the AMMA data 415 reported the concentration of all particulate matter, so taking these measurements as being of dust concentration overestimates the true dust concentration, and (ii) measurements taken when wind was not coming from a dustdominated region were omitted, which could also cause an overestimation of the dust concentration. To mitigate the effect of this second error, we only use seasonally averaged dust concentrations for which >70% of data was retained. This resulted in the omission of the Winter and Spring seasons at the Bambey station. 420 Following Huneeus et al. (2011) and Wu et al. (2020), we additionally added surface concentration measurements of PM10,aer dust from a long-term (May 1995-December 1996 filter-based deployment in Jabiru, in northern Australia (Vanderzalm et al., 2003). However, unlike Huneeus et al. (2011) and Wu et al. (2020), we do not use data obtained using a similar methodology in Rokumechi (Zimbabwe) because most of the dust at this site originated locally from within and near the national park where the station was located (p. 2649 in Nyanganyura et al., 2007). 425 To use the measurements of PM10,aer dust in Jabiru and the Sahel, we obtained the PM10,aer dust concentration for those models with size-resolved surface concentrations, namely the inverse model and each model in our ensemble. We did so by first obtaining the geometric diameter that corresponds to an aerodynamic diameter of 10 µm, which is DPM10aer = aer • 10 μm = 6.8 μm. This uses the conversion factor aer = 0.68 from Huang and Kok (2018), who accounted for the effects of particle shape (Huang et al., 2020) and density to link the aerodynamic and geometric 430 diameters. For each model, we then summed the contributions from particle bins with diameters smaller than DPM10aer and used a correction factor cPM10,aer for particle size bins that straddle DPM10aer. This correction factor uses the result from  that the globally averaged dust size distribution ( ) is approximately constant in the range of 5 -20 µm, such that the fractional contribution to the PM10,aer concentration of a bin that straddles DPM10aer can be approximated as 435 where − and + are respectively the lower and upper limits of the particle size bin that straddles 10 µm aerodynamic diameter (D = 6.8 µm).
The second independent data set that we used to evaluate the inverse model results is a compilation (110 stations) of the deposition flux of dust with geometric diameter ≤ 10 µm (PM10) from Albani et al. (2014). This study merged data from previous data sets (Ginoux et al., 2001;Tegen et al., 2002;Lawrence and Neff, 2009;Mahowald et al., 440 2009) and adjusted these data to cover the 0.1 -10 µm geometric diameter range. We obtained the PM10 deposition flux for the inverse model, the MERRA-2 data, and for each model in our ensemble following the approach above for the PM10,aer concentration data. Note that we cannot correct the concentration and deposition flux of the AeroCom Phase I models (next section) to the PM10,aer and PM10 size ranges because of a lack of size-resolved simulation data. We thus used the bulk concentration and deposition fluxes as many of these models simulated the 445 PM10 size range (see Table 3 in Huneeus et al., 2011).

Comparison of inverse model results against AeroCom models and MERRA-2
In order to compare the inverse model's representation of the global dust cycle against climate and chemical transport model simulations, we used the results of an ensemble of simulations for which the prognostic dust cycles were analyzed in detail, namely the AeroCom Phase I simulations of the dust cycle in the year 2000 (Huneeus et al., 450 2011). As such, the AeroCom simulations were obtained for a year closer to the time period in which most concentration and deposition measurements were taken (see above). We do not use newer AeroCom Phase II and Phase III simulations because only the dust component of Phase I models has been analyzed in detail. We furthermore also do not use recently analyzed dust cycle results from CMIP5 models (Pu and Ginoux, 2018;Wu et al., 2020) because less than half of CMIP5 models with prognostic dust cycles reported total deposition fluxes, 455 which are needed for the analyses against measurements (see previous section). In addition, many CMIP5 models did not include a prognostic dust cycle and instead read in pre-calculated dust emissions (Lamarque et al., 2010). But note that CMIP5 model errors against measurements are similar to those for AeroCom models and those for our model ensemble (e.g., compare Figs. 8 and 9 in Wu et al. (2020) against Figs. S9, S10, S12, and S13).
We analyzed the AeroCom Phase 1 model results to obtain the seasonally and annually averaged DAOD at 550 nm, 460 the dust surface concentration, and the annually averaged total (wet and dry) deposition fluxes for comparisons against measurements and the inverse model results. We also obtained the globally integrated annually averaged dust emission flux, dust loading, and DAOD. We obtained these variables for each of the 13 AeroCom simulations available from the online AeroCom database (see https://aerocom.met.no/; this repository does not contain the 14 th model simulation analyzed in Huneeus et al., from the ECMWF model, which is thus omitted here). 465 We also analyzed the MERRA-2 dust product (Gelaro et al., 2017) in order to compare the inverse model's representation of the global dust cycle against a leading aerosol reanalysis product. We obtained the same variables from the MERRA-2 data as from the AeroCom data, except that we analyzed the MERRA-2 data for the years 2004-2008 to coincide with the regional DAOD constraints (Table 2).
We quantified the agreement of the various models against measurements using Taylor diagrams (Taylor, 2001) and 470 by the correlation coefficients, bias, and root-mean-squared errors (RMSE). Because the surface concentration and deposition flux measurements span several orders of magnitude, their RMSEs are calculated in log-space. We furthermore quantified overall model agreement against measurements by calculating the normalized error Φ against the available data for each hemisphere: where n and m index the different models, which include the inverse model, MERRA-2, the 6 model ensemble 475 members, and the 13 AeroCom models, such that Nmodel = 21. Further, S denotes the RMSE of a model simulation with the DAOD (subscript ), surface concentration (subscript conc), and deposition flux (subscript dep) data sets on the annual timescale. These data are split into data sets for the Northern Hemisphere (superscript NH) and Southern Hemisphere (superscript SH). For the SH, there are no accurate observational constraints on DAOD available (see Sect. 2.3) so we calculate the error relative to only the surface concentration and deposition flux data sets. Note that 480 Φ is defined such that Φ = 1 implies that a model is average among the 21 models in reproducing the global dust cycle. The lower Φ is, the more accurately it reproduces measurements and observations of the various aspects of the global dust cycle.

Results
We first evaluate our methodology by verifying that the inverse model obtains improved agreement against the 485 observed regional DAOD used in the inverse model (Sect. 4.1). We then obtain the predictions of the inverse model for the main properties of the global dust cycle, namely DAOD, dust emission, dust column loading, dust surface concentration, and dust deposition flux (Sect. 4.2). Subsequently, we evaluate whether the integration of observational constraints on dust properties and abundance indeed yields an improved representation of the global dust cycle by comparing our results against independent measurements and observations in the NH (Sect. 4.3.1) and 490 the SH (Sect. 4.3.2).

Evaluation of inverse model results against observed regional DAOD
To verify the viability of our methodology, we first compare the inverse model's DAOD against the observationally constrained seasonal DAOD of 15 regions (  Fig. 2a), and which shows a decrease in the RMSE of a factor of approximately three to five, depending on the season. Note that the DAOD in the mid-Atlantic region is nonetheless systematically underestimated by both the models in our ensemble and the inverse model. This is a common problem in models that is likely in part due to too fast removal in models (Ridley et al., 2012;Yu et al., 500 2019). The RMSE over the relatively minor dust source regions of North America, Australia, South America, and Southern Africa is similarly reduced by about a factor of five. For the East Asia and Middle East & Central Asia regions, the decrease in RMSE is about a factor of one-and-a-half to two. This relatively smaller decrease in the RMSE likely occurs because we used only one source region each for both these relatively extensive source regions. Consequently, our procedure is unable to eliminate some biases of the model ensemble in these regions, such as an 505 underestimation of DAOD in the Thar desert, which could be due to model underestimations of emissions in this region (Shindell et al., 2013). Future work could thus improve upon our results by using more source regions to better constrain the contributions of the Middle East and Asian source regions to the global dust cycle.
Overall, our procedure achieves a substantial reduction of the total DAOD error summed over the fifteen regions, reducing the RMSE by over a factor of two, from 0.092 to 0.041. This reduction in error is expected, as our 510 methodology minimized the error against these regional DAOD data. Moreover, we find that the reduced chi squared statistic, which is of order 1 for a model that captures observations within the uncertainties (Bevington and Robinson, 2003), is indeed less than 1 for all seasons except boreal Spring. This implies that our methodology results in good agreement with the observational DAOD constraints. Further, the ability of the inverse model to reproduce the spatial pattern of DAOD is also substantially improved relative to both the six models in the model 515 ensemble and the AeroCom Phase I models (Figs. 3e, f), and is similar to that of the MERRA-2 dust product, which assimilates satellite and ground-based AOD observations (Buchard et al., 2017;Gelaro et al., 2017;Randles et al., 2017).
https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License. in the model ensemble (brown numbers, as for panels a-d), the six improved model results (green numbers with a prime), and the inverse model results (blue star). The horizontal axis shows the standard deviation of the data set or model prediction, the curved axis shows the correlation, and the grey half-circles denote the centered root-mean-squared difference between the observations and the model predictions. As such, the distance between a model and the observations is a measure of the model's ability to reproduce the spatiotemporal variability in the observations; Taylor diagrams do not capture biases between model predictions 540 and observations. (f) Same as panel (b), except showing a comparison against the annually averaged regional DAOD constraints.

Inverse modeling results of key aspects of the global dust cycle
We present inverse model results for the dust emission rate, DAOD, column-integrated dust loading, dust surface concentration, and dust deposition flux (Table 3, Fig. 4) and compare these inverse model results against independent measurements in Sect. 4.3. We also provide median estimates with uncertainty of the main size-545 resolved properties of the global dust cycle (Fig. 5).
https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License. Our results indicate that the global emission rate and loading of dust with geometric diameter D ≤ 20 µm (PM20) are 555 larger than most models account for. AeroCom models reported an ensemble median global dust emission rate of 1.6 × 10 3 Tg/year (one standard error range: 1.0-3.2 × 10 3 Tg/year) and CMIP5 models reported a value of 2.7 (1.7-3.7) × 10 3 Tg/year; both these ensembles included a mix of models simulating dust up to diameters of 10 µm or more (see Fig. S7). Our results indicate that the global emission rate of PM20 dust is 4.6 (3.4-9.1) × 10 3 Tg/year. This larger global dust emission rate is primarily due to two reasons. First, our methodology accounts for dust up to a 560 geometric diameter of 20 µm, which is a larger size range than accounted for in many AeroCom and CMIP5 models (Huneeus et al., 2011;Wu et al., 2020;Fig. S7) and thus results in a larger bulk dust emission flux. Accounting for this larger size range is desirable because observations indicate that ~30% of PM20 dust loading consists of coarse dust with 10 ≤ D ≤ 20 µm (Ryder et al., 2019;Fig. 5b). Because this coarse dust has a shorter lifetime [1.0 (0.4-1.8) days; Fig. 5e] than finer dust, we find that dust with 10 ≤ D ≤ 20 µm accounts for 565 ~65% of the total PM20 dust emission flux, which corresponds to 2.9 (1.8-6.5) × 10 3 Tg/year (Fig. 5a). This ~65% relative contribution of the 10 ≤ D ≤ 20 µm size range is substantially larger than that inferred from size-resolved measurements of the emitted dust flux (Huang and Kok, 2018). In order to match in situ atmospheric dust size distributions, current models thus need to emit more coarse dust than determined from emitted dust flux measurements, which further supports the inference from multiple previous investigations that coarse dust deposits 570 too quickly in atmospheric models (Maring et al., 2003;Ansmann et al., 2017;Weinzierl et al., 2017;van der Does et al., 2018). Although the small MEE of coarse dust [0.13 (0.12-0.15) m 2 /g; Fig. 5f] causes it to account for only a https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License.
small fraction [7.2 (5.7-9.3) %; Fig. 5d] of mid-visible (550 nm) DAOD, dust with 10 ≤ D ≤ 20 µm is nonetheless radiatively important because it accounts for much larger fractions of dust absorption of shortwave radiation and extinction of longwave radiation (Tegen and Lacis, 1996;Samset et al., 2018;Ryder et al., 2019). 575 The second reason that PM20 emission fluxes are larger than accounted for in most models is that observations have shown that many models have a bias towards fine dust (Kok, 2011b;Ansmann et al., 2017;. Indeed, models that do include dust up to 20 µm geometric diameter tend to underestimate the global PM20 dust emission rate relative to our results (Fig. S7). Because coarse dust has a shorter lifetime and a lower MEE (Figs. 5e, f), correcting this fine-dust bias requires a substantially larger total emission flux to match DAOD constraints. Many 580 of the models in our ensemble partially addressed the fine bias by using the brittle fragmentation theory parameterization for the emitted dust flux, which is substantially coarser than other emitted dust size distributions (Kok, 2011b). This causes our model ensemble to show a larger emission flux (3.5 (2.7-5.2) × 10 3 Tg/year) than AeroCom models (1.6 (1.0-3.2) × 10 3 Tg/year), although this increase is also due to these more recent models simulating dust out to larger particle diameters (Fig. S7). More recent work has used dozens of in situ measurements 585 to show that the fine-dust bias in models is even more substantial than previously reported, and specifically that the atmospheric loading of coarse dust with D > 5 µm is several times greater than accounted for in most models . Generating this even greater loading of coarse dust thus requires a correspondingly larger emission flux (Table 3; Fig. 5a). Emission fluxes would be even larger if the maximum size range was extended further to include dust with D > 20 µm, which measurements indicate is abundant close to source regions and might 590 be important for interactions with longwave radiation (Ryder et al., 2013;Ryder et al., 2019). As previously reported by , accounting for the substantial atmospheric loading of coarse dust with 5 ≤ D ≤ 20 µm also drives a larger total dust loading, increasing from 20 (12 -24) Tg obtained by AeroCom models and 17 (14-36) Tg obtained by CMIP5 models to 26 (22-30) Tg obtained here (Table 3). Since models indicate that the atmospheric loading of non-dust aerosols is around 10 Tg (Textor et al., 2006), dust is likely by far the most dominant aerosol 595 species by mass, accounting for approximately three quarters of the atmosphere's total particulate matter loading.
The constraints on the global dust cycle obtained here are strongest on the DAOD, because our inverse model minimizes error with respect to observed regional DAOD (Sect. 4.1). The inverse model then relies on observational constraints on the globally averaged dust size distribution and extinction efficiency to link the DAOD to loading per source region (Sections 2.2 and 2.3), which adds further uncertainty to our inverse model results. Constraints on dust 600 emission and deposition fluxes are still more uncertain, because these further depend on results from the ensemble of models, such as the spatial pattern of emission within individual source regions, transport, and the size-resolved dust lifetime. The lifetime of coarse dust shows especially large variability between models, which adds substantially to the uncertainty in PM20 emission and deposition fluxes because coarse dust dominates these fluxes (Figs. 5a, b). Consequently, the relative uncertainties in global emission and deposition fluxes are several times 605 larger than the relative uncertainty in DAOD (Table 3).

Performance of inverse model results against independent measurements
After obtaining inverse model results for key aspects of the global dust cycle, we next evaluate the accuracy of this representation of the global dust cycle using independent measurements of dust surface concentration and dust deposition fluxes (see Sect. 3.1). We divide these results into comparisons for the NH (Sect. 4.3.1) and the SH (Sect. 630 4.3.2). We do this because we have observationally informed constraints on DAOD for eleven NH regions, and therefore expect the inverse model results to show relatively good agreement against independent measurements in the NH. In contrast, we do not have observationally constrained DAOD for the SH; instead, the inverse model used an ensemble of reanalysis products, whose ensemble members might have similar biases as they assimilate similar remote sensing data sets. As such, we expect the inverse model results to show less agreement against independent 635 measurements in the SH.

Performance of the inverse model results against independent measurements in the Northern Hemisphere
The inverse model results accurately reproduce the seasonal variation in surface dust at individual sites in the NH, capturing all the measurements within the uncertainties (Fig. 6). The inverse model results show an average 640 correlation coefficient of r = 0.90 with the seasonally averaged measurements at the different sites, which exceeds the average correlation coefficient of models in our ensemble (r = 0.85), in the AeroCom ensemble (r = 0.61), and the MERRA-2 dust product (r = 0.86). The inverse model results also accurately reproduce the spatial variation in dust surface concentration among different locations, as shown by scatter plots comparing predicted and observed surface concentrations on seasonal (Fig. 7a) and annual (Fig. 7b) timescales. This strong agreement between the 645 inverse model results and dust surface concentration is a notable improvement over any of the six models in our model ensemble, any of the 13 AeroCom Phase 1 models, and the MERRA-2 dust product. The strong performance of the inverse model is due to its improved ability to capture spatial variability in seasonal and annual dust concentration, as quantified by Taylor diagrams in Figures 7d and 7e, and because the inverse model results show almost no bias against seasonally and annually averaged concentration measurements (Figs. 8a, b). This lack of bias 650 in capturing the mean dust aerosol state also represents a substantial improvement over models, which show biases of up to approximately ±0.3 in logarithmic space, corresponding to a bias of up to a factor of ~2 in linear space. The inverse model's reduction in bias and improved representation of spatiotemporal variability of dust surface concentration combine to produce RMSEs (in log space) of only ~0.22 (~65% relative error) against seasonally averaged and ~0.12 (~30% relative error) against annually averaged dust surface concentration measurements (Figs.  655 8c, d). Compared to individual models and MERRA-2, this represents a reduction by a factor of ~1.5-5 in error in log space and a reduction by a factor of ~2-10 in relative error.
https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License.  Fig. 8c) and on average higher correlation coefficients than MERRA-2 (red line 665 and diamonds), models in the AeroCom ensemble (black dotted lines and letters), and (unmodified) models in our ensemble (brown dashed lines and numbers). Also shown are the mean correlation coefficients between measurements and the different AeroCom models (rAeroCom) and the different models in our ensemble (rmodels), and the correlation coefficients for MERRA-2 (rR) and the inverse model results (rIM). Uncertainty ranges on measurements and the inverse model results represent one standard error on the climatological seasonally averaged surface concentration. The legend for individual models is given in Fig. 3, and x-670 values are offset slightly for clarity.
We find that the inverse model results also show good agreement against the compilation of NH deposition flux measurements (Fig. 7c). The scatter between measurements and model predictions of deposition fluxes is about an order of magnitude larger than for the comparison against surface concentration measurements. This is likely partially driven by substantial model errors in deposition (Ginoux, 2003;Huneeus et al., 2011;Yu et al., 2019;675 Huang et al., 2020), although the large spread between measurements in similar locations (Figs. 4d, 7c) suggests that experimental (e.g., Edwards and Sedwick, 2001) and representation errors (Schutgens et al., 2017) are also substantial contributors to the larger disagreement between models and measurements. Nonetheless, the inverse model results are in relatively good agreement with deposition measurements (Fig. 7c) and reproduces the spatial pattern of deposition flux better than most models (Fig. 7f). Additionally, whereas models in our ensemble and the 680 AeroCom models show biases against deposition flux measurements of up to approximately ±0.5 in logarithmic space, which corresponds to a bias of up to a factor of ~3 in linear space, the inverse model results show a bias close to zero (Fig. 8a, b). Overall, the inverse model results show an RMSE of ~0.58, which matches that of the best performing models, and is less by ~5-25% relative to other models.  Fig. 6, and the typical relative uncertainty on the deposition fluxes is 50%. Also shown are Taylor diagrams summarizing the statistics of the ability of the 695 different models to reproduce the spatial variability in the measured fields of (d) seasonal and (e) annual surface concentration, and (f) dust deposition flux (Taylor diagrams do not capture biases between model predictions and observations). The different symbols represent the measurements (purple triangle), the 13 AeroCom models (black letters), MERRA-2 (red "R"), the six models in the model ensemble (brown numbers), the six improved models (green numbers with prime), and the inverse model results (large blue star). An exact legend for the different models is provided in Fig. 3.

700
We further explore the merit of our inverse modeling approach by analyzing the "improved model" results (Sect. 2.5), which represent output from each of the individual model ensemble members that was corrected using observational constraints on dust properties and abundance (Sect. 2). For each of the six ensemble members we find that the inverse modeling procedure reduces errors against both NH dust surface concentration and deposition flux measurements, with reductions ranging from a few percent to well over a factor of two (Figs. 8c,d). As with the 705 inverse model results, for most models this is due to both an improvement in the representation of the spatiotemporal variability of dust surface concentration and deposition flux (Figs. 7d-f) and a reduction in the bias against both sets of measurements (Figs. 8a, b).
The comparison against independent measurements thus indicates that the inverse model results represent the NH dust cycle more accurately than both MERRA-2 and a large number of climate and chemical transport models. This 710 is quantified in Fig. 8e, which shows the normalized model error for the various models and model ensembles. We find that the inverse model results show a normalized error of 0.49, which is well below that of the mean of models in our ensemble (1.08) and the AeroCom ensemble (1.22), and also below the MERRA-2 normalized error (0.62). Moreover, we find that the average normalized error of improved models is substantially less (0.72) than for the unmodified models in our ensemble. These results indicate that our approach of integrating observational constraints 715 on dust properties and abundance is effective in improving model accuracy.
https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License. (a)-(d) denote results for the individual models in the AeroCom ensemble (black letters), MERRA-2 (red "R"), models in our ensemble (brown numbers), and the improved models (green italicized numbers). The exact legend for the different models is 725 given in Figure 3, and stars denote the mean bias and RMSE for models in our ensemble (brown star), the improved models (green star), and the inverse model results (blue star). Panel (e) shows normalized model errors relative to the DAOD (purple bars), surface concentration (green bars), and deposition flux (orange bars) data sets. Shown are results for the inverse model; the average of models in the AeroCom ensemble, our model ensemble, and our ensemble of improved models; MERRA-2; and for the individual models in our ensemble before and after applying observational constraints (see Sect. 2.5). Hatched bars denote 730 results of the inverse model and improved models obtained through our methodology. The reductions in bias, RMSE, and normalized error for the inverse model and improved models relative to the individual models and MERRA-2 imply that the inverse modeling procedure improves the representation of the NH dust cycle.

Performance of inverse model results against independent measurements in the Southern Hemisphere 735
After analyzing the performance of the inverse model results in the Northern Hemisphere, we next analyze the performance of the inverse model results in the Southern Hemisphere. We expect less agreement against independent measurements than in the NH because the SH DAOD constraints are of substantially lower quality since the Ridley et al. (2016) approach does not yield accurate constraints there (see Sect. 2.3).
The agreement of the inverse model results against independent data in the SH varies substantially between stations 740 and regions. The inverse model has difficulty reproducing the seasonality in surface concentration at many SH stations (Fig. 9), which could indicate that long-range transport is not well captured as most stations are remote from the main dust source regions (Fig. 2c)

755
Hemisphere stations. Shown are measurements (orange line and circles) and results from models in the AeroCom ensemble (black dotted lines and symbols) and our ensemble (brown dashed lines and symbols), and results from MERRA-2 (red line and diamonds) and the inverse model (blue line and squares). Also shown are the mean correlation coefficients between measurements and the different AeroCom models (rAeroCom) and the different models in our ensemble (rmodels), and the correlation coefficients for MERRA-2 (rR) and the inverse model results (rIM). Uncertainty ranges on the inverse model results and 760 measurements represent one standard error on the climatological seasonally averaged surface concentration. The legend for individual models is given in Fig. 3, and x-values are offset slightly for clarity.
This underestimation of dust surface concentration but overestimation of deposition fluxes in Antarctica is puzzling (Figs. 10a-c). Indeed, many individual models show similar results (Figs. S11-S13; also see Huneeus et al. (2011) and Wu et al. (2020)). One possible explanation is large model errors in the conversion of dust concentrations to 765 deposition fluxes, which is known to be one of the most uncertain aspects of global dust cycle simulations (Huneeus et al., 2011). This is particularly the case for regions dominated by wet deposition, which is a challenge for models to simulate accurately, in part because it depends on modeled precipitation, which itself can have large uncertainties (Huneeus et al., 2011;Mahowald et al., 2011a). Additionally, the inverse model and most individual models do not include high latitude dust emissions, which could cause additional errors for comparisons against measurements in 770 Antarctica (Bullard et al., 2016). Another possibility is that measurements do not accurately represent either the dust surface concentration or the deposition fluxes. In particular, all but one of the Antarctic dust fluxes are derived from measurements of total-dissolvable iron in snow and ice, for which the conversion to the deposited dust flux involves many uncertainties (Edwards and Sedwick, 2001;Mahowald et al., 2009), and it is possible that this methodology systematically underestimates dust deposition fluxes (Huneeus et al., 2011). Another factor that could cause 775 disagreement between the inverse model results and measurements might be a mismatch in timescales. The inverse model results characterize the dust cycle for the years 2004-2008, whereas the concentration data were taken for different dates in the period 1981-2000 (Prospero et al., 1989;Arimoto et al., 1995) and the deposition flux measurements were taken one to several decades earlier (Edwards et al., 2006;McConnell et al., 2007). This mismatch in time periods could cause modeled deposition fluxes to exceed measured fluxes as several studies have 780 reported increases in dust emissions from South America and in dust deposition at Antarctica over the past century or so (McConnell et al., 2007;Gasso and Torres, 2019;Laluraj et al., 2020). Furthermore, there is substantial interannual variability in dust concentration that could affect the mismatch in time between models and measurements, especially for less dusty regions such as in the SH (Smith et al., 2017). Comparisons against measurements in previous studies have suffered from similar mismatches in time periods (Huneeus et al., 2011;785 Albani et al., 2014;Colarco et al., 2014;Kok et al., 2014a).
The ability of the inverse model to reproduce the spatial distribution of surface concentration and deposition measurements is thus less good in the SH than in the NH. However, despite the decreased agreement against independent measurements, the inverse model performs better than most of the individual models in our ensemble and in the AeroCom ensemble (Figs. 9, 10d-f, 11). The inverse model, the individual models, and the MERRA-2 790 results all show biases against SH surface concentration and deposition flux measurements that are substantially larger than the biases against NH measurements (Fig. 11a, b). Interestingly, the different models show a positive correlation between bias against surface concentration data and bias against deposition flux measurements, with both biases being negative for twelve of the models. This indicates that systematic under-or overestimation of SH dust are key contributors to errors against measurements, with additional errors due to difficulties in reproducing the 795 spatial pattern of dust surface concentration and deposition fluxes (Fig. 10d-f). Consequently, almost all models show a substantially larger root mean-squared error relative to measurements for the SH than for the NH (Figs. 11c,  d).
These results indicate substantial model errors in the magnitude and spatial pattern of SH dust emissions, dust transport, and/or dust deposition, and underscore the difficulties models have in capturing the SH dust cycle. different symbols represent the measurements (purple triangle), the 13 AeroCom models (black letters), MERRA-2 (red "R"), the six models in the model ensemble (brown numbers), the six improved models (green numbers with a prime), and the inverse model results (large blue star). An exact legend for the different models is provided in Fig. 3.
Overall, the integration of observational constraints on dust properties and abundance seems to produce a modest improvement in the representation of the SH dust cycle. This is quantified in Fig. 11e, which shows that the 815 normalized model error of the inverse model results is 0.78, which is below that of the mean of models in our model ensemble (0.92) and the AeroCom ensemble (1.06), and below the normalized error of the MERRA-2 dust product (0.81). However, whereas the "improved model" results show clear reductions in bias, RMSE, and normalized error in the NH, they show no clear improvements in the SH (Fig. 11).
https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License. Figure 11. Evaluation of whether integrating observational constraints on dust properties and abundance produces an improved representation of the Southern Hemisphere dust cycle. Shown are the biases (top panels) and root-mean-squared errors (RMSEs; middle panels) in logarithmic space with respect to measurements of (a and c) seasonal dust surface 825 concentration and dust deposition flux and of (b and d) annual dust surface concentration and deposition flux. Symbols in panels (a)-(d) denote results for the individual models in the AeroCom ensemble (black letters), MERRA-2 (red "R"), models in our https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License. ensemble (brown numbers), and the improved models (green italicized numbers). The exact legend for the different models is given in Figure 3, and stars denote the mean bias and RMSE for models in our ensemble (brown star), the improved models (green star), and the inverse model results (blue star). Panel (e) shows normalized model errors relative to the surface 830 concentration (green bars) and deposition flux (orange bars) data sets. Shown are results for the inverse model; the average of models in the AeroCom ensemble, our model ensemble, and our ensemble of improved models; MERRA-2; and for the individual models in our ensemble before and after applying observational constraints (see Sect. 2.5). Hatched bars denote results of the inverse model and improved models obtained through our methodology.

Discussion 835
Our results show that our framework for integrating observational constraints on dust properties and abundance yields an improved representation of the global dust cycle. Relative to the model ensemble, the inverse model results show a reduction of errors against NH dust cycle measurements of over a factor of two (Fig. 8e) and modest improvements for the SH (Fig. 11e). Moreover, we have obtained a data set of the global dust cycle that is more accurate than a large number of model simulations and the MERRA-2 dust product and that is resolved by particle 840 size and season.
Below, we first discuss the main limitations of our methodology and results (Sect. 5.1). We then discuss how our results can be used to guide improvements in the representation of the global dust cycle in climate and chemical transport models (Sect. 5.2), after which we discuss the utility of the data set presented here in constraining dust impacts on the Earth system (Sect. 5.3). 845

Limitations of the methodology
Our results are subject to a few important limitations. First, although our methodology integrates observational constraints, it still relies on global model simulations to compute a number of key variables, including the spatial pattern and timing of dust emissions within each source region, the vertical distribution of dust, and the deposition flux of dust. All three of these processes are known to be subject to important model errors (e.g., Ginoux, 2003;850 Huneeus et al., 2011;Kim et al., 2014;Kok et al., 2014a;Evan, 2018). As discussed in Sect. 1, accurately simulating the magnitude and spatiotemporal variability of dust emissions represents a fundamental challenge for models. To mitigate this problem, many models prescribe dust sources using a variety of approaches that aim to identify prolific sources where geomorphologic processes concentrate soil particles as a result of fluvial erosion (Ginoux et al., 2001;Prospero et al., 2002;Tegen et al., 2002;Zender et al., 2003;Koven and Fung, 2008). However, these 855 representations are highly uncertain, as indicated by different approaches producing large differences in the spatial patterns of emissions (Cakmur et al., 2006;Kok et al., 2014a;Wu et al., 2020). In addition to these challenges with simulating dust emissions, many models also underestimate the height at which dust is transported (Yu et al., 2010;Johnson et al., 2012;Kim et al., 2014). Furthermore, excessive diffusion of coarse dust due to numerical sedimentation schemes causes additional problems in many models (Ginoux, 2003;Eastham and Jacob, 2017;860 Zhuang et al., 2018), and might be partially responsible for a general underestimation of long-range transport of coarse dust relative to measurements and satellite observations (Maring et al., 2003;Ridley et al., 2014;Ansmann et al., 2017;Gasteiger et al., 2017;van der Does et al., 2018;Yu et al., 2019). Because of these various uncertainties in model representations of dust processes, our constraints on dust AOD and loading are strongest, and constraints on dust emission, deposition, and 3D concentration have greater uncertainty (Table 3). Furthermore, although 865 uncertainties in the products obtained here include the error due to the spread in the results of the models in our ensemble, they do not account for systematic biases between the model ensemble and the real world, which might be substantial in light of the problems in model simulations highlighted above. In addition, some of the other inputs to our methodology, such as the globally averaged dust size distribution , would also be affected by possible biases in model results, such as in deposition. 870 A second limitation of our methodology is that the quality of the inverse model depends on the accuracy of the observational constraints on the globally averaged dust size distribution , extinction efficiency , and the regional DAOD constraints obtained in Ridley et al. (2016) and . As such, the results presented here are subject to the limitations of those studies. These limitations are described in detail in the corresponding papers and include possible biases due to errors in the dust extinction 875 efficiency due to the assumed tri-axial ellipsoid shape being an imperfect approximation to the highly heterogeneous shape and roughness of real dust particles (Lindqvist et al., 2014;Kok et al., 2017), errors in the remotely sensed https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License. optical depth retrieval algorithms for aspherical dust particles (Hsu et al., 2004;Kalashnikova et al., 2005;Dubovik et al., 2006), errors in the cloud-screening algorithms used in satellite and ground-based remote sensing products, errors due to a scarcity of AERONET "ground-truth" data in dust-dominated regions, and systematic differences 880 between clear-sky and all-sky AOD, although studies indicate such a systematic difference is small for dusty regions (Kim et al., 2014;Ridley et al., 2016;. The uncertainty due to many (not all) of these errors have been quantified in the relevant papers, and these errors have thus been propagated into the results in the present study. An additional key limitation is that the Ridley et al. (2016) DAOD constraint uses model simulations of the AOD due to other aerosol species to separate dust AOD from non-dust AOD in dusty regions. As such, consistent 885 biases in model simulations of non-dust AOD would have affected the inferred dust AOD. For instance, a systematic underestimation of biomass burning AOD across models van der Werf et al., 2017) would cause the underestimated biomass burning AOD to instead be assigned to dust, thereby causing an overestimate of dust AOD. This source of error might be particularly important in regions with substantial non-dust aerosol loadings, such as in much of Asia and in the Sahel during the biomass burning season (Yu et al., 2019). 890 Furthermore, the regional DAOD constraints from  for the lesser source regions of Australia, North America, South America, and South Africa are based on an ensemble of aerosol reanalysis products. These products use remotely sensed AOD and partly rely on prognostic aerosol models to partition this AOD to the different aerosol species (e.g., Randles et al., 2017). Considering the large uncertainties in dust models (Huneeus et al., 2011;Wu et al., 2020), these products could thus be substantially biased in regions for which dust does not 895 dominate AOD.
Another limitation of our results is that the representation of the modern-day global dust cycle is based mostly on model data and regional DAOD constraints for the period 2004-2008. As such, changes in the dust cycle before or after that period are not reflected in our results. For instance, satellite measurements have shown an increase in dust loading in the Middle East Kumar et al., 2019). Further, we assume that dust contributes to loading 900 and deposition in the same season that it is emitted, which is not always true and could generate small inconsistencies. We also use observational constraints on DAOD only at the mid-visible (550 nm), which is most sensitive to dust with a diameter between ~1-5 µm (Fig. 5d). Dust particles outside of this size range are thus partially constrained from correcting model simulations to match the globally averaged dust size distribution inferred in  and might thus have larger errors that dust with diameters around 1-5 µm.

905
Another important limitation is that many of the models in our ensemble do not explicitly account for anthropogenic (e.g., land use) sources of dust emission, which might account for ~10-25% of present-climate dust emissions (Tegen et al., 2004;Ginoux et al., 2012). However, the observationally constrained DAOD used here to scale dust emissions and loading does not distinguish between natural and anthropogenic dust and thus inherently includes both. Nonetheless, the omission of land use dust emissions from many of the models in our ensemble could produce 910 important errors close to anthropogenic dust sources, which might account for a substantial fraction of total emissions in Asia, Australia, Southern Africa, and the Americas (Ginoux et al., 2012).
Finally, the conclusion that our methodology yields an improved representation of the global dust cycle depends on the quality of the independent data used to evaluate the inverse model results. However, these data have a few limitations. First, some of the measurements might have large, unquantified errors. This appears to be the case 915 especially for deposition flux measurements, which show a much larger spread than surface concentration measurements, even for proximal locations. Second, the concentration and deposition data used to evaluate the inverse model results do not coincide in time with the simulations, which could affect the comparisons (see Sect. 3 and further discussions in, e.g., Huneeus et al., 2011). Finally, some aspects of our representation of the global dust cycle were not explicitly tested against measurements. Future work could further investigate the accuracy of the 920 inverse model results through comparisons against additional data, such as visibility data Shao et al., 2013), dust vertical profile data (Yu et al., 2010;Kim et al., 2014), and remote sensing retrievals of the Angstrom exponent (Huneeus et al., 2011).

Improving model representations of the global dust cycle
The results in Figs. 6-8 show that our methodology of integrating observational constraints on dust properties and 925 abundance reduces model errors in simulating the global dust cycle. This finding is particularly clear from the results of the six "improved models". Each of these models shows a substantial reduction of model error against measurements and observations of the NH dust cycle , with the average reduction of the errors in https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License.
"improved models" equaling ~35% (Fig. 8e). These findings suggest several ways in which the representation of the global dust cycle can be improved in global and regional models. 930 First, our results indicate that it is critical for models to account for the substantial asphericity of dust aerosols (Okada et al., 2001;Huang et al., 2020). Dust asphericity enhances the MEE by ~40% because aspherical dust particles extinguish more radiation than volume-equivalent spherical particles (Potenza et al., 2016;Kok et al., 2017). As such, not accounting for dust asphericity causes an overestimation of the dust loading needed to match DAOD constraints by ~40%, and thus can produce a corresponding bias against concentration and deposition flux 935 measurements. This is illustrated by the MERRA-2 results, which are in good agreement with DAOD constraints (Figs. 7d, 7e, 8e), but overestimate NH dust deposition flux measurements by ~25% and surface concentration measurements by ~50% (Figs. 8a,b and Figs. S9 and S10). MERRA-2 uses dust optics from Colarco et al. (2014) based on spheroids, which underestimate dust asphericity (Huang et al., 2020) and yielded a ~25% enhancement of dust extinction. Accounting for the full extinction enhancement of ~40% due to dust asphericity would thus reduce 940 the biases of the MERRA-2 dust product against surface concentration and deposition flux measurements. Since most current models either do not account for dust asphericity or substantially underestimate its effect on extinction efficiency (Huang et al., 2020) we recommend that models account for the full enhancement of extinction by dust asphericity, for instance by implementing the constraints on the extinction efficiency of aspherical dust from Kok et al. (2017). 945 Second, models can be improved by correcting the current substantial underestimation of coarse dust loading. In this study, we integrated a joint observational-modeling constraint on the globally averaged dust size distribution in order to account for the ~17 Tg of coarse dust (D > 5 µm) that observations indicate is present in the atmosphere (Ryder et al., 2019;. The finding that our methodology almost eliminates bias against NH measurements  suggests that this constraint on the globally averaged dust size distribution is relatively 950 accurate. This further confirms the conclusion from several studies that models substantially underestimate coarse dust loading van der Does et al., 2018;Ryder et al., 2019;. Models can thus be improved by eliminating the current underestimation of coarse dust. This could be done either by similarly applying the constraints on the globally averaged size distribution  or, preferably, by improving the relevant model physics. Specifically, recent studies indicate that the underestimation of coarse dust 955 is due to both an underestimation of the emission of coarse dust (Huang and Kok, 2018) and an underestimation of the lifetime of the emitted coarse dust (Maring et al., 2003;Weinzierl et al., 2017). Measurements of the emitted dust size distribution show a much larger flux of dust with D ≥ 5 µm than current parameterizations, including brittle fragmentation theory (Kok, 2011b, a), account for (Huang and Kok, 2018). We find that, in order for models to generate the observationally constrained ~17 Tg of coarse dust loading, a fractional contribution of emitted dust with 960 5 ≤ D ≤ 20 µm is needed that is even larger than found by measurements of emitted dust size distributions ( Fig. 5a; Sow et al., 2009;Rosenberg et al., 2014;Huang and Kok, 2018;Huang et al., 2019). This finding further supports the inference from several lines of evidence that models underestimate the lifetime of coarse dust (Maring et al., 2003;Weinzierl et al., 2017;van der Does et al., 2018). As such, models require improved parameterizations of both the emitted dust size distribution and dry deposition processes to properly account for the abundance of coarse dust 965 in our atmosphere. Improved parameterizations of the emitted dust size distribution that better account for the large contribution of coarse dust are under development (Huang and Kok, 2018). To improve size-resolved dry deposition, we recommend that models account for the ~20% slowing of the gravitational settling speed due to dust asphericity (Huang et al., 2020). Further improvements in dust deposition parameterizations are likely needed, including accounting for possible effects of electrification and turbulence in dust layers on gravitational settling 970 (Ulanowski et al., 2007;Gasteiger et al., 2017;van der Does et al., 2018).
Finally, our results indicate that model accuracy can be substantially improved by correcting biases in the dust loading generated by each main source region (Figs. 3, 8e). These biases could be reduced in two ways. First, models could emulate the procedure developed here and scale emission of dust from each region to match the observed regional DAOD obtained in Ridley et al. (2016). A second approach would be to scale the simulated (size-975 resolved) emissions or loading per source region and season to that obtained in our companion paper (to be submitted soon). These improvements would be most effective for simulations of the present-day dust cycle by regional and global models, as well as short range, medium range and seasonal forecasts of dustiness by numerical weather models. Ultimately, parameterizations of dust emission should be improved to eliminate the need for adjustment of model simulations in this manner. This is critical because without identifying and correcting the 980 problematic model physics, we cannot know how these processes change with climate, for example under global warming or over glacial cycles. Together with uncertainties due to future land use changes, this problem limits the https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License.
ability of models to predict future changes in the global dust cycle and its effect on climate and the Earth system (Evan et al., 2016;Kok et al., 2018).
Although we found that the integration of observational constraints on dust properties and abundance is effective in 985 reducing model errors in the representation of the NH dust cycle, we found only slight improvements for the SH dust cycle (Fig. 11e). There are two likely reasons for this finding. First, whereas the inverse model is informed by accurate observational constraints on regional DAOD in the NH, such constraints are less accurate for the less dusty SH . And second, the dust cycle simulations used in our ensemble are less accurate for the SH dust cycle than for the NH dust cycle, as indicated by substantially larger root mean-squared errors relative to 990 measurements for the SH (Figs. 11c, d) than for the NH (Figs. 8c, d). These larger model errors for the SH likely occur because a large fraction of SH dust emission originates from regions containing sparse vegetation (Ito and Kok, 2017), the effects of which on dust emission is difficult for models to represent accurately (King et al., 2005;Okin, 2008). Additionally, there are less data available in the SH from ground-based measurements such as dust surface concentration measurements. And whereas many measurements close to dust source regions are available for 995 the NH, most measurements for the SH are at sites remote from the main dust source regions (Figs. 2c,d), where they are less effective at constraining the main features of the SH dust cycle. There are also fewer satellite retrievals available to constrain simulations of the SH dust cycle. For instance, dust sources such as Patagonia are shrouded by clouds for a larger fraction of the year than for most NH sources (Ginoux et al., 2012), which limits constraints on dust emissions and DAOD from satellite retrievals (Gasso and Stein, 2007). Additionally, the errors in satellite 1000 retrievals tend to be larger for the SH than for the NH because the relative error decreases with AOD Remer et al., 2005). Considering the important role that the SH dust cycle plays in biogeochemistry, the carbon cycle, and the climate system (Lambert et al., 2008;Hamilton et al., 2020), our results underscore a critical need for more observations to constrain the SH dust cycle.

Utility of DustCOMM data set in understanding the role of dust in the Earth system 1005
In addition to identifying mechanisms to improve individual model simulations, this study obtained an improved representation of the global dust cycle that can be used to improve our understanding and quantification of the role of dust in the Earth system. This addition to the DustCOMM data set  contains dust loading, DAOD, (surface) concentration, and (wet and dry) deposition flux fields that are resolved by space, particle size, and season (data are available at https://dustcomm.atmos.ucla.edu/data/K21/). Our results in Sect. 4.3 indicate that this 1010 data set is more accurate than both a large number of climate and chemical transport model simulations and the MERRA-2 dust product. Moreover, whereas MERRA-2 is internally inconsistent because dust loading is adjusted after emission by assimilating AOD measurements (Randles et al., 2017;Wu et al., 2020), our method for integrating observational constraints yields a self-consistent representation of the global dust cycle. Our companion article (to be submitted soon) will supplement this data set by partitioning all these fields by the originating source 1015 region. This data set representing the seasonally resolved and size-resolved global dust cycle can be used to more accurately quantify dust impacts on the Earth system, such as on climate, weather, the hydrological cycle, biogeochemistry, and human health.
Our data set of an improved representation of the global dust cycle has an additional strength that amplifies its use: our data set quantifies and propagates a range of observational and modeling uncertainties (see Sect. 2.5). This 1020 allows for the propagation of uncertainty into dust impacts constrained using our data set, such as in the quantification of direct radiative effects and indirect cloud and biogeochemistry effects (Mahowald, 2011). With a few exceptions Regayre et al., 2018;Di Biagio et al., 2020), the quantification of the uncertainty of (dust) aerosol direct and indirect radiative effects is uncommon, yet is critical to robustly constraining (dust) aerosol impacts on the Earth system (Carslaw et al., 2010;Mahowald et al., 2011b). Moreover, the quantification of 1025 uncertainties on aerosol effects in both the present-day and pre-industrial climates is crucial to constraining climate sensitivity (Carslaw et al., 2013;Carslaw et al., 2018).
A second strength of our data set representing the global dust cycle is that it uses an analytical framework. As such, it is relatively straightforward to add additional observational constraints of the dust cycle, or to update the results as more accurate constraints on any of the inputs (see Fig. 1) become available. For example, more accurate model 1030 simulations can be easily integrated into the framework. Additionally, more detailed observational constraints could become available, for instance from an expansion of the DAOD analysis conducted in Ridley et al. (2016) or by adding additional types of observational constraints. This would reduce both uncertainties and biases, producing a more accurate and precise data set of the global dust cycle.

Conclusions 1035
We have obtained an improved representation of the global dust cycle by developing an analytical framework that uses an inverse model to integrate observational constraints on the dust size distribution, extinction efficiency, and regional DAOD with an ensemble of global dust cycle simulations (Fig. 1). This new approach mitigates two critical challenges that models face in representing the global dust cycle, namely (i) that capturing the magnitude and spatial distribution of dust emissions is a fundamental challenge for large-scale models because of the large mismatch 1040 between the resolved scales (~10-100 km) and the physically relevant scales (~1 m to several km) over which dust emissions vary, and (ii) that models have difficulty representing uncertainties in dust microphysical properties and often use values that are not consistent with up-to-date observational and experimental constraints.
Comparisons against independent measurements indicate that this new framework of integrating observational constraints with model simulations produces an improved representation of the present-day (2004)(2005)(2006)(2007)(2008) global dust 1045 cycle. Our inverse model reproduces NH measurements of dust surface concentration with a factor of 1.5-5 less error than both individual model simulations and the MERRA-2 dust product (Figs. 8c,d). This large improvement is due to reduced errors in capturing the seasonal cycle (Fig. 6) and the spatial variability of dust surface concentration (Figs. 7d,e), and because of the near elimination of biases against measurements in the NH (Figs. 8a,  b). Overall, the inverse model results show a reduction of errors against measurements and observations of the NH 1050 dust cycle measurements of approximately a factor of two (Fig. 8e). These improvement are noteworthy as previous studies have had difficulty simultaneously reproducing dust AOD, surface concentration, and deposition flux (Cakmur et al., 2006;Mahowald et al., 2006;Albani et al., 2014). The near elimination of bias against these different data sets thus suggests that models can be improved by accounting for the enhancement of the MEE by dust asphericity , as otherwise a ~40% greater dust loading would be needed to match DAOD 1055 constraints, resulting in a corresponding overestimation of NH dust surface concentration and deposition fluxes. Our results further indicate that models can be improved by correcting the current underestimation of coarse dust loading  and by adjusting source-resolved emissions to match regional DAOD constraints .
Although the integration of observational constraints thus improves the representation of the NH dust cycle, we 1060 found less improvement in the SH dust cycle. This is likely due to both the lower quality of constraints on regional DAOD in the SH and because of the difficulty models have in reproducing the dust cycle in the less dusty SH.
We also find that the emission flux of dust with geometric diameter up to 20 µm (PM20) is approximately 5,000 Tg/year (one standard error range of 3400 to 8900 Tg/year; Table 3), which is greater than most models account for. This greater global emission rate is partially driven by a larger emission flux of coarse dust with D ≥ 5 µm, which 1065 we find accounts for ~80% of the global PM20 emission flux (Fig. 5a). This large flux of coarse dust is needed to generate the ~17 Tg of atmospheric coarse dust loading that in situ measurements indicate resides in the atmosphere . Accounting for this substantial loading of coarse dust is important because these particles account for a substantial fraction of absorption of shortwave radiation and both absorption and scattering of longwave radiation (Tegen and Lacis, 1996;Ryder et al., 2018;Ryder et al., 2019), and also can account for a large 1070 fraction of nutrients delivered to ecosystems by dust.
The improved representation of the global dust cycle presented here is publicly available as part of the DustCOMM data set . These data include gridded dust emission, loading, (surface) concentration, wet and dry deposition, and DAOD fields that are resolved by season and particle size, including by particle bin and for PM2.5, PM10, and PM20 dust. Additional strengths of this data set are that it includes uncertainty 1075 estimates and that the data can be readily updated as improved constraints on dust properties and abundance become available. As such, our improved representation of the global dust cycle can facilitate more accurate constraints on the various critical impacts of dust on the Earth system.

Glossary
Dimensionless global scaling factor by which a unit dust loading in a global model simulation's particle size bin k is multiplied in order to bring the annually averaged global https://doi.org/10.5194/acp-2020-1131 Preprint. Discussion started: 23 November 2020 c Author(s) 2020. CC BY 4.0 License.
dust loading generated from all source regions in agreement with the constraint on the globally averaged dust size distribution ( � atm ( ) ). ̅ Mass extinction efficiency (m 2 /kg) of a global model simulation's particle size bin k, obtained by integrating the constraint on the globally averaged extinction efficiency � ext ( ) over the particle bin's size range.
Longitude. ̅ Density of dust aerosols, which is taken as (2.5±0.2)×10 3 kg m -3 . � Total uncertainty on the area-averaged observed DAOD of region p for season s. ̅ Area-averaged observed DAOD for region p and season s. ̆, ( , ) Inverse model seasonally averaged DAOD produced by dust emitted from source region r, averaged over season s. ̆, Inverse model seasonally averaged DAOD produced by dust emitted from source region r, averaged over season s and observed region p. Latitude.

2
Summed squared deviation between the observed DAOD in the fifteen regions and that obtained from our analysis. Area of the region p defined in Table 2 (m 2 ).

PM10
Global constant denoting the fractional contribution to the PM10 deposition flux of a model particle size bin that straddles 10 µm. ̃, s, ( , , ) Model-simulated 3D dust concentration (m -3 ) produced by a unit of dust loading (1 Tg) in particle size bin k emitted from source region r, averaged over season s. ̆( , , ) Inverse model 3D bulk dust concentration (kg m -3 ), averaged over season s. ̆, ( , , ) Inverse model 3D dust concentration (kg m -3 ) produced by dust in particle size bin k, averaged over season s. Geometric (or volume-equivalent) diameter (m).

−
Lower geometric diameter limit of a global model simulation's particle size bin k (m).

+
Upper geometric diameter limit of a global model simulation's particle size bin k (m).

Dmax
Maximum dust aerosol geometric diameter considered in this study, namely Dmax = 20 µm. � ,s, ( , ) Model-simulated spatial distribution of dust deposition flux (m -2 yr -1 ) produced by a unit of dust loading (1 Tg) in particle size bin k emitted from source region r, averaged over season s. � , ( , ) Inverse model spatial distribution of deposition flux (kg m -2 yr -1 ) of dust in particle bin k, averaged over season s. � ( , ) Inverse model spatial distribution of bulk dust deposition flux (kg m -2 yr -1 ), averaged over season s. ̃, s, Model-simulated seasonally averaged fraction of global dust loading emitted from source region r that is contained in particle size bin k. ̆, , Inverse model fraction of seasonally averaged global dust loading emitted from source region r that is contained in particle size bin k. � , , ( , ) Model-simulated spatial distribution of dust emission flux (m -2 yr -1 ) needed to generate a unit (1 Tg) of dust loading in particle size bin k emitted from source region r, and averaged over season s. � , ( , ) Inverse model spatial distribution of dust emission flux (kg m -2 yr -1 ) of dust in particle bin k, averaged over season s. � ( , ) Inverse model spatial distribution of bulk dust deposition flux (kg m -2 yr -1 ), averaged over season s. , ( , ) Spatial distribution of the Jacobian matrix (Tg -1 ) of ̆, with respect to � , , which equals the DAOD produced per unit of bulk dust loading from source region r, averaged over season s.

,
The Jacobian matrix of ̅ with respect to � , (Tg -1 ), which equals the seasonally averaged DAOD produced per unit of dust loading originating from source region r in season s and averaged over the observed region p. k Index that sums over the different particle size bins of a given global model.