Models transport Saharan dust too low in the atmosphere: a comparison of the MetUM and CAMS forecasts with observations

. We investigate the dust forecasts from two operational global atmospheric models in comparison with in situ and remote sensing measurements obtained during the AERosol properties – Dust (AER-D) ﬁeld campaign. Airborne elastic backscatter lidar measurements were performed on board the Facility for Airborne Atmospheric Measurements during August 2015 over the eastern Atlantic, and they permitted us to characterise the dust vertical distribution in detail, offering insights on transport from the Sahara. They were complemented with airborne in situ measurements of dust size distribution and optical properties, as well as datasets from the Cloud–Aerosol Transport System (CATS) spaceborne lidar and the Moderate Resolution Imaging Spectroradiometer (MODIS). We compare the airborne and spaceborne datasets to operational predictions obtained from the Met Ofﬁce Uniﬁed Model (MetUM) and the Copernicus Atmosphere Monitoring

Abstract. We investigate the dust forecasts from two operational global atmospheric models in comparison with in situ and remote sensing measurements obtained during the AERosol properties -Dust (AER-D) field campaign. Airborne elastic backscatter lidar measurements were performed on board the Facility for Airborne Atmospheric Measurements during August 2015 over the eastern Atlantic, and they permitted us to characterise the dust vertical distribution in detail, offering insights on transport from the Sahara. They were complemented with airborne in situ measurements of dust size distribution and optical properties, as well as datasets from the Cloud-Aerosol Transport System (CATS) spaceborne lidar and the Moderate Resolution Imaging Spectroradiometer (MODIS). We compare the airborne and spaceborne datasets to operational predictions obtained from the Met Office Unified Model (MetUM) and the Copernicus Atmosphere Monitoring Service (CAMS). The dust aerosol optical depth predictions from the models are generally in agreement with the observations but display a low bias. However, the predicted vertical distribution places the dust lower in the atmosphere than highlighted in our observations. This is particularly noticeable for the MetUM, which does not transport coarse dust high enough in the atmosphere or far enough away from the source. We also found that both model forecasts underpredict coarse-mode dust and at times overpredict fine-mode dust, but as they are fine-tuned to represent the observed optical depth, the fine mode is set to compensate for the underestimation of the coarse mode.
As aerosol-cloud interactions are dependent on particle numbers rather than on the optical properties, this behaviour is likely to affect their correct representation. This leads us to propose an augmentation of the set of aerosol observations available on a global scale for constraining models, with a better focus on the vertical distribution and on the particle size distribution. Mineral dust is a major component of the climate system; therefore, it is important to work towards improving how models reproduce its properties and transport mechanisms.

Introduction
Mineral dust is an important component of the Earth system (Forster et al., 2007;Haywood and Boucher, 2010;Knippertz and Todd, 2012), and it affects the scattering and absorption of solar and infrared radiation, as well as cloud microphysics. The Sahara is the main source of mineral dust (Washington et al., 2003;Shao et al., 2011), and once lifted into the air the dust can be transported over thousands of kilometres (Knippertz and Todd, 2012;Tsamalis et al., 2013) where it is exposed to the effects of ageing and mixing. These effects change its optical, microphysical, and cloud condensation properties (Richardson et al., 2007;Lavaysse et al., 2011), affecting the size distribution, chemical composition, and radiative effects. The transported dust also affects tropical cyclone development through effects on the sea surface Published by Copernicus Publications on behalf of the European Geosciences Union.
temperature (Evan et al., 2018), and the deposition of ironrich material into the ocean has an impact on biogeochemical cycles (Jickells et al., 2005).
Dust is forecast prognostically in numerical weather prediction (NWP) because of its impacts on atmospheric circulation (Solomos et al., 2011;Mulcahy et al., 2014), visibility, air quality, health, and aviation. Significant progress has been made in dust modelling over the last decade, with a suite of regional and global dust models now available. In recent years dust models have also started to assimilate aerosol optical depth (AOD) measurements from satellites (Niu et al., 2008;Benedetti et al., 2009;Liu et al., 2011;Di Tomaso et al., 2017). There have been a number of studies in recent years to provide further insight on the transport and properties of dust (e.g. Heintzenberg, 2009;Ansmann et al., 2011;Kanitz et al., 2014;Ryder et al., 2015;Groß et al., 2015, among many others) and the ability of models to predict dust events (e.g. Chouza et al., 2016;Ansmann et al., 2017). However, there have been few studies assessing how well the vertical distribution of dust is captured in models. For example, Chouza et al. (2016) found that the European Centre for Medium-Range Weather Forecasts (ECMWF) MACC model (precursor to the CAMS model considered here) simulated Saharan plumes that matched the vertical distribution but underestimated the marine boundary layer aerosol extinction, compensating for the missing AOD with an overestimate of the dust layer intensity. More recently, Ansmann et al. (2017) found that dust models, including the one run at the ECMWF, were able to forecast dust well for the first few days after emission but that the modelled loss processes were too strong, leading to an underestimation with increasing distance from the source. Other studies have shown that dust is not optimally represented in models, highlighting insufficient uplift and insufficient transport of the coarser particles. For example, Evan (2018) found that the representation of dust in climate models was affected by errors in the surface wind fields over northern Africa. Given the diversity of findings and the range of available models and methodologies, there is a continued need to assess the model predictions of the dust vertical distribution, particularly with information on vertically resolved particle size information, which is not usually available from operational remote sensing observations. Aerosol Robotic Network (AERONET) sun photometer retrievals (Holben et al., 1998) play an important role in dust model evaluation (for example, see Scanza et al., 2015;Cuevas et al., 2015;Ridley et al., 2016) and offer nearly continuous measurements and, for some stations, long observation records. However, AERONET instruments do not provide information on vertical distribution. Dry convective mixing can raise mineral dust to altitudes of at least 5-6 km over the Sahara and disperse it into a deep mixed layer (Messager et al., 2010). The dominant easterly winds at these latitudes advect this air mass across the Atlantic Ocean, and as the hot, dry, and dust-laden air passes the West African coast, it is undercut by cooler moist air in the marine bound-ary layer (MBL) and forms an elevated layer called the Saharan Air Layer (SAL) (Karyampudi et al., 1999). As plumes move across the Atlantic, the altitude of the SAL may decline due to large-scale subsidence and loss processes, and the residence time of the lofted dust is closely related to the height and size distributions. High-latitude dust lifted in Iceland during winter storms has also been reported up to high altitude, with coarse particles up to 5 km (Dagsson-Waldhauserova et al., 2019). The impact of dust on radiation and clouds also depends on its vertical distribution (Johnson et al., 2008). The key loss processes, wet and dry deposition and turbulent downward mixing, are strongly influenced by the altitude of the dust and the fine-and coarse-mode fractions. Note that in this paper we will denote particles with diameters < 1 µm as fine-mode dust, with coarse-mode particles having diameters > 1 µm.
Lidar observations provide valuable information about the location and vertical distribution of aerosols in the atmosphere and as such can be useful in the evaluation of dust models. Spaceborne lidar measurements provide this information on a global scale. For example, the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) on board the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) is an elastic backscatter lidar system (Winker et al., 2010) with limited capability to distinguish different types of aerosol (Omar et al., 2009). The Cloud-Aerosol Transport System (CATS) on board the International Space Station was a polarisation-sensitive backscatter lidar with good detection sensitivity and the ability to differentiate different aerosol types (Yorks et al., 2016). Both systems include depolarisation measurements, which permits the identification of mineral dust reliably vs other aerosol types. Airborne lidar measurements of aerosols typically offer a finer resolution and the combination with a number of other airborne instruments but on a limited geographical scale (see e.g. Marenco et al., 2011Marenco et al., , 2016Marenco, 2013).
In this work we compare airborne measurements of mineral dust with model predictions. The measurements include remote sensing with elastic backscatter lidar and in situ dust observations of the particle size distribution. We also make use of data from the CATS spaceborne lidar to extend our analysis over the Sahara. The observations are used to assess the performance of the dust forecast from two operational global models, the Met Office Unified Model (MetUM) and the European Centre for Medium-Range Weather Forecasts-Copernicus Atmosphere Monitoring Service (ECMWF-CAMS) model. The data are used to investigate whether convection, large-scale wind, boundary layer height, or dust size distribution has the greatest effect on how well the models capture the vertical structure of the dust layers. In this study observation data are used to assess the relative performance of the dust schemes in two operational global models. Both models and their respective dust schemes are briefly described in Sect. 2.1 and 2.2. Both models considered here assimilate MODIS AOD into the model analysis to improve the AOD forecast (e.g. Pope et al., 2016), and the models perform generally well for the prediction of dust AOD. For this study, short-range forecasts were used (forecast lead time < 12 h).

MetUM
The Met Office Unified Model (MetUM) is a nonhydrostatic, fully compressible, deep-atmosphere dynamical core solved with a semi-implicit semi-Lagrangian time step on a regular latitude-longitude grid (Davis et al., 2005). The configuration used in this study is the Global NWP model that was operational in 2015 (Global Atmosphere 6.1), which had a resolution of 0.35 • longitude by 0.23 • latitude, corresponding to an approximate resolution of 25 km at mid-latitudes and ∼ 40 km at the Equator (Walters et al., 2017). There are 70 vertical levels, reaching an altitude of 80 km (Pope et al., 2016). The dust scheme uses nine size bins for the horizontal flux calculations, with diameters between 0.0632 and 2000 µm, and either a six-or two-bin scheme for the subsequent transport and advection (Woodward, 2001(Woodward, , 2011Collins et al., 2011;Brooks et al., 2011). The operational Global model, used here, uses the two-bin dust scheme: division 1 (d1) covers the 0.2-4.0 µm diameter range, and division 2 (d2) covers the 4.0-20 µm diameter range.
AOD from MODIS collection 5.1 on board the Aqua satellite was assimilated into the model from Deep Blue over land and Dark Target over selected ocean regions in the dust belt (note that ocean assimilation was at that time limited to grid points with observed AOD > 0.1). There are four daily model runs, initialised at 00:00, 06:00, 12:00, and 18:00 UTC, and the model fields are available with a time step of 3 h (00:00, 03:00, 06:00, 09:00, 12:00, 15:00, 18:00, and 21:00 UTC). See Pope et al. (2016) and references therein for a description of how the model is initialised and the AOD data assimilation methodology.
The extinction efficiency for each of the MetUM dust bins is precalculated into a lookup table based on Mie scattering with an assumed underlying log-normal distribution and the refractive index from Balkanski et al. (2007). The extinction coefficient is then determined in the model by multiplying the predicted mass mixing ratio by the precomputed extinction efficiency .

ECMWF-CAMS
The global atmospheric composition forecasts run at the ECMWF, as part of the Copernicus Atmospheric Monitoring Service (CAMS), are a continuation of the work of the Monitoring Atmospheric Composition and Climate (MACC) project. The CAMS system combines state-of-the-art modelling with Earth observation data assimilated from a variety of sources, including MODIS collection 5 AOD from Aqua and Terra (limited to Dark Target retrievals). The data used here are from the operational forecasts produced in nearreal time during the period of the ICE-D campaign. At that time, the horizontal resolution was 80 km (corresponding to a T255 spectral truncation) and there were 60 vertical levels. The model provided a 120 h long forecast from 00UTC, and the analysis used 12-hourly 4D-Var data assimilation with MODIS Terra and Aqua Dark Target AOD to constrain the total aerosol mixing ratio. Details of the model set-up and the analyses can be found in Morcrette et al. (2009), and Cuevas et al. (2015. The operational CAMS global assimilation and forecasting system uses fully integrated chemistry in the ECMWF Integrated Forecasting System (IFS), for this time period cycle 40r2. The IFS is a spectral model using vorticity-divergence formulation with semi-Lagrangian advection and physical parameterisations on a reduced Gaussian grid. The CAMS aerosol parameterisation is based on the LOA/LMD-Z (Laboratoire d-Optique Atmosphérique/Laboratoire de Météorologie Dynamique-Zoom) model (Reddy et al., 2005). Prognostic aerosol of natural origin, such as mineral dust and sea salt, is described using three size bins. In total CAMS has five different types of prognostic aerosol, unlike the MetUM which only has dust in the operational model. For dust the bin size classes are one fine-mode (division 1 or d1, 0.06-1.1 µm diameter) and two coarse-mode bins (division 2 or d2, 1.1-1.8 µm diameter; division 3 or d3, 1.8-40 µm diameter). Morcrette et al. (2009) state that the size bins are chosen such that the mass concentration percentages are 10 % for the fine dust mode and 20 % and 70 % for the two coarse dust size bins during emission.
The extinction coefficient is computed in the model for each aerosol bin by multiplying the mixing ratio by the mass extinction coefficient derived from offline Mie scattering calculations based on the optical properties of Dubovik et al. (2002) as documented in Morcrette et al. (2009). For dust, hygroscopic growth is not considered.

ICE-D campaign
AERosol properties -Dust (AER-D) was a campaign led by the Met Office in collaboration with the universities of Reading and Hertfordshire . It was held at the same time as the Ice in Clouds Experiment -Dust (ICE-D. O'Sullivan et al.: Models transport Saharan dust too low in the atmosphere D), a larger collaborative campaign involving the Met Office, the National Centre for Atmospheric Science (NCAS), the universities of Manchester and Leeds (UK), the British Antarctic Survey, and the University of Mainz. In addition, the Sunphotometer Airborne Validation Experiment in Dust (SAVEX-D) was also carried out, thanks to EUFAR funding based on a proposal from the University of Valencia, Spain, the Met Office, and the University of Reading. SAVEX-D is treated here as a component of AER-D. The AER-D and ICE-D field campaigns were conducted on 6-25 August 2015 from Praia, Cape Verde (14 • 57 N, 23 • 29 W), 650 km off the west coast of Africa, an ideal region for observing dust outflow. The main aim of the ICE-D campaign was to characterise the properties of Saharan dust as ice nuclei (IN) and cloud condensation nuclei (CCN), their impact on cloud microphysical processes, and the formation of convective and stratiform clouds. The AER-D and SAVEX-D projects aimed at characterising dust properties above the eastern Atlantic. The main measurements were made using the Facility for Atmospheric Airborne Measurements (FAAM) Airborne Research Aircraft, a modified BAe-146-301; in total, 16 flights took place between the two campaigns, six of which contained high-altitude sections dedicated to surveying the vertical distribution of dust using lidar. The instruments deployed on the aircraft enabled a range of measurements of aerosol size distribution, chemical composition, optical properties, and radiative effects. Most flights took place in proximity to the Cape Verde islands, with the exception of flights B923, B924, and B932, which sampled between Cape Verde and the Canaries. Ground-based measurements were also made on the island of Santiago during the month. These experiments together provide a comprehensive dataset to investigate the properties of transported Saharan dust during the summer season. The key airborne instruments and satellite data used in this study are briefly discussed in the next sections.
During AER-D and ICE-D, Saharan air masses were transported by predominantly easterly winds over the Atlantic in a sequence of events between 6 and 25 August. Cape Verde was often on the edge of the transported dust, enabling flights to sample the main dust plume and a gradient across the flight track. The dust episodes often lasted for several days, which provided the opportunity to make measurements of dust of varying age. Among the key aims of the AER-D project are the improvement of dust remote sensing from space and from the ground and the validation of dust predictions in the Me-tUM and other models. The focus of the present paper is on the latter objective. Four dust events are considered here, derived from five research flights (one event having been sampled through a double flight). A summary of the flight sections considered is given in Tables 1 and 2, and the flight tracks are shown in Fig. 1. We use these data to investigate whether convection, large-scale wind, boundary layer height, or dust size distribution has the greatest effect on how well the models capture the vertical structure of the dust layers. There is no direct measure for convection in the archived model fields, and as such the impact of convection on the dust forecast can only be inferred through a process of elimination.
For convenience flight sections are divided into "runs" and "profiles": we have a run (also called straight and level run, denoted here with the letter R) when the aircraft flies for a certain time on a constant heading at a constant altitude and a profile (denoted here with the letter P) when the aircraft changes altitude with a constant rate of ascent or descent. Note that an aircraft profile is a slant trajectory through the atmosphere and thus differs from a lidar profile (vertical). Each aircraft run or profile is identified with a number; hence, for a given flight we have R1, R2,. . . and P1, P2,. . .. The runs and profiles of interest in this paper are identified in Tables 1 and 2.

Airborne lidar
The Leosphere ALS450 elastic backscatter lidar (wavelength 355 nm) is deployed on the FAAM aircraft in a nadir-viewing geometry. Marenco et al. (2011) and Marenco (2013) describe the methodology for converting lidar beam returns at 355 nm wavelength into profiles of the aerosol extinction coefficient. The system specifications are summarised in Marenco et al. (2014, and references therein), and a further description of the data processing methodology can be found in Marenco et al. (2016). During processing, the lidar data were integrated to 1 min temporal resolution, which corresponds to a 9 ± 2 km footprint at typical aircraft speeds.
Smoothing to a 45 m vertical resolution was also applied to reduce the effect of shot noise. The vertical profiles were processed using a double iteration. First we determined the lidar ratio (extinction-to-backscatter ratio), and subsequently we processed the full dataset to determine the extinction coefficient and AOD (see Marenco et al., 2016, and references therein, where the same methodology is applied). The first iteration was conducted on a subset of the vertical profiles, on which the signature of Rayleigh scattering above the dust layer could clearly be identified to enable the lidar ratio to be determined. We obtained a campaign mean lidar ratio of 54 ± 8 sr, which is in reasonable agreement with other measurements of the lidar ratio for dust at 355 nm (Lopes et al., 2013). This value of the lidar ratio was subsequently used to process the full dataset in the second iteration. On average during this campaign, the uncertainty in the derived dust extinction coefficient was 8 % but with significant variability of this figure in both the vertical and horizontal. The uncertainty is smaller than this near the top of the profile (closer to the aircraft) and larger nearer the ground. The methodology described in Marenco et al. (2016) was used here.

In situ aerosol measurements
A number of wing-mounted instruments permitted us to measure the aerosol size distribution between 0.1 and 100 µm.  berg, 1981), and the two-dimensional stereo probe (2DS) measured large aerosol particles up to ∼ 100 µm. Calibration of the PCASP was done before and after the campaign, whereas the CDP was also calibrated before most flights. The PCASP and CDP measurements (d < 20 µm) and their calibration for the ICE-D campaign are discussed in more detail in Ryder et al. (2018), where the full size distribution measurements are described. The particle size spectra have been processed for an assumed refractive index for dust of 1.53-0.001i, thus correcting for the bin ranges calibrated using polystyrene latex spheres, and the first bin has been discarded due to its undefined lower edge. The 2DS is a shadowing probe with 10 µm resolution, and it does not rely on refractive index to infer particle size. Profiles of in situ measurements were acquired on slant trajectories through the atmosphere (aircraft profiles).

Satellite datasets
Two sources of satellite data are used here, the Cloud-Aerosol Transport System (CATS) and the Moderate Resolution Imaging Spectroradiometer (MODIS). CATS is a multiwavelength lidar instrument (wavelengths 532 and 1064 nm) developed to enhance Earth science remote sensing capabilities from the International Space Station (ISS) (McGill et al., 2015). CATS operated for 33 months (10 February 2015 to 29 October 2017), primarily in an operating mode that was limited to the 1064 nm wavelength due to issues with stabilising the frequency of laser 2 (Yorks et a., 2016). The CATS level 1 data product includes 1064 nm attenuated total backscatter (ATB) and linear volume depolarisation ratio measurements. Yorks et al. (2016) provides an overview of the CATS L1 data products and processing algorithms as well as a comparison with airborne data. Pauly et al. (2019) found that the CATS 1064 nm ATB has a low bias of up to 7 % in aerosol layers compared to airborne and groundbased lidars due primarily to CATS calibration uncertainties. The CATS extinction coefficient profiles have a 5 km horizontal resolution (along-track) and 60 m vertical resolution. Lee et al. (2019) showed that CATS extinction profiles compared favourably with CALIPSO, with differences due to the aforementioned ATB bias and differences in parameterised extinction-to-backscatter ratios. This paper utilises the vertical profiles of the 1064 nm aerosol extinction coefficient in the CATS level 2 (L2) version 3-01 5 km profile products derived from the L1 attenuated total backscatter data. For this study, the data were filtered by the "cloud" and "invalid" flags, thus showing only the aerosol data points. The aerosol subtype (plotted together with the extinction coefficient) indicates that most of the aerosol of interest here is in fact classified as dust and dust mixtures in the CATS L2 dataset. MODIS collection 6.1 level 2 atmospheric aerosol products from Aqua (MYD04_L2) and Terra (MOD04_L2) were obtained from the Level-1 and Atmosphere Archive & Distribution System (LAADS, ftp://ladsftp.nascom.nasa.gov/ allData/61/, last access: 21 September 2017). The merged Deep Blue and Dark Target aerosol optical depth at 550 nm from both Aqua and Terra was used to create daily AOD maps (Hsu et al., 2004(Hsu et al., , 2006Levy et al., 2013;Sayer et al., 2013Sayer et al., , 2014. The differences between the collection 5 (used in both models for operational assimilation in August 2015) and the subsequently released collection 6 are treated in detail in the above-referenced papers. Generally speaking, with the collection 6 update, the Deep Blue product was extended to vegetated surfaces, and improvements to the aerosol type classification and quality assurance were introduced for both the Dark Target and Deep Blue products. Comparisons performed by the authors suggest that, generally speaking, the collection 6 AOD values are marginally higher in the dust source regions (e.g. western Africa and the Middle East). The differences between the MODIS collections represent a major improvement to the MODIS product, but we do not expect them to substantially affect the conclusions drawn in this paper.

Analysis of dust source regions and transport
Source regions for the sampled dust were investigated using two back-trajectory models. Back trajectories were calculated from the time, latitude, longitude, and altitude of various points along the flight track where high dust loadings had been encountered using the Numerical Atmospheric Modelling Environment (NAME) (Jones et al., 2007) and the Hybrid Single-Particle Lagrangian Integrated Trajectory model (HYSPLIT) (Draxler and Hess, 1998;Stein et al., 2015). In the NAME back trajectories, meteorological data from Me-tUM were used, and the HYSPLIT back trajectories were driven using meteorological data from the National Oceanic and Atmospheric Administration (NOAA) Global Data Assimilation System (GDAS). Despite the very different models and meteorological data used, the back trajectories from the two models highlighted consistent source regions.
Haboobs driven by convective outflows from mesoscale storms have been shown to represent the dominant uplift mechanism of Saharan dust during the summer months, with a share of 50 % of the uplifted dust (Marsham et al., 2013a). The meteorological reanalyses driving HYS-PLIT back trajectories and NAME dosage maps are not able to identify the dust source location or the transport pathways over these or subsequent uplift events . On the other hand, haboobs and dust storms are clearly identified by an expert eye in the EUMETSAT "dust RGB" product from the MSG and SEVIRI infrared channels (http://oiswww.eumetsat.int/ idds/html/product_description.html, last access: 5 February 2018), and it is thus possible to utilise this type of imagery to track dust as it is transported, thus helping to determine source location and uplift time (e.g. Schepanski et al., 2007). Dust events observed during four of the flights considered here were examined in this way by Ryder et al. (2018), and mesoscale convective storms drove the dust uplift and subsequent transport in all of them. Despite the inability of back-trajectory analysis to really capture haboobs, the back trajectories and the satellite tracking of the plumes gave consistent results.
The identified source regions and dust transport paths are shown in Fig. 1. This uses a combination of work done by Ryder et al. (2018) and Liu et al. (2018), with additional information in this work from NAME and HYSPLIT to help identify the dust trajectory. A detailed discussion on the meteorology during the ICE-D campaign can be found in . A key point is that the MBL in the eastern Atlantic was typically 300-500 m deep during the study period, which is in agreement with the aircraft lidar observations and the in situ measurements during aircraft ascent and descent profiles. Liu et al. (2018) also show that on 15 August there was a change in the synoptic conditions. This means that for the first and third case study used here (B920 and B927) the maximum horizontal wind speed above the MBL in the Saharan Air Layer (SAL) was lower than 10 m s −1 , and wind direction varied between NE and SE, which resulted in lower dust loadings during these two flights. Case study 2 (B923 and B924) was also in this period of slower wind speeds, but high dust loadings were sampled due to the more northerly location of flights B923 and B924. In the final case study looked at here, case study 4 (B932), the wind speed above 2 km was significantly enhanced with a more easterly wind direction. This resulted in higher dust loadings being observed in case study 4 than for 1 or 3 -note that the highest dust loadings of all were observed in case study 2 due to the location of these flights; see Liu et al. (2018) for the full meteorological and dust source analysis.

Comparison of datasets
The airborne lidar measurements of the aerosol extinction coefficient and AOD were measured at a wavelength of 355 nm, whereas the MODIS and AERONET data used here were all collected at 550 nm, and CATS aerosol properties are at 1064 nm. The model extinction is available for a variety of wavelengths including 380, 550, and 1064 nm, and for CAMS 355 nm is also available. Here, the MetUM dust aerosol extinction coefficient was recalculated from the mass concentrations of division 1 and division 2 dust (see Sect. 2.1 for a description of the dust scheme), as well as Mie-derived optical properties of the two dust size bins.
Having measurements at different wavelengths across datasets has not been a major concern because very little wavelength dependence was noted during the campaign for aerosol extinction: the difference in AOD between 340 and 550 nm was less than 5 % in the AERONET data examined. Similarly, the MetUM extinction at 355 nm was only 22 ± 7 % larger than at 1064 nm. This is explained by the small Ångström exponent during the campaign (−0.4 to 0.4: see Liu et al., 2018), and this is generally expected for coarse mineral dust particles. For this reason, it was deemed unnecessary to scale extinction and AOD for wavelength in the present study.
The MetUM extinction coefficient only includes dust; this could potentially make the results lower compared to total aerosol extinction, which also includes other aerosol types. However, data from the CATS lidar, as well as the in situ measurements including filter samples discussed in , confirm that the aerosol sampled during AER-D and ICE-D was predominantly dust, with a contribution from marine aerosol in the MBL. For this reason, for this study we neglect the conceptual difference between the dust-only extinction of the MetUM and total aerosol properties in CAMS and the observations.
The comparison methodology used is summarised in Fig. 2.

Results and discussion
In Sect. 4.1 and 4.2, the measurements of the aerosol extinction coefficient, AOD, and dust concentration for the different size bins used by the MetUM and CAMS are used to assess the predicted dust and the representation of dust size distribution in both models. In Sect. 4.3 and 4.4, the model large-scale wind and boundary layer height are compared with observations to infer what, if any, influence these have on the dust forecast.

Individual case studies
Case study 1 (7 August 2015, B920; Figs. 3-8). This flight took place near Praia and was co-located with an overpass of the CATS spaceborne lidar. There were two high-level sections during the flight that have been looked at, R1 and R6 (see Table 1 for run times and locations). Figure 3 displays the airborne, spaceborne, and model data for R6, which coincided with a CATS overpass. A deep dust layer was observed between ∼ 2 and 5 km, with marine aerosol mixed with dust in the boundary layer and a broken cloud field at the top of the boundary layer. Both the extent and amount of aerosol observed agree well between the airborne and the spaceborne lidars ( Fig. 3a and d). The aerosol type classification from CATS (not shown here) also agrees well with the in situ measurements, which found a marine aerosol layer below the dust layer. The dust layer was well mixed, with moderate extinction coefficients (100-180 Mm −1 ) and AODs between 0.28 and 0.44 observed by the airborne lidar. Figure 3e-g show that the models and the observations display a low AOD around Cape Verde, with much larger values near the Canary Islands and off the West African coast. In Fig. 3g, the AOD observations from MODIS, AERONET (stars), and the aircraft lidar (dots) are in agreement within 5 %. This broad agreement is consistent with the fact that both models assimilate MODIS AOD. However, the MetUM and CAMS models underpredict the intensity of the AOD maximum by 0.9 and 0.6, respectively, and there are also variations in the predicted plume location.
From Fig. 3a-d we see that the predicted vertical distribution of the dust layer shows some differences from the observations: the dust layer extends from the surface to around 4 km in the MetUM and from 1 to ∼ 4 km in CAMS, whereas CATS and the airborne lidar both show the dust layer between 2 and 5 km. The magnitude of the extinction coefficient predicted by the models of 100-170 Mm −1 is, however, in good agreement with the observations from both lidars (100-200 Mm −1 ). The mean, standard deviation, and maximum extinction values for each considered flight section are summarised in Table 1. For this run, the MetUM mean extinction was 55 ± 38 Mm −1 , the ECMWF forecast was 58 ± 41 Mm −1 , and the aircraft lidar measured a mean extinction value of 56 ± 40 Mm −1 .
In Fig. 4a the mean extinction profile for R6 is shown for the airborne lidar, the MetUM, and the CAMS model, and Fig. 4b displays the mean dust concentration profile in each of the size bins for both models for the same time period. As already highlighted from Fig. 3 the MetUM has the dust layer extending right down to the ocean surface. It is dominated by the smaller size bin (d1, 0.2-4.0 µm diameter), in particular for the aerosol below 1 km. The concentration predicted by CAMS for this case is about half of that in the MetUM, and the magnitude of the predicted extinction is similar at around 100-120 Mm −1 . There are, however, differences in the dust layering; for the MetUM the maximum is near the surface with a smooth decline with altitude, whereas CAMS predicts an elevated dust layer between 1 and 4 km as discussed for Fig. 3. This discrepancy in concentrations is thought to be mainly ascribed to the representation of the particle size distributions, whereas the agreement in terms of extinction can be understood if one considers that the models are tuned to the observations. The dust concentration from the MetUM divisions d1 and d2 and the CAMS divisions d1 (0.06-1.1 µm diameter), d2 (1.1-1.8 µm diameter), and d3 (1.8-40 µm diameter) have also been compared with the in situ measurements for each of the five size ranges and the total dust concentration measured during aircraft profiles. Two profiles from this flight are shown in Figs. 5 and 6. The observed concentration of dust in the MetUM d1 size bin typically makes up about a third of the total dust concentration measured, and d2 is around two-thirds. In contrast, the measurements only show 0-10 µg m −3 dust in the CAMS d1 and d2 size bins, and the concentration in the d3 size bin is very close to the total measured. Comparing the model data (lines with markers on) to the measurements (lines of the same colour with no markers) in Figs. 5 and 6 we can see that both models struggle to accurately capture the dust concentration for each size bin. This adds to the difficulty in attributing dust to the right altitude. For example, in P2 (Fig. 6a) the MetUM has more d1 dust than d2, while the aircraft measurements show the opposite. For the same profile (Fig. 6b), CAMS has more d2 dust than d3; however, the measurements show that there is less than 10 µg m −3 d1 or d2 dust, and the predicted CAMS d3 shows a maximum of 60 µg m −3 to be compared with 350 µg m −3 (observed maximum).
Temperature and specific humidity profiles from the aircraft in situ instruments were also compared with data from the MetUM and ECMWF. An example is shown for this flight for P2 (Fig. 7) and P7 (Fig. 8). The temperature profiles are within 3.5 • in the boundary layer and within 1.5 • above 4.5 km, with no systematic bias for either model. Both models also generally get the specific humidity profiles about right, capturing the main features, although with more obvious differences than for temperature. Generally, the models predict a correct vertical structure of the atmosphere in terms of thermodynamic profiles; however, the predicted dust vertical distribution seems to depart excessively from the thermodynamic structure.
Case study 2 (12 August 2015, B923 and B924; Figs. 9-11). Flights B923 and B924 both took place on 12 August flying between Praia and Fuerteventura to sample the outflow from a dust uplift event that had happened on 10 August in northern Mali. These flights were able to reach the main dust plume, which means that the highest AODs and extinction coefficients of the campaign were measured on this day . The two flights sampled the same plume at different times during the day, and only B923 is shown here as results for flight B924 are similar. The AOD measured by the airborne lidar reached 2, with an aerosol extinction coefficient of 100-1300 Mm −1 near the western African coast. As in the previous case study, both models captured the spatial distribution of the dust AOD well (Fig. 9d-f); however, the MetUM underpredicted the intensity of the AOD maximum by 1.1, and the CAMS model underpredicted it by 0.8.  The right-hand plot shows the same thing but for the ECMWF-CAMS size bins, with the measurements shown using lines and the model values with lines and markers for divisions 1 (red), 2 (green), and 3 (blue). See text for the description of the divisions.
For this section of flight B923, both models showed a dust layer up to ∼ 5 km, with an enhanced extinction coefficient at 13-17 • W between the surface and 1 km, where the extinction coefficient increases from an average in-layer value of 100-150 to 500-700 Mm −1 (Figs. 9a, b and 10). This spatial distribution along the flight track is similar to the observed one ( Figs. 9c and 10); however, the maximum dust extinction is observed at ∼ 1 km of altitude, whereas the models predict it closer to the surface, and the dust maximum extinction coefficient along the flight track was underpredicted in the MetUM and CAMS by 45 % and 80 %, respectively (Table 1). Two sections of flight B924, on the same day, rein-  force these results (not shown here as they are similar to the section just discussed). However, Fig. 9d-f show that there is a difference in the general representation by both models: CAMS predicts a maximum AOD of 1.6, with almost the same values and spatial distribution that were observed by lidar, whereas the MetUM underpredicts this dust event's maximum AOD by 0.6 compared to the lidar and 1.5 com-pared to MODIS. The differences between models and observations could possibly be associated with the dust having been uplifted by a strong haboob, which models, running with the resolution and convection parameterisation required for global coverage, are unlikely to represent in a way that gives the strength of the uplift (Marsham et al., 2013b;Birch et al., 2014;Roberts et al., 2018). In particular, we note that the convection parameterisation has no specific representation of surface gusts due to downdrafts (main contributors to dust uplift) and that it is not currently coupled to the dust scheme.
In P1 the measurements show very large amounts of dust, up to 3000 µg m −3 concentration (Fig. 11), with both models predicting significantly less (250 µg m −3 in MetUM and 120 µg m −3 in CAMS). Interestingly in this aircraft profile, which is closer to the area affected by the intense dust, both models have more dust in the largest size bins, in agreement with the in situ measurements.
In summary, compared to the very large differences between the measured and modelled dust concentration, the modelled extinction is much closer to the observations.
Case study 3 (15 August, B927; Figs. 12-13). This case study is quite interesting, as the dust was confined to a shallow layer between 2.0 and 3.5 km as can be seen in Fig. 12c. The extinction coefficient (∼ 100-300 Mm −1 ) measured by the lidar and the AOD (up to 0.36) were moderate. Much higher AOD values, up to 2.4, were observed by MODIS over Africa and nearer to the coast. As can be seen from Fig. 12ac, the ECMWF-CAMS model does a good job at getting the dust layer centred around an altitude of 2.9 km and with an extinction coefficient of 180-330 Mm −1 , in good agreement with the observations but with a larger layer depth (between 1.5 and 4 km). This is particularly noticeable in the run mean plot (Fig. 13a). On the other hand, the MetUM predicts a dust layer centred around 2.7 km, close to the lidar observations, but the peak extinction coefficient is underpredicted by ∼ 200 Mm −1 . A second dust layer is predicted near the surface below 1.1 km, and this results in an AOD range of 0.4-1.8, which is similar to the AOD range of 0.3-2.0 predicted by CAMS (Fig. 12d, e). The location of the maximum AOD predicted by the models is in reasonable agreement with the MODIS observations (Fig. 12f); however, MODIS observed higher AOD values in the dust plume than the models predicted of up to 2.6. Figure 13b shows the modelled dust mass concentrations in the different size bins. For the MetUM there is a greater amount of dust in the smaller size bin, with a peak in d1 dust of 120 ± 10 µg m −3 and a peak in d2 dust of 70 ± 20 µg m −3 . For the CAMS model the opposite is true and the smallest size bin peaks at 20 ± 7 µg m −3 in the main dust layer, with most of the dust mass in the larger two size bins reaching a maximum of 100±9 µg m −3 for d2 and 80±7 µg m −3 for d3.
Case study 4 (20 August, B932; Figs. 14-16). The fourth case study shows another interesting flight, during which the dust was observed in an elevated layer between 2 and 4.5 km (Fig. 14c). For the dust observed on this day, the estimated transport time from the source region was 2.5 d, thus shorter compared to the previous three. The dust was uplifted by a mesoscale convective system on 17 August near the Algeria-Mali border and from the northernmost tip of Mali (Fig. 1). The aerosol extinction coefficient (∼ 100 and 400 Mm −1 ) and AOD observed by the airborne lidar (up to 0.72) were the highest observed during the campaign after B923 and B924. We note that this flight also travelled about ∼ 800 km to the northeast of Cape Verde, hence getting closer to the main plume. As can be seen from Fig. 14d-f, the AOD in the dust plume is between 0.6 and 1.2 for both models, which compares well to the 0.7-1.4 observed by MODIS. Both models simulate the spatial distribution of the AOD well compared to observations and predict the observed north-south gradient along the flight track. From Fig. 14a-c we can also see that both models forecast the top of the dust layer reaching around 4 km, which is only slightly lower than the 4.5 km observed on the lidar. However, the observations show most of the dust in a relatively shallow layer between 2 and 3.5 km, whereas the models have the peak of the dust below 1 km. This can also be seen quite clearly in Fig. 15a.
Out of the eight high-level sections from the four case studies included in this work, R1 from B932 shown here is the only case study for which both models predict a higher extinction coefficient than was observed by airborne lidar. As can be seen from Table 1, the lidar measured a mean aerosol extinction coefficient of 76 ± 81 (Max 395 Mm −1 ), while the MetUM and ECMWF mean and maximum values were 140 ± 130 (Max 620 Mm −1 ) and 140 ± 120 (Max 500 Mm −1 ), respectively. In this case, moreover, both models have most of the dust concentration in the largest size bin, although the d2 (4.0-20 µm) dust mass for the MetUM is underestimated by 20 % and the CAMS d3 (1.8-40 µm) dust mass is underestimated by 85 % compared with observations. The peak d2 mass of 800 ± 200 µg m −3 predicted by the MetUM is 270 µg m −3 larger than the peak d1 mass of 520 ± 90 µg m −3 . (Fig. 15b). In CAMS, the peak d2 and d3 masses in the dust layer are ∼ 200 ± 75 µg m −3 each, i.e. more than double the peak d1 mass of 90 ± 10 µg m −3 . Still, the fine-mode dust appears to be overestimated by ∼ 30 % , ∼ 80 %, and ∼ 90 % for the MetUM d1 and the CAMS d1  The right-hand plot shows the same thing but for the ECMWF-CAMS size bins, with the measurements shown using lines and the model values with lines and markers for divisions 1 (red), 2 (green), and 3 (blue). and d2, respectively (peak model value compared to peak observed). The greater contribution of the smaller dust particles to the extinction coefficient combined with an overestimation of the overall concentration are consistent with the predicted extinction coefficient being ∼ 12 % and ∼ 40 % higher than the observed one in this case study for CAMS and the Me-tUM, respectively. Note that the CAMS d2 dust mass concentration of R1 (Fig. 15b) and P4 (Fig. 16b) is virtually identical to the d3 mass concentration, with the two lines overlapping.

General findings from the four case studies considered
For all the case studies the MetUM and ECMWF global dust forecasts capture the spatial distribution of dust AOD reasonably well in comparison with observations. The model predictions show some positioning errors compared to MODIS AOD, and this can affect the local comparisons made at the aircraft location. In the case studies considered, the models showed underprediction of the AOD by 0.8-1.5 and 0.6-0.9 for the MetUM and CAMS, respectively. However, in case study 4 both models underpredicted the AOD by ∼ 0.2. The model prediction of the vertical distribution of the dust extinction coefficient is not always consistent with observations. As a general rule, we have observed that both models have tended to predict the dust 0.5-2.5 km too low in the atmosphere compared with the observations, with ECMWF generally better capturing elevated dust layers. The ECMWF-CAMS model also captures the depth of the dust layer better than the MetUM, with the height of the dust layer being more accurate and with the MetUM often extending the dust layer down to the surface in cases when this is not seen in the observations. In the next section we will use data from the CATS spaceborne lidar, in comparison with predictions from the MetUM, to investigate what could be causing the observed discrepancies in the dust vertical distribution.
We noted large differences of 25 %-100 % (corresponding to 100-2800 µg m −3 ) between the measured and modelled dust concentration associated with a modelled extinction within ∼ 50 % of the observations, which may appear surprising because concentration is the modelled variable from which optical properties are computed. We need to bear in mind, however, that AOD is the most often used metric to compare aerosol model predictions and observations: AERONET AOD is often used in model verification, and both the MetUM and the CAMS model use MODIS AOD in data assimilation. It is not so surprising, therefore, that modelled optical properties are pulled towards the observations, even when the microphysical properties from which they are computed are out of scale (in this case, an underestimated dust concentration). Finer particles make a greater contribution to the aerosol extinction coefficient per unit mass than coarser ones, and the mismatch between the representation in concentration and in optical properties can be compensated for in the models through the size distribution. For most of the aircraft profiles studied here, the models have about a factor of 2 too much dust in the smaller size bins, meaning that an underpredicted dust concentration can yield an aerosol extinction coefficient of the right order of magnitude.
For the flights which sampled dust nearer the source regions (case studies 2 and 4) the models had 65 %-90 % of the dust concentration in larger size bins (MetUM d2 and CAMS d3) compared to the other flights, for which this proportion was 35 %-60 %. This seems to indicate that the models may represent the dust size distribution better nearer the source. The observations from the AER-D and ICE-D campaigns suggest that, as the dust travels away, the observed size distribution changes little, with large particles transported in significant quantities as far as Cape Verde Ryder et al., 2018). In contrast, the models appear to lose particles from the larger size bins rapidly with increasing dust mass age due to gravitational sedimentation processes.

Comparison with the CATS spaceborne lidar
We compared almost every CATS overpass covering North Africa and the eastern Atlantic during AER-D and ICE-D with the MetUM. CATS and model data were compared for overpasses between 6 and 25 August 2015 in the study region off the western African coast between 40 • N and 10 • S latitude and 40 • W and 40 • E longitude, for a total of 45 overpasses. The four most significant cases are discussed here. For each overpass, the CATS aerosol extinction coefficient was compared with the MetUM dust extinction coefficient, and the modelled contribution to the extinction of each of the two size bins was also analysed.
In Fig. 17, a CATS overpass at 00:00 UTC on 7 August over the African continent is shown, with significant amounts of dust between 1 and 7 km. The MetUM predicts the dust in more or less the right places across the CATS track but underpredicts the magnitude of the extinction coefficient by 60 %. As for the case studies in Sect. 4.1, most of the predicted dust is also lower in altitude (below 5 km) than observed and extends to the surface (although the model does predict some dust reaching as high as 7 km). The smaller size bin contributes 80 % of the modelled extinction coefficient.
In Fig. 18 a CATS overpass from 18:00 UTC, also on 7 August, is shown for which the dust is moving off the West African coast over the sea. At the eastern end of the transect the model has a similar dust extinction coefficient (60-180 Mm −1 ) to CATS (80-260 Mm −1 ), the key difference being that the model layer extends between the surface and 5 km, while in the CATS observations it extends between 1 and 7 km. However, over the ocean (longitude > 15 • W) the model misses the layer evident in the CATS data.
Two further examples are shown in Fig. 19 for 00:00 UTC on 8 August and Fig. 20 for 16:00 UTC on 10 August with a similar pattern. In Fig. 19 the entire CATS overpass shown is over land: at the northwest end of the overpass both the Me-tUM and CATS show the dust plume extending from the surface to over 7 km. However, towards the southeast the model predicts it to be between the surface and ∼ 4-6 km, whereas CATS continues showing the layer between 1 and 7 km. The model predicts an approximately 65 % lower extinction coefficient than CATS.
In Fig. 20, similar to Fig. 18, the CATS overpass starts over the West African coast and then moves over the ocean. As in the previous example, the model predicts a deep dust layer extending up to 6 km. The model underpredicts the aerosol extinction by ∼ 65 % and by ∼ 45 % over land. Over land, division 2 predicted dust makes up 7.5 % of the dust concen-tration, dropping away to nearly zero over the ocean, potentially due to sedimentation of the coarser particles.
Two things stand out from the above examples: (1) over the African continent, where the dust is uplifted, the model generally agrees better with the observations than over the ocean further away from the source region, and (2) the smaller dust particles (division d1) in the model reach the same altitude as the dust layer observed by CATS, but the coarser particles (division d2) appear to be distributed much lower in the atmosphere (e.g. Figs. 17, 19, and 20). As already mentioned, we looked at similar plots for 45 overpasses in total, and the comparison gave similar results.
In the MetUM there is a size dependence in the dust uplift scheme, whereby finer particles are lofted more easily. However, previous studies suggest that the MetUM division d2 dust would be expected to reach higher altitudes away from source regions than it does. The behaviour downstream from the source seems to indicate that as the dust-laden air mass  The right-hand plot shows the same thing but for the ECMWF-CAMS size bins, with the measurements shown using lines and the model values with lines and markers for divisions 1 (red), 2, (green), and 3 (blue). moves away, the coarse particles are lost too quickly in the model prediction. This would fit with what previous studies have found, for example Ansmann et al. (2017).

Effect of large-scale wind and boundary layer height
In this section we investigate potential drivers for the observed discrepancies in the vertical distribution of dust in the MetUM and ECMWF-CAMS. This is a difficult task as there are many competing factors that influence how dust is lifted     into the atmosphere and subsequently transported, and these vary considerably between models. In the MetUM the three processes which are most likely to have an impact on the vertical distribution of dust are the convection scheme, boundary layer (BL) height at the source, and the large-scale wind.
Looking at the large-scale wind field and BL height should show whether the modelled dust layer height is controlled by the large-scale wind or by boundary layer mixing processes at the source. If examination of these processes cannot explain why the dust is too low in altitude, then the most likely cause is to be researched in the convection scheme. There is, however, no direct measure of convection in the model output fields from the MetUM, and therefore any influence can only be inferred from the data that are available to us. Back trajectories from HYSPLIT and NAME as well as SEVIRI dust RGB imagery were used to determine the central trajectory of the dust sampled during each case study from the source (Fig. 1). The dust concentration for each size bin, the large-scale wind (w), and the BL height were extracted from the model output along the track and plotted as a cross section every 6 h from the time of uplift to the time of sampling by the aircraft. Figure 21 displays such cross sections for case study 3. The dust was uplifted from Mali on 13 August, with a secondary uplift along the track in Mauritania. At the time of uplift both models show a ∼ 0.3 m s −1 increase in the largescale wind velocity. An increase in large-scale wind velocity at the time of uplift between 0.2 and 0.8 m s −1 was observed for all the cases looked at. At the time of dust uplift, the BL height was typically 4-5 km, and the dust mixed up to its top. The altitude which the dust reached over the source regions of Africa compared well with the CATS observations of the depth of the dust layer over Africa . This suggests that problems with the BL height in the MetUM may not be the cause for the dust layer being represented too low in the atmosphere away from the source region.
From the data presented here it is not possible to determine how well the models represent large-scale wind in the dust source regions. Previous studies which have looked at this issue more comprehensively do, however, suggest that there is an underprediction of wind fields in the models, which is also linked to coarse-resolution modelling (e.g. Chouza et al., 2016). Evan et al. (2016) showed that desert dust emission is to first order a function of wind speed, and it is against this quantity that models parameterise the dust source. This, combined with our observations of an increase in large-scale wind velocity at the time of dust uplift, suggests that further investigation into the role of wind speed in the models would be helpful as a key part of getting the amount of dust uplift right.

Conclusions
The vertical distribution, particle size distribution, and mass concentration are the key properties that are predicted in a dust transport model. On the other hand, the main observable quantity on a global scale is aerosol optical depth from AERONET (Holben et al., 1998), MODIS (Hsu et al., 2004(Hsu et al., , 2006Levy et al., 2013;Sayer et al., 2013Sayer et al., , 2014, and potentially other sources such as the Polar Multi-Sensor Aerosol product (PMAp; Lang et al., 2017), the Visible Infrared Imaging Radiometer Suite (VIIRS; Hsu et al., 2019), and several others. Aerosol optical depth is at the same time an optical property and a vertically integrated quantity, meaning that the same observable AOD can be retrieved e.g. with differing combinations of concentration and particle size distribution or with a differing vertical distribution. It is good practice to pull the model towards the observations, and this can be achieved by tuning and data assimilation: this means that we can expect a good model to yield a sensible prediction of the AOD. This is, however, insufficient to state that the underlying microphysical properties, from which AOD is derived, are correctly balanced. The vertical distribution and particle size distribution heavily affect how dust is transported and how quickly it is deposited. Wind speed and direction are altitude dependent, meaning that transport is heavily dependent on the altitude of a layer. Residence time and transport range are affected by both the particle size distribution (coarse particles tend to be deposited more quickly) and vertical distribution (turbulent mixing in the boundary layer speeds up deposition compared to the free troposphere). The representation of these properties in a model can affect the predicted AOD gradient across the Atlantic, for example. All this means that in the case of a model constrained by AOD observations only, other processes may need to compensate for a potential imbalance in the microphysical representation, such as the intensity of sources and sinks. The microphysical properties and the three-dimensional spatial distribution of dust are thus deeply interconnected.
We have used a combination of remote sensing and in situ measurements to characterise the vertical distribution and transport of Saharan dust over the eastern Atlantic and West Africa during August 2015, as well as to evaluate the dust forecasts from two operational global atmospheric models (MetUM and ECMWF-CAMS). The dust AOD predictions at short forecast lead times from both models were in agreement with the aircraft, satellite, and AERONET observations but with a low bias (note that both models assimilate AOD). Previous studies resulted in similar findings; Roberts et al. (2018) found that the AOD over the Sahara is well represented compared to MODIS on a seasonal to monthly timescale. On the other hand, we found that the vertical distribution of the aerosol extinction coefficient and dust concentration could benefit from improvements. Our results show that the predicted vertical distribution places the dust low in the atmosphere when compared to observations. Agreement between measured and modelled profiles was better near the source, with differences increasing downstream, confirming the findings of previous studies (e.g. Kim et al., 2014;Ansmann et al., 2017). Similarly, Konsta et al. (2018) concluded that the BSC-DREAM8b regional dust model overestimated dust extinction in the Saharan source regions and underestimated transported dust over Europe and the Atlantic.
This issue was particularly noticeable in the MetUM, wherein the coarser dust was not transported high enough in the atmosphere or far enough away from the source compared with the observations. This suggests that the model could be settling the coarse-mode dust too quickly, and similar findings have also been observed in previous studies (e.g. Kim et al., 2014;Mona et al., 2014;Binietoglou et al., 2015). We also found that both models underpredict the coarse mode and overpredict the fine mode. The discrepancy between the magnitude of the measured and modelled extinction coefficient is much less than for the concentration profiles. This is likely to be due to the microphysical representation, since small particles are more optically efficient. Due to MODIS AOD data assimilation and model tuning against AERONET observations, the large under prediction of coarse-mode dust in the models is compensated for with a relatively small effect on the forecast average extinction coefficient and aerosol optical depth, even with the discrepancies in size distribution and dust concentration. Our findings support a recent study by Adebiyi and Kok (2020), who reported a large underprediction of coarse-mode dust in six climate models and that, for this reason, the global dust burden was underpredicted by a factor of 4. Huneeus et al. (2011) also found that models tend to simulate the climatology of vertically integrated parameters (AOD and AE) much better than total deposition and surface concentration. Hoshyaripour et al. (2019) also highlighted discrepancies between ICON-ART dust predictions and Multiangle Imaging Spectroradiometer (MISR) observations associated with uncertainties in particle size distribution and emission mechanisms.
The overestimation of dust concentration in the finer ECMWF-CAMS bins and the underestimate of coarser dust are issues that the ECMWF are aiming to address in the future. In order to do this an updated dust emission scheme based on Remy et al. (2019) using the Kok et al. (2012) estimates of size distribution at emission would be used. It is expected that this would increase the total dust concentration and shift it to the larger sizes, thus keeping total extinction similar to its present values but more accurately representing the dust size distribution. After these changes have been implemented, a further study like the present one can help quantify the improvement introduced.
We have also investigated the processes driving dust uplift in the models, and our analysis suggests that uncertainties in the large-scale wind and the emitted size distribution are likely causes of differences between observations of the Saharan Air Layer (SAL) and MetUM predictions. The crude representation of the dust size distribution in the MetUM two-bin dust scheme is another important factor. The MetUM operational dust forecast is intended to be used primarily for AOD forecasts and extinction for visibility purposes, and although improvements of the microphysical properties would be desirable, the current implementation is satisfactory to an extent and has the advantage of being computationally cheap. We also note that the dust scheme used in the Met Office climate model differs, using six size bins rather than two, with the six-bin version yet to be evaluated as in this article.
The scheme used to represent dust microphysical properties in models deserves attention as a key element to pursue accurate mineral dust predictions. Simple schemes (such as the two-bin dust size distribution in the operational version of the MetUM) have the obvious advantage of being viable in terms of computing resources required, but, on the other hand, there is the consequence of giving a less accurate representation of the microphysical properties. This could be addressed by increasing the number of variables used to represent the size distribution, for example by using a scheme with two or more modes, each defined by two variables, such as in the GLOMAP-mode aerosol scheme in UKCA (Mulcahy et al., 2020), although the ability of this scheme to represent the coarse and giant modes correctly still needs to be proven. Whatever approach is chosen, it needs to allow coarse and giant particles to be represented, a capability currently missing in many models (Huneeus et al., 2011). It is to be noted that there are plans in place to move to GLOMAP dust within the operational Global MetUM in the near future and also ongoing experimentation with this scheme in the ECMWF IFS within CAMS. Moreover, there are plans to modify the latter scheme by adding a third (super-coarse) mode: these are changes in the right direction.
As the size distribution affects gravitational settling, it indirectly affects the three-dimensional distribution. Additionally, some processes may deserve better attention, as studies suggest that they could increase the lifetime of coarse and giant particles beyond what is predicted for gravitational set-tling: e.g. turbulence within the Saharan Air Layer, particle electrification, and the role of convective systems (Van Der Does et al., 2018). The optimum balance between these processes is still to be understood, as is the correct estimation of emission intensity. The dust observable properties, in terms of the aerosol optical depth, particle sizes, spatial distribution, and vertical distribution, are determined by these processes. The combination of all these properties determines the impact of dust on the climate system, hence the importance of understanding these processes better (see e.g. Kok et al., 2017).
Two more points that need attention are the particle shape and effect of dust on the radiation field, atmospheric heating rates, and thermodynamics as well as the dust transport itself. If dust particles are assumed to be spherical in the dust transport models, many computations are easier; however, it is well known that dust particles are very irregular. The massto-extinction conversion and the drag coefficient calculations (which affect deposition and transport) are directly affected by particle shape. Moreover, dust microphysics and consequent radiative properties such as single-scattering albedo and the asymmetry parameter alter the computations of atmospheric radiation due to dust. In turn, this affects the heating rates of atmospheric layers, atmospheric thermodynamics, convective motions, and wind fields, which result in possible modifications of the dust transport patterns. An improvement of the radiative transfer models within dust models is therefore suggested to integrate the latest understanding of dust microphysics.
As this study highlights the limitations ascribed to using AOD as the main observable quantity towards which to verify, tune, and pull the model, it also supports the perspective of improving the set of aerosol observations that can be used on a global scale. In particular, observational datasets exist for the vertical dust distribution, which can be exploited to better constrain the predictions. The most obvious one is the CALIPSO dataset, which has been observing the global aerosol distribution since 2006 (Winker et al., 2010;Tsamalis et al., 2013), and in the future Earth-CARE is expected to be another very good candidate (Illingworth et al., 2015). Note that this perspective is not limited to using active sensors, and studies exist on the observation of the vertically resolved distribution from passive hyperspectral instruments in the infrared (Callewaert et al., 2019). In the long term, providing observations not only of AOD, but also of the vertical distribution of aerosols, could become the driver for operational space missions.
In addition to vertically resolved information, we also highlight the importance of and need for better-constrained size-resolved properties of dust to reproduce the correct relationship between concentration and the extinction coefficient. Particle size distributions, both in the model representation and in the observations, should cover the whole size spectrum, including the giant mode Ryder et al., 2019). Ideally, these observations should be co-ordinated, vertically resolved, and established across a number of locations downstream from sources, e.g. across the tropical Atlantic. Sporadic observations do exist, and we advocate for a more systematic approach. For instance, a number of balloon-borne sensors are being developed and could be used for this purpose (see e.g. Renard et al., 2016;Fujiwara et al., 2016;Smith et al., 2019;Dagsson-Waldhauserova et al., 2019).
To conclude, we highlight how campaigns focusing on a combination of in situ and remote sensing observations can provide information to simultaneously validate existing model developments and help identify the areas requiring developments. In the last few years, considerable improvements have been made to operational dust forecasts, and with this paper we want to contribute to this effort by (1) indicating a few points that could be addressed in the models and (2) providing a few datasets and a selection of case studies for future model assessments.
Data availability. The FAAM aircraft datasets collected during the ICE-D and AER-D campaigns are available from the British Atmospheric Data Centre, Centre for Environmental Data Analysis, at the following URL: http://catalogue.ceda.ac.uk/ uuid/d7e02c75191a4515a28a208c8a069e70 (last access: 20 January 2018) (Bennett, 2019).
Author contributions. DOS carried out the analysis of the lidar data and interpreted them together with the satellite and model datasets, selected the case studies, wrote the initial draft of the article, and drew the main conclusions. FM proposed and coordinated the AER-D campaign, supervised the analysis of the lidar data, and finalised the paper for submission. FM and CR worked in the AER-D mission science team implementing the airborne sampling strategy for aerosol science objectives. CR analysed the in situ measurements. YP helped with the interpretation of the MODIS data. ZK, BJ, AB, and MB provided the interpretation of the results in terms of model issues and highlighted the potential improvements. MMG, JY, and PS provided the CATS data and their interpretation. All authors read the paper and provided constructive comments.
Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "Dust aerosol measurements, modeling and multidisciplinary effects (AMT/ACP inter-journal SI)". It is not associated with a conference.
Acknowledgements. Airborne data were obtained using the BAe-146-301 Atmospheric Research Aircraft operated by Directflight Ltd and managed by the Facility for Airborne Atmospheric Measurements (FAAM).
The staff of the Met Office, the universities of Leeds, Manchester, and Hertfordshire, FAAM, Direct Flight, Avalon Engineering, and BAE Systems are thanked for their dedication in making the ICE-D and AER-D campaigns a success. Claire Ryder acknowledges NERC support through Independent Research Fellowship NE/M018288/1. The authors thank the principal investigators and their staff for establishing and maintaining the AERONET sites used in this study. The MODIS data in this study were acquired as part of NASA's Earth Science Enterprise. The algorithms were developed by the MODIS Science Teams, and the data were processed by the MODIS Adaptive Processing System (MODAPS) and Goddard Distributed Archive Centre (DAAC); they are archived and distributed by the Goddard DAAC.
Financial support. This research has been supported by the Met Office through the Public Weather Service programme. Claire Ryder acknowledges NERC support through Independent Research Fellowship NE/M018288/1. Review statement. This paper was edited by Nikos Hatzianastassiou and reviewed by three anonymous referees.