Assimilating spaceborne lidar dust extinction improves dust forecasts

Atmospheric mineral dust has a rich tri-dimensional spatial and temporal structure that is poorly constrained in forecasts and analyses when only column-integrated aerosol optical depth (AOD) is assimilated. At present, this is the case of most operational global aerosol assimilation products. Aerosol vertical distributions obtained from space-borne lidars can be assimilated in aerosol models, but questions about the extent of their benefit upon analyses and forecasts along with their consistency with AOD assimilation remain unresolved. Our study thoroughly explores the added value of assimilating space5 borne vertical dust profiles, with and without the joint assimilation of dust optical depth (DOD). We also discuss the consistency in the assimilation of both sources of information and analyse the role of the smaller footprint of the space-borne lidar profiles upon the results. To that end, we have performed data assimilation experiments using dedicated dust observations for a period of two months over Northern Africa, the Middle East and Europe. We assimilate DOD derived from VIIRS/SUOMI-NPP Deep Blue, and for the first time CALIOP-based LIVAS pure-dust extinction coefficient profiles on an aerosol model. The 10 evaluation is performed against independent ground-based DOD derived from AERONET Sun photometers and ground-based lidar dust extinction profiles from field campaigns (CyCARE and Pre-TECT). Jointly assimilating LIVAS and Deep Blue data reduces the root mean square error (RMSE) in the DOD by 39% and in the dust extinction coefficient by 65% compared to a control simulation that excludes assimilation. We show that the assimilation of dust extinction coefficient profiles provides a strong added value to the analyses and forecasts. When only Deep Blue data are assimilated the RMSE in the DOD is reduced 15 further, by 42%. However, when only LIVAS data are assimilated the RMSE in the dust extinction coefficient decreases by 72%, the largest improvement across experiments. We also show that the assimilation of dust extinction profiles yields better skill scores than the assimilation of DOD under equivalent sensor footprint. Our results demonstrate the strong potential of 1 https://doi.org/10.5194/acp-2021-442 Preprint. Discussion started: 9 June 2021 c © Author(s) 2021. CC BY 4.0 License.

spatial resolution of the model grid. More specifically, only grid cells with coefficient of variation less than unity were used in the assimilation, while in Cheng et al. (2019) the corresponding threshold was set equal to 0.5.
In addition, in order to avoid spurious values in the assimilation process (e.g. unrealistic high values of extinction coefficient at 532 nm arising from possible misclassification of clouds as aerosols), we discarded LIVAS dust extinction coefficients 120 larger than 10 −3 m −1 . Errors in POLIPHON pure-dust extinction coefficient profiles are of the order of 15-25% (Ansmann et al., 2019). In consequence and similarly to Cheng et al. (2019), input error statistics for the data assimilation routine were prescribed as the 20% of the value of the dust extinction coefficient. The corresponding uncertainties in the CALIPSO-based pure-dust product are extensively and in-depth analyzed in Marinou et al. (2017). The number of ingested profiles in the assimilation is shown in the middle panel of Fig. 1. An additional filter was applied to ensure that the 60-meter vertical  DDBsubset dataset. N stands for the number of observations. Please note that the center and the right panels have the same colorbar range.

VIIRS Deep Blue dataset 130
The DOD at 550 nm was extracted from the Deep Blue (DB) Level 2 product of the VIIRS instrument onboard the SUOMI-NPP satellite (Sayer et al., 2018;Hsu et al., 2019). The DB product provides total AOD at 550 nm with a global coverage daily.
Along with AOD, the DB product includes a flag with the aerosol type classification of the retrieval (namely dust, smoke, high-altitude smoke, non-smoke fine-dominated, mixed, background and fine-mode dominated), and quality-assurance flags 5 https://doi.org/10.5194/acp-2021-442 Preprint. Discussion started: 9 June 2021 c Author(s) 2021. CC BY 4.0 License. over ocean and land from one (worst quality) to three (best quality). Hsu et al. (2019) highlight the improvements done in the DB retrieval for dust aerosols, as the optical model was updated with non-spherical dust optical properties.
The standard DB product is AOD. We used only pixels classified as "dust" aerosol type and with a quality assurance flag equal to 3 over ocean and greater than or equal to 2 over land. The resulting DOD dataset was then interpolated to the model grid and assigned an uncertainty of 0.2 × DOD + 0.05 following Sayer et al. (2019). Hereafter we use DDB to refer to this filtered dust DB retrieval. The number of DDB retrievals assimilated in this study is shown in the left panel of Fig. 1. We note 140 that DDB is not necessarily a pure-dust AOD and may include contributions of other aerosols types, although dust should be predominant particularly in Northern Africa and the Middle East.
The large swath (30-40 km) of the VIIRS instrument can be a big plus for data assimilation. In contrast, CALIOP has a horizontal footprint of 100 m and a horizontal resolution of 333 m. When comparing the assimilation from both instruments, it is key to understand the role of these differences in spatial coverage. To respond to this fundamental question, we prepared 145 a subset of DDB data, consisting of pixels with DDB retrievals but only where the LIVAS dataset has a valid retrieval in at least one vertical level for every UTC day. This dataset of DOD is called hereafter DDBsubset, and the number of DDBsubset retrievals used in the assimilation is shown in the right panel of Fig. 1. 2.2 Ground-based observations for evaluation 2.2.1 AERONET fraction is calculated based on the known depolarization ratio of pure dust (31%) and the non-dust component (5%). Having the dust-only backscatter coefficient, the dust-only extinction coefficient is determined by the use of the pure dust lidar ratio at 532 nm of 45 sr Mamouri and Ansmann (2016). The non-dust extinction coefficient is calculated similarly depending on the type of non-dust aerosol. Finally, a consistency check is performed by summing up the dust-only and non-dust extinction 170 profiles and comparing them to the total measured extinction coefficient. More details can be found in Urbanneck (2018).
In contrast to the estimated DOD from AERONET and DB AOD retrievals, which may be affected by other aerosols, the PollyNET measurements can provide pure-dust retrievals (Tesche et al., 2009;Mamouri and Ansmann, 2017). For this reason, we use here the lidar observations from CyCARE and Pre-TECT campaigns in the evaluation of our results.

175
To simulate the dust cycle, we used the Multiscale Online Nonhydrostatic AtmospheRe CHemistry (MONARCH) model (Pérez et al., 2011;Haustein et al., 2012;Jorba et al., 2012;Badia et al., 2017;Di Tomaso et al., 2017;Klose et al., 2021). MONARCH is a fully online integrated system for meso-to global-scale applications developed at the Barcelona Supercomputing Center (BSC). It uses the Nonhydrostatic Multiscale Model on the B-grid (NMMB, Janjic and Gall, 2012) as the meteorological driver and couples gas-phase and mass-based aerosol modules to describe the life cycle of atmospheric components. It uses the  Guerschman et al. (2015) (2001) as the scaling function. To account for roughness elements on the land-surface, such as vegetation, rocks, or pebbles, configurations I and III used the drag partition parameterization from Marticorena and Bergametti (1995) with corrections from King et al. (2005) in combination with a monthly climatology of MODIS-derived leaf-area index (Myeni et al., 2015) and aerodynamic roughness length data for arid regions from Prigent et al. (2012). In contrast, configurations II and IV utilized the drag partition parameterization from Raupach et al. (1993) together with a monthly climatology of photosynthetic and 205 non-photosynthetic vegetation cover data from Guerschman et al. (2015) (Klose et al., 2021). The drag partition corrections were applied to the threshold friction velocity for particle entrainment.

Data assimilation
MONARCH is coupled to a Local Ensemble Transform Kalman Filter (LETKF, Hunt et al., 2007). The LETKF implementation used in this study was built upon the implementation from Miyoshi and Yamane (2007), Schutgens et al. (2010), and Di Tomaso A key ingredient in the data assimilation algorithm is the representation of model uncertainty, which in an ensemble-based 225 scheme, like ours, had been derived from the model ensemble. We have generated the MONARCH ensemble by perturbing the observational errors were uncorrelated, i.e., the observational error covariance matrix was a diagonal matrix.

Experiment description and evaluation
We describe in Section 2.1 the three datasets used in the assimilation: DDB, DDBsubset and LIVAS. Using a fixed configuration of the model and the data assimilation scheme parameters (Sections 2.3 and 2.4), we designed and ran five experiments assimilating combinations of the three datasets.
We ran our data assimilation experiments over a regional domain centred at 20°E in longitude and 30°N in latitude, which 240 covers North Africa, the Middle East and Europe (e.g. Fig. 2). The model was set up with a rotated latitude-longitude grid with 0.66°resolution at the center of the grid, 40 vertical sigma layers, and hourly output of dust concentrations for the 8 size bins. The dust extinction coefficient and DOD were computed with software provided by Gasteiger and Wiegner (2018). We have assumed spheroidal dust particles with the axis ratio distribution shown in Table 2 of Koepke et al. (2015) and the OPAC refractive index for dust (e.g. 1.53 + 0.0055i for 550 nm) as in Koepke et al. (2015). 245 We performed the five data assimilation experiments between March and April 2017. A 14-month spinup was run without assimilation to properly initialise the soil moisture content. We also ran a control experiment over the period of study, consisting of an ensemble forecast without data assimilation. For each of the five data assimilation experiments (eLIVAS, eDDB, eDDBsubset, eLIVAS+DDBsubset and eLIVAS+DDB), we obtained two types of simulation outputs: analyses and forecasts.
We produced ensemble forecasts with a forecast length of 24 hours, initialised with the last timestep of the analyses of 250 the day before (at 0 UTC in our 24-hour assimilation window). Forecasts and observations along with their prescribed error were the input for the data assimilation scheme, which computed the 4D mass concentration dust field analyses within the assimilation window. Therefore, for a given day, forecasts can carry the observational information assimilated from the days before, but analyses can, in addition, carry the observational information of that given day. In contrast, the control experiment omits the assimilation of dust information.
where m and r are the average of the model and observations, respectively. We also included the mean over the number of where P i is the cumulative distribution function of the ensemble, which is approximated empirically by the M ensemble members, and P ri the cumulative distribution function of the observation r i , computed as P ri ( Heaviside step function.

Results and discussion
We first discuss the internal consistency of the data assimilation system in Section 3.1 by comparing analyses and forecasts with the assimilated data. We then present the evaluation against ground-based measurements from Sun photometers and lidars in Section 3.2.

Consistency and cross-comparison checks with satellite products
We cross-compared the model simulations (control, forecasts and analyses) with the two main assimilated observational datasets. Consistency can be checked when analyses are compared with an observational dataset used for the assimilation (e.g., when DDB DOD are compared to analyses from the eDDB or eLIVAS+DDB experiments). This verification step provides a sanity check for the data assimilation process. When the datasets are not assimilated (e.g., when DDB DOD are compared to analyses from the eLIVAS experiment), the comparison is then performed with independent satellite observations. Forecasts are initialised from analyses, thus forecast scores (i.e. error metrics calculated for the forecast fields) can also be considered, up to a certain degree, as an evaluation of the forecast quality, even though the reference observations and the forecast cannot be assumed to be completely independent in this case. In contrast to the control run, which overestimates DOD, the analyses and forecasts are in better agreement with DDB both in terms of overall DOD values and spatial distribution. In this experiment, DDB DOD were not assimilated, thus the qualitative improvement of the analysis compared to the control run indicates that, despite the relatively low spatial coverage    As introduced in Section 2, we used lidar retrievals of pure-dust profiles for the evaluation of the experiments. The evaluation 385 was conducted for the dust event above the eastern Mediterranean between 19 to 24 April 2017, whose extent and dynamics can be observed in the right column of Fig. 2.
We compared our five experiments with the dust extinction coefficient provided by these lidars. Figure 8 shows the comparison between the lidar measurement in the three sites, the control run, the forecasts and analyses from three experiments (eLIVAS, eDDB, eLIVAS+DDB), and the AOD from AERONET sites close to the lidar instruments, without filtering by   The overall quantitative evaluation is shown in Fig. 9. In general terms, both bias scores are smaller for eLIVAS, eLI-400 VAS+DBBsubset and eLIVAS+DDB than for eDDB and eDDBsubset. The correlation coefficient is weakly affected by the assimilation in all experiments and the MFE is slightly smaller for experiments where LIVAS were assimilated. RMSE and CRPS behave similarly, with improvement for all the experiments compared to the control run, particularly for those where LIVAS was were assimilated. We also note that in contrast to the evaluation against AERONET DOD, eLIVAS shows better MFE, RMSE and CRPS scores than eDDBsubset, indicating that the assimilation of pure-dust vertically-resolved observations 405 can provide a better vertical representation of the dust concentrations than those with only DOD. We have computed the evaluation scores also for each of the available profiles (Fig. 8), which are summarised in Fig. 10. The assimilation performance was split in two groups. The first group is characterised by low values of measured dust extinction coefficient (non-shaded columns in Fig. 10) where the dust plume cannot be easily identified in Fig. 8. profiles corresponds to high dust extinction coefficients (green-shaded columns in Fig. 10). For these profiles, it is possible to visually identify in Fig. 8 the altitude and shape of the dust plume.
In this Section we show two different approaches for computing the scores of a group of vertical profiles. The first approach is to compute the scores by concatenating all the pairs of observed and simulated extinction coefficients, without distinguishing among profiles on the computation. This has been done in Fig. 9. A second approach is to compute the scores for each observed profile, and then to compute estimators of the statistics of these scores. The last three columns of Fig. 10 follow this approach, 415 showing the average of the scores computed for each profile. In Fig. 10, the mean all column shows the average of the scores of all the profiles; the mean high column shows the average of the scores of the profiles with strong dust extinction coefficients (i.e., green-shaded columns of this figure); and the mean low column shows the average of the scores of the profiles with small values of measured dust extinction coefficients (non-shaded columns in Fig. 10). These last three columns in Fig. 10 are also shown in Table 2. We note that this methodological difference between both approaches to compute the scores can have 420 noticeable impacts. For example, the mean all values of the correlation coefficient and RMSE of Table 2 differ from those in Fig. 9.
For the group of profiles with small values of dust extinction coefficient (mean low in Fig. 10) the absolute scores (Mean Bias, RMSE and CRPS) are small because simulated and observed values are also small, but do not improve with assimilation.
Similarly, the normalized scores (MFE and MFB) and the correlation coefficient do not improve. The group of profiles with 425 high dust extinction coefficients (green-shaded columns in Fig. 10) generally show better normalised scores than the group with low dust values. With the exception of the correlation coefficient, all the scores in this group improved after assimilation.
In this mean high group the Mean Bias drastically decreased when LIVAS were assimilated (eLIVAS, eLIVAS+DDBsubset and eLIVAS+DDB). Similarly, MFB, MFE, RMSE and CRPS improved with LIVAS assimilation. The correlation coefficient does not improve with assimilation, but the degradation of this score in all experiments with respect to the control run remains 430 below 5% of the control run value. Overall, when dust extinction is large, all analyses improved in all scores with the exception of the correlation coefficient. While improvements are enhanced when LIVAS is assimilated, they are still non-negligible for eDDB and eDDBsubset.
All in all, despite the sparse spatial coverage of LIVAS compared to DDB, this evaluation shows that dust extinction profiles are best constrained in experiments where LIVAS is assimilated.

Consistency between column and profile assimilation and the role of a narrow satellite footprint
In previous sections we show that experiments assimilating the full DDB dataset (eDDB) yield better scores than those assimilating the DDB subset (eDDBsubset), underlining the importance of the horizontal coverage of the observations in our assimilation. We also show that eLIVAS+DDB produces better scores than eLIVAS when compared to AERONET DOD, and also that eLIVAS+DDB produces better scores than eDDB when compared to PollyNET. However, we obtained mixed results   that a direct comparison among experiment analyses can further help elucidating the impact of assimilating vertically-resolved dust observations, along with the impact of the narrow (or wide) field of view of the measurements in our analyses.
Although DDBsubset was designed to have a similar horizontal and temporal coverage than LIVAS, a direct comparison 445 between the eDDBsubset and eLIVAS experiments should also take into account that (i) LIVAS provides direct observational information in the vertical coordinate, while DDBsubset does not; (ii) the vertical influence of LIVAS information is only partial if the column is not complete, in contrast to the DDB DOD that is propagated to the whole column; (iii) DDB only provides data during the afternoon overpass (about 13:30 LT), while LIVAS provides data during afternoon and night overpasses. Nighttime profiles have better quality, and given the assimilation cycle design and the temporal localisation applied, they should influence 450 the 0 UTC analyses more than daily afternoon, along with the forecast and subsequent analyses.
It is possible to compare the experiments by inspecting the histograms of differences between the analyses and the control run. We have computed these differences for DOD in Fig. 11 and for dust extinction coefficient in Fig. 12. Figure 11 shows bi-dimensional histograms of the DOD differences for the five experiments. The 1:1 line indicates that respective analyses produce the same differences with the control run, i.e. they are equal. Points in quadrants I and III indicate that both experiments 455 increase and decrease, respectively, the DOD values at the same locations and times, which is as a sign of consistency. It can be seen that the (eDDB, eLIVAS+DDB) panel shows less deviation with respect to the 1:1 line than the (eLIVAS, eLIVAS+DDB) ground-based lidar pure-dust extinction coefficient measurements performed during the CyCARE and Pre-TECT experimental campaigns in the Mediterranean. The assimilation of LIVAS leads to a better representation of the dust extinction coefficient profiles than the assimilation of DDB alone. Jointly assimilating DDB and LIVAS yields the second-best scores for both the DOD and the dust extinction coefficient profile, which proves their suitability for dust forecast applications.
We have also focused on the limitations of the narrow footprint of LIVAS compared with the large swath of DDB, which 495 reduces the observational influence on the analyses. Yet, the impact of the vertically-resolved information provided by LIVAS is significant, and with a similar coverage it produces even a larger impact on the analyses than the assimilation of DOD.
Our findings strongly support the conclusions of Cheng et al. (2019) in that the assimilation of aerosol profiles can improve their vertical representation in models. We additionally show that the vertical profiles of dust extinction coefficient can be constrained by assimilating the LIVAS product. We are aware of the limitations of this study due to the limited availability of 500 ground-based PollyXT lidar measurements. We are looking forward to the publication of ground-based pure-dust lidar datasets from EARLINET and MPLNET (version 3), that would be very useful for a long term assimilation and evaluation of simulated dust extinction profiles from model forecasts and analyses. Our work shows the value of space-borne polarization lidars for improving desert dust forecasts and analyses. As such, future satellite missions with combined extinction and depolarization capability, such as EarthCARE, are expected not only to further contribute to desert dust research, but also to operational 505 forecasts if real-time products are made available.

Appendix B: Regions for LIVAS collocation
We present in Fig. B1 the definition of regions used in Fig. 5.