Automated time-height-resolved airmass source attribution for proﬁling remote sensing applications

. Height resolved airmass source attribution is crucial for the evaluation of proﬁling ground-based remote sensing observations. This work presents an approach how backward trajectories or particle positions from a dispersion model can be combined with geographical information (a land cover classiﬁcation and manually deﬁned areas) to obtain a continuous and vertically resolved estimate of airmass source above a certain location. Ideally, such an estimate depends on as few as possible a-priori information and auxiliary data. An automated framework for the computation of such an airmass source is presented 5 and two exemplary applications are described. Firstly, the airmass source information is used for the interpretation of airmass sources for three case studies with lidar observations from Limassol (Cyprus), Punta Arenas (Chile) and ship-borne off Cabo Verde. Secondly, airmass source statistics are calculated for two 8-week campaigns to assess potential observation biases of lidar-based aerosol statistics.

The transport pathway of an airmass arriving over the site can be computed either using mean wind trajectories or a particle 70 dispersion model. Both approaches can be used with the proposed method. Mean wind trajectories for the past 10 days are calculated using HYSPLIT (Stein et al., 2015). To account for variability, ensemble trajectories consisting of 27 members, spaced 0.3 • horizontally and 220 m vertically around the end point, are used ( Fig. 1 a). Meteorological input data for HYSPLIT are taken from the GDAS1 dataset (1 • horizontal resolution) provided by the Air Resources Laboratory (ARL) of the U.S. MODIS (Broxton et al., 2014) National Weather Service's National Centers for Environmental Prediction (ARL Archive). The location of the air parcel is 75 stored in steps of 1 hour. A more realistic representation of turbulence and mixing can be achieved using a LPDM, which simulates the pathway of hundreds to thousands of particles. Here the recent version of FLEXPART (Stohl et al., 2005;Pisso et al., 2019) is used. Meteorological data is obtained from the GFS analysis at a horizontal resolution of 1 • (NOAA, 2000). 5000 particles are used with the particle positions being stored every 3 hours. These simulations are run every 3 hours with height steps of 500 m for the whole period of interest.

80
In this work, surface is classified by two methods: (1) a simplified version of the MODIS land cover classification (Friedl et al., 2002;Broxton et al., 2014). The 17 categories of the original dataset are grouped to 7 categories according to Tab. 1 in order to allow for purpose-serving statistics in the output. Additionally, the horizontal resolution is reduced to 0.1 • . The categories do not resolve the annual cycles, for example due to growing seasons.
(2) customly defined areas as polygons, named according to their geographical context.  The residence times at each time and height step are summed for each land cover class or polygon, where the backtrajectory or particle was below the reception height. Within this study the widely applicable reception height threshold of 2 km is used (Val Martin et al., 2018). Different settings can be easily applied to study events which are entrained at greater heights, such as wildfire smoke emission or volcanic eruptions. The vertical airmass transport during such events is usually not accurately covered by atmospheric models. Setting the reception height to the maximum emission height of such events (as can be 90 estimated, e.g., from satellite observations) can bypass the uncertainties in the modeled vertical motion. The residence times for each category and each height can then be visualized as a profile ( Fig. 1 b). Where the residence time is 0, no air parcels were observed below the reception height during the duration of the backward simulation. In the example shown in Fig. 1 (b) above 5 km height, no airmasses resided at heights below 2 km above ground in the prior 10 days. The theoretical maximum residence time in hours depends on the number of trajectories or particles n, the duration of backward calculation d in days 95 5 https://doi.org/10.5194/acp-2020-955 Preprint. Discussion started: 12 October 2020 c Author(s) 2020. CC BY 4.0 License. and the interval of output ∆o in hours: To illustrate the temporal evolution, successive airmass source profiles can be shown one after each other. This visualization condenses the 4D history of a multitude of trajectories (or thousands of particle positions) to a quickly understandable summary, which closely resembles the time-height cross section as usually obtained from vertically or nadir pointed active ground-based 100 remote sensing observations (e.g., Fig. 4).

Polly XT lidar observations
The airmass source estimate is used to interpret observations conducted with the Polly XT lidar (Engelmann et al., 2016).
Polly XT is equipped with backscatter-channels at 1064, 532 and 355 nm as well Raman-and depolarization-channels at the shorter two wavelengths. The optical properties are derived using the automated PollyNET retrieval (Baars et al., 2016(Baars et al., , 2017105 Yin and Baars, 2020) and manual analysis of single profiles. One product of this retrieval is the quasi backscatter coefficient, where the attenuated backscatter is corrected for molecular extinction. Details are covered in Baars et al. (2017).
Polly XT was deployed to various field campaigns and longer term measurements during the last 15 years (Baars et al., 2016).
A broad variety of meteorological conditions and aerosol regimes was covered. The multi-wavelength observations of Polly XT contain unique fingerprints of the observed aerosol types from different source regions (Illingworth et al., 2015).

110
In the following sections 4 and 5, the airmass source attribution will be applied to selected case studies and measurement campaigns, in order to demonstrate its applicability for determination of the airmass source regions and for the estimate of potential observation biases.  height. However, the airmass transport is correctly covered by both estimates. Interestingly, the airmass source estimation for this case provides some added value information with respect to the lidar observations. As both HYSPLIT and FLEXPART approaches indicate, North-American air masses were present in the upper troposphere during the time of the observation, which however had too low aerosol load for being detectable by the Polly XT lidar.

150
On the 14 September 2017 an upper-level short-wave trough moved eastward from the Aegean Sea towards Cyprus. Above 1 km height, the wind turned from South-West to South during the course of the day with velocities ranging from 5 − 15 m s −1 , whereas below, wind velocity was lower and direction more variable.
The time-height cross-section of quasi backscatter observed by Polly XT at Limassol shows two pronounced aerosol layers above the boundary layer (Fig. 6)). The first layer was observed between 1 and 2 km height from 0 to 9 UTC and a second, 155 thicker layer after 3 UTC. Until the night, this layer increases in thickness from bases at 3 and tops at 4.5 km height to bases at 1.2 and tops at 6.5 km height. The boundary layer itself is also laden with aerosols and shows significant backscatter below 1 km height.   The optical parameters were analyzed for one period in the morning between 02:59 and 04:02 UTC and one in the evening between 21:41 and 22:39 UTC (periods marked in Fig. 6 with horizontal orange bars). The profiles from the morning period 160 (Fig. 7) show for the lower layer at 1.8 km height particle depolarization ratios of 0.25, low Ångström values and lidar ratios 8 https://doi.org/10.5194/acp-2020-955 Preprint. Discussion started: 12 October 2020 c Author(s) 2020. CC BY 4.0 License. During the evening (Fig. 8), the upper layer extended from 1.3 to 6 km height and shows homogeneous and mostly wavelengthindependent optical properties throughout it's depth. Particle depolarization ratios were between 0.10 and 0.15, with 532 nm values slightly higher than at 355 nm. Lidar ratios in that layer were 35 sr, typical for middle east dust (Mamouri et al., 2013;Nisantzi et al., 2015) and a mixture of mineral dust and anthropogenic pollution (e.g. Tesche et al., 2009).

170
The airmass source estimate (Fig. 9) identifies transport from barren-ground-influenced air from the 'Sahara' until 9 UTC.
Later, corresponding to the change in wind direction, the source for the air aloft is identified as 'Arabian Peninsula', but still the barren class. Below 1 km height, a mixture of surfaces was observed, originating mostly form 'Europe'. Comparing the source estimate based on HYSPLIT (Fig. 9 a, c) with the one from FLEXPART (Fig. 9 b, d), both models agree qualitatively well again. While the general transition was captured by the source estimate, the leading edge of the 'Arabian Peninsula' plume was 175 observed over Limassol earlier than indicated. The increase in thickness of this plume is represented in the source estimate as well.

Biomass burning aerosol at Punta Arenas, Chile
Punta Arenas is located in a region where the atmosphere is known to be clean and one of the least affected by anthropogenic influences (Hamilton et al., 2014). Nevertheless, events of aerosol long-range transport also occur occasionally (Foth et al.,   Within that flow long-range transport from across the Pacific Ocean occurred.

185
In the Polly XT observations from 20 May 2019 a layer of increased backscatter is present from 2 UTC to roughly 10 UTC.
The airmass soruce estimate is also able to capture this faint aerosol layer. Fig. 12 shows, that airmasses form 'Australia' were present between 3 and 9 UTC from 3 to 6 km height. In terms of land cover class these airmasses were characterized by savanna/shrubland and grass. Apart from the described period, the airmasses were solely influenced by the Southern Ocean (i.e. the water class). FLEXPART simulations (Fig. 12 b, d) agree with the HYSPLIT results, however the computed temporal 195 extend and the residence times are slightly longer for the latter. Hence, the airmass source scheme is also capable of capturing aerosol transport at hemispheric (i.e. more than 10000 km) scales.

Assessing potential observation biases
Vertically resolved aerosol statistics are prone to observations biases, as they usually depend on cloud-free conditions. When clouds or precipitation are present, no aerosol properties can be obtained from optical techniques. However, respective statistics,   from suitable (cloud-free) measurement periods are representative for the full observational period. Chances are given that cloudy conditions are related to certain air masses which would stay unidentified in the lidar-based statistics of aerosol optical properties. One way to assess this bias is to compare the airmass residence time statistics of the full observational period with 205 the one subsampled to the times when aerosol information is available.
Applied to lidar data, the automatically analyzed profiles of particle backscatter at 532 nm from Baars et al. (2016) are used.
In their work, the raw profiles are grouped into 30-minute chunks, are cloud screened, averaged and analyzed by either the Klett or the Raman method, if signal-to-noise ratio is high enough and a reference height could be set. All profiles that pass a