On the spatial variability of the regional aerosol distribution as determined from ceilometers

Measurements of the vertical distribution of aerosol particles are typically only available at selected sites leaving the question of their representativeness for urban and regional scales unanswered. As a contribution to solve this problem we have investigated ceilometer signals from two testbeds in Munich and Berlin, Germany. For each testbed measurements of 24 months from 6 ceilometers were available. This constitutes a unique data set, in particular as the same type of instruments are deployed and the same data evaluation schemes applied. Two parameters are discussed: the mixing layer height (MLH) 5 as an indicator for the vertical distribution and the integrated backscatter as a proxy for the amount of aerosols in the mixing layer. The MLH was determined by the COBOLT algorithm, the integrated backscatter from the Klett (backward and forward) inversion scheme. It was found that the mean difference of the MLH at two sites within a testbed typically only varies by less than 50 m, slightly increasing with the distance of the corresponding sites. Almost 60 % of all intercomparisons agree within ± 100 m. MLHs are typically correlated with R> 0.9 in particular for the Berlin-testbed. With respect to the integrated 10 backscatter the correlation is in the range of 0.7 <R< 0.9. This is expected from the diversity of local aerosol sources within a given testbed. We conclude from our data that the MLH determined from a single ceilometer is applicable for a whole metropolitan area. However, the integrated backscatter of particles within the mixing layer exhibits a variability of 15–25 % suggesting that one ceilometer is not representative, especially if atmospheric processes shall be investigated.

assumption is e.g. applied when AERONET data of the aerosol optical depth are used to validate satellite retrievals (Filonchyk et al., 2019) or numerical models (Balzarini et al., 2015;Curci et al., 2014;Palacios-Pena et al., 2018), when ground based lidar measurements from EARLINET are used to validate Calipso data (Mona et al., 2009), or when for air quality studies the vertical distribution of aerosols measured at one site inside a city is assumed to be valid for the whole municipality (Geiß et al., 2017). In this paper we want to contribute to the topic of the spatial variability of the vertical distribution of aerosol 5 particles. To describe the distribution by a single parameter often the mixing layer height (MLH or z mlh ) is selected -it allows to separate the free troposphere (almost aerosol free unless long range transported plumes are advected) from layers in contact with the surface with high concentrations of particles. MLH is widely used though it is neither precisely defined nor a prognostic variable of numerical models.
Measurements of the vertical distribution of aerosol particles are typically based on lidar remote sensing. Such measurements 10 provide the particle backscatter coefficient β p or the extinction coefficient as a function of height. In the case of so-called ALCs (automated low power lidars and ceilometers) quantitative results are confined to the backscatter coefficient (Wiegner and Geiß, 2012). Procedures for the determination of the MLH have been developed, extensively tested and intercompared (e.g., Eresmaa et al., 2006;Haeffelin et al., 2011;Geiß et al., 2017;de Bruine et al., 2017;Poltera et al., 2017;Kotthaus and Grimmond, 2018a). Most of them are based on backscatter coefficients, attenuated backscatter or range corrected signals. Statistics of 15 diurnal and annual cycles has been published for selected sites, e.g. Vancouver (Van der Kamp et al., 2010), Vienna (Lotteraner and Piringer, 2016), Beijing (Tang et al., 2016), London (Kotthaus and Grimmond, 2018b), Leipzig (Baars et al., 2008), Dehli (Murthy et al., 2020), or Warsaw (Wang et al., 2020). The observation periods range from 3 months (Dehli) to 10 years (Warsaw).
Our main motivation for this study was our previous work (Geiß et al., 2017) when in the framework of an air quality  Table 1. The distance between all sites is given in Table 2; © Google Earth Pro, 2020.  Table 1. The distance between all sites is given in Table 3; © Google Earth Pro, 2020.
4 https://doi.org/10.5194/acp-2020-332 Preprint. Discussion started: 5 May 2020 c Author(s) 2020. CC BY 4.0 License. Table 1. List of the ceilometer sites of testbed M (upper block) and testbed B (lower block). The geographical coordinates are given in degrees. Altitudes of the ground at the ceilometer's location and the altitude of the measurement platform above the ground (z0) are given in meters.  to Munich international airport) at a rural site. The following two sites are mainly included to investigate potential differences on a regional scale. Augsburg is the third largest city of Bavaria (295,000 inhabitants) and about 50 km west of Munich. The ceilometer M-AU is set up at the local airport approximately 7 km north of the city center. In a somewhat larger distance but 5 east of Munich the village of Mühldorf (20,000 inhabitants) is located with the ceilometer M-MD at a small airport 4 km north of the town.

ID
The corresponding overview of the sites in Berlin (3.5 million inhabitants, 890 km 2 , extension 38 km north-south and 45 km east-west) is given in the lower part of Table 1 and illustrated in Fig. 2. Three ceilometers are installed on platforms as indicated  (Scherer et al., 2019b). B-TU is located in the center of the city whereas B-FU is on the edge of the center slightly outside the Ringbahn (ring of suburban trains in 5 Berlin) on a small hill with an altitude of 20 m relative to its vicinity. B-GR is located in the Grunewald-forest and, similar to B-AH, far outside the Ringbahn. The distances between the sites are given in Table 3. B-PO is operated by the DWD and located south-west of Berlin on a small hill (Telegrafenberg, 94 m) close to the city of Potsdam (180,000 inhabitants). The ceilometer B-LI is located at the Meteorological Observatory of the DWD in Lindenberg (only 850 inhabitants), a rural site approximately 50 km southeast of Berlin. These two stations are included to have areas "outside" Berlin, with B-PO certainly 10 more influenced by the city than Lindenberg.
It is clear that the distribution of sites within each testbed is "as similar as possible" but not identical: Berlin is much more extended than Munich, there is not such a limited city center as in Munich, and the terrain is quite flat.
For both testbeds data from 1 September 2017 to 31 August 2019 are available (24 months). Measurements gaps are short (a few hours or less) and rare, the only significant gaps were for the M-AU ceilometer from 11 September 2017 to 20. November between sunset and the following 1.5 hours. By varying the individual parameters of the three functions, up to 16 combined functions are calculated and the best in terms of missing jumps of the retrieved z cob and the strength of the gradients is selected as the final solution. A lower threshold z min , depends on the overlap characteristics of the ceilometer. If the mixing layer is topped by clouds COBOLT is assuming z cob as the height of the cloud base. Such retrievals are flagged (cloud-flag). In case 5 of rain no z cob is retrieved according to the procedure described above but z cob is interpolated between the last MLH-retrieval before and the first retrieval after the rain event. These retrievals are also flagged (rain-flag).
As a by-product the growth rate of the (convective) mixing layer between 2 hours after sunrise and noon is determined. For this purpose the MLH is smoothed in time and the growth rate is defined as the change of the MLH between the two inflexion points. We use the growth rate to check the reliability of the COBOLT-retrieval: Cases with negative values are interpreted as 10 indication that a very inhomogeneous and rapidly changing aerosol stratification was present that prevents a reliable retrieval.
As a consequence, these days were excluded from further evaluation, such conditions are however very rare (less than 1%).
The overlap correction function can be an issue if it creates "artificial layers". They might erroneously be interpreted as z cob .
Thus, a careful check for artefacts is mandatory. To avoid such artificial layers we have determined a correction of the original overlap correction function, in case of B-LI even as time dependent function. This was done by searching for meteorological 15 situations when vertical mixing is expected to result in a homogeneous distribution of particles within a range reaching or exceeding the nominal range of complete overlap. Such conditions can be found around noon. The corresponding adaptations of the profiles are based on the slope-method as described by Hervo et al. (2016) and were accepted as correction functions provided that no artificial layers showed up when applied to different days. Using this procedure we found that the signals are trustworthy above 220 m and can be used by COBOLT. To facilitate the interpretation of z cob at different sites we choose the 20 z min = 220 m for all instruments; so also biases in the retrieved MLH are avoided.
As the ceilometer output is not synchronized in time and temporal gaps might occur a fixed grid with a temporal resolution of 2 minutes is used for all intercomparisons of the MLH retrievals.

Results
For both testbeds data from two years are available. To investigate the horizontal variability over different distances we first 25 compare co-incident MLHs from different sites (Section 3.2) and determine diurnal cycles of the MLH (Section 3.3). Finally, a quantitative optical property of the aerosols, the integral of the particle backscatter coefficient, is discussed (Section 3.4).
At first, however, we briefly outline the filtering of cloud-contaminated cases. It should be emphasized that we use the same instruments at all sites (CHM15k), the same methodology to retrieve the MLH (COBOLT), and make the same atmospheric assumptions (e.g. lidar ratio) to avoid biases in our intercomparison.

Cloud filtering
As we are focusing on aerosol distributions, cloudfree conditions are preferential. Under the prevailing climatic conditions in Germany this very strict requirement would however lead to a quite limited set of days that can contribute to our study. For 7 https://doi.org/10.5194/acp-2020-332 Preprint. Discussion started: 5 May 2020 c Author(s) 2020. CC BY 4.0 License.
the description of cloudiness we use "cloud occurrence" C, i.e. the fraction of time when a cloud was detected in the field of view of a ceilometer, and "cloud cover" synonymously. If not otherwise stated cloud cover is determined for 24 hours a day.
The cloud cover of all clouds regardless of their altitude is denoted as C all . In contrast, we use C low and C mid if only clouds 5 below 2 km and 4 km, respectively, are considered. That means that for example C low can be zero even if cirrus clouds exist.
As the criterion for the presence of a cloud we choose cbh(1) = −1 (lowermost cloud base height) with cbh provided by the proprietary software of the CHM15k-instruments. For testbed M it was found that the monthly mean C all at different sites agrees within a few percent with very few exceptions. The annual mean is between 41 % (M-AU) and 47 % (M-MD, B-PO).
The lowest cloud cover occurs in summer (predominantly between 30 % and 40 %), whereas between December and January 10 C mid > 60 % leads to unfavorable conditions for the MLH-retrieval.
Consequently our selection criteria were somewhat relaxed: optically thin high altitude clouds well separated from the mixing layer, e.g. cirrus clouds, do not prohibit MLH retrievals by means of COBOLT and thus can be included in the investigation.
The same is true in the case of short showers and broken cloud fields at the top of the mixing layer as long as they are not dominating the aerosols. As these are "soft" criteria we have investigated different options with respect to the duration of showers, 15 the thresholds C trsh of cloud covers C low and C mid , and the treatment of flagged retrievals in the following sections. Only days with fog or long-lasting rain were always excluded. Such days would be useless anyway because of the strong attenuation of the ceilometer signals.
The influence of different selection criteria with respect to cloudiness may briefly be demonstrated by comparing the number of suitable days per month for testbed M: when the quite strict condition "C low < 0.2 at all sites and during 24 hours" has to be 20 fulfilled for consideration, no days are found in December and January, and only 6 or 7 days per month on average for the rest of the year. This number of days is for example not sufficient to determine diurnal or annual cycles. With a relaxed criterion "C low < 0.2 only at the selected site and during the afternoon period" the number of suitable days within two years is in the order of 20, again with lower numbers, i.e. 2−7, for December and January. Note, that the application of the ideal condition "no clouds at all sites for the whole day" is unrealistic as it results in only four days out of two years. 25 The selection criteria must not necessarily be based on the exclusion of days as describe above, it can also be based on the MLH-retrievals available with the 2-minutes resolution provided by COBOLT. By means of the cloud-flag MLH-retrievals can be identified that are not perfect in the above mentioned sense, on the other hand we had the option to investigate how the consideration or non-consideration of cloud formation at the top of the mixing layer influence the statistics of the MLH. Rainflagged retrievals are however always excluded, because the time and duration of rain events are likely different at different 30 sites; moreover, an intercomparison is meaningless as the MLH is only inferred from interpolation (see above).
As "reference configuration" for the subsequent evaluation we define the following setup of selection criteria: we only consider days with C low < 0.2 at the corresponding site to ensure -at least locally -fair weather conditions. Rain-flagged retrievals are omitted, cloud-flagged are considered. Furthermore, days with negative growth rates of the convective MLH are excluded.

Comparison of individual MLH-retrievals
For the intercomparison of the MLH at the different sites we consider 24 hours of measurements per day. Note, that the height of the measurement platform z 0 (see Table 1) is added to the height derived by COBOLT. To facilitate the reading the following generic convention is introduced for two sites i and j: ∆ i,j is the difference "MLH at site i minus MLH at site j". For the sake 5 of brevity we use for i and j the ID as given in Table 1 without the name of the testbed.

Testbed M
Two parameters describing the differences of the mixing layer at all sites of testbed M are summarized in Table 4: above the diagonal the mean MLH-difference ∆ i,j (in km) is shown, below the diagonal the fraction of cases with |∆ i,j | < 100 m, henceforward referred to as F 100m , in percent. The reference configuration is considered. It is found that for all combinations  10 https://doi.org/10.5194/acp-2020-332 Preprint. Discussion started: 5 May 2020 c Author(s) 2020. CC BY 4.0 License.  It is obvious that outliers with |∆ i,j | > 1.01 km, data points beyond the dashed lines, are rare. The linear regression line (red solid line) is close to the 1:1-line (black) revealing, that the same/similar MLH at the two sites can be found for both narrow and deep mixing layers; MLHs larger than 2 km are however not very frequent for testbed M. Most MLHs are below 15 500 m (red color), because nighttime measurements are included. Moreover, during winter narrow mixing layers are dominating. Further parameters derived from the scatter plots are compiled in Table 5. The values a 0 and a 1 of each regression line z mlh,i = a 0 +a 1 z mlh,j with i and j being the column and row, respectively, can be found below the diagonal, whereas Pearson's correlation coefficient R is shown above the diagonal. In 11 out of 15 cases a 0 is less than ±50 m, with the largest, but still small value for the comparison of the two remote sites M-AU and M-MD. The slope is a 1 ≈ 1, again with the biggest deviation 20 for the two remote sites, suggesting that extended mixing layers tend to be wider in Augsburg. The correlation coefficient does not vary much within testbed M. It is R 0.92 for the inner sites of the testbed -only for intercomparisons with Mühldorf it is slightly lower -but still R = 0.88 or larger. Table 6. Settings of the selection criteria with respect to the rain-flag, filtering of unrealistic growth rates and cloud cover for 7 selected configurations # rain-flag growth rate cloud criterion comment To investigate the robustness of these findings and to estimate their uncertainty we have re-calculated the intercomparisons with different configurations, i.e. modified selection criteria. If we do not exclude MLH retrievals with the rain-flag, or if we relax the condition with respect to C low by changing C trsh from 0.2 to 0.3, or if we do not use the growth rate for filtering, the number of cases for the intercomparison increases. If we choose a lower threshold for C low , e.g. C trsh = 0.1 or C trsh = 0, or if 5 the cloud cover-criterion is changed from C low < C trsh to C mid < C trsh , i.e., not only cases with clouds below 2 km but also cases with clouds between 2 km and 4 km are excluded, the number of cases decreases. Explicitly we have investigated several alternative configurations as defined in Table 6: here, the consideration of the rain-flag and negative growth rates is indicated, and the cloudiness criterion.
In the case that rain-flagged MLH retrievals are not excluded (configuration #2) only about 300 additional retrievals con-10 tribute to the statistics; this was expected as rain showers are unlikely under conditions of low cloudiness (here: C trsh = 0.2).
The other configurations however imply a substantial change, e.g. with the "extreme" criterion applying C trsh = 0 (configura- Two examples are explicitly shown. The first example shown in Table 7 is based on the application of the "C mid < 0.2"-15 criterion (configuration #7) and is typical for the other configurations (except #5), so we can omit the corresponding tables here. The second example (Table 8) is based on the strict criterion "C low = 0" and constitutes the "extreme" configuration #5. From comparing the values above the diagonal of Table 4 and Table 7 it can be seen that ∆ i,j typically changes by less than 10 m. The largest effect is found for ∆ TH,MD that decreases from 89 m to 79 m. Changes of F 100m are below 2 percent, again the largest effect is found for the TH-MD intercomparison (from 42.5 % to 44.3 %, see values below the diagonal). For 20 configuration #5 without clouds below 2 km, see Table 8, the differences to the reference configuration #1 are slightly larger, especially ∆ AU,MD changes from 5 m to 26 m . The change of F 100m can be up to almost 5 %, for the intercomparison of M-TH and M-MD this limit is even exceeded, when F 100m increases from 42.5 % to 48.0 %. For all intercomparisons configuration #5 results in larger F 100m when compared to the reference case. The reason is that the stronger cloud filtering reduced the risk of critical MLH-retrievals. We conclude that there is no regular spatial pattern in the changes of ∆ i,j and F 100m with changing configurations. From this and the fact that the definition of a configuration is somewhat arbitrary anyway, we conclude that the reference configuration (Table 4) is well suited for the investigation of the spatial variability of the MLH, and that the uncertainty of ∆ i,j can assumed to be approximately 10 m, and the uncertainty of F 100m approximately 2-3 %.

5
It has been stated at the beginning of this section that we consider MLH-retrievals of the whole day. This includes very shallow boundary layers during night time when stable conditions prevail. As a consequence differences of the MLH at two sites are expected to be small and the correlation large. For this reason, and because convective boundary layers are of special interest, we briefly want to summarize a few key results if only the MLH from the daylight period is compared.  The upper right part of Table 9 shows the mean differences ∆ i,j , the lower left part indicates F 100m . When compared to 10 Table 4 it can be seen that the absolute value of ∆ i,j is (in 9 out of 15 cases) smaller than the 24 hours-evaluation with differences of up to 31 m in case of M-OS/M-AU. With respect to F 100m the effect of the observation period is larger: in all but one case n(∆ i,j ) is considerably broader with a reduction of F 100m between 5 % and 10 %. The reason is that when omitting night time measurements the variability of the developing MLH before noon has a larger influence on the statistics, e.g. if the onset of convection is not the same at the two sites. When comparing core and remote sites additionally the urban heat island 15 effect may become more relevant. Table 10 is the analog to Table 5: it shows the correlation coefficient R and the parameters a 0 and a 1 of the regression line, however, for daytime measurements only. When focusing on the core sites of testbed M, the correlation coefficient is virtually unchanged, whereas for intercomparisons of core and remote sites R is slightly reduced. Nevertheless, for all intercomparisons the MLHs remain highly correlated when the observation period is changed from 24 hours to daylight only. Changes of the parameters a 0 and a 1 are also very small.

5
The investigations described in the previous section for testbed M has also been made for testbed B. The total number of intercomparisons is 109400 < N < 120700, i.e. similar to testbed M. A summary of key parameters is given in Table 11. It can be seen that the absolute value of the mean difference of the MLH is 78 m or less , in many cases (9 out of 15) even below 30 m. So the spatial variability of the MLH is lower than for testbed M. The fact that the absolute values of ∆ i,j and the heights of the buildings are of the same order of magnitude underlines the importance considering the actual altitude of the 10 ceilometer. F 100m is in general somewhat larger than for testbed M. The values vary between 47.3 % and 79.5 %. If alternative configurations as described in Table 6 are applied the robustness of these values is confirmed.
To be consistent with the discussion for testbed M we also show all frequency distributions n(∆ i,j ) and the MLH-scatter plots for the reference configuration in Fig. 4. The general shape of the distributions is similar. The peak of n is in some cases more pronounced as can be seen in the panels above the diagonal, but not always for |∆ i,j | < 0.01 km. It amounts e.g. 20.7 % 15 for B-FU vs. B-TU with the maximum at -0.05 ≤ ∆ FU,TU < -0.03 km, i.e., the mixing layer is less extended at B-FU. This might be a consequence of the topography. Small ∆ i,j can be found for any width of the mixing layer (panels below the     Different symbols denote the different configurations (Table 6), crosses are for configuration #5 with C low < 0, for more details see text.

Dependence on distance
To investigate the dependence of the parameters R and F 100m on the distance of the two corresponding sites we have combined the results from both testbeds. Moreover we distinguish retrievals of two different time periods: either from 24 hours or from the daylight period only. Fig. 5 concerns the correlation coefficient R and shows the results from testbed M (red: 24 hours, blue: daylight period) and testbed B (black: 24 hours, green: daylight period). The different symbols refer to different configurations (Table 6) with the reference configuration marked by squares. The fact that different symbols except crosses actually cannot 5 be distinguished demonstrate the robustness of the correlation to changes of the selection criteria. The crosses correspond to configuration #5, i.e. a very strict requirement with a substantial reduction of cases. Three main conclusions can be drawn: first, the correlation decreases with increasing distance, second, the correlation is larger for testbed B, and third, the differences between the different configurations are very small for testbed B. In case of testbed M the correlation coefficient is more sensitive to the selection of the cases, however, the uncertainty of R is still small. In general the correlation coefficient is   (Lotteraner and Piringer, 2016). For distances in the order of 50 km for both German testbeds R is significantly larger compared to the Chinese sites where 0.6 < R < 0.8 was found, and the measurements in Pasadena (Scarino et al., 2014). However, only short time periods were exploited by Zhu et al. (2018) and Scarino et al. (2014), and the terrain height in California was quite variable.
The fraction of cases with MLH-differences below 100 m, F 100m , as a function of distance is presented in a similar figure   (Fig. 6). For this parameter the results for testbed M and testbed B are quite similar. The main difference is that F 100m is 5 slightly larger for testbed B (black and green symbols) for distances smaller than 35 km. The differences of the low values of F 100m when only daylight observations and large distances are considered (green and blue symbols) are not considered as significant. Again, the variability is larger for testbed M, here it is in the order of the difference between the two testbeds.

Diurnal cycle of the MLH
For the determination of the mean diurnal cycle of the MLH we consider days selected by the criteria defined as the reference 10 configuration, i.e. among others that C low < 0.2 when considering the whole day. Thus, due to the relatively high cloudiness in Germany, only a limited number of days is available. As a consequence, only bi-monthly averages are determined.
The mean diurnal cycle is shown for two examples: for the April/May period it is shown in Fig. 7 as a function of time since sunrise, whereas the August/September average is shown in Fig. 8. The different colors indicate the sites, the solid lines are and not for August/September. For B-LI the differences to sites in central Berlin are even less pronounced so that a distinct effect of the urban heat island cannot be demonstrated.
Whether during the summer months the mixing layer in Berlin is in general more extended than in Munich should be assessed on the basis of a longer series of measurements, and a detailed analysis of meteorological parameters, topography and surface type/land use (e.g. Fallmann et al., 2016). The same is true for the effect of the urban heat island on the MLH. Only a quick overview over relevant meteorological parameters may be provided here in Fig. 9: sunshine duration, temperature,  adequate to monitor the annual evolution. Finally, it should be emphasized, that an intercomparison with other cities would suffer from the fact that publications often do not disclose how the annual cycle has been determined, or time series do not cover several years.

Integrated backscatter
To establish a global overview of the variability of the MLH on an urban or regional scale a general accepted definition of the MLH is mandatory. This is not yet achieved, in particular, as on the one hand there are quite different approaches possible, 5 based on different meteorological principles, on the other hand even for retrievals based on ceilometers no standard procedure is available. As a consequence, intercomparisons of the vertical distribution of aerosols based of their optical or microphysical properties would be a important progress, in particular, as such information could facilitate the comparisons with results from numerical models. As a first step towards this direction we have intercompared the integrated backscatter β int at the sites of each testbed. We define the integrated backscatter as the integral of the particle backscatter coefficient β p from the surface to 10 the MLH z mlh assuming a constant β p in the region of incomplete overlap, see Eq. 1.
According to the previous investigation we set z min =220 m, z 0 is given in Table 1. When β int is multiplied with a mean lidar ratio one gets the aerosol optical depth (AOD) of the mixing layer at 1064 nm. In the absence of elevated aerosol layers (e.g. Saharan dust, occasionally occurring over Munich) this should be close to the AOD as determined by sun photometers.

15
AERONET provides AOD at 1020 nm so interpolation over the small wavelength interval is not an issue in view of the uncertainty of the lidar ratio and the signal calibration.
We aim to determine the particle backscatter coefficient β p for all times when a MLH-intercomparison is provided. In principle, two options for the calculation of β p are available: we can use a backward or a forward inversion algorithm, where the former is conceptually preferential due to a better accuracy. Both approaches with a lidar ratio of 45 sr are applied in the 20 following.

Application of the backward inversion
The temporal resolution of the MLH-retrieval, i.e. 2 minutes, cannot be kept for β p -retrievals when the backward Klettalgorithm (Klett, 1981;Fernald, 1984) is used because the Rayleigh-calibration requires much longer time averaging. Therefore, we reduce the temporal resolution to one hour by defining 47 overlapping time slices that are centered around each half 25 hour, i.e., the first covers the period from 00:00 to 1:00 UTC, the second the period from 00:30 to 01:30 UTC and so on. MLHretrievals from COBOLT and ceilometer signals were averaged accordingly with cloud contaminated signals being excluded.
If less than 25 % of the ceilometer profiles remain, no average is calculated. Furthermore, all cases with z cob < 0.4 km are excluded from the following investigation to avoid a strong contribution of the assumed β p -profile in the overlap region to 22 https://doi.org/10.5194/acp-2020-332 Preprint. Discussion started: 5 May 2020 c Author(s) 2020. CC BY 4.0 License. β int , see first term on the right hand side of Eq. 1. To have an additional criterion to prevent cloud contamination, cases with β p > 0.01 km −1 sr −1 at heights z < z mlh are excluded. As already mentioned, an refined overlap-correction is applied to the ceilometer signals.
As a consequence, the number of intercomparisons of β int is much smaller than the number of MLH-intercomparisons. This is in particular true when M-TH is involved because for this instrument the Rayleigh calibration was not possible in most cases. The reason was the aging of the avalanche photodiode occurring until June 2019 resulting in a distortion of the signal.

5
From dark measurements, i.e. covered telescope, in December 2018 and January 2019 it was found that this distortion was stable in time, however, earlier dark measurements are not available. Correction of the ceilometer signals by means of the dark measurements increase the number of successful Klett inversions, however, the total number remains too low to make an intercomparison reasonable. For intercomparisons not including M-TH the total number is still relatively low and ranges from 124 to 327 (testbed M), and from 106 to 667 (testbed B).

10
The intercomparisons for the two testbeds are shown in Fig. 10. The panels above and below the diagonal refer to testbed M and testbed B, respectively. Cases with β int > 8 · 10 −3 sr −1 at either site or |∆ i,j | > 1 km are considered as unrealistic and thus have been removed. As mentioned above, the panels of the first row, associated with M-TH, should be ignored; they are only included for the sake of completeness. For testbed M a correlation coefficient of 0.488 < R < 0.841 can be found; note that the low R for M-OS/M-MD is based on 124 intercomparisons only. For testbed B it is in general larger with 0.702 < R < 15 0.933. In spite of the moderate to high correlation the mean difference D i,j (in percent) of the integrated backscatter at the two correlated sites i and j (with k being the counter of the N cases) is quite large: for testbed M D i,j is between 27 and 53 % (median: 32 %). Note, that in the denominator the mean of both sites is used, because the integrated backscatter is not the same. Within the testbed the averages β int differ between 3.9 ·10 −4 sr −1 at M-WS and 5.4 ·10 −4 sr −1 at M-OS, revealing 20 that the amount of aerosols (expressed in terms of backscattering) is much more variable than the MLH. For testbed B we find 14 < D i,j < 44% (median: 33 %), and 4.1 ·10 −4 sr −1 < β int <5.9 ·10 −4 sr −1 . Again, the variability within the testbed is quite large. The main reason is the variability of the local aerosol sources with respect to the emission strength and the particle type. This has been shown by e.g. von Schneidemesser et al. (2018) who reported large differences of emissions within Berlin.
These differences certainly influence the aerosol content of the mixing layer, though due to the time required for then vertical 25 mixing an immediate effect of the ground based sources on the vertical column is not expected. If we strengthen the above mentioned criteria with respect to β int or |∆ i,j | the number of cases is reduced but the correlation coefficient increases with the second criterion being much more effective. As an example, the correlation increases to 0.781 < R < 0.943 for testbed B, if data with β int > 4 · 10 −3 sr −1 at either site or |∆ i,j | > 0.2 km are excluded.

Application of the forward inversion 30
The main drawback of the backward inversion is the limited number of cases. Thus, we also use an alternative approach to calculate β int by applying the "forward Klett inversion", i.e. instead of the determination of a boundary value of β p at a reference height (Rayleigh calibration) one relies on the lidar constant C L (aka calibration constant). However, it has been found that C L of most of the commercial ceilometers is not constant, consequently adding another error source to the retrieval.
In principle, the lidar constant as a function of time can be determined from fitting C L from the available Rayleigh-calibrations.

5
The advantage of the forward integration is, that β p and consequently β int can be calculated for any time when the MLH can be retrieved, even when mid-level clouds are present or the signal-to-noise-ratio of the ceilometer signal is too low to allow a backward inversion. The disadvantage is that the accuracy of the fitted C L is limited, furthermore, the forward inversion includes the assumption that the atmospheric transmission below z ovl is close to 1. For the wavelength of 1064 nm and low overlap ranges this assumption is justified (Wiegner and Gasteiger, 2015).

10
An example can be found in Fig. 11  β int is typically less correlated than the MLH (see Table 5 and Table 12 for comparison). This is caused by the uncertainty of the calibration, in particular when applying the forward Klett inversion, and the fact that the intercomparison of the integrated 15 backscatter is not only affected by differences of the MLH but also by differences of β p at the two sites. As already been demonstrated by means of the AOD over Berlin , differences of β p are likely because of the spatiotemporal variability of aerosol sources and dispersion pattern triggered by the changing wind field. Therefore, the results might be influenced by a number of extreme values. Their impact has been briefly estimated by introducing various criteria to eliminate unrealistic intercomparisons: one criterion concerns the maximum acceptable difference of the MLH at the two sites, another 20 the maximum difference of β int . If we gradually decrease the first criterion from |∆ i,j | < 1.0 km (see Fig. 12   To demonstrate the variability within a testbed we have also calculated averages β int ; here we consider all measurements used in any of the intercomparisons. The average for the sites of testbed B is 4.5 ·10 −4 sr −1 < β int < 7.4 ·10 −4 sr −1 (see Table 13, second line). These values are higher than the previous results derived for the 1-hour resolution (last line), in 5 particular, large differences occur if individual sites are compared. The main reason is that, due to the better temporal coverage and resolution, the samples (backward/forward inversion) of the intercomparisons are quite different. As can be seen in the right part of Table 13, the number of cases is more than one order of magnitude lower in the "backward"-case. On the other hand, the uncertainty of C L of 15 % results in an uncertainty of β int of approximately 18 %: As an example the results in the first line (underestimate of C L by 15 %) and the third line (overestimate by 15 %) may be compared to the second line where 10 C L from the fitting curve is used. This calibration error is larger than that for the backward inversion that can be as low as 4 % (Wiegner and Geiß, 2012) whereas the additional uncertainty due to the unknown lidar ratio affects both approaches in the same way. Incidentally, an increase of C L is associated with a decrease of β p , thus the number of cases slightly increases as potentially less cases are excluded due to the β int -threshold. The table also shows that the influence of different |∆ i,j |-criteria is in the order of 1-3 % only. When β int is averaged over the whole testbed, we find 5.9 ·10 −4 sr −1 for testbed B and 5.0 ·10 −4 sr −1 for testbed M, corresponding to a 19 % larger value in Berlin. This is expected taking into account the more extended mixing layer in summer.
With a lidar ratio in the range of 38 to 53 sr, calculated with the online tool MOPSMAP  for 20 continental and urban aerosol types this corresponds to a relatively low, but not implausible annual mean AOD of the mixing layer in the range of 0.019 < τ p < 0.031 at 1064 nm.
Finally, we include scatter plots (Fig. 13) and the correlation coefficients of the intercomparison of the mean particle backscatter coefficient β p,mlh averaged over the vertical extent of the mixing layer at the corresponding site. It can be seen that for testbed B the correlation coefficient is 0.725 < R < 0.945, i.e. an increase compared to β int (see Fig. 12). The low values are found for correlations with the remote site at Lindenberg. On the contrary the correlation of β p,mlh decreases for testbed M. This suggests that the height of the mixing layer has a stronger influence on the differences of β int in Berlin than in Munich.

5
In case of β p,mlh we have also determined the relative deviation E i,j for sites i and j, analogously to Eq. (3). For testbed M we find 13 % < E i,j < 30 % (average: 25 %, median: 26 %), again with the smallest deviation of 13.1 % for the closest sites M-TH/M-HW. This means that for β p,mlh the differences from the regression line are smaller than for β int , though the correlation is slightly lower. In case of testbed B we find 14 % < E i,j < 31 % (average: 22 %, median: 21 %). Here, the differences are also clearly smaller compared to β int . For a deeper analysis of these effects the role of e.g. the topography, the 10 dispersion of particles from local sources and the flow pattern should be considered.

Summary and conclusions
We have investigated the spatial distribution of aerosols in two testbeds -Munich and Berlin, Germany -both of which including 6 ceilometers located inside a large city and at rural sites approximately 50 km outside the center of the city. The aerosol distribution is described by means of two quantities that can be inferred from active remote sensing: the mixing layer height as 15 a qualitative measure and the (particle) integrated backscatter as a quantitative measure. The first parameter can be considered as a simple description of the vertical distribution, the second as a proxy for the aerosol optical depth. Data from 24 months are available. They constitute a unique data set: the first time continuous range resolved aerosol information at several points in a regional domain could be exploited. As all instruments are of the same type, and the same methodology to determine the MLH is applied unbiased results are provided. With respect to the integrated backscatter we use the backward and the forward 20 inversion scheme: though the former provides a better accuracy of the signal calibration we feel that the latter is preferential as it provides a much better temporal coverage and resolution. In either case a careful overlap correction is mandatory, and consideration of the height of the measurement platform is required. With respect to the exploitation of ceilometer data efforts should focusing on improvements of automated procedures for calibration and filtering of cloud-contaminated cases should be encouraged. 25 It was found that the MLH within each testbed is quite similar: typically the mean difference between two sites is below 50 m and in 50% -60% of the cases the difference of the MLH at two sites is within ±100 m. The correlation coefficient and the fraction of cases within ±100 m slightly decrease with distance between the corresponding sites. However, a significantly higher MLH over the city compared to sites about 50 km outside is not found. The integrated backscatter is found to be highly and moderately correlated within the Berlin and Munich testbed, respectively, however with mean differences of approximately absolute values may depend on the selection criteria. This is an inherent property of the problem that cannot be avoided: in any intercomparison study the selection of suitable "fair weather"-cases is necessarily somewhat arbitrary -we have tried to mitigate this problem by testing several configurations to find a robust set of criteria. Nevertheless there is certainly not one single set of criteria that can be considered as "absolute truth".
As a by-product we found that in the summer months the mixing layer in Berlin is significantly more extended than in Munich.

5
In view of our primary question, the representativeness of the spatial distribution of aerosols derived from data of a single ceilometer for a metropolitan area like Berlin or Munich, we conclude that the MLH can be considered as "homogeneous".
On the contrary, β int is "less homogeneous" due to the spatiotemporal heterogeneity of aerosol properties and sources. Consequently, the exploitation of denser networks of particle measurements is required, and the deployment of a single ceilometer is not sufficient to characterize the distribution of integrated optical properties as the integrated backscatter. A similar conclusion 10 was e.g. found with respect to clouds over metropolitan areas (Theeuwes et al., 2019). In future, urban scale resolving models (e.g., Maronga et al., 2020) will help to investigate the role of particle transport on the distribution of aerosols in the mixing layer and thus their heterogeneity. With respect to the exploitation of ceilometer data improvements of automated procedures for their calibration are desirable.

15
Ceilometer data are available on request from the owners.

Author contributions
MW conducted the study and wrote the paper. AG upgraded the COBOLT-algorithm to make a computational economic retrieval of the MLH feasible, and provided several plots. IM provided data of the DWD-ceilometers including quality control.
FM and TR maintained their ceilometers and provided the corresponding data. All authors contributed to the final version of 20 the paper.