Ozone trend profiles in the stratosphere: combining ground-based data over Central Europe to consider uncertainties

Observing stratospheric ozone is essential to assess if the Montreal Protocol has succeeded to save the ozone layer by banning ozone depleting substances. Recent studies have reported positive trends indicating that ozone is recovering in the upper stratosphere at mid-latitudes, but the trend magnitudes differ and uncertainties are still high. Trends and their uncertainties are influenced by factors such as instrumental drifts, sampling patterns, discontinuities, biases, or short-term anomalies that all might mask a potential ozone recovery. The present study investigates how anomalies, temporal measurement sampling rates 5 and trend period lengths influence resulting trends. We present an approach for handling suspicious anomalies in trend estimations to improve the derived trend profiles. The approach was applied to data from a Ground-based Millimetre-wave Ozone Spectrometer (GROMOS) located in Bern, Switzerland. We compare our improved GROMOS trend estimate with results from other ground stations (lidars, ozonesondes, and microwave radiometers) in Central Europe. The data indicate positive trends of 1 to 3 % per decade at an altitude of about 40 km (3 hPa), providing a confirmation of ozone recovery in the upper stratosphere 10 in agreement to satellite observations. At lower altitudes, the ground station data show inconsistent trend results, which emphasize the importance of ongoing research on lower stratospheric ozone trends. Our presented method of a combined analysis of ground station data provides a useful approach to recognize and to reduce uncertainties in stratospheric ozone trends by considering anomalies in the trend estimation. We conclude that stratospheric trend estimations still need improvement and that our approach provides a tool that can also be useful for other data sets. 15

(FFTS) with an integration time of 30 min has been used. An overlap measurement period of almost 2 years (Oct. 2009 to July 2011) was used to homogenize the FB data, by subtracting the mean bias profile of the overlap period (FB mean − FFTS mean ) from all FB profiles (Moreira et al., 2015). These homogenized ozone data are available on the NDACC web page (ftp.cpc.ncep. noaa.gov/ndacc/station/bern/hdf/mwave/). The FFTS retrieval used in the present study (version 2021) uses variable errors in the a priori covariance matrix of around 30 % in the stratosphere and 70 % in the mesosphere, and a constant measurement error 5 of 0.8 K (Moreira, 2017). The retrieved profiles have a vertical resolution of ∼ 15 to 25 km in the stratosphere. We concentrate in this study on the middle and upper stratosphere between 31 hPa (≈ 24 km) and 0.8 hPa (≈ 49 km), where the retrieved ozone is quasi independent of the a priori information. This is assured by limiting the altitude range to the altitudes where the area of the averaging kernels (measurement response) is larger than 0.8, which means that more than 80 % of the information comes from the observation instead the a priori data (Rodgers, 2000). More information about the homogenization as well 10 as the parameters used in the retrieval can be found in Moreira et al. (2015). Besides the described data harmonization to account for the instrument upgrade, we performed some additional data correction. Because the stratospheric signal is weak in an opaque and humid atmosphere, we discarded measurements when the atmospheric transmittance was smaller than 0.3.
Excluding measurements in such a way should not result in a sampling bias because tropospheric humidity is uncorrelated to stratospheric ozone. Also, the data have been corrected for outliers at each pressure level by removing values that exceed 15 four times the standard deviation within a 3-days moving window. Profiles were excluded completely when more than 50 % of their values were missing (e.g. due to outlier detection). Furthermore, we omitted profiles where the instrumental system temperature showed outliers, identified by an exceed of four times the standard deviation within a 30-days moving window.
The second microwave radiometer used in this study is the Stratospheric Ozone Monitoring Radiometer (SOMORA). It was built in 2000 as an update of the GROMOS radiometer and is located at the aerological station (MeteoSwiss) in Payerne 20 (46.8 • N, 7.0 • E) at 491 m a.s.l. since 2002. Some instrumental changes were performed in the beginning of 2005 (front-end change) and in October 2010, when the acousto-optical spectrometer of SOMORA has been upgraded to an FFTS (Maillard Barras et al., 2015). The data have been harmonized to account for the spectrometer change. The instrument covers an altitude range from 25 to 60 km with a temporal resolution of 30 min to 1 h. In this study, we consider SOMORA data at an altitude range between 18 hPa (≈ 27 km) and 0.8 hPa (≈ 49 km). For more information about SOMORA, refer to Calisesi (2003) 25 concerning the instrumental setup and Maillard Barras et al. (2009Barras et al. ( , 2015 concerning the operational version of the ozone retrievals used in the present study.

Lidars
We use data from two differential absorption lidar (DIAL, Schotland (1974)) instruments in Germany and France. The instruments emit laser pulses at two different wavelengths, where one is absorbed by ozone molecules and the other is not. 30 Comparing the backscattered signal at these two wavelengths provides information on the vertical ozone distribution in the atmosphere.
The lidar at the Meteorological Observatory Hohenpeissenberg (MOH,47.8 • N,11.0 • E, 980 m a.s.l.) in Germany has been operating since 1987, emitting laser pulses at 308 and 353 nm (Werner et al., 1983;Steinbrecht et al., 2009b). The lidar can only retrieve ozone profiles during clear-sky nights due to scattering on cloud particles and the influence of sunlight. On average, it retrieves eight night-profiles in a month. In this study we limit the data to the altitude range where the averaged measurement error is below 10 % (below 43 km or 2 hPa). The lower altitude limit was set to the chosen limit of GROMOS at 31 hPa (≈ 24 km).
The Observatory of Haute Provence (OHP, 43.9 • N, 5.7 • E, 650 m a.s.l.) in France hosts a lidar that measures in its current 5 setup since the end of 1993 (Godin-Beekmann et al., 2003). The lidar emits laser pulses at 308 nm and 355 nm, as first described by Godin et al. (1989). The instrument measures on average 11 profiles per month. We use OHP lidar profiles below 40 km (∼ 3 hPa), where the averaged measurement error remains below 10 %. As lower altitude limit we use 31 hPa (≈ 24 km) to be consistent with the GROMOS limits.
More detailed information about the lidars and ozonesonde used can be found for example in Godin et al. (1999) and Nair 10 et al. (2011, 2012).

Ozonesondes
The three mentioned observatories at Payerne, MOH and OHP also provide weekly ozonesonde measurements. The ozonesonde measurements in Payerne are usually performed three times a week at 11:00 UTC (Jeannet et al., 2007), resulting in 13 profiles per month on average. The meteorological balloon carried a Brewer Mast sonde (BM, Brewer and Milford (1960)) until 15 September 2002, which was then replaced by an electrochemical concentration cell (ECC, Komhyr (1969)). The profiles are normalised using total column ozone from the Dobson spectrometer at Arosa (46.77 • N,9.7 • E, 1850 m a.s.l., Favaro et al. (2002)). If those data are not available, forecast ozone column estimates based on GOME-2 (Global Ozone Monitoring Experiment-2) data are used (http://www.temis.nl/uvradiation/nrt/uvindex.php).
Ozone soundings at MOH are performed two to three times per week with a BM sonde (on average 10 profiles per month). 20 Three different radiosonde types have been used since 1995, all carrying a BM ozonesonde, without major changes in its performance since 1974 (Steinbrecht et al., 2016). The profiles are normalized by on-site Dobson or Brewer spectrophotometers and, if not available, by satellite data (Steinbrecht et al., 2016).
At OHP, ECC ozonesondes have been used since 1991 with several instrumental changes (Gaudel et al., 2015). The data are normalized with total column ozone measured by a Dobson spectrophotometer until 2007 and ultraviolet-visible SAOZ 25 (Système d'Analyse par Observation Zénithale) spectrometer afterwards (Guirlet et al., 2000;Nair et al., 2011). In our analysed period, four profiles are on average available per month.
Ozonesonde data are limited to altitudes up to ∼ 30 km, where the balloon usually bursts. Therefore, we used ozonesonde profiles only below 30 km, which is a threshold value for Brewer Mast ozonesondes with precision and accuracy below ±5 % (Smit and Kley, 1996). For normalization, the correction factor (CF), which is the ratio of total column ozone from the reference 30 instrument to the total ozone from the sonde, has been applied to all ozonesonde profiles. At all measurement stations, we discarded profiles when their CF was larger than 1.2 or smaller than 0.8 (Harris et al., 1998;Smit and ASOPOS Panel, 2013).
We further excluded negative values and profiles with extreme jumps or constant ozone values, as well as profiles with constant or decreasing altitude values.

Aura/MLS
The microwave limb sounder (MLS) on the Aura satellite, which was launched in mid 2004, measures microwave emission from the Earth in five broad spectral bands (Parkinson et al., 2006). It provides profiles of different molecules in the atmosphere, including stratospheric ozone, with a vertical resolution of ∼ 3 km. We used ozone data from Aura/MLS version 4.2 above Bern with a spatial coincidence of ±1 • latitude and ±8 • longitude, where the satellite passes twice a day (around 02:00 and 5 13:00 UTC). More information about the MLS instrument and the data product can be found, e.g., in Waters et al. (2006). We chose the Aura/MLS data for our study because it was identified to be stable with insignificant drifts between 20 and 40 km (Hubert et al., 2016).

Data comparison
To verify the GROMOS data and detect possible anomalies, we compared all described data sets with the GROMOS data 10 from Bern in the time period Jan. 1995 to Dec. 2017. The different instrument sites are all located in Central Europe. Payerne is located 40 km south-west of Bern, which ensures comparable stratospheric measurements. Hohenpeissenberg is located 290 km north-east of Bern, and the OHP lies 360 km south-west of Bern. Even if stratospheric trace gases generally show small horizontal variability, the distance between the different stations, especially between MOH and OHP, may lead to some differences in measured ozone.

Data processing for comparison
The GROMOS retrieval is accurate between 31 and 0.8 hPa. We therefore limit all instrument data in this study to this altitude range, and divide it in three parts. For convenience, they will be referred to as lower stratosphere between 31 and 13 hPa (≈ 24 to 29 km), middle stratosphere between 13 and 3 hPa (≈ 29 to 39 km) and upper stratosphere between 3 and 0.8 hPa (≈ 39 to 49 km). The limits for the upper stratosphere agree with the common definition (e.g. Ramaswamy et al., 2001), whereas the 20 here defined lower and middle stratosphere are usually referred to as middle stratosphere in other studies.
Most of the instruments provide volume mixing ratios (VMR) of ozone in parts per million (ppm). In case that the data were given in number density (molecules cm −3 ), as for example for the OHP ozonesonde, the VMR was calculated with the air pressure and temperature provided by the same instrument or collocated radiosonde data. Data that can not be physically explained such as negative ozone values have been excluded for all instruments. 25 The GROMOS, SOMORA and Aura/MLS profiles have a constant pressure grid, which is not the case for the lidar and ozonesonde data. We therefore interpolated all lidar and ozonesonde profiles to a pressure grid based on a regular spaced altitude grid of 100 m for the ozonesonde at OHP and Payerne, and 300 m for the lidars and the ozonesonde at MOH. The mean profile of the interpolated pressure data then built the new pressure grid for the ozone data. These interpolated lidar and ozonesonde data are used for the trend estimations. For the direct comparison with GROMOS, the data were adapted to the 30 GROMOS grid, which is described in the next section (Sect. 2.2.2). Our figures generally show both pressure and geometric 6 Atmos. Chem. Phys. Discuss., https://doi.org /10.5194/acp-2018-1213 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 4 December 2018 c Author(s) 2018. CC BY 4.0 License. altitude. The geometric altitude is approximated by the mean altitude grid from GROMOS, which is for each retrieved profile determined from operational model data of the European Centre for Medium-Range Weather Forecasts (ECMWF).

Direct comparison with GROMOS
The vertical resolution of GROMOS is usually coarser than for the other instruments, except for SOMORA. When comparing their profiles directly with GROMOS profiles, the different vertical resolution of the instruments has to be considered. 5 Smoothing the profiles of the different instruments by GROMOS' averaging kernels makes it possible to compare the profiles with GROMOS without biases due to resolution or a priori information (Tsou et al., 1995). The profiles having higher vertical resolution than GROMOS were interpolated to the GROMOS pressure grid, and were then convolved by the averaging kernel matrix according to Connor et al. (1991), with 10 where x conv is the resulting convolved profile, x a is the a priori profile used in the GROMOS retrieval, x h is the profile of the higher resolved instrument, interpolated to the GROMOS grid, and AVK is the corresponding averaging kernel matrix from GROMOS. When x h covered a smaller altitude range than GROMOS, the profile x h was extended by the a priori profile of GROMOS to cover the same altitude range. After the convolution, these additional values were removed again to obtain the original altitude range of x h . The SOMORA profiles have a similar vertical resolution as profiles from GROMOS and were 15 thus not convolved, because this would require a more advanced comparison method as proposed by Rodgers and Connor (2003) or Calisesi et al. (2005). GROMOS has a higher temporal resolution than the other instruments (except SOMORA).
For SOMORA, only profiles coincident in time with GROMOS have been selected. For the other instruments, a mean of GROMOS data at the time of the corresponding measurement was used, with a time coincidence of ± 30min. Only for the lidars, GROMOS data were averaged over the whole lidar measurement time (usually one night). 20 For direct comparison with GROMOS we computed relative differences between the monthly mean values of the different data sets and the mean of the coincident GROMOS profiles. The relative differences have been calculated by subtracting each monthly ozone value of the data set X from the corresponding GROMOS monthly mean (GR), using the GROMOS monthly mean of the corresponding month as a reference ((GR−X)/GR·100). Based on these relative differences we identified periods where GROMOS showed biased values compared to the other instruments. To identify these anomalies we used a corrected 25 relative difference to GROMOS. For this the relative difference time series for each instrument was corrected by its mean relative difference to GROMOS. This made it possible to ignore a potential constant offset of the instruments and to concentrate on periods with temporally large differences to GROMOS. When this corrected relative difference was larger than 10 % for at least three instruments, the respective month was identified as anomaly. Above 2 hPa, where only SOMORA and Aura/MLS data are available, both data sets need to have a corrected relative difference larger than 10 % to be identified as anomaly.

Trend model
To estimate stratospheric ozone trends we applied the multilinear parametric trend model from von Clarmann et al. (2010) to the monthly means of all individual data sets. The model fits the following regression function: where t is the monthly time vector, y(t) represents the estimated ozone time series, and a to h are coefficients that are fitted in the trend model. QBO 30  (MEI), that combines six meteorological variables measured over the tropical Pacific (Wolter and Timlin, 1998). The F10.7 data and the MEI data are available via https://www.esrl.noaa.gov/psd/data/climateindices/list/. In addition to the described natural oscillations, we used four periodic oscillations with different wavelengths l n to account for annual (l 1 = 12 months) and semiannual (l 2 = 6 months) oscillation as well as two further overtones of the annual cycle (l 3 = 4 months and l 4 = 3 months).
The strength of the model used is that it can consider inhomogeneities in data sets, by considering a full error covariance 15 matrix (S y ) when reducing the cost function (χ 2 ). Inhomogeneous data can originate from changes in the measurement system (e.g. changes in calibration standards or merging of data sets with different instrumental modes), or from irregularities in spatial or temporal sampling. Such inhomogeneities lead to groups of temporal correlated data, that can, if not considered, change significance and slope of the estimated trend (von Clarmann et al., 2010). Von Clarmann et al. present an approach to consider such inhomogeneities in the trend analysis. They divide the data into multiple subsets which are assumed to be biased 20 with respect to each other and which are characterized by diagonal blocks in the data covariance matrix. Another reason for inhomogeneities might be unknown instrumental issues that lead to anomalies, which is treated in the present study.
To consider inhomogeneities, the total uncertainty of the data set is represented by a full error covariance matrix S y at each pressure level. The diagonal elements of S y are set to the monthly means of the measurement uncertainties for each instrument. For lidar data this is on average 4 % for the OHP lidar and 6 % at MOH between ∼ 20 and 40 km, with lowest 25 uncertainties in the middle stratosphere. For the ozonesonde, the uncertainty is assumed to be 5 % for all ECC sondes and 10 % for BM sondes (Harris et al., 1998;Smit and ASOPOS Panel, 2013). For SOMORA we use uncertainties composed of smoothing and observational error, ranging from 1 to 2 % in the middle stratosphere and 2 to 7 % in the upper stratosphere.
The used Aura/MLS uncertainties range between 2 and 5 % throughout the stratosphere. For the GROMOS trends, we use  Moreira et al. (2015), composed of the standard error (σ/ √ DOF with standard deviation σ and degrees of freedom DOF) of the monthly means, an instrumental uncertainty (measurement noise), and an estimated systematic instrument uncertainty obtained from cross-comparison. The resulting uncertainty values are approximately 5 % in the middle stratosphere, 6 to 8 % in the lower stratosphere and 6 to 7 % in the upper stratosphere. The off-diagonal elements of S y are first set to zero, assuming no error correlation. In a second step, autocorrelation coefficients are inferred from residuals 5 of the initial trend fit. These are used to build another covariance matrix which is added to the data error covariance matrix.
The mean variance of this additional covariance matrix is adjusted such that the χ 2 divided by the degrees of freedom of the trend fit with the enhanced covariance matrix is unity. This additional covariance matrix represents contributions to the fit residuals which are not caused by data errors but by phenomena that are not represented by the trend model. This second trend fit is only performed if the initial normalized χ 2 is larger than unity, which is not the case if the assumed data errors 10 are larger than the fit residuals (Stiller et al., 2012). The more sophisticated uncertainty estimates that we use for GROMOS are larger than the residuals in the first regression fit (χ 2 < 1), which means that the correlated residuals are not considered for the GROMOS trend. For all the other instruments, however, correlated residuals are considered, because we use simple instrumental uncertainties that are usually smaller than the fitted residuals.
To account for anomalies in the time series when estimating the trends we adapted S y in two steps. First, we increased 15 the uncertainties for months where anomalies were identified. For this purpose, we set the diagonal elements of S y for the respective months to a value obtained from the mean difference to all instruments and the mean GROMOS ozone value at each altitude. Assuming for example that GROMOS deviates from all instruments on average by 10 % in August 2010 at 10 hPa, and the overall mean August value at this altitude would be 7 ppm. In this case we would assume an uncertainty of 0.7 ppm for this biased month at this altitude level. In a second step, we used the ability of the trend programme to account for biases in a data 20 subset. For this, a fully correlated block composed of the squared estimated bias uncertainties is added to the corresponding part of the covariance matrix. Considering the bias in this way is mathematically equivalent to treating the bias as an additional fit variable that is fitted to the regression model with an optimal estimation method, as shown by von Clarmann et al. (2001).
The value chosen for the bias uncertainty determines how much the bias is estimated from the data itself. For low values, the bias will be close to the a priori value, which would be a bias of zero, for high values it will be estimated completely from 25 the given data. The bias can thus be estimated from the data itself, which makes the method more robust because it does not depend on an a priori choice of the bias (Stiller et al., 2012). In a sensitivity test we have found that assuming a value of 5 ppm for the correlated block permits a reliable fit, where the bias is fitted independently of the a priori zero bias.
Our ozone trend estimates always start in Jan. 1997, which is the most likely turning point for ozone recovery due to the its absolute value exceeds twice its uncertainty, as described by Tiao et al. (1990). This statistically inferred confidence interval is based on the assumed instrumental uncertainties. Unknown drifts of the data sets, however, are not considered in this claim.

Time series comparison
To verify the GROMOS data and identify potential anomalies, we first have a look at the GROMOS time series (Sect. 3.1) and compare then the data with instrument data in Central Europe (Sect. 3.2).

GROMOS time series
The monthly means of the GROMOS time series ( Fig. 1 (a)) clearly depict the maximum of ozone VMR between 10 hPa in November 1997 because of an instrumental upgrade, leading to a higher monthly mean value than usual. Besides this we did not detect any systematic instrumental issues in the GROMOS data that could explain the anomalies. Therefore, we compare the GROMOS data with the presented ground-based instruments from Central Europe, as well as with Aura/MLS data, to check 20 whether the observed anomalies are due to natural variability or due to unexplained instrumental issues.

Comparison of different data sets
We compared GROMOS with ground-based and Aura/MLS data and averaged it over three altitude ranges (Fig. 2). The different data sets have been smoothed with the averaging kernels of GROMOS to make a direct comparison possible, as described in Sect. 2.2.2. Due to the similar vertical resolution of GROMOS and SOMORA, the SOMORA profiles have not 25 been smoothed by GROMOS' averaging kernels, despite differences between their a priori data and averaging kernels. This might lead to larger differences between GROMOS and SOMORA than between GROMOS and the other instruments. To avoid that an instrument does not cover the full range of one of the three altitude classes, all ozonesonde data have been cut at 30 km (≈ 11.5 hPa), all lidar data at 3 hPa (≈ 40 km), and all SOMORA data below 13 hPa (≈ 29 km) for this analysis.
The different instrument time series shown in Fig. 2 contain only data that are coincident with GROMOS measurements as 30 10 The different data sets agree well, showing, however, periods where some instruments deviate more from GROMOS than others. In the upper stratosphere ( Fig. 2 (a)), GROMOS and SOMORA agree well most of the time, but GROMOS reports slightly lower ozone than SOMORA and also lower values than Aura/MLS. In the middle stratosphere ( Fig. 2 (b)), both lidars Differences between the data sets in the lower stratosphere can be better seen in Fig. 3. Shown are the monthly relative differences of time coincident pairs of GROMOS (GR) and the convolved data set X, with GROMOS data as reference value To summarize our comparison results, we observed some periods with anomalies compared to GROMOS' climatology 25 (Fig. 1). Some of these anomalies were also observed when comparing GROMOS to the different data sets (Fig. 2, Fig. 3, and series by comparing ground-based data sets to several satellite products (see also Petropavlovskikh et al. (2018)). Some of our detected biased periods were also found by Steinbrecht et al. (2009a) who compared different ground-based instruments, for 5 example the GROMOS anomaly in 2006. They attribute the observed biases to sampling differences, but also to irregularities in some data sets. In fact, our results confirm irregularities in the GROMOS time series, which are probably due to instrumental issues of GROMOS.

Ozone trend estimations
By comparing the GROMOS data to the other instruments, we have confirmed some biased periods in the GROMOS time 10 series. To improve the GROMOS trend estimates, we use now these detected anomalies by considering them in the regression fit (Sect. 4.2). In the following section, this method will first be applied to an artificial time series to test and illustrate the approach.

Artificial time series analysis
The trend programme from von Clarmann et al. (2010) can handle uncertainties in a flexible way, which makes it possible to 15 account for anomalies in a time series. To investigate how anomalies can be best considered in the trend programme, we test the programme with a simple artificial time series that consists of monthly ozone values with a seasonal cycle and a linear trend. This artificial time series y(t) is given by with the monthly time vector t, a constant ozone value a = 7 ppm and a trend of 0.1 ppm per decade, with b = 0.1/120 ppm 20 per month. We consider a simple seasonal oscillation with an amplitude A = −1 ppm such that A 2 = g 2 + h 2 (e.g. g = h = √ 0.5 ppm) and a wavelength l n = 12 months. The uncertainty was assumed to be 0.1 ppm for each monthly ozone value, which was considered in the diagonal elements of S y . This artificial time series (shown in Fig. 5 (a)) is later on referred to as case A. The estimated trend for this simple time series corresponds quasi perfectly to the assumed time series' trend (0.1 ppm per decade), which proves that the trend programme works well. The residuals (Fig. 5 (b)) are small (−6 order of magnitudes), 25 but increase towards the time series edges. This might be due to some error propagation, but can be neglected for this study because of its small magnitude.
To investigate how the trend programme reacts to anomalies in the time series, we increased the monthly ozone values by 5 %  (Fig. 6). 30 Assuming that a real time series contains such suspicious anomalies due to for example instrumental issues, they would distort the true trend. To account for these anomalies in the trend estimation, we applied different possible corrections by modifying the error covariance matrix S y in the trend model. The used parameters for the different corrections are summarized in Table 2, and Fig. 6 presents the resulting linear trends. In a first attempt we decreased the weight of the anomalies in the time series by increasing their uncertainties (diagonal elements of S y ) and set the uncertainties of the corresponding months to 5 % of the overall mean ozone value (case C). Afterwards, we added a correlated block to the covariance matrix for the months containing 5 anomalies, and applied the S y once with (case D) and once without (case E) the increased diagonal elements of S y . The bias uncertainty in the correlated block was set to 5 ppm.
The results show that all of the corrected trend estimations (cases C, D, and E) improve the trends and lead to almost the true trend of 0.1 ppm ( Fig. 6 (b)). They are thus able to correct for the artificially added anomalies. Considering a correlated bias block in the S y (case D and E) leads to even better results than only increasing the diagonal elements of S y (case C). 10 Consequently, the trends are improved by correcting the anomalies in the trend estimation, whereas the uncorrected estimate distorts the trend unintentionally. We further found that the trend estimate is closer to the true trend, the higher the uncertainties are chosen (diagonal elements of S y or correlated block in S y ). This can be explained by the fact that the trend analysis allows the bias to vary as in an optimal estimation scheme (von Clarmann et al., 2001). When estimating the bias, the higher the bias uncertainty is, the less confidence is accounted to the a priori knowledge about the bias (that would be a bias of zero), 15 and the bias is then determined directly from the data as if it would be an additionally fitted variable. Based on our test with the artificial time series we conclude that our method succeeds to handle suspicious anomalies in a time series, leading to an estimated trend close to the trend that would be expected without anomalies.

GROMOS trends
The GROMOS time series has been used for trend estimations in Moreira et al. (2015), who found a significant positive trend 20 in the upper stratosphere. In recent years, GROMOS showed some anomalies leading to higher trends than expected in the middle atmosphere, as for example shown by Steinbrecht et al. (2017) or Petropavlovskikh et al. (2018). These higher trends motivated us to improve GROMOS trend estimates by accounting for the observed anomalies. In the following, we present the trend profiles of the GROMOS time series by considering the detected anomalies in the trend programme with the different correction methods that were introduced in Sect. 4.1. In a first step, we estimated the trend without considering inhomogeneities 25 in the data (case I in Fig. 7). The uncertainties of the data that are used as diagonal elements in S y range between 5 and 8 % and are composed as proposed by Moreira et al. (2015) (see Sect 2.3). In a second step, we increased the uncertainties (diagonal elements of S y ) as described in Sect. 2.3 for the months and altitudes that were detected as anomalies, but still assume an uncorrelated error (case II). In a third step, we considered a full correlated block for the periods where anomalies were detected to fit a bias (case III). The bias uncertainty was set to 5 ppm at all altitudes, which ensures that the bias is fully estimated from 30 the data. The diagonal elements of S y stayed the same as in case II.
The trend profiles in Fig. 7 report an uncorrected GROMOS ozone trend (case I) of about 3 % per decade in the middle stratosphere from 30 to 5 hPa, which decreases above to around −4 % per decade at 0.8 hPa. Correcting the trend by increasing the uncertainty for months with anomalies (case II) decreases the trend slightly in the lower stratosphere, but the differences  2017), even if the GROMOS trend peak is observed at slightly lower altitudes. A possible reason for this difference in the trend peak altitude might be related to the averaging kernels of the current GROMOS retrieval version.
We observe a slight shift in the peak height of the averaging kernels, indicating that the information is retrieved from slightly higher altitudes (∼ 2 to 4 km). Furthermore, one has to keep in mind the coarse altitude resolution of GROMOS, which is at this altitude between 20 to 25 km. It is therefore possible that the trend peak altitude does not exactly correspond to the true 15 peak altitude. The instrumental upgrade in 2009 led to a change in the averaging kernels. This change, however, should not influence the trend estimates because the data have been harmonized (see Sect. 2.1.1) and thus corrected for such effects. The harmonization also corrects for possible effects due to changes in the temporal resolution (from 1 h to 30 min).
In the upper stratosphere (above 2 hPa), the GROMOS trend estimates are mostly insignificant but negative. This is probably influenced by the negative trend observed in the mesosphere. A negative ozone trend in the mesosphere is consistent with theory, 20 because increased methane emissions lead to an enhanced ozone loss cycle above 45 km (Pawson et al., 2014;Brasseur and Solomon, 2005). However, this is not further investigated in the present study because of the low mesospheric measurement response in the GROMOS filter bench data (before 2009).
In the middle and lower stratosphere (below 5 hPa) using different anomaly corrections results in largest trend differences, with trends ranging from 2 to 3 % per decade. This result suggests that the GROMOS anomalies mostly affect these altitudes 25 between 30 and 5 hPa. The corrected GROMOS trend (case III) is not significantly different from zero below 17 hPa, and cases I (uncorrected) and II (corrected by S y ) show significant trends. Compared to other studies, the GROMOS trends in the lower stratosphere are slightly higher than trends of most merged satellite data sets, but the corrected GROMOS trend lies within their uncertainty bars.
In summary, correcting the GROMOS trend with our anomaly approach leads to a trend profile of 2 to 2.5 % per decade in the 30 lower and middle stratosphere, which agrees with satellite-based trends from recent studies. The GROMOS trends are almost not affected by anomalies at 4 hPa (≈ 37 km), suggesting a robust ozone recovery of 2.5 % per decade. At lower altitudes, trends are more affected by the detected anomalies, and the corrected trend estimate shows a trend of 2 % per decade. This result implies that ozone might also recover at lower altitudes, but the uncertainties, the dependency on anomalies, and the insignificance of the corrected trend show that this result is less robust.

Influence of temporal sampling on trends
Stratospheric ozone at northern mid-latitudes has a strong seasonal cycle of ∼ 16 %  and a moderate diurnal cycle of 3 to 6 % (Schanz et al., 2014;Studer et al., 2014). The sampling rate of ozone data is thus important for trend estimates, because measurement dependencies on season or time of the day might influence the resulting trends. An important characteristic of microwave radiometers is their measurement continuity, being able to measure during day and night, as well 5 as during almost all weather conditions (except for an opaque atmosphere) and thus during all seasons. Other ground-based instruments such as lidars are temporally more restricted because they typically acquire data during clear-sky nights only.
Clear-sky situations are more frequent during high-pressure events which vary with season and location (Steinbrecht et al., 2006;Hatzaki et al., 2014). The lidar measurements thus do not only depend on the daily cycle, but also on location and season. The seasonal dependency for ozonesondes might be smaller, but they are only launched during daytime.
10 Figure 8 gives an example of how the measurement sampling rate can influence resulting trends. The GROMOS time series is used for these trend estimates, once using only measurements at the time of ozonesonde launches in Payerne, and once only at the time of lidar measurements at MOH. The differences to the trend that uses the complete GROMOS sampling are not significant, but still important, especially at ∼ 40 km and in the lower stratosphere. The lidar sampling leads to higher trends (∼ 3 % per decade) than the normal sampling (∼ 2 % per decade) below 5 hPa. Above 5 hPa, using ozonesonde sampling 15 results in a higher trend than using normal or lidar sampling. Even if ozonesondes are not measuring at those altitudes, the result shows that measuring with an ozonesonde sampling (e.g. only at noon) might influence the trend at these altitudes. Our findings suggest that the time of the measurement (day or night) or the number and the timing of measurements within a month can influence the resulting trend estimates. Results concerning the time dependences of trends based on SOMORA data can be found in Maillard . We conclude that sampling differences have to be kept in mind when comparing trend 20 estimates from instruments with different sampling rates or measurement times.

Influence of time period on trends
An important factor that influences trend estimates is the length and starting year of the trend period. Several studies have shown that the choice of starting or end point affects substantially the resulting trend (Harris et al., 2015;Weber et al., 2018).
Further, the number of years in the trend period are crucial for the trend estimate (Vyushin et al., 2007;Millán et al., 2016). We 25 investigate how the GROMOS trends change when the estimations start in different years. For this, we average the GROMOS trends over three altitude ranges and determine the trend for periods of different lengths, all ending in Dec. 2017 but starting in different years (Fig. 9). As expected, the uncertainties increase with decreasing period length, and trends starting after 2010 are thus not even shown. Consequently, trends become insignificant for short trend periods. In the lower and middle stratosphere, more than 11 years are needed to detect a significant positive trend (at 95 % confidence level) in the GROMOS data, whereas 30 the 23 years considered are not enough to detect a significant trend in the upper stratosphere (above 3 hPa). Weatherhead et al. (2000) and Vyushin et al. (2007) state that at least 20 to 30 years are needed to detect a significant trend at mid-latitudes, but their results apply to total column ozone, which can not directly be compared with our ozone profiles. In general we observe that the magnitude of the trend estimates highly depends on the starting year. Furthermore, the trends start to increase in 1997 (middle stratosphere) or 1998 (lower stratosphere), probably due to the turning point in ODS. The later the trend period starts after this turning point, the higher the trend estimate is, which has also been observed by Harris et al. (2015). Starting the trend for example in the year 2000, as it is done in other studies (e.g. Steinbrecht et al., 2017;Petropavlovskikh et al., 2018), increases the GROMOS trend by almost 2 % per decade compared to the trend that starts in 1997. The trend dependency on 5 the starting year is controversial to the definition of a linear trend, which does not change with time. This suggests that the true trend might not be linear, or that some interannual variations or anomalies are not captured by the trend model. Nevertheless, our findings demonstrate that it is important to consider the starting year and the trend period length when comparing trend estimates from different instruments or different studies.

Trend comparison
Trend profiles for all instruments at the three measurement stations are shown in Fig. 10. The trend profiles agree on a positive trend in the upper stratosphere, whereas they differ at lower altitudes. Due to the given uncertainties, most of the trend profiles are not significantly different from zero at 95 % confidence interval. Only GROMOS, the lidars and the ozonesondes at OHP show significant trends in some parts of the stratosphere (bold lines in Fig. 10). 15 We observe that all instruments that measure in the upper stratosphere show a trend maximum between ∼ 4 and ∼ 1.5 hPa (between ∼ 37 km and 45 km), which indicates that ozone is recovering at these altitudes. The trend maxima range from ∼ 1 to 3 % per decade, which is comparable with recent, mainly satellite-based ozone trends for northern mid-latitudes (e.g. Ball et al., 2017;Frith et al., 2017;Sofieva et al., 2017;Steinbrecht et al., 2017). Only the lidar trend at MOH is larger throughout the whole stratosphere, with ∼ 3 % per decade in the middle stratosphere and 4 to 5 % per decade between 5 and 2 hPa. Nair et al.

20
(2015) observed similar trend results for the MOH lidar, even if they consider five less years with a trend period ranging from 1997 to 2012. Steinbrecht et al. (2006) found that lidar data at MOH and OHP derive from reference satellite data above 35 km drifts might explain our high MOH trends and lower OHP trends. The distance of ∼ 600 km between the MOH and OHP stations might also explain some differences between the lidar trends. The GROMOS trend peaks at slightly lower altitudes than the trends of the other instruments. This difference might be related to the averaging kernels of GROMOS, which indicate that GROMOS retrieves information from higher altitudes than expected (∼ 2 km difference).
In the middle and lower stratosphere, at altitudes below 5 hPa (≈ 36 km), our trends show discrepant results. The microwave 30 radiometers and the MOH lidar report trends of 0 to 3 % per decade, and also the ozonesonde at OHP confirms this positive trend. However, the ozonesondes at MOH and Payerne as well as Aura/MLS and the OHP lidar report a trend of around 0 to −2 % per decade below 5 hPa. Some of these observed trend differences can be explained by instrumental changes or differences in processing algorithms and instrument setup. The discrepancy between ozonesonde and lidar trends at OHP, for example, are possibly due to the change of the pressure-temperature radiosonde manufacturer in 2007, which resulted in a step change in bias between ozonesonde and lidar observations. A thorough harmonization would be necessary to correct the trend for this change. The SOMORA trend shows a positive peak at 30 km, which is probably due to homogenization problems that are corrected in the new retrieval version of SOMORA (Maillard . Furthermore, we have shown that sampling rates and starting years have an important effect on the resulting trend. Our results show that the lidar 5 sampling at MOH leads to a higher trend in the lower stratosphere than using a continuous sampling. The high lidar trend at MOH might therefore partly be explained by the sampling rate of the lidar. The trend period lengths differ between SOMORA, Aura/MLS and the other data sets, which might also partly explain differences in trend estimates. To explain the remaining trend differences in the lower stratosphere, further corrections e.g. for anomalies, instrumental changes or sampling rates would be necessary. In brief, trends in the lower stratosphere are not yet clear. For some instruments, significant positive trends are 10 reported, but for many other instruments trends are negative and mostly not significantly different from zero. This result reflects the currently ongoing discussion about lower stratospheric ozone trends (e.g. Ball et al., 2018;Chipperfield et al., 2018;Stone et al., 2018).
In summary, our ground-based instrument data agree that ozone is recovering around 3 hPa (40 km) with trends ranging between 1 and 3 % per decade for most data sets. In the lower and middle stratosphere between 24 and 37 km (31 and 4 hPa), 15 however, the trends disagree, suggesting that further research is needed to explain the differences between ground-based trends in the lower stratosphere.

Conclusions
Our study emphasizes that natural or instrumental anomalies in a time series affect ground-based stratospheric ozone trends.
We found that the ozone time series from the GROMOS radiometer (Bern, Switzerland) shows some unexplained anomalies. 20 Accounting for these anomalies in the trend estimation can substantially improve the resulting trends. We further compared different ground-based ozone trend profiles and found an agreement on ozone recovery at around 40 km over Central Europe.
At the same time, we observed trend differences ranging between −2 and 3 % per decade at lower altitudes.
We compared the GROMOS time series with data from other ground-based instruments in Central Europe and found that they generally agree within ±10 %. Periods with larger discrepancies have been identified and confirmed to be anomalies in 25 the GROMOS time series. We did not find the origins of these anomalies and assume that they are due to instrumental issues of GROMOS. The identified anomalies have been considered in the GROMOS trend estimations because they can distort the trend. By testing this approach first on a theoretical time series and then with the real GROMOS data, we have shown that identifying anomalies in a time series and considering them in the trend programme makes the resulting trend estimates more accurate. With this method, we propose an approach of advanced trend analysis based on the work of von Clarmann et al. 30 (2010) that may also be applied to other ground-or satellite-based data sets to obtain more consistent trend results.
Comparing the GROMOS trend with other ground-based trends in Central Europe suggests that ozone is recovering in the upper stratosphere between around 4 and 1.5 hPa (≈ 37 km and 45 km). This result confirms recent, mainly satellite-based studies. At other altitudes, we have observed contrasting trend estimates. We have shown that the observed differences can partly be explained by different sampling rates and starting years. Other reasons might be nonconformity in measurement techniques, instrumental systems or processing approaches. Further, the spatial distance between some stations might explain some trend differences, because different air masses can be measured, especially in winter when polar air extends over parts of Europe. Accounting for anomalies in the different data sets as proposed in the present study might be a first step to improve 5 trend estimates. Combined with further corrections, e.g. for sampling rates or instrumental differences, this approach may help to decrease discrepancies between trend estimates from different instruments. In many other studies, the observed trend differences are less apparent because the ground-based data are averaged over latitudinal bands (e.g. WMO, 2014;Harris et al., 2015;Steinbrecht et al., 2017). Nevertheless, it is important to be aware that ground-based trend estimates differ considerably, especially in the lower stratosphere. Exploring the origin of the differences and improving the trend profiles in a similar way 10 as we did for GROMOS may be an important further step on the way to monitor the development of the ozone layer. To summarize, we have shown that anomalies in time series need to be considered when estimating trends. Our ground-based results confirm that ozone is recovering in the upper stratosphere above Central Europe, and emphasize the urgency to further investigate lower stratospheric ozone. The presented approach to improve trend estimates can help in this endeavour.     b Uncertainty considered in the diagonal elements of Sy for months with anomalies.
c Bias uncertainty considered in the off-diagonal elements of Sy for months with anomalies, set as a correlated block.