Ground-based ozone profiles over central Europe: incorporating anomalous observations into the analysis of stratospheric ozone trends

. Observing stratospheric ozone is essential to as-sess whether the Montreal Protocol has succeeded in sav-ing the ozone layer by banning ozone depleting substances. Recent studies have reported positive trends, indicating that ozone is recovering in the upper stratosphere at mid-latitudes, but the trend magnitudes differ, and uncertainties are still high. Trends and their uncertainties are inﬂuenced by factors such as instrumental drifts, sampling patterns, discon-tinuities, biases, or short-term anomalies that may all mask a potential ozone recovery. The present study investigates how anomalies, temporal measurement sampling rates, and trend period lengths inﬂuence resulting trends. We present an approach for handling suspicious anomalies in trend estimations. For this, we analysed multiple ground-based stratospheric ozone records in central Europe to identify anomalous periods in data from the GROund-based Millimetre-wave Ozone Spectrometer (GROMOS) located in Bern, Switzerland. The detected anomalies were then used to estimate ozone trends from the GROMOS time series by considering the anomalous observations in the regression. We compare our improved GROMOS trend estimate with results derived from the other ground-based ozone records (lidars, ozonesondes, and microwave radiometers), that are all part of the Network for the Detection of Atmospheric Composition Change (NDACC). The data indicate positive trends of 1 % decade − 1 to 3 % decade − 1 at an altitude of about 39 km (3 hPa), providing a conﬁrmation of ozone recovery in the upper stratosphere in agreement with satellite observations. At lower altitudes, the ground station data show inconsistent trend results, which emphasize the importance of ongoing research on ozone trends in the lower stratosphere. Our presented method of a combined analysis of ground station data provides a useful approach to recognize and to reduce uncertainties in stratospheric ozone trends by considering anomalies in the trend estimation. We conclude that stratospheric trend estimations still need improvement and that our approach provides a tool that can also be useful for other data sets.

Abstract. Observing stratospheric ozone is essential to assess whether the Montreal Protocol has succeeded in saving the ozone layer by banning ozone depleting substances. Recent studies have reported positive trends, indicating that ozone is recovering in the upper stratosphere at mid-latitudes, but the trend magnitudes differ, and uncertainties are still high. Trends and their uncertainties are influenced by factors such as instrumental drifts, sampling patterns, discontinuities, biases, or short-term anomalies that may all mask a potential ozone recovery. The present study investigates how anomalies, temporal measurement sampling rates, and trend period lengths influence resulting trends. We present an approach for handling suspicious anomalies in trend estimations. For this, we analysed multiple ground-based stratospheric ozone records in central Europe to identify anomalous periods in data from the GROund-based Millimetrewave Ozone Spectrometer (GROMOS) located in Bern, Switzerland. The detected anomalies were then used to estimate ozone trends from the GROMOS time series by considering the anomalous observations in the regression. We compare our improved GROMOS trend estimate with results derived from the other ground-based ozone records (lidars, ozonesondes, and microwave radiometers), that are all part of the Network for the Detection of Atmospheric Composition Change (NDACC). The data indicate positive trends of 1 % decade −1 to 3 % decade −1 at an altitude of about 39 km (3 hPa), providing a confirmation of ozone recovery in the upper stratosphere in agreement with satellite observations. At lower altitudes, the ground station data show inconsistent trend results, which emphasize the importance of ongoing research on ozone trends in the lower stratosphere. Our presented method of a combined analysis of ground station data provides a useful approach to recognize and to reduce uncertainties in stratospheric ozone trends by considering anomalies in the trend estimation. We conclude that stratospheric trend estimations still need improvement and that our approach provides a tool that can also be useful for other data sets.
2016; Kuttippurath and Nair, 2017;Pazmiño et al., 2018;Strahan and Douglass, 2018). Outside of the polar regions, however, differences in ozone recovery are observed depending on altitude and latitude. The question as to whether ozone is recovering in the lower stratosphere is still controversial (Ball et al., 2018;Chipperfield et al., 2018;Stone et al., 2018;Wargan et al., 2018), whereas broad consensus exists that stratospheric ozone has stopped declining in the upper stratosphere since the end of the 1990s (Newchurch et al., 2003;Reinsel et al., 2005;Steinbrecht et al., 2006;Stolarski and Frith, 2006;Zanis et al., 2006;Steinbrecht et al., 2009a;Shepherd et al., 2014;WMO, 2014WMO, , 2018SPARC/IO3C/GAW, 2019). Recently estimated trends for upper stratospheric ozone are positive, but they are still different in magnitude and significance because detecting a small trend is a difficult task. Many factors influence stratospheric ozone such as variations in atmospheric dynamics, solar irradiance, or volcanic aerosols and the increase of greenhouse gases (WMO, 2014). Further, ozone trends might be masked by natural variability.
Other important sources for trend uncertainties are instrumental drifts, abrupt changes, biases, or sampling issues, e.g. due to instrumental differences in sampling patterns or in vertical or temporal resolution. Satellite drifts have been included in trend uncertainties in several studies (e.g. Stolarski and Frith, 2006;Frith et al., 2017). Possible statistical methods to consider abrupt changes in a time series are, for example, presented by Bates et al. (2012). Biases in ozone data sets can lead to important differences in trend estimates, especially when they occur at the beginning or the end of the considered trend period (e.g. Bai et al., 2017). The influence of non-uniform sampling patterns on trends was illustrated by Millán et al. (2016). Also Damadeo et al. (2018) showed that accounting for temporal and spatial sampling biases and diurnal variability changes satellite-based trends.
To account for several of the mentioned factors that influence trend estimates, different approaches were published following the Scientific Assessment of Ozone Depletion of the World Meteorological Organization (WMO) in 2014 (WMO, 2014), with the aim to reduce uncertainties in trend estimates. Drifts in single satellite data sets were, for example, considered in the studies by Eckert et al. (2014) or Bourassa et al. (2018), whereas Sofieva et al. (2017) used only stable satellite products with no or small drifts. The study of Steinbrecht et al. (2017) summarizes recent trend estimates using only updated satellite data sets with small drifts. The drifts were mainly identified by Hubert et al. (2016) and were not considered in the trend estimates by Harris et al. (2015) or WMO (2014). Steinbrecht et al. also used data from a large range of ground stations, but possible biases or anomalies in these ground-based data were not considered. The resulting ground-based trends consequently show some important differences and were not used in their final merged trend profile. Ball et al. (2017) used an advanced trend estimation method that considers steps in satellite time series or biases due to measurement artefacts. Their Bayesian method uses a priori information about the different satellite data sets and results in an optimal merged ozone composite, but it has not yet been applied to ground-based data.
The studies presented above agree on positive ozone trends in the upper stratosphere with some differences in magnitude and show varying trends in the middle and lower stratosphere. This agreement is more difficult to observe in ground-based data sets, in which the data variability is larger due to strong regional variability (Steinbrecht et al., 2017;WMO, 2014). Because of this larger variability, considering instrumental biases or regional anomalies is of special importance for trend estimations derived from ground-based data. In addition to Steinbrecht et al. (2017) and WMO (2014), several other studies presented ground-based trends of stratospheric ozone profiles (e.g. Steinbrecht et al., 2009a;Nair et al., 2013Nair et al., , 2015Harris et al., 2015; SPARC/IO3C/GAW, 2019), but biases in the data sets that might influence the resulting trends have not been considered yet.
The present study proposes an approach to handle the problem of anomalous observations in time series by considering the anomalies when estimating trends. For this purpose, we present the updated data set of the ground-based microwave radiometer GROMOS (GROund-based Millimetrewave Ozone Spectrometer) located in Bern, Switzerland. We determine its trends with a multilinear parametric trend model (von Clarmann et al., 2010) by considering anomalies and uncertainties in the time series, resulting in an improved trend estimate. To identify such anomalies in the GROMOS data set, we compare the GROMOS data with other ground-based data sets (lidars, ozonesondes, and microwave radiometers) in central Europe (Sect. 3). We define anomalies as periods in which the data deviates from the other data sets. Before applying our trend approach to the GROMOS time series (Sect. 4.3), we tested it with an artificial time series (Sect. 4.2). Not only anomalies in a time series influence resulting trends, but also sampling patterns and the choice of the trend period. We therefore present a short analysis of temporal sampling rate and trend period length based on the GROMOS data set (Sect. 4.3.1 and 4.3.2). Finally, we compare the improved GROMOS trend with the trends from the other data sets used (Sect. 4.4).

Ozone data sets
The stratospheric ozone profile data used in the present study come from different ground-based instruments that measure in central Europe (Table 1). They are all part of the Network for the Detection of Atmospheric Composition Change (NDACC, 2019). In addition, we used data from the Microwave Limb Sounder (MLS) on board the Aura satellite (Aura/MLS). All data from the different stations are compared to data from the GROMOS radiometer located in Bern,Switzerland (46.95

Microwave radiometers
We use data from two microwave radiometers, both located in Switzerland. They measure the 142 GHz line where ozone molecules emit microwave radiation due to rotational transitions. The spectral line measured is pressure-broadened and thus contains information about the vertical distribution of ozone molecules. To obtain a vertical ozone profile, the received radiative intensity is compared to the spectrum simulated by the Atmospheric Radiative Transfer Simulator 2 (ARTS2; Eriksson et al., 2011). By using an optimal estimation method according to Rodgers (2000), the best estimate of the vertical profile of ozone volume mixing ratio is then retrieved from the measured spectrum. This is done using the software tool Qpack2, which together with ARTS2 provides an entire retrieval environment (Eriksson et al., 2005). The GROund-based Millimetre-wave Ozone Spectrometer (GROMOS) located in Bern is the main focus of this study (Kämpfer, 1995;Peter, 1997). GROMOS has been measuring ozone spectra continuously since November 1994. Before October 2009, the measurements were performed by means of a filter bench (FB) with an integration time of 1 h. Since October 2009, a fast Fourier transform spectrometer (FFTS) with an integration time of 30 min has been used. An overlap measurement period of almost 2 years (October 2009 to July 2011) was used to homogenize the FB data, by subtracting the mean bias profile averaged over the whole overlap period (FB mean − FFTS mean ) from all FB profiles (Moreira et al., 2015). These homogenized ozone data are available on the NDACC web page (ftp://ftp.cpc.ncep.noaa.gov/ ndacc/station/bern/hdf/mwave/, last access: 28 March 2019). The FFTS retrieval used in the present study (version 2021) uses variable errors in the a priori covariance matrix of around 30 % in the stratosphere and 70 % in the mesosphere and a constant measurement error of 0.8 K (Moreira, 2017). The retrieved profiles have a vertical resolution of ∼ 15 to 25 km in the stratosphere. We concentrate in this study on the middle and upper stratosphere between 31 hPa (≈ 24 km) and 0.8 hPa (≈ 49 km), where the retrieved ozone is quasiindependent of the a priori information. This is assured by limiting the altitude range to the altitudes where the area of the averaging kernels (measurement response) is larger than 0.8, which means that more than 80 % of the information comes from the observation rather than from the a priori data (Rodgers, 2000). More information about the homogenization as well as the parameters used in the retrieval can be found in Moreira et al. (2015). Besides the described data harmonization to account for the instrument upgrade, we performed some additional data corrections. Because the stratospheric signal is weak in an opaque and humid troposphere, we discarded measurements when the atmospheric transmittance was smaller than 0.3. Excluding measurements in such a way should not result in a sampling bias because tropospheric humidity is uncorrelated to stratospheric ozone. Also, the data have been corrected for outliers at each pressure level by removing values that exceed 4 times the standard deviation within a 3-day moving median window. Profiles were excluded completely when more than 50 % of their values were missing (e.g. due to outlier detection). Furthermore, we omitted profiles in which the instrumental system temperature showed outliers exceeding 4 times the standard deviation within a 30-day moving median window.
The second microwave radiometer used in this study is the Stratospheric Ozone MOnitoring RAdiometer (SOMORA). It was built in 2000 as an update of the GROMOS radiometer and has been located in Payerne since 2002. Some instrumental changes were performed at the beginning of 2005 (frontend change) and in October 2010, when the acousto-optical spectrometer of SOMORA was upgraded to an FFTS (Maillard Barras et al., 2015). The data have been harmonized to account for the spectrometer change. The instrument covers an altitude range from 25 to 60 km with a temporal resolution of 30 min to 1 h. In this study, we consider SOMORA data at an altitude range between 18 hPa (≈ 27 km) and 0.8 hPa (≈ 49 km). For more information about SOMORA, refer to Calisesi (2003) concerning the instrumental setup and Maillard Barras et al. (2009Barras et al. ( , 2015 concerning the operational version of the ozone retrievals used in the present study.

Lidars
We use data from two differential absorption lidar (DIAL; Schotland, 1974) instruments in Germany and France. The instruments emit laser pulses at two different wavelengths, one of which is absorbed by ozone molecules and the other which is not. Comparing the backscattered signal at these two wavelengths provides information on the vertical ozone distribution in the atmosphere. The lidars can only retrieve ozone profiles during clear-sky nights due to scattering on cloud particles and the interference with sunlight.
The lidar at the Meteorological Observatory Hohenpeissenberg (MOH) has been operating since 1987, emitting laser pulses at 308 and 353 nm (Werner et al., 1983;Steinbrecht et al., 2009b). On average, it retrieves eight night profiles a month. In this study we limit the data to the altitude range in which the measurement error averaged over the whole study period is below 10 % (below 42 km or 2 hPa). The lower alti- tude limit was set to the chosen limit of GROMOS at 31 hPa (≈ 24 km).
The Observatory of Haute Provence (OHP) operates a lidar that has been measuring in its current setup since the end of 1993 (Godin-Beekmann et al., 2003). The lidar emits laser pulses at 308 nm and 355 nm, as first described by Godin et al. (1989). The instrument measures on average 11 profiles per month. We use OHP lidar profiles below 40 km (≈ 2.7 hPa) for which the averaged measurement error remains below 10 %. As a lower altitude limit, we use 31 hPa (≈ 24 km) to be consistent with the GROMOS limits. More detailed information about the lidars and ozonesondes used can be found, for example, in Godin et al. (1999) and Nair et al. (2011Nair et al. ( , 2012.

Ozonesondes
The three mentioned observatories at Payerne, MOH, and OHP also provide weekly ozonesonde measurements. The ozonesonde measurements in Payerne are usually performed three times a week at 11:00 UTC (Jeannet et al., 2007), resulting in 13 profiles per month on average. The meteorological balloon carried a Brewer Mast sonde (BM; Brewer and Milford, 1960) until September 2002, which was then replaced by an electrochemical concentration cell (ECC; Komhyr, 1969). The profiles are normalized using concurrent total column ozone from the Dobson spectrometer at Arosa, Switzerland (46.77 • N, 9.7 • E;1850 m a.s.l.;Favaro et al., 2002). If the Dobson data are not available, forecast ozone column estimates based on GOME-2 (Global Ozone Monitoring Experiment-2) data are used (http://www.temis. nl/uvradiation/nrt/uvindex.php, last access: 28 March 2019).
Ozone soundings at MOH are performed two to three times per week with a BM sonde (on average 10 profiles per month). Three different radiosonde types have been used since 1995, all carrying a BM ozonesonde, without major changes in its performance since 1974 . The profiles are normalized by on-site Dobson or Brewer spectrophotometers and, if not available, by satellite data .
At OHP, ECC ozonesondes have been used since 1991 with several instrumental changes (Gaudel et al., 2015). The data were normalized with total column ozone measured by a Dobson spectrophotometer until 2007 and an ultraviolet-visible SAOZ (Système d'Analyse par Observation Zénithale) spectrometer afterwards (Guirlet et al., 2000;Nair et al., 2011). In our analysed period, four profiles are available on average per month.
Ozonesonde data are limited to altitudes up to ∼ 30 km, above which the balloon usually bursts. Therefore, we used ozonesonde profiles only below 30 km, which is a threshold value for Brewer Mast ozonesondes with precision and accuracy below ± 5 % (Smit and Kley, 1996). For normalization, the correction factor (CF), which is the ratio of total column ozone from the reference instrument to the total ozone from the sonde, has been applied to all ozonesonde profiles. At all measurement stations, we discarded profiles when their CF was larger than 1.2 or smaller than 0.8 (Harris et al., 1998;Smit and ASOPOS Panel, 2013). We further excluded profiles with extreme jumps or constant ozone values, as well as profiles with constant or decreasing altitude values.

Aura/MLS
The microwave limb sounder (MLS) on the Aura satellite, launched in mid-2004, measures microwave emission from the Earth in five broad spectral bands (Parkinson et al., 2006). It provides profiles of different trace gases in the atmosphere, with a vertical resolution of ∼ 3 km. Stratospheric ozone is retrieved by using the spectral band centred at 240 GHz. We Atmos. Chem. Phys., 19, 4289-4309, 2019 www.atmos-chem-phys.net/19/4289/2019/ used ozone data from Aura/MLS version 4.2 above Bern with a spatial coincidence of ±1 • latitude and ±8 • longitude, where the satellite passes twice a day (around 02:00 and 13:00 UTC). More information about the MLS instrument and the data product can be found, e.g. in Waters et al. (2006). We chose the Aura/MLS data for our study because there are no drifts between 20 and 40 km .

Time series comparison
To identify potential anomalies in the GROMOS data, we compared the data with the other described data sets in the time period from January 1995 to December 2017, except for some instruments that cover a shorter time period (Table 1).

Comparison methodology
To compare GROMOS with the other instruments, the different data sets have been processed to compare consistent quantities and have been smoothed to the GROMOS grid. Taking relative differences between the data sets made it possible to identify anomalous periods in the GROMOS time series.

Data processing
In this study we concentrate on the altitude range between 31 and 0.8 hPa, in which the a priori contribution to GROMOS profiles is low (see Sect. 2.1). We therefore limit all instrument data to this altitude range and divide it into three parts. For convenience, they will be referred to as the lower stratosphere between 31 and 13 hPa (≈ 24 to 29 km), the middle stratosphere between 13 and 3 hPa (≈ 29 to 39 km), and the upper stratosphere between 3 and 0.8 hPa (≈ 39 to 49 km). The limits for the upper stratosphere agree with the common definition (e.g. Ramaswamy et al., 2001), whereas the lower and middle stratosphere defined here are usually referred to as the middle stratosphere in other studies. Most of the instruments provide volume mixing ratios (VMRs) of ozone in parts per million (ppm). In cases that the data were given in number density (molecules cm −3 ), the VMR was calculated with the air pressure and temperature provided by the same instrument for ozonesondes or co-located ozone-or radiosonde data for lidars. For lidar measurements, these sonde temperature and pressure profiles are completed above the balloon burst by operational model data from the National Center for Environmental Prediction (NCEP) at OHP and by lidar temperature measurements and extrapolated radiosonde pressure data at MOH.
The GROMOS, SOMORA, and Aura/MLS profiles have a constant pressure grid, which is not the case for the lidar and ozonesonde data. The lidar and ozonesonde data were therefore linearly interpolated to a regular spaced altitude grid of 100 m for the ozonesonde at OHP and Payerne and 300 m for the lidars and the ozonesonde at MOH. The mean profile of the interpolated pressure data then built the new pressure grid for the ozone data. These interpolated lidar and ozonesonde data are used for the trend estimations. For the direct comparison with GROMOS, the data were adapted to the GROMOS grid, which is described in the next section (Sect. 3.1.2). Our figures generally show both pressure and geometric altitude. The geometric altitude is approximated by the mean altitude grid from GROMOS, which is determined for each retrieved profile from operational model data of the European Centre for Medium-Range Weather Forecasts (ECMWF).

GROMOS comparison and anomalies
The vertical resolution of GROMOS and SOMORA is usually coarser than for the other instruments. When comparing profiles directly with GROMOS profiles, the different vertical resolution of the instruments has to be considered. Smoothing the profiles of the different instruments by GRO-MOS' averaging kernels makes it possible to compare the profiles with GROMOS without biases due to resolution or a priori information (Tsou et al., 1995). The profiles with higher vertical resolution than GROMOS were convolved by the averaging kernel matrix according to Connor et al. (1991), with where x conv is the resulting convolved profile, x a is the a priori profile used in the GROMOS retrieval, x a,h is the same a priori profile but interpolated to the grid of the highly resolved measurement, x h is the profile of the highly resolved instrument, and AVK is the corresponding averaging kernel matrix from GROMOS. The rows of the AVK have been interpolated to the grid of the highly resolved instrument and scaled to conserve the vertical sensitivity (Keppens et al., 2015). The SOMORA profiles have a similar vertical resolution as profiles from GROMOS and were thus not convolved because this would require a more advanced comparison method as proposed by Rodgers and Connor (2003) or Calisesi et al. (2005). GROMOS and SOMORA have a higher temporal resolution than the other instruments. For SOMORA, only profiles coincident in time with GROMOS have been selected. For the other instruments, a mean of GROMOS data at the time of the corresponding measurement was used, with a time coincidence of ±30 min. Only for the lidars were GROMOS data averaged over the whole lidar measurement time (usually one night).
For comparison with GROMOS we computed relative differences between the monthly mean values of the different data sets and the monthly mean of the coincident GROMOS profiles. The relative difference (RD) for a specific month i has been calculated by subtracting the monthly ozone value of the data set (X i ) from the corresponding GROMOS monthly mean (GR i ), using the GROMOS monthly mean as a reference: (2) Based on the relative differences we identified periods in which GROMOS differs from the other instruments. To identify these anomalies we used a debiased relative difference (RD debiased ), given by where RD X is the mean relative difference of GROMOS to the data set X over the whole period. This made it possible to ignore a potential constant offset of the instruments and to concentrate on periods with temporally large differences to GROMOS. When this debiased relative difference was larger than 10 % for at least three instruments, the respective month was identified as an anomaly in the GROMOS data. Above 2 hPa, for which only SOMORA and Aura/MLS data are available, both data sets need to have a debiased relative difference to GROMOS larger than 10 % to be identified as an anomaly.

GROMOS time series
The monthly means of the GROMOS time series (Fig. 1a) clearly depict the maximum of ozone VMR between 10 hPa and 5 hPa and the seasonal ozone variation, with increased spring-summer ozone in the middle stratosphere and increased autumn-winter values in the upper stratosphere . Figure 1b shows GROMOS' relative deviations from the monthly climatology (monthly means over the whole period 1995 to 2017). This ozone deviation is calculated by the ratio of the deseasonalized monthly means (difference between each individual monthly mean and the corresponding climatology of this month) and the monthly climatology. We observe some periods in which GROMOS data deviate from their usual values, mostly distinguishable between the lower-middle stratosphere and the upper stratosphere. In the lower-middle stratosphere we observe negative anomalies (less ozone than usual) in 1995 to 1997  Besides this we did not detect any systematic instrumental issues in the GROMOS data that could explain the anomalies. Therefore, we compare the GROMOS data with the presented ground-based data sets, as well as with Aura/MLS data, to check whether the observed anomalies are due to natural variability or due to unexplained instrumental issues.

Comparison of different data sets
We compared GROMOS with ground-based and Aura/MLS data and averaged them over three altitude ranges (Fig. 2).
The different data sets have been smoothed with the averaging kernels of GROMOS to make a direct comparison possible, as described in Sect. 3.1.2. Due to the similar vertical resolution of GROMOS and SOMORA, the SOMORA profiles have not been smoothed by GROMOS' averaging kernels, despite differences between their a priori data and averaging kernels. This might lead to larger differences between GRO-MOS and SOMORA than between GROMOS and the other instruments. To avoid an instrument not covering the full range of one of the three altitude ranges, all ozonesonde data have been cut at 30 km (≈ 11.5 hPa), all lidar data at 3 hPa (≈ 39 km), and all SOMORA data below 13 hPa (≈ 29 km) for this analysis. The different instrument time series shown in Fig. 2 only contain data that are coincident with GRO-MOS measurements as described in Sect. 3.1.2, whereas the GROMOS data shown here represent the complete GRO-MOS time series with its high temporal sampling. This might lead to some sampling differences that are not considered in this figure.
The different data sets agree well, showing, however, periods in which some instruments deviate more from GROMOS than others. In the upper stratosphere (Fig. 2a), GROMOS and SOMORA agree well most of the time, but GROMOS reports slightly less ozone than SOMORA and also smaller values than Aura/MLS. A step change between SOMORA and GROMOS is visible in 2005, which might be related to the SOMORA front-end change in 2005. In the middle stratosphere (Fig. 2b) dars have also been observed by Steinbrecht et al. (2017), as can be seen in the latitudinal lidar averages of their study. Differences between the data sets in the lower stratosphere can be better seen in Fig. 3. The monthly relative differences of time coincident pairs of GROMOS (GR) and the convolved data set X are shown, with GROMOS data as reference values (Eq. 2). The mean relative difference of all instruments compared to GROMOS (black line in Fig. 3) generally lies within ∼ ±10 %. However, there are some periods with larger deviations, in which GROMOS measures less ozone than the other instruments (negative relative difference) in 1995 to 1997 and in 2006 in the lower stratosphere and in 2016 in the middle and upper stratosphere. We further observe that the relative difference between GROMOS and the OHP ozonesonde shows some important peaks in the last decade, indicating that the sonde often measures more ozone than GROMOS. The ozonesonde data seem to have some outlier profiles. When comparing the monthly means of coincident pairs, these outliers are even more visible because only a small number of OHP ozonesonde profiles are available per month (only four profiles on average).
For a broader picture, the same relative differences to GROMOS are shown in Fig. 4, but each panel represents an individual instrument, and all altitude levels are shown. The anomalies for which at least three data sets (or two above 2 hPa) deviate by more than 10 % from GROMOS (as de- The negative anomalies in 1995 to 1997 in the lower stratosphere that we observed in Fig. 3 were only partly detected as anomalies with our anomaly criteria. To summarize our comparison results, we observed some periods with anomalies compared to GROMOS' climatology ( Fig. 1). Some of these anomalies were also observed when comparing GROMOS to the different data sets (Figs. 2, 3,  and 4). This implies that the source of the anomalies is local variations in Bern or instrumental issues of GROMOS rather than broad atmospheric variability. We can thus conclude that the observed negative GROMOS anomalies in the lower stratosphere in 2006 and in the upper stratosphere in 2016 are biases in the GROMOS time series. The same is the case for the positive anomalies in 2000 and 2014 in the lower and middle stratosphere and also for some summer months in 2015, 2016, and 2017. In contrast to these confirmed anomalies, the GROMOS anomalies in the lower stratosphere in 1995 to 1997 (negative) and 1998 and 1999 (positive) as observed in Fig. 1 are small when comparing to the other instruments and are thus only confirmed for a few months by our anomaly detection. The biased periods in the upper were not confirmed by comparing GROMOS to SOMORA and might thus be real ozone variations. However, we have to keep in mind that fewer instruments (only SOMORA and Aura/MLS) provide data for comparison above 2 hPa, which makes the anomaly detection less robust at these altitudes, especially prior to the start of Aura/MLS measurements in 2004. Our results are consistent with those reported by Moreira et al. (2017). They compared GROMOS with Aura/MLS data and also observed positive deviations of GROMOS in the middle stratosphere in 2014 and 2015 as well as a negative deviation in the upper stratosphere in 2016. Hubert et al. (2019) found similar anomalies in the GROMOS time series by comparing ground-based data sets to several satellite products (see also SPARC/IO3C/GAW, 2019). Some of our detected biased periods were also found by Steinbrecht et al. (2009a) who compared different ground-based instruments, for example, the GROMOS anomaly in 2006. They attribute the observed biases to sampling differences but also to irregularities in some data sets. In fact, our results confirm irregularities in the GROMOS time series, which are probably due to instrumental issues of GROMOS and not only due to sampling differences.

Ozone trend estimations
Ozone trends are estimated in the present study by using a multilinear parametric trend model (von Clarmann et al., 2010). By comparing the GROMOS data to the other data sets as described above (Sect. 3.3), we have confirmed some anomalous periods in the GROMOS time series. To improve the GROMOS trend estimates, we now use these detected anomalies and consider them in the regression fit. In the following, the trend model (Sect. 4.1) will first be applied to an artificial time series to test and illustrate the approach of considering anomalies in the regression (Sect. 4.2). It will then be applied on GROMOS data (Sect. 4.3) before comparing the resulting GROMOS trends to trends from the other instruments (Sect. 4.4).

Trend model
To estimate stratospheric ozone trends, we applied the multilinear parametric trend model from von Clarmann et al. (2010) to the monthly means of all individual data sets. The model fits the following regression function: . In addition to the described natural oscillations, we used four periodic oscillations with different wavelengths l n to account for annual (l 1 = 12 months) and semi-annual (l 2 = 6 months) oscillation as well as two further overtones of the annual cycle (l 3 = 4 months and l 4 = 3 months).
The strength of the model from von Clarmann et al. (2010) used in our study is that it can consider inhomogeneities in data sets, by considering a full error covariance matrix (S y ) when reducing the cost function (χ 2 ). Inhomogeneous data can originate from changes in the measurement system (e.g. changes in calibration standards or merging of data sets with different instrumental modes), from irregularities in spatial or temporal sampling, or from unknown instrumental issues. Such inhomogeneities lead to groups of temporally correlated data errors, that can, if not considered, change significance and slope of the estimated trend (von Clarmann et al., 2010). The study of von Clarmann et al. (2010) presents an approach to consider known or suspected inhomogeneities in the trend analysis. They divide the data into multiple subsets which are assumed to be biased with respect to each other and which are characterized by diagonal blocks in the data covariance matrix. We use their method and code in our trend analyses to account for inhomogeneities. The inhomogeneities are in our case anomalies in some months that we identified as described in Sect. 3.1.2.
The total uncertainty of the data set is represented by a full error covariance matrix S y that describes covariances between the measurements in time for each pressure level. The covariance matrix is for each instrument given by where S instr gives the monthly uncertainty estimates for the instrument, S autocorr accounts for residuals autocorrelated in time which are caused by atmospheric variation not captured by the trend model, and S bias describes the bias uncertainties when a bias is considered. The diagonal elements of S instr are set to the monthly means of the measurement uncertainties for each instrument. For lidar data this is on average 4 % for the OHP lidar and 6 % at MOH between ∼ 20 and 40 km, with smallest uncertainties in the middle stratosphere. For the ozonesonde, the uncertainty is assumed to be 5 % for all ECC sondes and 10 % for BM sondes (Harris et al., 1998;Smit and ASOPOS Panel, 2013). For SOMORA we use uncertainties composed of smoothing and observational error, ranging from 1 % to 2 % in the middle stratosphere and 2 % to 7 % in the upper stratosphere. The Aura/MLS uncertainties used range between 2 % and 5 % throughout the stratosphere. For GROMOS we use uncertainty estimates as described by Moreira et al. (2015), composed of the standard error (σ/ √ DOF, with standard deviation σ and degrees of freedom DOF) of the monthly means, an instrumental uncertainty (measurement noise), and an estimated systematic instrument uncertainty obtained from cross-comparison. The resulting uncertainty values are approximately 5 % in the middle stratosphere, 6 % to 8 % in the lower stratosphere, and 6 % to 7 % in the upper stratosphere. The off-diagonal elements of S instr are set to zero, assuming no error correlation between the measurements in time. The additional covariance matrix S autocorr , which is added to S instr , is first also set to zero. In a second iteration, autocorrelation coefficients are inferred from residuals of the initial trend fit. The mean variance of S autocorr is scaled such that the χ 2 divided by the degrees of freedom of the trend fit with the new S y becomes unity. The degrees of freedom are the number of data points minus the number of fitted variables. The latter are the number of the coefficients of the trend model plus the number of relevant correlation terms inferred by the procedure described above. This additional covariance matrix S autocorr represents contributions to the fit residuals which are not caused by data errors but by phenomena that are not represented by the trend model. S autocorr is only considered if the initial normalized χ 2 is larger than unity, which is not the case if the assumed data errors are larger than the fit residuals (Stiller et al., 2012). The more sophisticated uncertainty estimates that we use for GROMOS are larger than the residuals in the first regression fit (χ 2 < 1), which means that the time correlated residuals (S autocorr ) are not considered for the GROMOS trend. For all the other instruments, however, correlated residuals are considered because we use simple instrumental uncertainties that are usually smaller than the fitted residuals.
To account for the anomalies in the time series when estimating the trends we adapted S y in two steps. First, we increased the uncertainties for months and altitudes for which anomalies were identified (using the method described in Sect. 3.1.2). For this purpose, we set the diagonal elements of S instr for the respective month i to a value obtained from the mean difference to all instruments (RD debiased,i ) and the mean GROMOS ozone value. Assuming, for example, that GROMOS deviates from all instruments on average by 10 %  in August 2014 at 10 hPa, the overall mean August value at this altitude would be 7 ppm. In this case we would assume an uncertainty of 0.7 ppm for this biased month at this altitude level. In a second step, we account for biases in the data subsets in which anomalies were detected. For this, a fully correlated block composed of the squared estimated bias uncertainties is added to the part of S bias that is concerned by anomalies. For example, to account for a bias in the summer months of 2014, a fixed bias uncertainty is added to all vari-ances and covariances of these months in S bias . Considering the bias in this way is mathematically equivalent to treating the bias as an additional fit variable that is fitted to the regression model with an optimal estimation method, as shown by von Clarmann et al. (2001). The value chosen for the bias uncertainty determines how much the bias is estimated from the data itself. For small values, the bias will be close to the a priori value, which would be a bias of zero; for large values it will be estimated completely from the given data. The bias Atmos. Chem. Phys., 19, 4289-4309, 2019 www.atmos-chem-phys.net/19/4289/2019/ can thus be estimated from the data itself, which makes the method more robust because it does not depend on an a priori choice of the bias (Stiller et al., 2012). In a sensitivity test we found that assuming an uncertainty value of 5 ppm for the correlated block permits a reliable fit, whereby the bias is fitted independently of the a priori zero bias. The described procedure for anomaly consideration is only applied to the GROMOS time series, whereas the other trend estimates were not corrected for anomalies. Our ozone trend estimates always start in January 1997, which is the most likely turning point for ozone recovery due to the decrease of ODSs (Jones et al., 2011), and all end in December 2017. Exceptions are the trends from SOMORA and Aura/MLS. The SOMORA trend starts in January 2000 when the instrument started to measure ozone. Aura/MLS covers the shortest trend period, starting only in January 2005. The trends are always given in % decade −1 , which is determined at each altitude level from the regression model output in ppm decade −1 by dividing it for each data set by its ozone mean value of the whole period. We declare a trend to be significantly different from zero at a 95 % confidence interval, as soon as its absolute value exceeds twice its uncertainty. This statistically inferred confidence interval is based on the assumed instrumental uncertainties. Unknown drifts of the data sets, however, are not considered in this claim.

Artificial time series analysis
The trend programme from von Clarmann et al. (2010) can handle uncertainties in a flexible way, which makes it possible to account for anomalies in a time series, as described above (Sect. 4.1). To investigate how anomalies can be best considered in the trend programme, we first test the programme with a simple artificial time series and then try to use the specific features of the trend programme to immunize it against anomalies. The artificial time series used for this purpose consists of monthly ozone values from January 1997 to December 2017 and is given by with the monthly time vector t, a constant ozone value a = 7 ppm, and a trend of 0.1 ppm decade −1 , i.e. b = 0.1/120 ppm per month. We consider a simple seasonal oscillation with an amplitude A = 1 ppm such that A 2 = g 2 +h 2 (e.g. g = h = − √ 0.5 ppm) and a wavelength l n = 12 months. The uncertainty was assumed to be 0.1 ppm for each monthly ozone value, which was considered in the diagonal elements of S y . No noise was superimposed to the data. This artificial time series (shown in Fig. 5a) is later on referred to as case A. The estimated trend for this simple time series corresponds quasi-perfectly to the assumed time series' trend (0.1 ppm decade −1 ), which proves that the trend programme works well. The residuals are of order 10 −6 and increase towards the start and end of the time series (Fig. 5b). To investigate how the trend programme reacts to anomalies in the time series, we performed the following sensitivity study. We increased the monthly ozone values in the summer months (June, July, August) of 2014, 2015, and 2017 by 5 % (≈ 0.4 ppm). Since we are interested in the net effect of the anomalies, again no noise was superimposed on the test data. The same error covariance matrix S y as in case A has been used. For this modified time series (case B), we observe a trend of ∼ 0.13 ppm decade −1 instead of the expected ∼ 0.1 ppm decade −1 (Fig. 6 and Table 2). Assuming that a real time series contains such suspicious anomalies due to, for example, instrumental issues, they would distort the true trend.
A simple way to handle such anomalies would be to omit anomalous data in the time series. This, however, would increase trend uncertainties and lead to a loss of important information. Therefore, we use the presented approach to handle anomalous observations in the time series when estimating the trend. To account for these anomalies in the trend estimation, we make use of the fact that the user of the trend programme has several options to manipulate the error covariance matrix S y . In a first attempt we decreased the weight of the anomalies in the time series by increasing their uncertainties (diagonal elements of S y ) and set the uncertainties of the affected summer months to 0.36 ppm (case C in Fig. 6 and Table 2). This uncertainty value corresponds to 5 % of the overall mean ozone value. The uncertainty of the months without anomalies remained 0.1 ppm. No additional error correlations between the anomaly-affected data points were considered. The impact of the anomaly is already reduced from about 33 % to about 3 %. The estimated error of the trend has slightly increased because, with this mod-  ified covariance matrix which represents larger data errors, the data set contains less information.
In a next step, we added a correlated block to the covariance matrix for the months containing anomalies and applied the S y once without (case D) and once with (case E) the increased diagonal elements of S y . Adding the correlated block to S y corresponds to an unknown bias of the data subset that is affected by anomalies and leads to a free fit of the bias magnitude. This bias is represented in S y as a fully correlated block of (5 ppm) 2 . It has an expectation value of zero and an uncertainty of 5 ppm. With this approach (cases D and E), the trend obtained from the anomaly-affected data is almost identical to the trend obtained from the original data (case A). This implies that the trend estimation has successfully been immunized against the anomaly.
In summary, we found that the trend estimates based on anomaly-affected data are largely improved by consideration of the anomalies in the covariance matrix, while without this, a largely erroneous trend is found. For this purpose it is not necessary to know the magnitude of the systematic anomaly. It is only necessary to know which of the data points are affected. We further found that the trend estimate is closer to its true value when higher uncertainties are chosen (diagonal elements of S y or correlated block in S y ; not shown). This can be explained by the fact that the additional uncertainties represented by S y allow the bias to vary as in an optimal estimation scheme in which the bias is a fit variable itself (von Clarmann et al., 2001). When estimating the bias, the larger the bias uncertainty is, the less confidence is accounted to the a priori knowledge about the bias (that would be a bias of zero), and the bias is then determined directly from the data as if it were an additionally fitted variable. Based on our test with the artificial time series, we conclude that our method succeeds in handling suspicious anomalies in a time series, leading to an estimated trend close to the trend that would be expected without anomalies.

GROMOS trends
The GROMOS time series has been used for trend estimations in Moreira et al. (2015), who found a significant positive trend in the upper stratosphere. In recent years, GRO-MOS showed some anomalies leading to larger trends than expected in the middle atmosphere, as, for example, shown by Steinbrecht et al. (2017) or SPARC/IO3C/GAW (2019). These larger trends motivated us to improve GROMOS trend estimates by accounting for the observed anomalies. In the following, we present the trend profiles of the GROMOS time series by considering the detected anomalies in the trend programme with the different correction methods that were introduced in Sect. 4.1. In a first step, we estimated the trend without considering anomalies in the data (case I in Fig. 7). The uncertainties of the data that are used as diagonal elements in S y range between 5 % and 8 % and are composed Atmos. Chem. Phys., 19, 4289-4309, 2019 www.atmos-chem-phys.net/19/4289/2019/ Figure 7. GROMOS ozone trends from January 1997 to December 2017, without considering anomalies (case I), considering anomalies in the diagonal elements of S y (case II), and considering a correlated bias block for anomalies (case III). The shaded areas show 2σ uncertainties, whereas the bold lines represent altitudes at which the trend profile is significantly different from zero at a 95 % confidence interval (|trend| > 2σ ).
as proposed by Moreira et al. (2015) (see Sect. 4.1). In a second step (case II), we increased the uncertainties (diagonal elements of S y ) for the months and altitudes that were detected as anomalies. The uncertainties for these anomalous data have been set to a value obtained from the difference to the other data sets (RD debiased ), as described in Sect. 4.1. Finally, we considered a fully correlated block for the periods in which anomalies were detected to fit a bias (case III). The bias uncertainty was set to 5 ppm at all altitudes, which ensures that the bias is fully estimated from the data. The diagonal elements of S y stayed the same as in case II. The trend profiles in Fig. 7 report an uncorrected GRO-MOS ozone trend (case I) of about 3 % decade −1 in the middle stratosphere from 31 to 5 hPa, which decreases above to around −4 % decade −1 at 0.8 hPa. Correcting the trend by increasing the uncertainty for months with anomalies (case II) decreases the trend slightly in the lower stratosphere, but the differences are small. Using a correlated block in the covariance matrix to estimate the bias of each anomaly in an optimized way (case III) decreases the trend by around 1 % decade −1 in the lower stratosphere. The trend profile of this optimized trend estimation has a trend of 2.2 % decade −1 in the lower and middle stratosphere and peaks at approximately 4 hPa with a trend of 2.7 % decade −1 . The corrected trend profile is consistent with recent satellite-based ozone trends (e.g. Steinbrecht et al., 2017) in the middle stratosphere. In the upper stratosphere it decreases again to −2.4 % decade −1 at 0.8 hPa.
All these different GROMOS trend estimates, considering anomalies in different ways, show a significant positive trend of ∼ 2.5 % decade −1 at around 4 hPa (≈ 37 km) and a trend decrease above. This agreement indicates that the trend at these altitudes is only marginally affected by the identified anomalies and is rather robust. We can thus conclude that the trend of around 2.5 % decade −1 is a sign for an ozone recovery in the altitude range of 35 to 40 km. This result is consistent with trends derived from merged satellite data sets as, for example, found in Ball et al. (2017, Sofieva et al. (2017), or Steinbrecht et al. (2017, even though the GROMOS trend peak is observed at slightly lower altitudes. A possible reason for this difference in the trend peak altitude might be related to the averaging kernels of the current GROMOS retrieval version. We observe that after the instrument upgrade in 2009, the averaging kernels peak at higher altitudes than expected, indicating that the information is retrieved from slightly higher altitudes (∼ 2 to 4 km). It is therefore possible that the trend peak altitude does not exactly correspond to the true peak altitude. The instrumental upgrade in 2009 led to a change in the averaging kernels. This change, however, should not influence the trend estimates because the data have been harmonized (see Sect. 2.1) and thus corrected for such effects. The harmonization also corrects for possible effects due to changes in the temporal resolution (from 1 h to 30 min).
In the upper stratosphere (above 2 hPa), the GROMOS trend estimates are mostly insignificant but negative. This is probably influenced by the negative trend observed in the mesosphere. A negative ozone trend in the mesosphere is consistent with theory because increased methane emissions lead to an enhanced ozone loss cycle above 45 km (WMO, 2014). However, this is not further investigated in the present study because of the small mesospheric measurement response in the GROMOS filter bench data (before 2009).
In the middle and lower stratosphere (below 5 hPa), using different anomaly corrections results in largest trend differences, with trends ranging from 2 % decade −1 to 3 % decade −1 . This result suggests that the GROMOS anomalies mostly affect these altitudes between 30 and 5 hPa. The corrected GROMOS trend (case III) is not significantly different from zero below 23 hPa, but cases I (uncorrected) and II (corrected by S y ) show significant trends. Compared to other studies, the GROMOS trends in the lower stratosphere are slightly larger than trends of most merged satellite data sets.
In summary, correcting the GROMOS trend with our anomaly approach leads to a trend profile of ∼ 2 % decade −1 to 2.5 % decade −1 in the lower and middle stratosphere. This is consistent with satellite-based trends from recent studies in the middle stratosphere but is still larger than most satellite trends in the lower stratosphere. The GROMOS trends are almost not affected by anomalies at 4 hPa (≈ 37 km), sug- gesting a robust ozone recovery of 2.5 % decade −1 at this altitude. At lower altitudes, trends are more affected by the detected anomalies, and the corrected trend estimate shows a trend of 2 % decade −1 . The larger uncertainties in the lower stratosphere, the dependency on anomalies, and the insignificance of the corrected trend show that this positive trend in the lower stratosphere is less robust than the trend at higher altitudes.

Influence of temporal sampling on trends
Stratospheric ozone at northern mid-latitudes has a strong seasonal cycle of ∼ 16 %  and a moderate diurnal cycle of 3 % to 6 % Studer et al., 2014). The sampling rate of ozone data is thus important for trend estimates because measurement dependencies on season or time of the day might influence the resulting trends. An important characteristic of microwave radiometers is their measurement continuity, being able to measure during day and night as well as during almost all weather conditions (except for an opaque atmosphere) and thus during all seasons. Other ground-based instruments such as lidars are temporally more restricted because they typically acquire data during clear-sky nights only. Clear-sky situations are more frequent during high-pressure events which vary with season and location (Steinbrecht et al., 2006;Hatzaki et al., 2014). The lidar measurements thus do not only depend on the daily cycle, but also on location and season. The seasonal dependency for ozonesondes might be smaller, but they are only launched during daytime. Figure 8 gives an example of how the measurement sampling rate can influence resulting trends. The GROMOS time series is used for these trend estimates, once using only measurements at the time of lidar measurements and once only at the time of ozonesonde launches. The differences to the trend that uses the complete GROMOS sampling are not significant but still important, especially between 35 and 40 km and in the lower stratosphere. Using the sampling of the MOH lidar leads to larger trends (∼ 3 % decade −1 ) than using the normal sampling (∼ 2 % decade −1 ) below 5 hPa. The OHP lidar sampling, however, leads to smaller trends than the normal sampling above 13 hPa. This suggests that selecting different night measurements within a month can lead to trend differences.
All three ozonesonde samplings result in a larger trend than normal or lidar sampling above 5 hPa. Even if ozonesondes are not measuring at those altitudes, the result shows that measuring with an ozonesonde sampling (e.g. only at noon) might influence the trend at these altitudes. Our findings suggest that the time of the measurement (day or night) or the number and the timing of measurements within a month can influence the resulting trend estimates. Results concerning the time dependences of trends based on SOMORA data can be found in Maillard Barras et al. (2019). We conclude that sampling differences have to be kept in mind when comparing trend estimates from instruments with different sampling rates or measurement times.

Influence of time period on trends
An important factor that influences trend estimates is the length and starting year of the trend period. Several studies have shown that the choice of start or end point affects the resulting trend substantially (Harris et al., 2015; Atmos. Chem. Phys., 19, 4289-4309, 2019 www.atmos-chem-phys.net/19/4289/2019/ Weber et al., 2018). Further, the number of years in the trend period is crucial for the trend estimate (Vyushin et al., 2007;Millán et al., 2016). We investigate how the GRO-MOS trends change when the regression starts in different years. For this, we average the GROMOS trends over three altitude ranges and determine the trend for periods of different lengths, all ending in December 2017 but starting in different years (Fig. 9). As expected, the uncertainties increase with decreasing period length, and trends starting after 2010 are thus not even shown. Consequently, trends become insignificant for short trend periods. In the lower and middle stratosphere, more than 11 years is needed to detect a significant positive trend (at a 95 % confidence level) in the GRO-MOS data, whereas the 23 years considered is not enough to detect a significant trend in the upper stratosphere (above 3 hPa). Weatherhead et al. (2000) and Vyushin et al. (2007) state that at least 20 to 30 years is needed to detect a significant trend at mid-latitudes, but their results apply to total column ozone, which can not directly be compared with our ozone profiles. In general we observe that the magnitude of the trend estimates highly depends on the starting year. Furthermore, the trends start to increase in 1997 (middle stratosphere) or 1998 (lower stratosphere), probably due to the turning point in ODSs. The later the trend period starts after this turning point, the larger the trend estimate is, which has also been observed by Harris et al. (2015). Starting the trend, for example, in the year 2000, as is done in other studies (e.g. Steinbrecht et al., 2017; SPARC/IO3C/GAW, 2019), increases the GROMOS trend by almost 2 % decade −1 compared to the trend that starts in 1997. The trend magnitudes depend on the starting year of the regression, which is controversial to the definition of a linear trend that does not change with time. This illustrates that the true trend might not be linear or that some interannual variations or anomalies are not captured by the trend model. Nevertheless, our findings demonstrate that it is important to consider the starting year and the trend period length when comparing trend estimates from different instruments or different studies.

Trend comparison
The GROMOS trend profile (corrected as described above) and trend profiles for all instruments at the other measurement stations (uncorrected) are shown in Fig. 10. The trend profiles agree on a positive trend in the upper stratosphere, whereas they differ at lower altitudes. Due to the given uncertainties, most of the trend profiles are not significantly different from zero at a 95 % confidence interval. Only GRO-MOS, the lidars, and the ozonesonde at OHP show significant trends in some parts of the stratosphere (bold lines in Fig. 10).
We observe that all instruments that measure in the upper stratosphere show a trend maximum between ∼ 4 and ∼ 1.8 hPa (between ∼ 37 and 43 km), which indicates that ozone is recovering at these altitudes. The trend max- ima range from ∼ 1 % decade −1 to 3 % decade −1 , which is comparable with recent, mainly satellite-based ozone trends for northern mid-latitudes (e.g. Ball et al., 2017;Frith et al., 2017;Sofieva et al., 2017;Steinbrecht et al., 2017; SPARC/IO3C/GAW, 2019). Only the lidar trend at MOH is larger throughout the whole stratosphere, with ∼ 3 % decade −1 in the middle stratosphere and 4 % decade −1 to 5 % decade −1 between 5 and 2 hPa. Nair et al. (2015) observed similar trend results for the MOH lidar, even if they consider 5 fewer years with a trend period ranging from 1997 to 2012. Steinbrecht et al. (2006) found that lidar data at MOH and OHP deviate from reference satellite data above 35 km before 2003, with less ozone at MOH and more ozone at OHP compared to SAGE satellite data (Stratospheric Aerosol and Gas Experiment;McCormick et al., 1989). Opposite drifts are reported by Eckert et al. (2014) after 2002 compared to MIPAS satellite data (Michelson Interferometer for Passive Atmospheric Sounding; Fischer et al., 2008). Combining those drifts might explain our large MOH trends and smaller OHP trends. The distance of ∼ 600 km between the MOH and OHP stations might also explain some differences between the lidar trends. Furthermore, our sampling results show that the lidar sampling at MOH leads to a larger trend in the lower stratosphere than using a continuous sampling, whereas OHP lidar sampling leads to a lower trend in the middle stratosphere. The large lidar trend at MOH and the comparable low OHP lidar trend might therefore also be partly explained by the sampling rate of the lidars. The GRO-MOS trend peaks at slightly lower altitudes than the trends of the other instruments. This difference might be related to the averaging kernels of GROMOS, which indicate that GROMOS retrieves information from higher altitudes than expected (∼ 2 km difference). In the middle and lower stratosphere, at altitudes below 5 hPa (≈ 36 km), the estimated trends differ from each other. The microwave radiometers and the MOH lidar report trends of 0 % decade −1 to 3 % decade −1 , and also the ozonesonde at OHP confirms this positive trend. However, the ozonesondes at MOH and Payerne as well as Aura/MLS and the OHP lidar report a trend of around 0 % decade −1 to −2 % decade −1 below 5 hPa. Some of these observed trend differences can be explained by instrumental changes or differences in processing algorithms and instrument setup. The discrepancy between ozonesonde and lidar trends at OHP, for example, are possibly due to the change of the pressure-temperature radiosonde manufacturer in 2007, which resulted in a step change in bias between ozonesonde and lidar observations. A thorough harmonization would be necessary to correct the trend for this change. The SOMORA trend shows a positive peak at 30 km, which is probably due to homogenization problems that are corrected in the new retrieval version of SOMORA, which is, however, not yet used in our analyses (Maillard Barras et al., 2019). Furthermore, we have shown that sampling rates and starting years have an important effect on the resulting trend. The trend period lengths differ between SOMORA, Aura/MLS, and the other data sets, which might also partly explain differences in trend estimates. To explain the remaining trend differences in the lower stratosphere, further corrections, e.g. for anomalies, instrumental changes, or sampling rates, would be necessary. In brief, trends in the lower stratosphere are not yet clear. For some instruments, significant positive trends are reported, but for many other instruments, trends are negative and mostly not significantly different from zero. This result reflects the currently ongoing discussion about lower stratospheric ozone trends (e.g. Ball et al., 2018;Chipperfield et al., 2018;Stone et al., 2018).
In summary, our ground-based instrument data agree that ozone is recovering around 3 hPa (≈ 39 km), with trends ranging between 1 % decade −1 and 3 % decade −1 for most data sets. In the lower and middle stratosphere between 24 and 37 km (≈ 31 and 4 hPa), however, the trends disagree, suggesting that further research is needed to explain the differences between ground-based trends in the lower stratosphere.

Conclusions
Our study emphasizes that natural or instrumental anomalies in a time series affect ground-based stratospheric ozone trends. We found that the ozone time series from the GRO-MOS radiometer (Bern, Switzerland) shows some unexplained anomalies. Accounting for these anomalies in the trend estimation can substantially improve the resulting trends. We further compared different ground-based ozone trend profiles and found an agreement on ozone recovery at around 40 km over central Europe. At the same time, we observed trend differences ranging between −2 % decade −1 and 3 % decade −1 at lower altitudes.
We compared the GROMOS time series with data from other ground-based instruments in central Europe and found that they generally agree within ±10 %. Periods with larger discrepancies have been identified and confirmed to be anomalies in the GROMOS time series. We did not find the origins of these anomalies and assume that they are due to instrumental issues of GROMOS. The identified anomalies have been considered in the GROMOS trend estimations be-Atmos. Chem. Phys., 19, 4289-4309, 2019 www.atmos-chem-phys.net/19/4289/2019/ cause they can distort the trend. By testing this approach first on a theoretical time series and then with the real GROMOS data, we have shown that identifying anomalies in a time series and considering them in the trend analysis makes the resulting trend estimates more accurate. With this method, we propose an approach of advanced trend analysis based on the work of von Clarmann et al. (2010) that may also be applied to other ground-or satellite-based data sets to obtain more consistent trend results.
Comparing the GROMOS trend with other ground-based trends in central Europe suggests that ozone is recovering in the upper stratosphere between around 4 and 1.8 hPa (≈ 37 and 43 km). This result confirms recent, mainly satellitebased studies. At other altitudes, we have observed contrasting trend estimates. We have shown that the observed differences can partly be explained by different sampling rates and starting years. Other reasons might be instrumental changes or nonconformity in measurement techniques, instrumental systems, or processing approaches. Further, the spatial distance between some stations might explain some trend differences because different air masses can be measured, especially in winter when polar air extends over parts of Europe. Accounting for anomalies in the different data sets as proposed in the present study might be a first step to improve trend estimates. Combined with further corrections, e.g. for sampling rates or instrumental differences, this approach may help to decrease discrepancies between trend estimates from different instruments. In many other studies, the observed trend differences are less apparent because the ground-based data are averaged over latitudinal bands (e.g. WMO, 2014;Harris et al., 2015;Steinbrecht et al., 2017). Nevertheless, it is important to be aware that ground-based trend estimates differ considerably, especially in the lower stratosphere. Exploring the origin of the differences and improving the trend profiles in a similar way as we did for GRO-MOS may be an important further step on the way to monitoring the development of the ozone layer. To summarize, we have shown that anomalies in time series need to be considered when estimating trends. Our ground-based results confirm that ozone is recovering in the upper stratosphere above central Europe and emphasize the urgency to further investigate lower stratospheric ozone changes. The presented approach to improve trend estimates can help in this endeavour.
Data availability. All ground-based data used in this study are available at ftp://ftp.cpc.ncep.noaa.gov/ndacc. The MLS data from the Aura satellite are available at https://disc.gsfc.nasa.gov/datasets/ ML2O3_V004/summary (Schwartz, 2015).
Author contributions. LB and KH designed the concept and the methodology. LB carried out the analysis and prepared the manuscript. TvC provided the trend model. All co-authors con-tributed to the manuscript preparation and the interpretation of the results.
Competing interests. The authors declare that they have no competing interests. Thomas von Clarmann is co-editor of Atmospheric Chemistry and Physics, but he is not involved in the evaluation of this paper.