Intercomparison of vertically resolved merged satellite ozone data sets : interannual variability and long-term trends

In the framework of the SI2N (SPARC (Stratosphere-troposphere Processes And their Role in Climate)/IO3C (International Ozone Commission)/IGACOO3 (Integrated Global Atmospheric Chemistry Observations – Ozone)/NDACC (Network for the Detection of Atmospheric Composition Change)) initiative, several long-term vertically resolved merged ozone data sets produced from satellite measurements have been analysed and compared. This paper presents an overview of the methods, assumptions, and challenges involved in constructing such merged data sets, as well as the first thorough intercomparison of seven new long-term satellite data sets. The analysis focuses on the representation of the annual cycle, interannual variability, and long-term trends for the period 1984–2011, which is common to all data sets. Overall, the best agreement amongst data sets is seen in the mid-latitude lower and middle stratosphere, with larger differences in the equatorial lower stratosphere and the upper stratosphere globally. In most cases, differences in the choice of underlying instrument records that were merged produced larger differences between data sets than the use of different merging techniques. Long-term ozone trends were calculated for the period 1984–2011 using a piecewise linear regression with a change in trend prescribed at the end of 1997. For the 1984–1997 period, trends tend to be most similar between data sets (with largest negative trends ranging from −4 to −8 % decade in the mid-latitude upper stratosphere), in large part due to the fact that most data sets are predominantly (or only) based on the SAGE-II record. Trends in the middle and lower stratosphere are much smaller, and, particularly for the lower stratosphere, large uncertainties remain. For the later period (1998–2011), trends vary to a greater extent, ranging from approximately −1 to +5 % decade in the upper stratosphere. Again, middle and lower stratospheric trends are smaller and for most data sets not significantly different from zero. Overall, however, there is a clear shift from Published by Copernicus Publications on behalf of the European Geosciences Union. 3022 F. Tummon et al.: Intercomparison of vertically resolved merged satellite ozone data sets mostly negative to mostly positive trends between the two periods over much of the profile.


Introduction
The phase-out of ozone-depleting substances (ODSs) through the Montreal Protocol and its subsequent amendments and adjustments (World Meteorological Organization (WMO), 2015; 2011) resulted in peak stratospheric ODS concentrations in the late 1990s or early 2000s; the exact timing of the peak depends on which part of the stratosphere is considered.Since then, ODS concentrations have been declining and stratospheric ozone is expected to return to 1980 levels throughout most of the stratosphere at various times during the 21st century (WMO, 2015;2011;Eyring et al., 2010a;Austin and Butchart, 2003).Detecting such a response and attributing it to decreasing halogen levels requires long-term, temporally homogeneous ozone profile measurements.This is particularly important if the attribution is being done in the context of a changing climate, with increasing greenhouse gas concentrations and concomitant changes in stratospheric temperature and mean meridional transport affecting ozone in addition to the decrease in stratospheric halogen loading (e.g.Waugh et al., 2009;Eyring et al., 2007;Austin and Wilson, 2006).
While satellite instruments have provided continuous nearglobal measurements of the ozone profile since late 1978, no single instrument has provided continuous and stable global coverage for the entire period (e.g.Hassler et al., 2014).To date, the longest single-instrument space-based records are those of SAGE-II (Stratospheric Aerosol and Gas Experiment II) and HALOE (Halogen Occultation Experiment); these provided quasi-independent data from 1984 to 2005 (Damadeo et al., 2013;Grooß and Russell, 2005;McCormick et al., 1989).Over the past decade, a number of new satellite-based instruments have made ozone profile measurements (Hassler et al., 2014;Tegtmeier et al., 2013), but few continuous data sets extending the satellite record beyond 2005 using these new observations were available for consideration in WMO (2011).Therefore, the Stratospheretroposphere Processes And their Role in Climate (SPARC), the International Ozone Commission (IO 3 C), the ozone focus area of the Integrated Global Atmospheric Chemistry Observations (IGACO-O3), and the Network for the Detection of Atmospheric Composition Change (NDACC) supported the SPARC/IO 3 C/IGACO-O3/NDACC (SI2N) initiative, with a major aim of developing and promoting longterm vertically resolved data sets for updated knowledge of long-term changes in the vertical distribution of ozone.
Included as part of the SI2N initiative are multiple data sets where SAGE-II has been merged with newer measurements (Adams et al., 2014;Davis et al., 2015;Froidevaux et al., 2015;Penckwitt et al., 2015;Kyrölä et al., 2013), as well as data sets based on the consistently processed SBUV and SBUV/2 measurements (e.g.Frith et al., 2014;Wild and Long, 2015;Kramarova et al., 2013;McPeters et al., 2013).Constructing these merged data sets requires careful consideration of several methodological choices so as to avoid introducing artefacts that may contaminate derived trends.Such factors include sampling biases, the use of different native vertical coordinate systems, a combination of different vertical and horizontal resolutions, measurements made on different segments of the diurnal cycle in ozone, and relative calibration -all of which must be considered when combining measurements from two or more satellite records.

Issues related to merging data from several records
Sampling by an individual satellite instrument can vary in space and/or time during its lifetime (e.g.McPeters et al., 2013;Toohey et al., 2013) and can thereby introduce biases in analyses of single-instrument records if not addressed properly (Damadeo et al., 2014).When merging measurements from more than one instrument, not only changing sampling patterns but also differences in sampling between the various instruments need to be taken into account.This is especially important in regions where there are strong ozone gradients such as across mixing barriers in the stratosphere or in the upper troposphere/lower stratosphere, where small differences in sampling may result in large differences in observed ozone (Lambert et al., 2015;Toohey et al., 2013).Ensuring that differences in temporal sampling are adequately taken into account is also important in the upper stratosphere, where the diurnal cycle in ozone is significant (Parrish et al., 2014;Schanz et al., 2014;Sakazaki et al., 2013;Studer et al., 2014) and has the potential to introduce biases when merging instruments that measure at different times of the day.
Ideally, biases between individual instruments would be traceable to fundamental differences in the instruments and/or retrieval algorithms, which could then be corrected for during the merging process.However, in practice, this remains challenging.Alternatively, individual data sets can be adjusted using a completely independent set of measurements, for example, with ground-based total column observations (e.g.Bodeker et al., 2005) or using a model as a transfer function (e.g.Hegglin et al., 2014).As yet, it has not been possible to a priori eliminate all systematic biases in any vertically resolved ozone data set, and therefore one data set is typically chosen as a reference, with others biascorrected with respect to that reference.This requires a sufficiently long overlap between the instrument records to derive a statistically significant estimate of the bias, which can then be propagated to regions (in space and time) outside of the overlap.This is particularly important if the adjustments have a seasonal or spatial dependence.Such corrections are not always possible; linking the BUV record to the multi-instrument SBUV record or the SAGE-I record to the SAGE-II record is not possible as a result of lack of overlap.Assuming that the temporal and spatial sampling bias has been corrected, the data sets to be merged may need to be transformed to a common coordinate system if they do no have the same intrinsic vertical coordinates and ozone units.For example, the native vertical coordinate for the solar occultation technique used by the SAGE-II instrument is geometric altitude with ozone being retrieved in units of number density, while the thermal emission measurements of the MLS (Microwave Limb Sounder) instruments provide ozone amounts on pressure levels and in units of mixing ratio.To combine two such data sets, one of them needs to be converted to the vertical coordinate of the other, requiring knowledge of the vertical temperature profile, and also to a common concentration unit, requiring local temperatures.For long-term ozone trends, where changes are on the order of a few percent per decade, the uncertainties in long-term stratospheric temperature records can confound such conversions between vertical coordinate systems and concentration units (Davis and Rosenlof, 2012;Thompson et al., 2012;Wang et al., 2011;McLinden and Fioletov, 2011;Xu and Powell, 2011).Any artificial trend in stratospheric temperature structure, which affects the altitudes of pressure levels, can alias into ozone trends (WMO, 2015;McLinden and Fioletov, 2011;Rosenfield et al., 2005).The different vertical and horizontal resolutions of each data set also need to be taken into account; adding data with a relatively low resolution to a high-resolution data set involves either a degradation of the high-resolution measurement or the need for additional information to justify interpolating the low-resolution product to a higher resolution.
These methodological aspects are discussed further below in the context of an in-depth intercomparison of seven newly available merged satellite ozone profile data sets.A detailed assessment of these data sets is not only useful for analyses of long-term ozone changes; it is also useful because such data sets are commonly used to validate chemistry and transport in numerical models (e.g.Eyring et al., 2010a) and can be used to prescribe ozone boundary conditions for models that do not explicitly include interactive stratospheric chemistry.The merged data sets and methods used in this paper are described in Sect.2, while the analyses of the annual cycles and interannual variability of each data set are covered in Sect.3. A comparison of the long-term ozone changes estimated from each data set is then presented in Sect. 4 and, finally, conclusions are presented in Sect. 5.

Merged data sets used in this study
In the absence of a single perfectly calibrated satellite instrument with complete and continuous global coverage and decadal stability, measurements from multiple instruments need to be combined.A critical factor when combining multiple data sets is the relative calibration of the measurements from the different instruments.In principle, data sets can be merged by combining data from a series of instruments of the same type, or by merging data from instruments of different types.Seven merged data sets are considered here: two are based on the series of SBUV instruments, while the other five use measurements from a number of different recent instruments which all have the long SAGE-II record  as their backbone.The seven data sets are briefly described below, while further details can be found in the corresponding references provided.Their characteristics are summarised in Tables 1-7, the temporal coverage of the underlying satellite instruments used in each data set are shown in Fig. 1, and the spatial coverage over time as well as the number of observations used in each data set is shown in Fig. 2.

BUV, SBUV, and SBUV/2
A series of BUV (Backscatter Ultraviolet Radiometer) and SBUV (Solar Backscatter Ultraviolet Radiometer) instruments have flown onboard NASA and NOAA satellites since 1970 and provided a continuous ozone record since 1978.Measurements are made with the same instrument type and generally there has been a good temporal overlap between individual instruments (Bhartia et al., 2013;McPeters et al., 2013, and references therein).A new version of the retrieval algorithm (v8.6) was developed in which a number of improvements were made, including new ozone absorption cross sections, a new a priori ozone climatology, and a cloud-height climatology derived from Aura OMI (Ozone Monitoring Instrument) measurements (Bhartia et al., 2013).
Inter-instrument calibration at the radiance level (as opposed to ozone measurement calibration) is accomplished during periods of overlap between the SBUV instruments and with the SSBUV (Shuttle SBUV) flown periodically on several different space shuttles (DeLand et al., 2012).
Validation of the reprocessed ozone shows that total column ozone is consistent with measurements from the Brewer-Dobson network to 1 % (Labow et al., 2013).Comparisons with Aura MLS, SAGE, ozonesondes, and lidar observations show that ozone at individual levels in the stratosphere is generally consistent to within 5 % (Kramarova et al., 2013).Inter-instrument differences are generally less than the differences compared to independent data sets.However, despite the common instrument design and algorithm, differences in data quality exist between instruments.Kramarova et al. (2013) report larger biases and drifts for NOAA-9, as well as portions of NOAA-11 and NOAA-14 data, from the mid-to late 1990s, primarily as a result of a slow orbit drift of these instruments.The drifting orbits cause the geometric positions of the instruments with respect to the Sun to vary in time, leading to changing error characteristics as a function of altitude and latitude.Agreement with independent measurements is largely within 10 % during this period.
It is important to note that the primary source of error in the SBUV retrieval is the smoothing error due to the instrument's limited vertical resolution, particularly in the troposphere and lower stratosphere (Kramarova et al., 2013;Bhartia et al., 2013).The SBUV instrument has a resolution of 6-7 km near 3 hPa, degrading to 15 km in the troposphere (Bhartia et al., 2013).Thus SBUV reliably measures the partial column of ozone from the ground to the lower stratosphere, but must use a priori information to resolve the signal within that range.The v8.6 a priori derives from an ozone climatology constructed from Aura MLS and ozonesonde data which vary seasonally (monthly) but has no trend or interannual variability component (McPeters and Labow, 2012;Bhartia et al., 2013).The SBUV resolution is also somewhat reduced in the upper stratosphere, degrading to ∼10 km above 1 hPa.
Two merged data sets, described below, have been constructed using the SBUV v8.6 reprocessed observations.We limit the vertical range of our analysis of the SBUV data sets   HALOE v19 (1991HALOE v19 ( -2005)), UARS MLS v6 (1991-2005), Aura MLS v3.3 (2004 onward) to pressures < 20 hPa in the tropics (20 • N-20 • S) and to pressures < 30 hPa outside the tropics to exclude regions where the low vertical resolution may affect derived trends.

V8.6 SBUV MOD
This data set, produced by researchers at NASA, is based on measurements from the BUV instrument on Nimbus 4 covering the period 1970-1976 (with reduced coverage after mid-1972), and the SBUV on Nimbus 7 and SBUV/2 instruments on NOAA-11, -14, -16, -17, -18, and -19 covering the period 1979-2013(McPeters et al., 2013).Measurements from the SBUV/2 instrument onboard NOAA-9 are not included (see Fig. 2).Since the v8.6 algorithms include the aforementioned upgraded inter-instrument calibration, the approach taken is to average all data that meet quality standards set by calibration analysis and comparisons with external instruments rather than attempt to apply further external offsets (Frith et al., 2014).The largest measurement uncertainties occur in the 1990s, when instrument orbital drift led to less reliable measurements overall (Kramarova et al., 2013;DeLand et al., 2012).The Nimbus 4 BUV is included in the data set but cannot be intercalibrated as a result of the lack of temporal

SAGE-GOMOS1
SAGE-II screened according to the SAGE-II v7.00 release notes.GOMOS measurements screened according to the GOMOS level 2 product quality readme file.SAGE-GOMOS2 SAGE-II screened according to Hassler et al. (2008).GOMOS screened according to the GO-MOS level 2 product quality readme file.SAGE-OSIRIS SAGE-II screened according to the SAGE-II v7.00 release notes.Only OSIRIS morning (descending node) observations at solar zenith angles < 89.5 • are used.GOZCARDS SAGE-I screened for aerosol/cloud effects.SAGE-II screened for aerosol/cloud effects as per Wang et al. (2002) -also includes other details (e.g. for high beta angle issues).HALOE screened for aerosol/cloud effects following Bhatt et al. (1999) and Hervig and McHugh (1999).UARS MLS: screened as per Livesey et al. (2003).Aura MLS screened as per Froidevaux et al. (2008).ACE-FTS: generally screened as per ACE-FTS team guidelines; outliers screened as per Froidevaux et al. (2015).Full details can be found in Froidevaux et al. (2015).SWOOSH SAGE-II data screened according to Wang et al. (2002) and then with a 3σ filter.Only UARS MLS data between 100 and 0.22hPa used and screened following Livesey et al. (2003)."Trip angle" and "constant lockdown angle" events are removed from the HALOE data, as are any points with uncertainties 100 %.Aura MLS measurements are filtered according to the v3.3 data quality document.See full screening description and further references in Davis et al. (2015).

SBUV Merged Cohesive
Though the improved calibrations of v8.6 have reduced the inter-satellite differences, it has not removed them.A second SBUV-based data set was developed by researchers at NOAA with the aim of removing the remaining differences.This approach varies from that of v8.6 SBUV MOD in that (i) adjustments are made to individual instrument records based on periods of overlap in order to account for any variations in the observed annual cycle as well as an overall bias; (ii) rather than an average of all available observations, a single satellite is chosen for each period based on the best latitudinal coverage allowing the clean retention of satellite characteristics such as time of measurement, solar zenith angle, etc. to be identified with an ozone value; (iii) measurements from NOAA-9 are included in a short period to allow greater global coverage in the bridge from NOAA-11 to -14 (Fig. 2); and (iv) measurements from BUV are excluded since there is no overlap with the subsequent instruments (Wild and Long, 2015).The resulting differences between the two SBUVbased data sets are further described in Sects.3 and 4.

SAGE-based data sets
The 21-year SAGE-II record is a natural candidate to form the basis of a merged data set using multiple instrument types, and it is used in five of the seven records described here (Tables 1-7 and Fig. 1).The conceptually simplest extension is to add a single data set to the SAGE-II record to cover the years following 2005 after which SAGE-II was turned off.The period from 2005 to the present has had many operational satellite instruments (Hassler et al., 2014;Tegtmeier et al., 2013).To date, three single-instrument extensions have been made to the SAGE-II record.These include one case extended with limb-scattered measurements from OSIRIS (Optical Spectrograph and Infrared Imager System) onboard the Odin satellite (2001-present) (Bourassa et al., 2014;Adams et al., 2014;Sioris et al., 2014)

SAGE-OSIRIS
The SAGE-OSIRIS merged data set combines the SAGE-II v7.0 and OSIRIS v5.07 measurements into a continuous data series covering the period 1984-2013 (Bourassa et al., 2014;Sioris et al., 2014).Data from each instrument are individually deseasonalised and thereafter the differences (varying in latitude and altitude) between the two sets of anomalies are calculated for the overlap period (January 2002-August 2005).The OSIRIS data, of which only the morning (sunrise) measurements are used since they show smaller bias compared to SAGE-II (Adams et al., 2013), are then shifted by this difference to produce a consistent time series of SAGE-II and OSIRIS data covering October 1984 to December 2013; typically values were shifted by less than 3 % (Bourassa et al., 2014).

SAGE-GOMOS
Two different merged data sets were produced by combining the SAGE-II v7.0 (using both sunrise and sunset observations) and GOMOS IPF6.0 measurements.The data set described and analysed by Kyrölä et al. (2013) is constructed taking into account the difference between SAGE-II sunrise and sunset profiles and the GOMOS night-time stellar occultation measurements, i.e. the SAGE-II sunrise and sunset measurements are adjusted separately to GOMOS (used as reference) for each latitude and altitude bin.For the overlap period (April 2002-August 2005), the weighted mean of the medians from both instruments is used, with the weights being determined from the error estimates of each median (Kyrölä et al., 2013).This is hereafter referred to as the SAGE-GOMOS1 data set.
The second SAGE-GOMOS data set was constructed using SAGE-II as reference.The GOMOS data were adjusted to SAGE-II using latitude-and altitude-varying offsets, which were estimated statistically for the overlap period from April 2002 to August 2005.The offsets vary with season, but not with year.At pressures < 2 hPa, where the diurnal cycle in ozone may create differences between SAGE-II and GOMOS because of differences in the solar zenith angle of the measurements rather than because of instrument/retrieval-related biases between instruments, the SAGE-II and GOMOS data were normalised to a solar zenith angle of 90 • using scaling factors derived from a high-resolution chemistry-climate model simulation.The data were accumulated into 5 • latitude bands and then averaged to monthly means after having been corrected for zonal and monthly mean representativeness (Penckwitt et al., 2015).This data set will be referred to hereafter as the SAGE-GOMOS2 data set.

GOZCARDS
The SAGE-II (v6.2) record is also the central plank in the GOZCARDS (Global OZone Chemistry And Related trace gas Data records for the Stratosphere) data set, which is combined with data from five other instruments (SAGE-I v5.9rev, HALOE v19, UARS MLS v5, Aura MLS v2.2, and ACE-FTS v2.2 update) (Froidevaux et al., 2015).GOZCARDS extends the record back to include the SAGE-I data (although the data from this period are not considered in this study).The record is extended beyond the end of SAGE-II using a combination of Aura MLS and ACE-FTS measurements (see Tables 1-7 for details).The SAGE-II version 6.2 ozone number density versus height profiles were converted to mixing ratios on pressure levels using interpolated NCEP (National Centers for Environmental Prediction) temperature profiles obtained and archived by the SAGE-II instrument team.The converted SAGE-II monthly means are used as the reference data set, and other data sets are essentially bias-corrected against this reference.

SWOOSH
Similar to the GOZCARDS data set, SWOOSH (Stratospheric Water and OzOne Satellite Homogenized) combines SAGE-II (v7.0) ozone measurements with data from several other satellite instruments (Davis et al., 2015).The choice of instruments and data versions used are, however, slightly different than GOZCARDS (see Tables 11-13).Data sets originally in number density and altitude coordinates were converted to mixing ratios on pressure levels using the MERRA (Modern Era Retrospective analysis for Research and Applications) reanalysis (Rienecker et al., 2011).Aura MLS v3.3 is chosen as the reference data set, and the other instruments are adjusted by offsets that vary with latitude and height (but not time).Offsets are calculated from coincident observations during overlap periods between instruments.The final combined product is a mean of all available measurements in each latitude/height/time bin, but with greater weight given to instruments that sample more frequently (e.g.Aura MLS).Filled and unfilled versions of the data set exist on both geographical and equivalent latitude coordinates.Here we use the unfilled version on geographical coordinates averaged into 10 • latitude bins.
Each of the five SAGE-II-based records use a different data set to fill in the recent and pre-1984 data (if extending back that far) and/or a different merging approach; differences between them will therefore reflect the use of these different instrument records, some differences in data versions, and differences in merging techniques.The relative importance of an individual instrument is reduced as more measurement sets are added.We examine the impact of these differences in Sects.3 and 4.

Vertical coordinates and a common grid
The natural vertical coordinate of the limb and solar/stellar occultation techniques is geometric altitude with ozone concentration calculated as number density, while the nadirviewing backscatter technique used by the SBUV instruments and thermal emission measurements by the MLS instruments provide ozone amounts in mixing ratio on pressure levels.The three data sets produced on an altitude-numberdensity grid (SAGE-OSIRIS and the two SAGE-GOMOS data sets) were converted to pressure and mixing ratio coordinates using the MERRA reanalysis (Rienecker et al., 2011).Sensitivity tests using the JRA-55 reanalysis (Ebita et al., 2011) for conversion showed negligible differences between monthly mean profiles compared to conversion with MERRA (not shown).MERRA stratospheric temperatures have, furthermore, been shown to compare well with other recent reanalyses (e.g.Simmons et al., 2014;Bosilovich et al., 2011;Rienecker et al., 2011).Once converted to the same vertical coordinates and ozone concentration units, the seven data sets were interpolated in log pressure space to a common grid to facilitate comparison.No spatial interpolation in the horizontal was applied; instead, data were averaged from their native resolution (5 or 10 • ) into latitude bands of interest (see Sect. 3).Furthermore, no additional screening was applied to any of the seven data sets.

Multi-data-set mean
A multi-data-set mean (MDM) is constructed to provide a common point of reference to compare the data sets.The MDM is by no measure the best representation of ozone but provides a simple average of all available data, rather than favouring one particular merged data set.It is calculated by averaging all available data for each time step and latitude/pressure bin, with no weighting applied.A weakness of the MDM is that five of the seven data sets averaged are based on the SAGE-II record; therefore, despite some of the data sets either using another instrument as reference or including other observations during the SAGE-II period, the MDM is largely dominated by the SAGE-II signal from 1984 to 2005.

Multiple linear regression model
In this study we use the multiple linear regression model described by Hassler et al. (2013), which in turn is an update of the model used by Bodeker et al. (1998).To quantify ozone variability and trends we include basis functions representing: the QBO (quasi-biennial oscillation), specified as monthly mean 50 hPa Singapore zonal wind and a synthetic basis function orthogonal to this -to allow for a time lag at different latitudes and altitudes (Austin et al., 2008); ENSO (El Niño-Southern Oscillation), using the monthly mean Southern Oscillation index as proxy; the solar cycle, as represented by monthly mean F10.7 solar flux data from NOAA's National Geophysical Data Center; and a proxy for ozone perturbations forced by aerosols from the Mt Pinatubo volcanic eruption based on a synthetic time series representing the approximate temporal evolution of stratospheric aerosol concentrations following the eruption (see Bodeker et al., 1998, for further details).Fioletov (2008) provides further details and a general overview of how each of the processes described by these proxies affects ozone variability.Equation (1) presents the simplest form of how the model is applied: where t is the ozone for a particular month t for a particular data set; A-E are the model coefficients corresponding to an offset term (to account for the annual average ozone amount), linear trend, and other basis functions used; and R(t) represents the residuals (difference between the measured and statistically modelled ozone values).Furthermore, each coefficient has a constant term which represents the mean value (seasonally unvarying) of each predictor.The subscript on each term A-E indicates how many Fourier pairs the term was expanded into to account for the seasonal dependence of the ozone anomalies on the basis functions (Bodeker et al., 1998); for example, NB = 2 indicates two Fourier pairs (two sine, two cosine).An autoregressive model is applied to the residuals R(t) following Eq.(2): where 1 and 2 are the model coefficients and e t represents the independent random errors with zero mean and variances that are allowed to change from month to month (see Reinsel et al., 1994).Piecewise linear regression is chosen for the analysis because a central point of interest is whether there is any evidence for a change in the ozone trend after the peak in EESC (Jones et al., 2009;Steinbrecht et al., 2006;Newchurch et al., 2003).The break point was chosen at the end of 1997, as has been used in a number of other recent studies (e.g.Bourassa et al., 2014;Chehade et al., 2014;Kyrölä et al., 2013;Laine et al., 2013;Jones et al., 2009).The regression was applied to each level of data on the common pressure grid and trends were only estimated if more than 50 % of the data for a particular level were available (i.e. more than half of all months had data).We also calculate the uncertainty associated with each trend estimate based on the variance in the residual time series and present the 2σ uncertainties on the trends throughout this paper.

Data set intercomparison
The core analysis presented in this section is a comparison of the seven merged data sets for the period common to all data sets: October 1984-December 2011.We highlight similar features and major differences between the data sets before examining the derived trends in Sect. 4. Results are shown for three latitude zones: northern mid-latitudes (35-60 • N), tropics (20 • N-20 • S), and southern mid-latitudes (35-60 • S) similar to those used in WMO (2015WMO ( , 2011)).To ensure representativeness, data were area-weighted by the cosine of latitude and averaged for each region only if more than twothirds of the data for each latitude band were available.

Annual cycle
Figure 3 presents the annual cycle averaged over 1984-2011 for three selected levels in the three latitude regions (i.e.climatological average, not the annual cycle derived from the multiple regression model).These levels were chosen because they represent a spread of the variability between regions and are particular levels of interest.Ozone at 2 hPa (upper stratosphere) in the mid-latitudes has been most strongly affected by chemical depletion by ODSs.In the equatorial regions at 50 hPa (lower stratosphere), measurements are most uncertain because of the strong vertical ozone gradient (Tegtmeier et al., 2013).The 10 hPa level was chosen to give an indication of ozone changes in the middle stratosphere.The vertical error bars indicate ±2 standard error of the mean for each individual data set, while the grey-shaded region indicates the 1 standard deviation range (mean value of all data sets ± the mean of the standard deviations, which are calculated for each data set individually before averaging).The standard error of the mean provides a useful approximation of the uncertainty of the mean, although it may not repre-sent the true uncertainty since individual samples of the population may exhibit autocorrelations (Toohey and von Clarmann, 2013).The standard deviation represents the ozone variability for each month, averaged across all systems.Averages, standard error, and standard deviations were only calculated for months that had data for more than 14 of the 28 years available for analysis.
Figure 3a shows the annual cycle of ozone in the northern mid-latitude upper stratosphere, with peak values during the winter months (from November through February) and minimum values in the summer (May through August).The annual cycle in this region is largely determined by catalytic ozone destruction, which peaks during the summer months, resulting in maximum values occurring in winter (Brasseur andSolomon, 2005 Perliski et al., 1989).All seven data sets show similar annual cycles, both in terms of phase and amplitude, and mostly lie within the ±1 standard deviation range shown by the grey shading (the standard deviation across all data sets gives an indication of the spread between data sets).However, in terms of absolute values, there is quite a spread between the data sets.The SBUV Merged Cohesive data set shows consistently lower values for all months of the year, while SAGE-OSIRIS shows the opposite tendency, with mostly consistently higher values, although for only 5 of the 12 months the ±2 standard error bars do not overlap with the standard deviation range of all data sets.SWOOSH, GOZCARDS, and SAGE-GOMOS2 show remarkably similar annual cycles, with mean values not significantly different from each other in all months.
In the tropical lower stratosphere, where the ozone seasonal cycle is essentially determined by vertical transport associated with tropical upwelling, the peak ozone values from July through October (Fig. 3b) correspond to the months when upward transport of ozone-poor air from the troposphere is at a minimum (Randel et al., 2007).As for the upper stratosphere, differences are large, but the variability is large in this region as well, as seen in the large ±1σ range (grey shading), which ranges up to 0.2 ppmv (∼15 %).SWOOSH and GOZCARDS are again most similar to each other for most of the year, and show consistently higher values than the other data sets.SAGE-GOMOS1 and SAGE-OSIRIS show lowest values, with the latter having mean values that fall outside of the ±1σ range in nearly all months.The three data sets that extend the SAGE-II record with just one data set (GOMOS or OSIRIS) have considerably more missing data in this region of the stratosphere, with less than half of the data available for certain months of the year (where no data are shown; see caption Fig. 3).This is at least in part because of data being filtered out after the eruption of Mt Pinatubo.As mentioned above, the SBUV data sets are not shown at this pressure level.
In the southern mid-latitude middle stratosphere nearly all data sets agree remarkably well with each other throughout the year (Fig. 3c).In this region of the atmosphere, the annual cycle is opposite in phase to that in the upper stratosphere (Fi-oletov, 2008;Perliski et al., 1989), with peak ozone values in the summer (October through February) resulting from photochemical ozone production during this season (Perliski et al., 1989).The only data set that shows significantly different ozone values for much of the year is SAGE-OSIRIS, which has values up to 1 ppmv (∼15 %) higher in the austral winter season and slightly lower values in February and March.This feature is also evident in the northern mid-latitudes during winter (not shown), although to a lesser extent, and is likely due to the reduced sampling of the OSIRIS instrument in the winter hemisphere (see Fig. 2), which seems to quite strongly affect the mid-latitude zonal mean values in the merged data set.
Overall, in the three regions considered, the seven merged data sets show similar annual cycles, particularly in terms of phase and amplitude.Furthermore, with the exception of the SAGE-OSIRIS data set, the biases between data sets are largely consistent throughout the year.Agreement is best in the mid-latitude middle stratosphere, while there are larger differences between data sets in the tropical lower stratosphere and in the upper stratosphere globally -results which are similar to those reported by Tegtmeier et al. (2013) on an instrument-by-instrument basis.

Interannual variability
Figure 4 shows the time series of monthly mean ozone values for the same three regions and pressure levels as shown in Fig. 3.As for the mean annual cycles, the consistency between data sets varies somewhat from location to location, but, in general, agreement is best in the mid-latitude middle stratosphere, while differences between data sets are somewhat larger in the lower tropical stratosphere and in the upper stratosphere.In the northern mid-latitude upper stratosphere (Fig. 4a), differences between data sets range up to approximately 1 ppmv (∼15 %) and are larger prior to about 1995.Differences are of similar magnitude (up to 0.25 ppmv or ∼15 %) in the equatorial lower stratosphere and tend to be more consistent over time (Fig. 4b).What is also particularly noticeable in this region is the large number of missing data in the SAGE-GOMOS and SAGE-OSIRIS data sets, which is in part due to the 1991 Pinatubo eruption, but also because the solar occultation technique used by SAGE-II does not allow measurements in all months of the year in the tropics (Damadeo et al., 2013).SWOOSH and GOZCARDS ameliorate the relatively sparse SAGE-II sampling in the tropics, particularly in the post-Pinatubo period, through the inclusion of HALOE and UARS-MLS observations after 1991.These two data sets show a considerably reduced peak in maximum ozone values in 1992 compared to other years (Fig. 4b) -a direct result of the Pinatubo eruption.The sparser sampling of the GOMOS instrument is also evident in the SAGE-GOMOS data sets after the end of the SAGE-II record in 2005, as is the summer-only sampling of the OSIRIS instrument clearly visible in the mid-latitudes (see especially Fig. 4c).
Figure 5 shows the monthly mean percentage differences from the MDM (multi-data-set mean; see Sect.2.3) for the same three regions and pressure levels.Relative differences are estimated by dividing the difference between a particular data set and the MDM by the MDM (i.e.(data set X-MDM)/MDM).A 13-month running mean is applied, with values only being shown if data from more than 7 of the 13 months are available.In the upper stratosphere (Fig. 5a), agreement between data sets is, for most of the period, within ±8 % of the MDM for most data sets and well within ±5 % for SWOOSH, GOZCARDS, SAGE-GOMOS2, and v8.6 SBUV MOD.The SBUV Merged Cohesive data set shows a trend towards smaller and smaller differences from the MDM -a feature which is also to some extent visible in the monthly mean values in Fig. 4a but that is made more evident in terms of percent difference from the MDM.SAGE-GOMOS1 shows a slight reduction in differences from about 2001 onwards, which coincides with the introduction of the GOMOS data.Interestingly, the small improvement in agreement with the MDM also continues past the end of the SAGE-II record, so it is not simply the better spatial sampling when combining two instruments (see Fig. 2) that causes this.SAGE-OSIRIS shows a somewhat opposite tendency, with initially better agreement with the MDM during the overlap period between SAGE-II andOSIRIS (2001-2005) but then an increased difference from the MDM after 2005, when the SAGE-II record comes to an end.This may, as mentioned above, be related to the sparser sampling of the OSIRIS instrument during the winter months.Again, as discussed previously, despite SAGE-GOMOS1 using GOMOS as reference and SAGE-OSIRIS using SAGE-II as reference, the two data sets are remarkably similar in the upper stratosphere for the entire SAGE-II record from 1984 to 2005 (Fig. 5a).The SAGE-GOMOS2 data set, despite being referenced to SAGE-II, similar to SAGE-OSIRIS, shows lower values for the whole 1984-2011 period, perhaps because of the different treatment of the diurnal cycle of ozone in the upper stratosphere.
In the equatorial lower stratosphere (Fig. 5b), where the SBUV data sets are not used to construct the MDM, the data sets differ relatively consistently over the entire 1984-2011 period.SWOOSH and GOZCARDS show mostly positive differences from the MDM, which increase slightly over time.The main difference between these two data sets is in the post-Pinatubo period (1992)(1993)(1994)(1995), when the MDM is based only on the mean of these two data sets because no other data are available, and GOZCARDS shows higher values than SWOOSH.SAGE-GOMOS2 and SAGE-OSIRIS show mainly negative differences from the MDM, indicating they have consistently lower values compared to the other data sets.The SAGE-GOMOS1 data set is closest to the mean, remaining within about ±2 % of the MDM until the introduction of the GOMOS data in 2002, when the difference from the MDM becomes larger and positive (up to +5 % higher than the MDM).As already mentioned, agreement between data sets is best in the mid-latitude middle stratosphere.This can clearly be seen in Fig. 5c, where the seven data sets agree to within ±5 % and even to within ±3 % from about 2000 onwards (excluding SAGE-OSIRIS).
For the most part, the seven data sets agree to within ±10 % (or better) of each other throughout much of the stratosphere.In certain regions, some data sets show changes in time compared to the MDM.These tendencies are also evident in the anomalies shown in Sect.3.3 and will be further discussed there, in particular in terms of the implications these might have on derived trends.

Interannual anomalies
The time series of ozone anomalies (all proxies used in the multiple linear regression, except the linear trend, are removed over the entire period 1984-2011) are presented to identify how differences between data sets may lead to differences in the calculated trends.For comparison, a mean of all data sets, the MDM, is again created from averaging the anomalies from all available data sets for each month/region/pressure level.Plots of the southern midlatitude upper stratosphere are also included because this region is of particular interest in terms of long-term trends since it has been the most strongly affected by ODS-related ozone depletion (WMO, 2011).
In the northern mid-latitude upper stratosphere, all data sets, with the exception of SBUV Merged Cohesive, show a clear downward trend in anomalies until about 2001 in both the monthly and annual mean values (Figs.6a and 7a, respectively).Thereafter, the anomalies remain mostly negative but show a tendency towards less negative and even some positive values.This is perhaps more evident in the annual mean values and in particular for SAGE-OSIRIS and GOZCARDS, which both show a return to positive anomalies for most months of the year by the end of the record.As discussed in Sect.3.2, in this region the SBUV Merged Cohesive data set shows very different behaviour to any of the other data sets, with anomalies starting off negative and then becoming more positive over time, i.e. no downward and upward trend as seen in the other data sets.Similar but more pronounced features are also shown in the southern mid-latitude upper stratosphere (Figs.6d and 7d).In contrast, the six other data sets show similar tendencies to the northern mid-latitudes, with decreasing anomalies through until about 2000, and then a tendency towards more positive values thereafter.
In the equatorial lower stratosphere the anomalies of all five SAGE-II-based data sets become more negative with time.During the first 5 years of the record nearly all data sets show almost exclusively positive anomalies, while during the last 6 years of the record most data sets show negative anomalies for most months of the year (Fig. 6b).This trend is even clearer in the annual means (Fig. 7b).The two SAGE-GOMOS data sets follow very similar trajectories, except for the last 3 years of the record, where differences between the two data sets are somewhat larger, with the SAGE-GOMOS2 data set showing more negative anomalies than any of the other data sets.SWOOSH and GOZCARDS are also mostly  similar to each other but show a smaller decrease than the two SAGE-GOMOS data sets.SAGE-OSIRIS is similar to SWOOSH and GOZCARDS, and is perhaps closest to the MDM in this region.
In the southern mid-latitude middle stratosphere, changes in anomalies tend to be small (Figs.6c and 7c), with the SAGE-GOMOS and SAGE-OSIRIS data sets showing anomalies fluctuating around zero over the entire record.SWOOSH and GOZCARDS show some tendency towards more negative anomalies from 2004 to 2009, but thereafter anomalies again become close to zero.The largest change in terms of annual mean anomalies is seen in the SBUV Merged Cohesive data set, which shows the most positive anomalies at the beginning of the record and then the most negative anomalies at the end of the record (Fig. 7c).This is also true in the northern mid-latitude middle stratosphere (not shown).The v8.6 SBUV MOD data set, in contrast, shows a more similar tendency to the other data sets, with anomalies becoming more negative and then more positive, although at the end of the record anomalies are more positive than any other data set.
In terms of both monthly and annual mean anomalies, differences between data sets are largest prior to about 1995 in all regions and at all levels.Even the three data sets based solely on SAGE-II in the earlier period (SAGE-OSIRIS and the two SAGE-GOMOS data sets) show relatively large differences from each other (particularly SAGE-OSIRIS from the two SAGE-GOMOS data sets), indicating that the different merging processes have an impact on the estimated trends.After 1995 the data sets are more similar to each other until about 2005, after which differences become larger again.This is likely because from 2005 onwards, when SAGE-II was turned off, the data sets are based on different instrument records (see Fig. 1).Overall, however, data sets based on the same or very similar instrument records (e.g.SWOOSH and GOZCARDS) tend to be more similar to each other than to those based on different underlying instrument records.The exceptions to this are the two SBUV data sets, which tend to differ notably from one another despite relying on the same input data.This is because the SBUV record is made up of several individual instrument records rather than the single SAGE-II record anchoring the other data sets.This results in a much larger choice of possibilities in terms of which records are used and how they are intercalibrated.Therefore, despite the similar input to the SBUV-based data sets, the merging approaches result in substantial differences in the anomalies derived from these two products.The result this has on derived trends is further discussed in Sect. 4.

Trends
The multiple linear regression model used to estimate the trends discussed in this section is described in Sect.2.4.As above, an MDM is calculated, this time as the mean trend and uncertainty for all data sets available (again unweighted), i.e. the multiple linear regression is not applied separately to the monthly MDM constructed and shown in Figs. 6 and 7.

1984-1997 trends
For the 1984-1997 period, the data sets show relatively similar trend profiles in all latitude bands, most with maximum negative trends in the upper stratosphere (top row Fig. 8 and Tables 8-10).In the southern mid-latitudes (Fig. 8a), SWOOSH, SAGE-OSIRIS, and both SAGE-GOMOS data  sets show largest negative trends at 2 hPa (between −8 and −6 % decade −1 ), slightly more negative than GOZCARDS and v8.6 SBUV MOD, which indicate maximum negative trends of −5 and −4 % decade −1 at 2 hPa, respectively (see also Table 8).Trends in the northern mid-latitudes show a similar shape (Fig. 8c), with maximum negative values at 2 hPa, and are also of similar magnitude.All five SAGEbased data sets agree to within 2 %, with maximum negative trends ranging between −6 and −7 % decade −1 , while v8.6 SBUV MOD again shows less negative trends, peaking at −4 % decade −1 .In the equatorial region (Fig. 8b), trends are somewhat smaller, up to −3 to −5 % decade −1 at 2 hPa, with GOZCARDS and the SBUV Merged Cohesive data set even showing zero trend at this level.The large difference between GOZCARDS and SWOOSH, which are based on similar underlying instrument records, likely results from the different temperature record used to convert the different SAGE-II versions.In the tropical upper stratosphere, the NCEP temperatures used to convert the SAGE-II v6.2 profiles in GOZCARDS have different temporal variations than MERRA, which leads to some significant differences between the two data sets (Froidevaux et al., 2015); in most other regions, the GOZCARDS and SWOOSH results shown here tend to be quite similar.The MERRA reanalysis used to convert the SAGE-II v7.0 data set used in SWOOSH pro-duces results more similar to the SAGE-OSIRIS and SAGE-GOMOS data sets, also converted using MERRA.In all three latitude regions, SBUV Merged Cohesive shows a different trend profile structure from the other data sets, with maximum negative trends of nearly similar magnitude, but considerably lower down in the stratosphere (between 5 and 10 hPa rather than at 2 hPa).As seen in the annual mean anomalies in Fig. 7, this data set shows somewhat higher/lower mean values than the other data sets, particularly at the beginning of the record, which may influence the overall trend calculated for the 1984-1997 period.The difficulty lies in the lesser quality of the NOAA-9 and NOAA-14 data, which appear at the end of the 1984-1997 period.Due to this reduced quality, the SBUV Merged Cohesive data set does not tie the beginning of the time series to these data but instead uses the same adjustments for the ascending node of NOAA-11 as determined from the overlap of the descending node of NOAA-11 with NOAA-16 (Wild and Long, 2015).
The assumption that NOAA-11 ascending and descending have the same properties may not be warranted, and this may influence the trends from this data set for the 1984-1997 period.The different trends highlight the influence of using different merging techniques, since the underlying data (SBUV v8.6) and the regression model used are the same as for the v8.6 SBUV MOD data set.Trends in the middle stratosphere in all latitude regions tend to be small, mostly less than −2 % decade −1 , and in some cases even being slightly positive (see Table 9).The mean trend of all data sets (MDM) is, however, significant and negative in both mid-latitude regions but remains small.In the lower stratosphere the SAGE-based data sets show small negative trends in the northern mid-latitudes, ranging from −2 to −3 % decade −1 , and quite large negative trends in the tropics, almost of similar magnitude to the mid-latitude upper stratosphere (−5 to −7 % decade −1 ), although uncertainties are significantly larger.Nonetheless, at 50 hPa in the northern mid-latitudes and tropics, all five data sets indicate significant negative trends (see Table 10).In contrast, in the southern mid-latitudes, trends are small and insignificant in all data sets.
The trends calculated for this period agree very well with a wide range of previous studies (e.g.WMO, 1999, and references therein;Harris et al., 1998).In particular, more recent studies using some of the same data sets as here show similar results.For example, using the same SAGE-GOMOS1 data set, Kyrölä et al. (2013) derive trends of the same magnitude in the upper stratosphere (ranging from −8 to −6 % decade −1 in the mid-latitudes and up to −4 % decade −1 in the tropics), and, using a more complex analysis technique, Laine et al. (2013) present very similar results as well.Using the SAGE-OSIRIS data set also used in this study, Bourassa et al. (2014) estimate significant trends of the same magnitude in the upper stratosphere (again between approximately −8 and −6 % decade −1 in the mid-latitudes and up to −4 % decade −1 in the tropics) but show no significant trends below 35 km.In comparison, in the tropics we estimate significant negative trends between 30 and 50 hPa for the SAGE-OSIRIS data set as well as SWOOSH, GOZCARDS, and the two SAGE-GOMOS data sets.These significant negative trends are in agreement with data from SAGE-II, GOZCARDS, and model results presented in WMO (2015), although they cover a slightly different period (1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995).Damadeo et al. (2014) show no significant or slightly positive trends in the same SAGE-II data (which underlies all data sets showing negative trends during this period).Obviously there is still some uncertainty regarding the trends in the tropics during this period, and further work is needed to fully resolve this issue.

1998-2011 trends
For the 1998-2011 period, calculated trends are less consistent among the data sets but overall show a general shift towards increasing ozone in the middle and upper stratosphere (bottom row Fig. 8 and Tables 11-13).The smaller change in trends is somewhat to be expected given that the lifetimes of most ODS species are long (several decades), and thus the removal of these species will occur over a considerably longer timescale than the relatively brief period during which their concentrations increased (WMO, 2015).Trends in the northern mid-latitude upper stratosphere are mostly positive (Fig. 8f), although relatively small in magnitude compared to the large negative trend seen in the 1984-1997 period.SAGE-OSIRIS shows the largest positive trends, peaking at +5 % decade −1 at 2 hPa.This is unsurprising given the relatively large positive tendency in the annual mean values presented in Fig. 7a and agrees well with the findings of Bourassa et al. (2014), who show trends of similar magnitude in this location.Both SBUV data sets also show significant positive trends (up to 4 % decade −1 ) but at different pressures than SAGE-OSIRIS, above 1 hPa (where the SAGE-OSIRIS data are not available for comparison) as well as lower down in the middle stratosphere between 10 and 7 hPa.Both SAGE-GOMOS data sets, GOZCARDS, and SWOOSH all show almost no significant trend in the upper stratosphere.At pressures < 1 hPa differences between calculated trends are large in all three latitude regions.For example, in the northern mid-latitudes, SAGE-GOMOS2 shows negative trends of up to −7 % decade −1 , whereas GOZCARDS and SAGE-GOMOS1 indicate no significant trend and the SBUV data sets indicate positive significant trends up to 4 % decade −1 .It is very likely that the diurnal cycle in ozone plays a role at these altitudes, affecting the estimated trends quite strongly depending on how, and if, this is treated in the merging process.Whether and how the diurnal cycle of ozone is treated in all data sets is shown in Table 6.The differences between data sets are more strongly evident in this later period (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011), particularly since a wide range of different instruments which may be measuring at different times of the day is used to extend the SAGE-II record after 2005.
In the southern mid-latitudes (Fig. 8d), all data sets, excluding SWOOSH and SAGE-GOMOS1, indicate significant positive trends in the upper stratosphere, although they vary in where maximum trends are located in the vertical.SAGE-OSIRIS and SAGE-GOMOS2 show trends ranging from +3 to +5 % decade −1 between 4 and 0.5 hPa, while GOZCARDS shows maximum positive trends slightly smaller in magnitude and somewhat higher up.SBUV Merged Cohesive shows trends of similar magnitude to SAGE-GOMOS2, but peaking very clearly at 3 hPa.The v8.6 SBUV MOD data set also shows significant positive trends but peaking between 5 and 10 hPa and only up to +3 % decade −1 .SWOOSH shows no significant trend over the entire profile, similar to the SAGE-GOMOS1 data set, which differs only in that it shows small but significant negative trends at pressures greater than 50 hPa.SAGE-GOMOS2 also indicates similar magnitude negative trends at these levels; however, no such significant trends are evident in either GOZCARDS and SWOOSH and overall the MDM shows insignificant trends in the southern mid-latitudes at pressures > 5 hPa.As in the northern mid-latitudes, at pressures < 1 hPa the trends vary widely between data sets, again likely due to the diurnal cycle in ozone.
The mid-latitude trends shown here agree well with WMO (2015), which, although considering just post-2000 trends, shows a similar range of estimates using some of the same data sets (e.g.GOZCARDS and v8.6 SBUV MOD) as well as several other sources, including ground-based observations.As for the 1984-1997 period, the trend profiles for 1998-2011 also agree with the results presented by Bourassa et al. (2014) using the same SAGE-OSIRIS data set, as well as Kyrölä et al. (2013) and Laine et al. (2013) using the same SAGE-GOMOS data set.Looking at a somewhat shorter period (August 2002-April 2012) Gebhardt et al. (2014) show similar magnitude positive trends in SCIAMACHY (SCanning Imaging Absorption spectroMeter for Atmospheric CHartographY) of approximately 4 % decade −1 in the southern mid-latitude upper stratosphere; however, they show almost no significant positive trend in the northern mid-latitude upper stratosphere.This tendency towards a stronger signal in the Southern Hemisphere is also seen in the MIPAS observations covering the same period (Eckert et al., 2014), although they show a difference in the altitude range, with a larger region of positive trends seen in the southern midlatitude upper stratosphere.
In the equatorial region, trends are least consistent.In the upper stratosphere, between 5 and 2 hPa both SAGE-  2014), although, with potential positive drift in the OSIRIS data taken into account, the trends they present become less significant in this region (being insignificant between 10 • N and 10 • S).SWOOSH, GOZCARDS, and v8.6 SBUV MOD all show no significant trends in this region, except for at pressures < 1 hPa, where the v8.6 SBUV MOD data set shows significant positive trends.In the middle stratosphere, between approximately 10 and 5 hPa, GOZ-CARDS, SWOOSH, and the two SAGE-GOMOS data sets show small negative trends up to −3 % decade −1 , but only the trends from SWOOSH at 10 hPa are significant (see Tables 11-13).For the same period and levels, Kyrölä et al. (2013) also show significant negative trends even larger in magnitude, up to −5 % decade −1 , but only between 10 • N and 10 • S. The larger latitude band used here (20 • N-20 • S) likely accounts for the difference in magnitude and significance in trends.Gebhardt et al. (2014) also found large negative trends in this region of the middle stratosphere (as large as −10 % decade −1 ) for the 2002-2012 period in SCIA-MACHY observations.However, in a comparison for just the 2004-2012 period they show that the SCIAMACHY trends are considerably more negative than either Aura MLS or OSIRIS.Using MIPAS data for the 2002-2012 period, Eckert et al. (2014) show negative trends of similar magnitude to Kyrölä et al. (2013) of approximately −5 % decade −1 , although somewhat lower in the middle stratosphere.While there is a clear negative trend in many data sets, the range of trend estimates remains large.Finally, in the tropical lower tropical stratosphere, trends are insignificant for nearly all the data sets shown here, largely as a result of the large uncertainties in measurements made in this part of the atmosphere.

Conclusions
This paper presents the first intercomparison of seven new merged satellite ozone profile data sets for the period 1984-2011, common to all data sets.We also present an overview of the methods, assumptions, and challenges involved in producing such merged data sets.The analysis focuses on the representation of the annual cycle, interannual variability, and, in particular, long-term trends.Overall, the data sets are most similar in the mid-latitude lower and middle stratosphere, remaining largely within ±5 % of the mean of all data sets (MDM).Larger differences are found in the tropical lower stratosphere, where the spread between data sets is ±10 % from the MDM, and in the upper stratosphere globally, where data sets for the most part are within ±8 % of the MDM.These results are in agreement with the inter-instrument comparison of Tegtmeier et al. (2013).For the data sets based on SAGE-II (SAGE-GOMOS1, SAGE-GOMOS2, SAGE-OSIRIS, SWOOSH, and GOZCARDS), the choice of instrument records to be merged was found to have a greater impact on differences than the choice of merging technique.For these data sets, those based on the same individual instrument records tend to be more similar to each other than those based on different instrument records.In general, it also appears that the inclusion of a greater number of instrument records in an individual merged data set and the potential this creates for bias cancellation tends to improve such long-term merged data sets.The SBUV v8.6 records, on the other hand, show some significant differences, despite being based on the same underlying data.This is in large part because the SBUV record is made up of several individual instrument records, allowing for more potential choices as to what records to use and, more importantly, how to intercalibrate those records.
Piecewise linear regression was used on the entire time period  to calculate trends for a "decrease" period (1984-1997) and a "recovery" period (1998-2011).Trends estimated for the first period are more similar between data sets, with most data sets showing significant negative trends in the mid-latitude upper stratosphere ranging between −4 and −8 % decade −1 .Significant negative trends are also evident in the tropical upper stratosphere in most data sets, although smaller in magnitude.In the middle and lower stratosphere, trends are small, and for the most part insignificant, but depend critically on the uncertainties.The good agreement between data sets is, however, somewhat unsurprising given that most data sets are predominantly (or only) based on the SAGE-II record for the 1984-1997 period.The trends presented for this period agree well with both previous and more recent studies (e.g.Harris et al., 1998;WMO, 1999;Kyrölä et al., 2013;Bourassa et al., 2014;Damadeo et al., 2014;Eckert et al., 2014;WMO, 2015).
For the second period (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011), calculated trends vary more between data sets, ranging from −1 to +5 % decade −1 in the upper stratosphere in all three latitude regions considered.There is, however, a clear shift from mostly negative to mostly positive trends, which is most evident in the southern mid-latitude and equatorial upper stratosphere.In the northern mid-latitude upper stratosphere, only two of the seven data sets show significant positive trends and the MDM trend is only just barely significantly positive.In the middle and lower stratosphere, trends are again mostly small and not significantly different from zero.The larger differences between data sets for the "recovery" period are perhaps to be expected since the five SAGE-based data sets use different instrument records to complete the SAGE-II record.Nevertheless, the trends calculated largely fall within the range of estimates presented in several recent analyses (Kyrölä et al., 2013;Bourassa et al., 2014;Damadeo et al., 2014;Eckert et al., 2014;Laine et al., 2013;Sioris et al., 2014;WMO, 2015).
Recent studies have also demonstrated that it remains difficult to identify stable, statistically significant positive trends in global ozone (Frith et al., 2014;Harris et al., 2015).The low-frequency variability in the ozone record (Frith et al., 2014) and the choice of start and end dates (Frith et al., 2014;Harris et al., 2015) are contributing factors, as well as uncertainty related to the proxies chosen for the regression model (de Laat et al., 2015).This study, using an "ensemble" of recently produced long-term, vertically resolved, merged satellite ozone data sets, shows a fairly wide range of estimates of long-term ozone trends; not all data sets unequivocally indicate significant positive trends in the upper stratosphere for the 1998-2011 period.However, the mean trend of all data sets (MDM) is significant and positive in the southern midlatitudes between 5 and 0.5 hPa, in the tropics between 3 and 1 hPa, and in the northern mid-latitudes at just 2 hPa.The use of more complex regression techniques that take into account spatial and temporal bias, such as that used by Damadeo et al. (2014), or that allow a latitude-and altitude-varying change point (e.g.Laine et al., 2013), may help better constrain estimates of long-term ozone trends.As newer individual satellite data sets are better understood and merging techniques become more refined, differences between data sets and derived trends may be reduced.In this context, continued high-quality long-term ozone profile measurements are essential to unambiguously identify an ozone recovery in response to decreasing ODSs within a changing climate.

Figure 1 .
Figure 1.Time coverage of the individual instruments used in each of the seven data sets considered in this study for the 1984-2011 period.

Figure 2 .
Figure 2. Latitudinal availability of data over the 1984-2011 period.The colours represent the sum of measurements used per latitudinal grid box over the entire profile for each merged data set (note that two separate colour scales are used).

Figure 3 .
Figure 3. Annual cycle averaged over the 1984-2011 period for (a) the Northern Hemisphere mid-latitudes (35-60 • N) at 2 hPa, (b) the tropics (20 • N-20 • S) at 50 hPa, and (c) the Southern Hemisphere mid-latitudes (35-60 • S) at 10 hPa.Error bars indicate ±2 standard errors of the mean, while the grey range shows ±1 standard deviation (mean of all data sets).Values (monthly values, standard deviation, and standard error) are shown only if data are available for more than half (14) of the 28 years in the time series.

Figure 4 .
Figure 4. Monthly mean ozone (ppmv) in (a) the Northern Hemisphere mid-latitudes (35-60 • N) at 2 hPa, (b) the tropics (20 • N-20 • S) at 50 hPa, and (c) the Southern Hemisphere mid-latitudes (35-60 • S) at 10 hPa.Note that the data sets are separated in the two mid-latitude plots (a) and (c) for clarity, and thus the y-axis values are shifted.

Figure 7 .
Figure 7. Annual mean anomalies for (a) the Northern Hemisphere mid-latitudes (35-60 • N) at 2 hPa, (b) the tropics (20 • N-20 • S) at 50 hPa, and (c) the Southern Hemisphere mid-latitudes (35-60 • S) at 10 hPa.Annual averages are only shown if data for more than 7 of 12 months per year were available.Note the different y-axis ranges.

Table 1 .
Summary of the latitudinal coverage and resolution as well as the temporal coverage and resolution of each of the seven data sets.

Table 2 .
Summary of the vertical range, units, and number of levels, as well as the ozone units of each of the seven data sets.

Table 3 .
Summary of the uncertainties provided and individual satellite data sets merged for each of the seven data sets.

Table 4 .
Summary of the merging approaches used for each of the seven data sets.

Table 5 .
Summary of the data screening applied to each of the seven data sets.
html with profile error flag set to 0 or 100, maximum solar zenith angle of 84 • , and maximum ResQC of 20.Monthly means are discarded if there are less than 20 measurements in a zone or if the mean latitude is greater than 1 • from the zone centre.

Table 6 .
Summary of the treatments of the diurnal cycle and references for each of the seven data sets.but data sets are adjusted to the mean of SAGE II sunsets and sunrises and sampling is most often quite uniform in time for each instrument's data set (when used).

Table 7 .
URLs for each of the seven data sets.

Tummon et al.: Intercomparison of vertically resolved merged satellite ozone data sets 2013
).Both OSIRIS and GOMOS have similar vertical resolution to SAGE-II as well as a multi-year period of overlap.

Table 8 .
Trends in % decade −1 for the 1984-1997 period at 2 hPa.Values in brackets show the 2σ uncertainty estimates, while those in bold are significantly different from zero at the 2σ level.

Table 9 .
As for Table 8 but for 10 hPa.

Table 10 .
As for Table8but for 50 hPa.Trends were not calculated for the SBUV data sets at pressures greater than 20 and 30hPa in the tropics and mid-latitudes, respectively; therefore they are not shown (NA).

Table 11 .
Trends in % decade −1 for the 1998-2011 period at 2 hPa.Values in brackets show the 2σ uncertainty estimates, while those in bold are significantly different from zero at the 2σ level.

Table 12 .
As for Table 11 but for 10 hPa.

Table 13 .
As for Table11but for 50 hPa.Trends were not calculated for the SBUV data sets at pressures greater than 20 and 30hPa in the tropics and mid-latitudes, respectively, therefore they are not shown (NA).