Observations of stratospheric ozone from multiple instruments now span three
decades; combining these into composite datasets allows long-term ozone
trends to be estimated. Recently, several ozone composites have been
published, but trends disagree by latitude and altitude, even between
composites built upon the same instrument data. We confirm that the main
causes of differences in decadal trend estimates lie in (i) steps in the
composite time series when the instrument source data changes and (ii) artificial
sub-decadal trends in the underlying instrument data. These
artefacts introduce features that can alias with regressors in multiple
linear regression (MLR) analysis; both can lead to inaccurate trend estimates.
Here, we aim to remove these artefacts using Bayesian methods to
infer the underlying ozone time series from a set of composites by building a
joint-likelihood function using a Gaussian-mixture
density to model outliers introduced by data artefacts, together with a
data-driven prior on ozone variability that incorporates knowledge of problems
during instrument operation. We apply this Bayesian self-calibration approach
to stratospheric ozone in 10

The ozone layer in the stratosphere protects the Earth's biosphere from
harmful solar ultraviolet (UV) radiation. The use of ozone-depleting substances
(ODSs), including chlorofluorocarbons (CFCs), led to a decline in ozone
globally over the latter half of the 20th century

Ozone responds to forcings from below, e.g. injections of aerosols from
volcanoes

Observational records of atmospheric ozone began with ground-based
observations in 1921

Creating an accurate record of stratospheric ozone profiles is a non-trivial
task and much work has been done at every stage, from design, construction,
and operation during flight, to post-processing and combining datasets into composites

Despite these difficulties, it is possible to account for many of these
problems. There is common information within all the composites, e.g. the
annual variability is similar in most composites

Our goal here is to provide a technique whereby the most likely ozone
variability throughout the stratosphere can be identified by using the
information embedded within multiple datasets simultaneously. The natural
approach with which to tackle such a problem is using Bayesian inference

This paper has three main parts. In the first part (Sect.

The SI2N project promoted seven ozone composites of
satellite observations, summarized in

We consider zonal mean, monthly mean ozone over the 28-year period, January
1985–December 2012, covered by all datasets. While the correction method
we present later (Sect.

The ozone instrument data and composites are already extensively detailed and
discussed in several recent papers as listed above, e.g.

A guide to the regression indices used in the trend analysis (upper third) and instrument data used to construct SAGE-based (middle third: GOZCARDS, dark blue; SWOOSH, light blue) and SBUV-based (lower third: SBUV-MOD, red; SBUV-MER, yellow) composites. Shading at SBUV-MER instrument changes indicates periods used to determine differences in annual variability and applying bias corrections between instruments. The full periods of instrument operation for datasets in these pairs are shown with multiple colours between the composites. Where SBUV data are not used for an interval, dashed lines replace solid. Between the SBUV composites, the local time of Equator crossing is shown. Where relevant, version numbers are given with instrument names; “O” and “L” indicate the satellite was a limb viewer or occultation-based instrument; SBUV instruments are all nadir viewing. Grey shading with black text highlights periods discussed in the article. Periods specifically flagged to increase the SBUV uncertainty estimates in the BASIC approach are labelled black with white text.

To determine why decadal trends from the various composites
are different requires an understanding of how they have been constructed
with satellite instrument data from multiple sources. We present a visual
reference guide for the four composites in Fig.

The two SBUV composites are built in two different ways:
SBUV-MER uses overlapping time series (shading in Fig.

The SBUV-based composites use only instruments with the same design and are
the longest single-instrument-type composites available. Both use the same
NOAA and Nimbus space-based platforms, though not always at the same time,
except that SBUV-MER uses NOAA-9 observations between 1994 and 1997 to
increase global coverage and bridge the gap in NOAA-11
(Fig.

In Fig.

Figure

While constructed by two separate teams, GOZCARDS

Because SAGE-II observes ozone number density, knowledge of local temperature
is needed to convert to volume mixing ratio. GOZCARDS uses SAGE-II v6.2, and
SWOOSH SAGE-II v7.0; the former uses NCEP reanalysis temperatures while the
latter uses the MERRA reanalysis (see

Finally, we show in Fig.

We want to combine the information from the various
composites and correctly account for uncertainties, artefacts, and drifts. To
this end, we adopt a Bayesian approach to infer constraints on the
(unknown) true time series,

Bayesian inference necessarily involves conditioning on our knowledge about
uncertainties and potential artefacts and drifts, and any prior assumptions
about the month-to-month variability, through our model which we denote as

In order to form the desired posterior distribution, we require a
probabilistic model for the data (Sect.

Our method requires uncertainties for each composite
that reflect the actual differences between the reported values and the true
state of ozone at the time of each measurement, as encoded in the likelihood
(Eq.

Instead, we seek to estimate the noise level from the data and in particular
from the discrepancies between the different composites. Estimating the
uncertainties is not the main focus of this paper, so a simple heuristic
method is used here, but this is clearly an aspect of this overall data
analysis problem which should be investigated further. Our approach is based
on a principal components analysis (PCA) of the composites to model the
differences between them, with the time-dependent noise level of each
composite then estimated from the variance of the higher-order components.
The starting point of this approach is to treat the full dataset

The PCA is implemented via singular value decomposition (SVD) in which the
mean-subtracted data matrix is factorized as

Our method of estimating the uncertainties in the composites is based on the
above reconstruction formula but is only heuristic in the sense that it does
not follow a rigorous calculation. We start by ignoring the leading,
i.e. the highest weighted, mode in

Visualization of the components of the
SVD algorithm within the BASIC approach used to estimate the uncertainty on
each ozone composite for two examples at

The steps of this method are illustrated in Fig.

From this, we form the uncertainty estimate for each of the composites in the
bottom panel,

In principle, the time series at each latitude–altitude location in the four
composites should be the same, and any deviations from the true value should
be a result of one or more of the potential reasons listed in
Sect.

The example at 10 hPa was ideal since modes were easy to associate with
artefacts within and between the composite pairs. Another example of the
usefulness of applying the SVD approach to estimate the uncertainty is shown
for 2.2 hPa and 0–10

Satisfyingly, the error estimates display higher uncertainty to individual
composites during periods already known to have anomalous behaviour
(Sect

The expected monthly ozone changes (or
“transitions”) between month

As the SVD approach is not always able to assign a known artefact explicitly
to a specific composite, it is necessary for us to provide additional
information regarding the composite uncertainties, whereby in three cases we
increase the estimated uncertainty by a factor of 2. These are (i) when an
instrument changes in a composite, which is appropriate since there are many
examples of jumps in a composite on, or immediately after, these dates (e.g.
Fig.

With estimates of the uncertainties on each composite, we can construct the
joint-likelihood function for the set of composites as a product over the
individual likelihoods at each time step (indicated by

A common assumption would be that, ordinarily, the likelihood for a single
measurement would be taken to be a normal distribution with a mean given by
the true value,

When the multiple measurements of the different composites are combined in
the product over

We factorize the prior into a product of transition priors
for each month-to-month transition, i.e.

The transition prior provides a way to estimate if measurements of ozone values from the composites in the month being evaluated are more likely or not and hence provide a way of assessing anomalous behaviour. The annual, or semi-annual, variability that makes up the seasonal cycle, is the largest mode of ozone variability. It is also a relatively consistent mode, so together with information from the observations, it can provide a way to help differentiate between artefacts and real anomalous behaviour.

We form the transition prior from all four composites together. Two examples
are given in Fig.

With the likelihood (Sect.

When constructing the month-to-month transition prior as described above, we
use the data to estimate and fix the prior's hyperparameters, i.e. the means
and variances of each month-to-month transition (January–February, February–March, etc.). This
is using the data twice – once to construct the transition prior and once
in the main posterior inference. However, we note that estimating and fixing
the hyperparameters from the data is an approximation, similar to “empirical
Bayes” methods, to a full Bayesian hierarchical treatment where the
parameters of the prior would be kept as free unknown parameters and inferred
jointly with the true ozone time series. In cases where the hyperparameters
are tightly constrained by the data and do not strongly co-vary with the
parameters of interest (here the underlying ozone time series), estimating
and fixing the hyperparameters from the data before the main analysis is an
excellent approximation to the full hierarchical model.

We leave a
more careful hierarchical analysis to future work, expecting this
approximation to have a small impact on the results, but outline the full
hierarchical model briefly below for completeness. In the generative
hierarchical model, the true ozone time series are generated from the
transition prior as

We designed synthetic tests to evaluate whether the BASIC approach was
effective in retrieving the “true” ozone time series given a set of four ozone
composites that had jumps, drifts, and noise, similar to those we encounter in
the existing datasets. Overall, we found the BASIC approach to be successful
at estimating ozone and, in particular, better than any individual composite
that contains artefacts. These synthetic tests are presented in
Sect.

The BASIC composite result for the 0–10

Another example, at the higher pressure of 10 hPa, is given in
Fig.

Finally, to show how the BASIC approach operates in a completely different
regime to that near the Equator, in Fig.

Ozone time series at three stratospheric
locations from 1985 to 2012, all bias shifted to the mean of SWOOSH after
August 2005.

Ozone time series at two stratospheric
locations from 1984 to 2014, all bias shifted to the mean of SWOOSH after June
2005.

In Sect.

In Fig.

The most significant problem in creating a unified calibration for all SBUV
instruments is the orbital drift

The apparent high scatter at 2.2 hPa in all differences involving SAGE composites
(i.e. Fig.

The drift between the SAGE composites prior to 1991 (Fig.

A small downward step in the SAGE composite difference in Fig.

A prominent feature in Fig.

Following the eruption of Mt. Pinatubo in June 1991, there is a large drop
in SBUV-MER at 10 and 16 hPa due to interference in viewing from volcanic aerosols
(not shown here, but see Fig.

For completeness, steps in the SBUV composites in Fig.

Now that we have established the validity of the BASIC approach and
constructed an ozone composite from GOZCARDS, SWOOSH, SBUV-MOD, and SBUV-MER,
we turn to analysing trends and modes of variability. This is often performed
using MLR

We perform MLR analysis on deseasonalized time series (i.e. by
subtracting monthly means) using five regressors: the F30 radio flux (solar),
which is superior to the F10.7 cm radio flux for representing solar UV
variability

We perform a DLM analysis following
very closely the model and formalism of

Our DLM analysis follows

The percentage change in ozone from DLM between 1985 and 1997

Here, we present estimates of changes in ozone between 1985 and 1997, and
between 1998 and 2012 (Fig.

Typically, ozone trends are reported as linear decadal percentage changes in
three latitude bands in the Southern Hemisphere
(60–35

It does not make sense to provide a linear trend estimate for the non-linear
DLM background trend. Instead, in Fig.

In the earlier period (1985–1997), the DLM and MLR profiles agree well
(within the DLM uncertainty). The DLM-BASIC typically displays better
agreement with the GOZCARDS profiles than the others in the northern and
southern midlatitudes, but the mean profile is generally closer to that of
SBUV-MOD over the Equator. Indeed, above 4 hPa, SWOOSH is typically at or
outside the BASIC composite 95 % credible interval in northern and equatorial
bands (this is also the case with MLR). Interestingly, the SBUV composites
are often outside the MLR-BASIC uncertainty range above 7 hPa at
midlatitudes in both hemispheres; DLM uncertainties are larger and the four
composites are in closer agreement when trends are analysed using DLM. This might
hint that MLR is being biased by residual variance and/or underestimating
error bars, in contrast to DLM, as was observed in the test cases (see
Sect.

The results for the latter period, 1998–2012, show a significant positive
trend in the upper stratosphere above 7 hPa, as expected to occur following
the implementation of the Montreal Protocol. The result is significant in
every dataset analysed with DLM in both the northern and southern
midlatitudes for at least one pressure level; for the BASIC composite, the
result is clear at multiple altitudes. We note that the MLR results are only
statistically significant at northern midlatitudes for both SBUV composites
and for all composites in the southern midlatitudes at 3.2 and 4.6 hPa.
There are also statistically significant differences between the mean
MLR-BASIC and the DLM-BASIC profiles over the Equator and at northern
midlatitudes; in the southern region, DLM profiles for composites are less
consistent than when using MLR, but the DLM-BASIC results are in good
agreement. The DLM profile shapes in the Northern Hemisphere are consistent
with each other, with a negative trend in the lower stratosphere, though
usually insignificant at the 95 % level, and a positive response in the upper
stratosphere, confirming the result of

The percentage change in ozone (left
axis) relative to 1998 (vertical dashed line; horizontal zero line) for the
integrated latitude bands 60–35

In Fig.

The uncertainties presented in Fig.

Figure

It is interesting to note that the two 1998–2012 midlatitude BASIC
composite profiles in Fig.

We propose that the profiles determined by DLM-BASIC are likely to be a
better representation of the change in stratospheric ozone than previous
estimates. We base this conclusion upon the knowledge that (i) the BASIC approach was
successful in identifying and correcting most known artefacts in the ozone
composites, (ii) the DLM performed better than the MLR in the artificial
ozone time series test cases, and (iii) the DLM-BASIC outperformed both
MLR-BASIC and DLM of all the “artefact-damaged” artificial time series. The
consistency of independent northern and southern midlatitude DLM profiles
for both periods would suggest that additional explanation for why the
different hemispheres should evolve in different ways is not required

We have presented a novel approach to identify and account for data artefacts
that remain in multiple ozone composites of satellite observations. These
artefacts are one of largest remaining causes of disagreement between decadal
trend estimates made from the many composites available. Our approach
includes estimates of uncertainties using singular value decomposition, a
Gaussian-mixture outlier model for the likelihood, and prior information in
the form of expected monthly transitions and knowledge of problems in ozone
observations; these are combined via Bayesian inference. The main output of
this process we term the BAyeSian Integrated and Consolidated (BASIC)
composite, which has been designed to account for differences in ozone
composites that are constructed in different ways and with observations from
different sources. The need for better approaches to combine ozone composites
has been raised in recent years as an issue needing resolution (e.g.

The presence of data gaps, biases between instruments, and issues with
sampling, noise, and differences in resolution also enhance uncertainties in
trend estimates, which might lead to artificial trends being extracted in
multiple linear regression (MLR) analysis. To avoid this, we employed, with
refinements, dynamical linear modelling (DLM)

The results presented here are a step forward, but we do not consider the
composite a definitive and final product; there are still issues to resolve,
which we extensively discuss (Sect.

From the DLM analysis, the estimated changes in ozone between 1985 and 1997,
and then between 1998 and 2012, show good agreement with the shape of the
ozone profiles presented by

We will make the BASIC composite available and
provide supporting documentation should the composite be updated. The
composite is available for public use at

The BASIC composite
is available at

In the construction of SBUV-MER, ozone was considered in
5

In the construction of SBUV-MOD,

Due to the low temporal sampling of SAGE-II (15 sunrise/sunset events per
day), as opposed to the

In SWOOSH, basic data prescreening is based on published recommendations
from satellite instrument teams. SAGE-II ozone screening follows the
recommendations of

We briefly note (and indicate in Fig.

In Fig.

It is clear that for either low values of

In terms of its effect on the BASIC composite time series, when combined with
a prior expectation, this can lead to the expected time series following one
pair (in the example given in Fig.

Example of Box–Tiao effect on idealized
data with a mean of

In Figs.

Recovered posteriors on

Recovered posteriors on

Recovered posteriors on

Recovered posteriors on

BASIC composite results in the main article uses SWOOSH
data version 2.6. We originally used version 2.5 (version 2.1 was used by

In Fig.

This example gives us further confidence that when multiple composites are available, the BASIC approach does a good job of accounting for artefacts that exist in only one dataset.

Ozone time series from 1985 to 2012, all
bias shifted to the mean of SWOOSH v2.6 after August 2005.

Given that we do not have any certain measurements against which to
test our approach, we need to demonstrate how the BASIC approach operates in
ideal, known conditions by using artificial test cases where all the
variance is understood. With that in mind, we designed three sets of tests;
we present one here and consider DLM and MLR analysis on the other two in
Sect.

To create test cases, we took a real ozone time series and from that
estimated the regression coefficients of solar, ENSO, volcanic aerosols, and
two QBO terms using MLR (as in Sect.

We specifically built the artefact time series to provide difficulties for the
BASIC approach. For example, in Fig.

A test case to evaluate the performance
of the BASIC approach. Damaged time series are plotted in panel

Similar to Fig.

So far, we have discussed several drawbacks with the current version of the BASIC approach presented here. Here, we collate and list these, and briefly discuss potential solutions for the future, where available.

The decadal trend in ozone from multiple
linear regression (MLR) between 1985 and 1997

Vertical resolution: This is a problem related to the different averaging
kernels of the various instruments used to construct the composites – the SAGE composites
use instruments that all have higher resolution than those in the SBUV composites. This
difference in vertical resolution becomes more important at lower altitudes, and it is
clear in the case of the QBO signal being different

Double counting: The use of only two pairs of composites, each built using the
same underlying instrument data, resolves one of the concerns of

Restricted altitude range: We currently only consider the pressure range
47–1 hPa (

Restricted latitude range: While the composites extend to higher latitudes than
60

Mt. Pinatubo: The example given at 10 hPa, and checks at other locations,
clearly indicates that the BASIC approach is able to avoid the artificial decrease in the
SBUV-MER data between June 1991 and 1992.

Some of these caveats may be resolved with additional information from the
ozone community and by using the BASIC approach to construct a composite from
the original, individual instrument time series. Nevertheless, for the work
involving composites here, we conclude that despite these issues, overall the
BASIC approach performs well in estimating ozone variability. This conclusion
is based upon the artificial test case target time series being well
estimated, the results of the example real ozone time series presented in
Fig.

To test the ability of MLR and DLM to estimate the
background trend, we use the artificial test cases presented in
Sect.

One major advantage of DLM over MLR for estimating long-term trends is that
MLR requires the trend to be prescribed in advance as linear, or piecewise
linear trends (e.g.

In Fig.

Note that in these test cases for our DLM inference we
assume a half-Gaussian prior on

In summary, our tests suggest that when estimating the long-term trend, the use of the BASIC approach to correct data, together with the DLM, is more successful and accurate than using MLR or DLM on uncorrected time series. Therefore, we would recommend using the BASIC approach combined together with the DLM for the analysis of long-term trends in ozone, as outlined in this study.

The authors declare that they have no conflict of interest.

We thank the referees, Daan Hubert and Marko Laine, for their thorough input
which led to significant but important changes that strengthened the paper
as a result. We thank Stacey Frith,
Jeannette Wild, and Lucien Froidevaux for detailed
comments on the composite datasets and general comments on the manuscript. We
also thank the GOZCARDS, SWOOSH, SBUV-MOD, and SBUV Merged Cohesive composite
teams for use of their data. William T. Ball and Eugene V. Rozanov
were funded by
Swiss National Science Foundation (SNSF) grant 200020_163206 (SIMA).
Fiona Tummon was funded by SNSF grant 20F121_138017. GOZCARDS ozone data can
be found at