Long-term changes in UT/LS ozone between the late 1970s and the 1990s deduced from the GASP and MOZAIC aircraft programs and from ozonesondes

. We present ozone measurements of the Global Atmospheric Sampling Program (GASP) performed from four commercial and one research aircraft in the late 1970s to compare them with respective measurements of the ongoing MOZAIC project. Multi-annual averages of UT/LS ozone were built using the aircraft data sets (1975–1979 and 1994– 2001), and long-term changes between the 1970s and 1990s were derived by comparison. The data were binned relative to the dynamical tropopause to separate between UT and LS air masses. LS data were analysed using equivalent latitudes. In the UT, pronounced increases of 20–40% are found over the Middle East and South Asia in the spring and summer seasons. Increases are also found over Japan, Europe, and the eastern parts of the United States depending on season. LS ozone over northern mid- and high latitudes was found to


Introduction
The upper troposphere and lower stratosphere (UT/LS) constitute regions of major concern for both climate impact and the surface environment. Because of the radiative properties and temperature structure of the atmosphere, changes in ozone have their largest impact on climate when they occur in the UT/LS (Forster and Shine, 1997).
UT/LS ozone is determined by both transport and chemistry depending on region and season of the year. The relative contributions of different processes are expected to vary strongly across the tropopause. At these altitudes, ozone is produced from precursor substances including nitrogen oxides (NO x ), volatile organic compounds (VOCs), and carbon monoxide (CO) under the influence of sunlight and destroyed mainly in reactions with HO x radicals (e.g., Rohrer, 1995). A relevant natural source of NO x in the UT/LS is lightning (Schumann and Huntrieser, 2007), while emissions from aircraft constitute an important anthropogenic source. NO x , VOCs, CO, as well as ozone itself also reach the Published by Copernicus Publications on behalf of the European Geosciences Union. 5344 C. Schnadt Poberaj et al.: UT/LS ozone changes from GASP/MOZAIC and ozonesondes UT/LS by convective mixing from the boundary layer (e.g., Lelieveld and Crutzen, 1994;Bertram et al., 2007) and upward large-scale transport, e.g. in the warm conveyor belts of mid-latitude cyclones (e.g., Stohl et al., 2003). Downward transport from the stratosphere is another important term in the ozone budget of the UT/LS. Any significant long-term change in one of the ozone sources, e.g. in anthropogenic surface NO x emissions, or changes in the downward flux of ozone from the stratosphere (Collins et al., 2003;Ordóñez et al., 2007), must thus be considered a possible cause contributing to UT/LS ozone changes. Several modelling studies emphasize the close relationship between industrial development and changes in tropospheric ozone in large regions of the world (e.g., Levy II et al., 1997;Berntsen et al., 2000;Lelieveld and Dentener, 2000;Fusco and Logan, 2003;Grewe, 2007) indicating that anthropogenic activities significantly contribute to long-term changes of tropospheric ozone. Knowing how ozone has changed in the troposphere and in the lower stratosphere is therefore of crucial importance for understanding changes in the UT/LS.
Until now, most information on long-term trends in UT/LS ozone is based on regular ozonesonde records from a limited number of stations across Europe, North America, and Japan providing measurements back to the late 1960s, more than a decade earlier than the beginning of ozone satellite measurements (e.g., Logan, 1985Logan, , 1994Logan et al., 1999). However, the trends derived from these individual sites may not be representative on a larger, i.e. hemispheric or global scale. In addition, the available data sets may not be statistically robust enough to precisely define trends, especially in the presence of large interannual variability, which tends to obscure trends. As a consequence, trend estimates that appear to differ between regions, or between different analyses in the same region may lie simply in the statistical imprecision of the results.
The current state of knowledge of tropospheric ozone trends based on ozonesonde measurements can be summarised as follows: over Europe and Japan, significant increases were reported in the 1970s and 1980s (Logan, 1994), whereas overall trends have levelled off, or have possibly become slightly negative since the beginning of the 1990s (Logan et al., 1999;Claude et al., 2004;Jeannet et al., 2007;Oltmans et al., 2006). In Canada, ozone trends were negative for the period 1980-2001 throughout the troposphere and lower stratosphere (all levels below 20 hPa) (Tarasick et al., 2005). Long-term ozonesonde observations in the United States are available from the Wallops Island station in Virginia with measurements beginning in 1970. In contrast to the changes at the European and Japanese sites, the reported long-term overall tropospheric change is small (<5%) for the period 1970period -2003period (Oltmans et al., 2006. In the LS, large negative trends were found below 18 km over northern midlatitudes for the period up to the mid-1990s (WMO, 1992(WMO, , 1999SPARC, 1998;Logan et al., 1999). Recent estimates of trends in the LS for the period up to 2004 show no decrease at 15 km (WMO, 2007) and ozone mixing ratios at 13-16 km are similar to those in the early 1980s (WMO, 2007;their Figs. 3-10). The differing trends for the earlier and later periods are due to a decline in ozone during the 1980s and early 1990s followed by a rapid increase thereafter. The eruption of Mt. Pinatubo in 1991 causing enhanced stratospheric aerosol loading and hence increased stratospheric ozone loss in 1992 and 1993, contributed to this evolution (WMO, 1999).
Besides balloon profiles, information on UT/LS ozone can be gained from regular aircraft measurements. This work presents a comparison of ozone measurements from two long-term aircraft programs in the 1970s and 1990s, both providing regular and large-scale ozone measurements of the Northern Hemisphere UT/LS region. The first project, the NASA Global Atmosphere Sampling Program (GASP), was carried out from 1975 to 1979 on commercial airliners and one research aircraft to regularly measure ozone and other trace species (e.g., Falconer and Holdeman, 1976;Nastrom, 1977Nastrom, , 1979. Using GASP ozone data, multiannual ozone averages representative of the second half of the 1970s were derived  hereafter referred to as SP2007). The second program, the Measurement of Ozone and Water Vapor by Airbus in Service Aircraft Program (MOZAIC) has been in operation since 1994 (Marenco et al., 1998;Thouret et al., 1998aThouret et al., , b, 2006. Within MOZAIC, commercial aircraft have been equipped with fully-automated instruments to measure ozone and water vapour during in-service flights. The MOZAIC data have recently been evaluated to document substantial increases in tropospheric ozone since 1994 extending from North America over the North Atlantic and Europe to Japan (Zbinden et al., 2006;Thouret et al., 2006).
In this paper, we present UT/LS ozone changes derived from the comparison between GASP and MOZAIC data. For an optimal representation of specific changes in the UT and LS, the analysis distinguishes between tropospheric and stratospheric air masses using local dynamical tropopause information. In addition, we compare long-term ozone changes derived from the aircraft measurements with those obtained from selected ozonesonde stations.
In the next section, the data sets and methodology used are presented. Section 3 presents the results of UT and LS ozone changes derived from the comparison of GASP and MOZAC aircraft data, and discusses these changes in comparison with respective results from ozonesondes. The last section contains summary and conclusions.

The GASP and MOZAIC aircraft programs
The GASP program, its ozone measurement system, as well as its quality assurance and control procedures have recently 5345 been extensively reviewed by SP2007. Detailed descriptions of the MOZAIC program can be found in a special issue of the Journal of Geophysical Research from 1998 (Marenco et al., 1998;Thouret et al., 1998a, b) and at the MOZAIC website http://mozaic.aero.obs-mip.fr. Only the main characteristics of both programs will thus be summarised here.
Within GASP, four in-service B-747 of United Airlines (1), Pan Am (2) and Qantas (1), as well as the NASA CV-990 research aircraft were equipped with automated instrument platforms to measure ozone, aerosols, condensation nuclei, water vapour, and carbon monoxide. Data are available from March 1975 to June 1979 when the funding for the program was cut. Altogether, the GASP period contains 6149 measurement flights. Measurements were carried out in the middle and upper troposphere and the LS at altitudes between 6 and 13.7 km. The program mostly covered the North Atlantic and Pacific Oceans, as well as the North American continent, but also to a lesser extent Europe (SP2007; their Fig. 1a). On average, eight data records were taken per hour. Assuming the speed of a Boeing 747 to be about 900 km/h at cruise altitudes, this results in in-situ observations at approximately every 110 km.
MOZAIC was launched in January 1993, and has been ongoing since then. Five Airbus aircraft of Air France, former Sabena, Lufthansa (2), and Austrian Airlines have been equipped with fully automated instruments to measure ozone and water vapour during in-service flights. Observational data are available since August 1994. MOZAIC observations cover large parts of North America, Europe, Asia, and also Africa and South America (SP2007; their Fig. 1b). In this study, the pre-processed one minute average data are used for the period of August 1994 through December 2001. Horizontally, the one minute averaging results in a record representative of approximately 15 km flight path. Between August 1994 andDecember 2001, 14 558 flights are available consisting of 113 008 flight hours. The chosen period ending in December 2001 results from the availability of ECMWF 40-year reanalyses used in this study and the specific processing of the aircraft and ozonesonde data for the separation between troposphere and stratosphere (cf. Sect. 2.3).
The principle of GASP and MOZAIC ozone measurements is a standard ozone monitoring technique based on the absorption of UV light by ozone at 253.7 nm (e.g., Thouret et al., 1998b;Dias-Lalcaca et al., 1998;Klausen et al., 2003). The ratio of the absorption signal, when alternately determining the transmittance of light in a sample of ozone-containing and ozone-free air, yields the ozone concentration making use of the Lambert-Beer law. Note that both GASP and MOZAIC use the same value for the absorption coefficient according to Hearn (1961) (308.5 cm −1 atm −1 ). Whereas the GASP instrument was a commercially available ultraviolet (UV) photometer manufactured by Dasibi Environmental Corporation (Tiefermann, 1979), the MOZAIC analyzer is a Thermo-Electron dual-beam UV absorption instrument (model 49-103) (Thouret et al., 1998b).
The GASP data were thoroughly quality checked before use (SP2007; Detwiler et al., 2000): the ozone monitors were exchanged two to four times per year for calibration and functional tests in the laboratory showing that the stability of the instruments' sensitivity over the course of one year changed by less than 1%. Ozone scrubbers were exchanged every three months to prevent degradation with time and hourly in-flights tests were carried out to test the instruments zero with the monitoring over time providing an indication of the instruments' stability. Probably the most critical problem of the GASP system was ozone destruction occurring upstream of the ozone monitor. This was regularly monitored by an ozone-destruction test package, and ozone values were corrected accordingly. As described in SP2007, three major issues had to be treated before using the GASP ozone data: 1.) a general 9% high bias (Tiefermann, 1979) was eliminated, 2.) periods of high frequency sampling (up to 16 samples per minute) were downgraded to one-minute averages from which only a single one-minute average within any five minute interval was stored for consistency with the normal operating procedure, and 3.) erroneous readings that were not flagged in the routine data archival procedure were removed. More details on the pre-processing of the GASP data are given in SP2007.
Concerning MOZAIC quality assurance and control (QA/QC) procedures, the instrument efficiency is checked for drifts during every flight, and ozone analysers are carefully calibrated against a reference analyser on an annual basis (Thouret et al., 1998b). Over the 1994-2001 period, accuracy and precision of the instruments did not exhibit any significant variation (SP2007).

Ozonesonde data
Ozone profile data from light balloon ascents at selected sites during 1975-1979 and 1994-2001 have been used for comparison with the aircraft data. The data were obtained from the World Ozone and Ultraviolet Radiation Data Centre (WOUDC) (http://www.woudc.org). Stations considered in this study are the European stations Payerne (Switzerland) (Jeannet et al., 2007), the Meteorological Observatory of Hohenpeissenberg (MOHp) (Germany) (Köhler and Claude, 1998), and Uccle (Belgium) (De Backer, 1999), as well as the US station of Wallops Island (Oltmans et al., 1998(Oltmans et al., , 2006. All stations have consistent time series and provide a sufficient number of ascents during the 1970s. An overview of these stations, the number of ascents, and sensor type is given in Table 1. Cooper et al. (2005) reported that springtime Wallops Island sonde data of the years 2000-2003 were biased toward dry conditions in the middle troposphere and that this bias was associated with a high bias in tropospheric ozone. If such a bias also existed in 1994-2001 and 1975-1979, long-term UT changes would be affected, and comparing with aircraft changes would become difficult to interpret. We used daily NCEP reanalysis 1 (Kalnay et 1975-1979 and 1994-2001 multi-annual seasonal means a) using all available daily data and b) only those dates where launching occurred. Comparing the a) and b) averages for the 1970s and 1990s, slightly dryer conditions by ≈10% were identified in the sonde launching means in most seasons, but not as pronounced as in April/May 2000-2003 (see Cooper et al., 2005). However, comparing the 1990s ratios b/a (91%, 91%, 100%, and 92% in DJF, MAM, JJA, and SON, respectively) with differences between MOZAIC and Wallops Island UT ozone profiles in Fig. 9, no clear dependency of the seasonally dependent water vapour ratio on UT ozone differences could be identified. Thus, the effect on long-term averages may be assumed to be minor. At the European sonde stations, data are obtained regularly on fixed days of the week, and therefore no fair weather bias is to be expected. The Japanese stations are excluded as the number of ascents during 1975-1979 is too small to build reliable multi-annual averages. The Canadian stations are not included since the change of sensor type from Brewer-Mast (BM, Brewer and Milford, 1960) to Electrochemical Concentration Cell (ECC, Komhyr, 1969Komhyr, , 1971 at the end of the 1970s leads to significant breaks in the time series (Tarasick et al., 1995). For more details on processing and corrections of the used ozonesonde data see Appendix A.

Method of data analysis
The GASP and MOZAIC aircraft and ozonesonde data analysed in this study have basically been processed in the same way as in SP2007. Therefore, only the main features of the methodology will be summarised here: To account for large vertical ozone gradients in the UT/LS, all measurements were scaled against tropopause altitude. To discriminate between tropospheric and stratospheric air, the potential temperature of the dynamical tropopause at 2 PVU was used: Data were classified tropospheric or stratospheric by calculating the difference between the potential temperatures at cruise altitude, derived from the aircraft data, and at the tropopause, interpolated from fields of the ECMWF reanalysis data set (ERA40, Uppala et al., 2005) (see SP2007). In the LS, all data have additionally been arranged into the equivalent latitude (EL)/potential temperature framework similarly as in Hoor et al. (2004) and Hegglin et al. (2006). A detailed description of the method is given in Hegglin et al. (2006). In addition, beyond the criteria used in SP2007, the data selection criteria were refined for the purpose of deriving longterm differences: -UT ozone was not allowed to exceed a seasonally variable upper limit in mixing ratio to avoid aged stratospheric air, mixed into the UT, to significantly bias individual averages in regions with limited sample sizes. This problem especially applies to some parts of the GASP and sonde data. In these cases, individual flights measuring anomalously high UT ozone significantly distorted the typical UT frequency distribution shifting the mean to higher values. Therefore, to remove anomalously high ozone from the GASP data set, probability density functions of MOZAIC UT ozone were used to define seasonally dependent upper limits, assuming that MOZAIC sample sizes are sufficiently large to derive representative frequency distributions. Using cutoff values of 80 ppbv, 120 ppbv, 120 ppbv, and 90 ppbv in DJF, MAM, JJA, and SON, respectively, 97% (95%), 98% (97%), and 99% (99%) of the MOZAIC (GASP) samples at middle, subtropical, and tropical latitudes were considered really tropospheric, respectively. We acknowledge that by removing aged stratospheric air from the tropospheric samples, we may potentially miss a certain contribution on long-term UT ozone changes by stratosphere-troposphere (STE) exchange processes suggesting the possibility of a) an increased frequency of STE and/or b) changed ozone concentrations entering the troposphere. However, not applying the cutoff values for MOZAIC does not significantly alter UT ozone mixing ratios (not shown). Thus, it can be assumed that the effect of the cutoff values is minor in the case of a well-defined frequency distribution and it may help to restrict GASP UT ozone to more typical mixing ratios.
-Due to the presence of a double tropopause (e.g. a fold), ECMWF potential vorticity interpolated onto the sonde and aircraft positions was occasionally above 2 PVU even when samples were identified as being located below the 2 PVU tropopause. These samples were eliminated from the UT data.
-Only aircraft and sounding data in the pressure range between 330 hPa and 195 hPa (≈8.5 km-11.9 km) were included in the analysis, as this represents the predominant aircraft cruising range (SP2007, their Table 1). For UT analysis, an even narrower range of 330 hPa-235 hPa (≈8.5 km-10.8 km) was selected. The higher levels were excluded as they mostly represent the tropopause region with only little data from the UT.
-Careful pre-processing of the measurement data was also applied concerning the climatological averaging process: In order to increase the weight of measurements taken on different days as compared to the same number of measurements but obtained on a single day, all multi-annual averages were computed from daily means rather than from the individual measurements.
To evaluate differences in UT ozone between 1975-1979 and 1994-2001, differences between multi-annual seasonal averages of both periods were calculated as horizontal distribution on a 10 • ×10 • map. In addition, to more quantitatively evaluate changes, multi-annual regional averages were calculated for all extratropical regions and the tropical South China region (S CHINA) defined in SP2007. The averages were formed by first building daily means over a given region and then averaging, as a function of season, over all daily means in all years. For the purpose of comparing the aircraft with ozonesonde data, the averaging region over Europe (EUR) was reduced in size (EUR SONDE, 42 • N-57 • N and 5 • W-20 • E) to exclude regions too remote from the sonde stations to influence the average. For S CHINA, ERA40 pressure at the thermal tropopause was used to identify tropospheric air masses as the 2 PVU tropopause is not suitable in the tropics. An additional region, Middle East (ME), has been included in the analysis due to its importance concerning long-term changes; it extends from 25 • N-45 • N and 30 • E-60 • E. For comparison of GASP/MOZAIC with sonde data at Wallops Island (37.9 • N, 75.5 • W, cf. Table 1), the aircraft data were averaged over 30-50 • N and 60-90 • W (East USA). This region extends further south than the NE USA region used previously in SP2007 to document the seasonal cycle of UT ozone in the 1970s. In some cases, longterm differences presented in the horizontal distributions and in the regional averages may appear inconsistent, as e.g. over Northern India (N IND) in autumn (cf. Sect. 3.1). However, it needs to be realised that stricter rules apply for calculating long-term differences over the 10 • ×10 • boxes than over the larger averaging regions, as within every box, at least 10 daily means need to be available to calculate long-term averages. For this reason, computing multi-annual means over a larger region may be allowed, while in subregions it may be not, due to a too limited number of daily means.
In the LS, the statistics of GASP observations is too limited to allow for a regionally resolved analysis of ozone changes. Instead, quasi-zonal mean differences between MOZAIC and GASP multi-annual means were calculated as function of EL and potential temperature distance from the tropopause for 5 • ×10 K grid cells. Longitudinal variability is assumed to be small in this co-ordinate system due to the approximate conservation of PV and potential temperature during transport in the stratosphere and the long lifetime of ozone. To ensure representativeness of GASP (MOZAIC) multi-annual means, averages were only calculated if at least 30 (50) daily means in at least three (six) years were available in every grid box. The methodology leads to sufficiently large GASP samples in most grid boxes. However, distributing the relatively few sonde data of the 1970s over EL results in rather small sample sizes in some grid boxes (<10 daily means). This problem is particularly significant in summer and autumn when the tropopause is high. Still this approach was preferred over further latitudinal averaging due to the presence of pronounced latitudinal gradients in LS ozone.
The confidence intervals, displayed in Figs. 2, 7, 8, 9, and 10 illustrating the reliability range of the differences between the periods were calculated according to where n G and n m denote the sample sizes of GASP and MOZAIC data, respectively, and σ 2 G and σ 2 m the sample variances. t designates the cutoff value in a Student's tdistribution depending on a selected probability P = 0.05 and the degrees of freedom (Df = n G +n m −2). Serial independence of measurements required for computing the confidence intervals is provided, as daily means were used for calculating regional multi-annual averages (see above). No evidence for persistence in the regional daily mean ozone time series could be identified (not shown).
Over some regions and in some seasons, GASP 1975-1979 climatological means are heavily biased toward the year 1978. This is particularly true for GASP UT ozone over the western and northeastern United States, the Atlantic, Europe, and Northern Japan in summer and autumn (Sect. 3.1). Similar biases also exist for LS ozone in summer, and only at high latitudes >50 • N EL, in autumn (Sect. 3.2). There are indications that the year 1978 was exceptional in terms of tropospheric ozone: annual means at Wallops Island showed maximum values in 1977 and 1978 in the sampling record between 1970 and 1995 (Oltmans et al., 1998;their Fig. 2). Less pronounced, but still relatively high annual mean values multi-annual averages (in ppbv) and relative difference between GASP and MOZAIC UT ozone (MOZAIC-GASP/GASP) (in %) (C) as function of latitude and longitude. Data have been averaged over a 10 • ×10 • grid. Data were identified tropospheric using the 2 PVU tropopause in the extratropics and the thermal tropopause in the tropics (latitudes <35 • N). A: Black boxes indicate regions for calculating regional means of UT ozone. For more details see text. A and C: Grey triangles denote where GASP data are biased toward one year (≥50% from one year), and pink triangles where GASP data are available from three years only. C: Hatched boxes indicate where differences are statistically significant at the 95% level. Differences have only been displayed where data from at least three years are available for the GASP period and number of daily means available for averaging is ten or more.   Table 3. ME region is specified in the text). DJF: blue bars, MAM: green bars, JJA: red bars, and SON: orange bars. Relative differences have only been displayed if 10 or more daily regional averages are available for both GASP and MOZAIC periods. Vertical bars indicate 95% confidence intervals of differences. Bottom rows of numbers represent numbers of daily means available for averaging. Upper row: GASP, lower row: MOZAIC. Numbers are coloured in orange (red) if GASP or MOZAIC data are biased toward one year: 50-75% of data from one year (>75% from one year). Grey triangles mark regional averages for which data from three years only are available for averaging. are also visible in 1978 and 1979 at 500-300 hPa at MOHp (Oltmans et al., 1998;their Fig. 2). Additionally, the GASP UT time series over Europe show that 1978 summer values were anomalously high in the period 1975-1979 (not shown). Unfortunately, there are not enough GASP summer flights over the eastern USA and the Atlantic during other years than 1978 to confirm the pattern for these regions. Long-term changes in regions with a sampling bias toward the year 1978 therefore need to be interpreted with care.

Results and discussion
In this section, changes of UT/LS ozone between the late 1970s and the second half of the 1990s deduced from the GASP and MOZAIC data sets are described separately for the UT (Sect. 3.1) and the LS (Sect. 3.2). Section 3.3 adds a comprehensive comparison with long-term changes deduced for BM and ECC ozonesondes for both LS and UT, and Sect. 3.4 contains the discussion of results.

UT ozone changes between 1975-1979 and 1994-2001
Long-term relative differences between GASP and MOZAIC UT ozone are shown as horizontal distributions in Fig. 1 and as regional averages in Fig. 2. In the following, we also compare the aircraft data with tropospheric ozone changes de-rived from other measurements except from regular ozone sonde stations, which will be discussed in Sect. 3.3. Over the western parts of the United States, estimated long-term differences are very small (≤±5%) in all seasons ( Fig. 2, W USA). However, the changes must be considered with care due to very limited availability of MOZAIC data for this region. Moreover, the summer and autumn changes derived from the GASP and MOZAIC samples are strongly biased to one year of data. Differences may, therefore, not be indicative of long-term changes. Unfortunately, over western North America, there are no other long-term measurements in the UT/LS for comparison. At the surface, ozone maxima at polluted sites in California show a continous decrease since the 1970s (Grosjean, 2003) attributable to the strong decrease in regional ozone precursor emissions (Martien and Harley, 2006). In contrast, analyses of marine boundary layer ozone based on measurements along the west coast (Parrish et al., 2009) and of rural and remote ozone across western USA (Jaffe and Ray, 2007) show increases since the (late) 1980s. However, at the ECC ozonesonde site at Boulder, Colorado (40.0 • N, 105.3 • W), no long-term trend could be detected in the free troposphere for the period 1985-2004 (Jaffe and Ray, 2007).
In contrast, over the northeastern parts of the USA (NE USA), differences between the 1970s and 1990s are mostly positive. Largest changes are seen in winter where ozone increased by around 20% on average. Smaller, but still mostly positive differences of about 10% are found in MAM. In Fig. 3. Probability density function of UT ozone mixing ratios for the East USA region and DJF. The figure shows the GASP (red) and MOZAIC (black) distributions. Additionally, mean, median, and the ensemble size (i.e., number of daily regional averages) for building multi-anual means over the region are indicated.
summer and autumn, no indications for long-term changes are given. However, the GASP data both over the NE USA and ATL regions (cf. discussion below) are strongly biased toward the year 1978 in JJA and SON. Thus, since ozone in these regions was probably rather high (cf. Sect. 2.3), true summer and autumn long-term changes over NE USA and ATL might be somewhat larger than derived from our analysis.
The surface station at Whiteface Mountain (WM) (44.4 • N, 73.9 • W, 1480 m) in the eastern United States documents increases between the 1970s and 1990s that are of similar magnitude as found in our study in the UT (Oltmans et al., 1998;their Fig. 1). The seasonal dependence of surface trends qualitatively agrees with our results showing largest increases in DJF and smallest in JJA. Obviously, wintertime increases at WM are due to a shift away from lower values in the frequency distribution (Oltmans et al., 2006;their Fig. 12). Such behaviour has also been observed at many stations in Europe and was attributed to the decrease in NO x emissions resulting in reduced titration of O 3 by NO in wintertime (e.g., Jonson et al., 2006). Evaluating probability density functions of GASP/MOZAIC ozone, a similar pattern is found for Eastern USA (Fig. 3) suggesting a possible link between surface and UT changes, probably through fast upward transport. A significant contribution to the wintertime increase in UT ozone may thus be caused by the reduction of NO x emissions between the 1970s and 1990s over the United States (see RETRO estimates of anthropogenic NO x emissions change between 1975(Fig. 4b) (www.retro.enes.org, Pulles et al., 2007.
Over the Atlantic (ATL), a relatively similar seasonal change pattern as over NE USA is seen with largest positive changes in winter, somewhat smaller, but still mostly positive changes in MAM. No evidence for change is found in JJA and SON. The only observational records over the Atlantic region that document long-term changes in surface ozone are ship-borne measurements during 1977-2002, which show no significant trend (Lelieveld et al., 2004). Conversely, ozone measured at Mace Head, Ireland, in air masses originating from the North Atlantic, shows a significant upward trend during 1987-2003(Simmonds et al., 2004. Similar to our results for the UT, the observed increases were largest in the winter season and smallest in summer. Worldwide air traffic has grown strongly during the last few decades (e.g., Schumann, 2002, see also Fig. 4d) significantly enhancing the abundance of reactive nitrogen (NO x = NO + NO 2 ) in the main flight corridors at tropopause altitudes (e.g., Brasseur et al., 1996;Köhler et al., 1997;Lee et al., 1997;Schumann et al., 2000). Over the North Atlantic, because of high air traffic density remote from other sources, the effect of aircraft emissions is most likely expected to be measurable. Model studies suggest that NO x emissions from aircraft estimated for the early 1990s resulted in an ozone increase of 3-8 ppbv in the same region (Penner et al., 1999;Kraabøl et al., 2002;Gauss et al., 2006). The maximum aircraft effect was predicted for the summer season and the smallest in winter in the aforementioned studies, which differs from the results of this study. However, due to limited data availability in JJA and SON, our analysis yields no definite answer of the true seasonal dependency of long-term changes over NE USA and ATL. According to Kraabøl et al. (2002), the maximum effect of aircraft emissions is smaller in winter than in summer, but the summer maximum is found at polar latitudes (7-8 ppbv), while increases of similar magnitude are found in the winter and summer seasons at midlatitudes (winter: 3-4 ppbv, summer: 2-5 ppbv). The latter numbers are the same order of magnitude as long-term ozone increases found by GASP/MOZAIC for the winter and spring seasons over ATL (2-6 ppbv) (not shown), which might indicate a connection between air traffic increases and long-term UT/LS ozone in these regions and seasons.
Averaged over Europe (EUR), statistically significant increases are only seen in spring (≈10%) (Fig. 2). In all other seasons, there is no indication for a long-term change in UT ozone. However, in summer, in contrast to the regional average, Fig. 1 indicates that there are increases over large parts of continental Europe on the order of 10-20%. In the regional average, these increases are overcompensated by decreases in the most western parts of Europe, and possibly by measurements in subregions that are not displayed in Fig. 1 due to GASP data limitations.
Other information on long-term free tropospheric ozone trends over Europe is available from a number of stations, such as the high Alpine mountain site Zugspitze, Germany (and the ozonesonde stations at MOHp (Germany), Payerne (Switzerland), and Uccle (Belgium), see Sect. 3.3.1).  riods 1975-1979 and 1994-2000 (1994-2000-1975-79) (%). Sources included are power generation, industrial, residential, and commercial combustion, transport, and ships (Schultz, 2007). The positive differences in NO x emissions change over the oceans are due to a uniform global scaling factor applied to derive historical ship emissions. For the comparison of 1975-1979 and 1994-2000, the factor equals 1.1. (c) RETRO aircraft NO x emissions at 10.5 km altitude for the period 1975-1979 (10 −9 kg gridbox −1 s −1 ), and (d) ratio of aircraft emissions 1994-2000/1975-1979. The emissions inventories are based on the DLR-1992 and DLR-2015 data sets interpolated according to Penner et al. (1999) using an exponential function between 1992 and 2015 (V. Grewe, personal communication, 2003). See also Schmitt and Brunner (1997).
Zugspitze shows large increases in ozone during 1978(Oltmans et al., 2006their Fig. 4). In all months of the year, monthly climatological ozone mixing ratios of 1978-1984 vs. 1995-2004 indicate rather uniform increases of ≈10 ppbv. The differences deduced from GASP/MOZAIC are considerably smaller, they amount to 0-6 ppbv over the EUR average.
The Middle East region is largely unexplored with respect to ozone observations and information on regional long-term changes. Both GASP and MOZAIC (and data from the 1995-1996 NOXAR project, see Brunner et al., 2001) aircraft observations point to an UT seasonal cycle that is characterised by a winter minimum and a spring to summer maximum (Fig. 5) (winter: 40-50 ppbv in both periods, summer: 45-75 ppbv/65-85 ppbv in the 1970s/1990s). A summer maximum is also found in retrievals of satellite measurements (Kar et al., 2002) and in ECC ozonesonde observations from Isfahan, Iran (32.5 • N, 51.7 • E), for which long-term averages (period 1995, 1996, and 1999-2005) at 400 hPa amount to approximately 55, 70, 85, and 60 ppbv in DJF, MAM, JJA, and SON, respectively (Isfahan data at www.woudc.org). Except for higher summer ozone, sonde measurements compare well with MOZAIC and NOXAR. There is clear indication that spring and summer UT ozone values have strongly increased over the last two decades: the regional averages show increases of 30% and 35% for MAM and JJA, respectively (Fig. 2). (Note that the summer changes derived from vertically averaged UT ozone may be overestimated by a few percent, as GASP aircraft collected most data at −10 to −20 K below the tropopause, where lower mixing ratios of ∼45 ppbv prevailed, while MOZAIC also gathered data closer to the tropopause containing a larger fraction of data with higher mixing ratios). In autumn and winter, only moderate changes of <±10% are seen over the ME region.
The spring and summer increases in UT ozone are probably related to a combination of causes: First, anthropogenic surface NO x emissions have vastly increased by 80-300% over the Middle East and India (Fig. 4b). Air traffic NO x  (Brunner et al., 2001). Symbols denote arithmetic means, horizontal bars medians. Vertical bars show standard deviation, and grey vertical ranges central 90% of data. Numbers at the bottom indicate number of daily regional averages.
emissions, although representing a minor fraction compared to surface emissions, are estimated to have increased by a factor of 4 from the 1970s to the 1990s over the Middle East and South Asia (Fig. 4d). Thus, it seems plausible that enhanced NO x abundance has lead to increased photochemical ozone production. However, due to large-scale subsidence in the subtropical high pressure belt, surface air pollution may not easily reach the UT. Therefore, long-range transport from the Indian subcontinent in easterly flow of the Asian monsoon anticyclone in the south of the Middle East may have contributed significantly to the spring and summer increases (Li et al., 2001, their Fig. 3). Both lightning produced NO x and NO x from anthropogenic sources over India reaching the UT through monsoonal convection have been attributed to producing unusually high local UT ozone, which is then transported to the Middle East (Li et al., 2001).
Over India, large increases are seen in the horizontal distribution in spring and summer (Figs. 1b and c). Averaged over N IND, spring and summertime ozone increased by about 25% and 40%, respectively (Fig. 2). Long-term changes in SON and DJF tend to be positive and indicate more moderate increases than in JJA (5-10%).
The large UT ozone increase over N IND in summer leads to a change in the annual cycle: while in the 1970s, summer values are at a minimum in the seasonal cycle, this minimum has disappeared in the 1990s (SP2007, their Fig. 8). A plausible explanation for the apparent change in seasonality could be the strong increases in ozone precursor emissions over the Indian subcontinent over the last decades ( Fig. 4b and  d) and an associated increase of tropospheric ozone that has been shown from surface ozone measurements (Naja and Lal, 1996) and from ozonesonde observations at three Indian sites (Saraf and Beig, 2004). In the middle to upper troposphere during 1971-2001, annual increases of 1.4%/yr to 1.8%/yr are found at the balloon stations Trivandrum (8 • N, 76 • E) and Pune (18 • N, 73 • E), together with large positive tropospheric trends at Delhi (28 • N, 77 • E) (4-14%/yr) (Saraf and Beig, 2004, their Table 1). The first two numbers agree favourably with summer increases deduced from GASP and MOZAIC for the UT (Fig. 2) that amount to 2.1%/yr over N IND and 1.8%/yr over ME. In fact, a model study of the period 1991-2001 showed that the largest effect of surface ozone increases over India on UT ozone occurs in the monsoon season when polluted air can be exported to the free and upper troposphere in deep convection (Beig and Brasseur, 2006).
Over Southeast Asia, a region for which other long-term observational records of tropospheric ozone are missing, significant increases are seen in the UT in spring and summer ( Fig. 1b and c). Over the S CHINA region, mean increases amount to ≈10% and 20%, respectively (Fig. 2). Besides increases in anthropogenic NO x emissions, ozone increases may be related to biomass burning, which is known to be an important driver of changes in tropical ozone as large amounts of ozone precursors are released including CO, nonmethane hydrocarbons, and NO x . In fact, there are hints that Southeast Asian and Indonesian CO emissions from wildland fires (including both uncontrolled wildfires, as well as fires ignited by humans either on purpose or inadvertently) may have increased by more than a factor of 3 between the 1970s and the 1990s (Schultz et al., 2008;their Fig. 4). The same increase is also found for continental Southeast Asia alone (M. Schultz, personal communication, 2009) as deforestation rates and El Nino Southern oscillation index to estimate the burned area are used in both regions for constructing regional emisisons time series. However, the estimates must be regarded with caution due to data sparseness in the parameters to construct the emissions inventory.
Over Japan, very few aircraft data are available, especially from the GASP program. This especially applies to Northern Japan (N JP) (Fig. 2), where no long-term changes could be derived for the winter season and only highly uncertain estimates can be given for MAM. While large positive changes of 25-35% are seen in JJA and SON, it must be kept in mind that only very little data are available for the GASP period, that are strongly biased toward 1978. Thus, JJA and SON changes may not be indicative of a longer-term change. It is important to recall that N JP and S JP include data from a relatively large longitudinal range from 115 • E to 170 • E. Assuming that similar changes can be expected over Japan itself and the adjacent ocean due to eastward advection in midlatitude westerly flow, additional information can be gained for the winter season in S JP. Small increases of ≈10% are found in this way. In all other seasons, increases are found that amount to 15%, 15%, and 10% in MAM, JJA, and SON, respectively.
The differences between GASP and MOZAIC data are in qualitative agreement with Japanese ozonesonde records at  Fig. 9). Since ozone precursor emissions have not substantially increased over the last two decades over Japan, but are rapidly increasing over China ( Fig. 4b; Ohara et al., 2007), Naja and Akimoto (2004) suggested, using trajectory analysis, that Chinese NO x emissions could largely be responsible for the increased ozone levels during the 1990s over Southern Japan (Tsukuba, Kagoshima). Ozone levels over northern Japan, in contrast, were found to be dominated by air masses from Eurasia. During late spring and summer, these show an increase in LT ozone during the period 1970 to the 1980s and a slight decrease thereafter (Naja and Akimoto, 2004;their Fig. 12b). Figure 6 shows long-term changes of quasi-zonal mean LS ozone deduced from the GASP and MOZAIC data sets as a function of equivalent latitude and potential temperature distance from the tropopause for the seasons of the year. At midlatitudes in winter and spring, statistically significant decreases on the order of −15%< O 3 <0 are found above 10-20 K above the tropopause. Below, mixing of tro-pospheric air into the lowermost stratosphere occurs (Hoor et al., 2002), resulting in larger variability of ozone values which most likely destroys statistical significance. Whereas in winter, the amount of relative ozone reduction increases toward higher latitudes and altitudes, the largest decreases are seen at lower latitudes <40 • N EL in spring. Similar to DJF and MAM, decreases are also found in summer at 50 to 80 • N EL, but decreases are smaller (−10%< O 3 <0 in more than 90% of grid cells), and statistically significant changes are almost exclusively seen between 15 and 25 K and 50 to 70 • N EL. In summer, the differences between the GASP and MOZAIC multi-annual averages may be hampered by the fact that most GASP measurements were collected in 1978 (cf. Sect. 2.3). Thus, the statistical significance of decreases might only feign long-term changes in this particular season. In autumn (Fig. 6d), statistically significant decreases, amounting to −2 to −10%, are mostly confined to high altitudes above the tropopause between 55-75 • N EL. The winter and spring decreases are in qualitative agreement with the downward trend between the 1970s and 1990s due to ozone depletion by halocarbons (e.g., Solomon, 1999;Staehelin et al., 2001) and the fact that downward transport of stratospheric ozone to the lowermost stratosphere is strongest in these seasons.

Upper troposphere
For comparison of GASP and MOZAIC with ozonesonde data in the UT, the aircraft data were averaged over the EUR SONDE and East USA regions as function of potential temperature distance from the 2 PVU tropopause and compared to respective balloon data of the European (Uccle, MOHp, Payerne) and Wallops Island sounding stations, respectively, averaged over the GASP and MOZAIC observation periods. The vertical layers extend from 0 to −5 K, −5 to −10 K, −10 to −15 K, and −15 to −20 K potential temperature distance from the tropopause corresponding to approximate mean metric distances of −0.9 km, −1.8 km, −2.4 km, and −3 km, respectively. An altitude shift was introduced to the ozonesonde data to take into acount the response time of the sensors (see Appendix A). Figure 7 displays the comparison over Europe. Most aircraft and sonde data are available in the range of 0 to −10 K below the tropopause in the chosen pressure range of 330 hPa to 235 hPa. There, GASP aircraft and sonde mostly agree within the range of uncertainty, average differences largely lying in the range of ≤±10% (Fig. 7, [1975][1976][1977][1978][1979]. Further contemporary information on UV photometer vs. sonde behaviour can only be gained from the Balloon Ozone Intercomparison Campaign (BOIC, June 1983-March 1984, where BM instruments were compared with a UV photometer (and ECC sensors) (Hilsenrath et al., 1986). While no Fig. 7. GASP, MOZAIC, and balloon UT ozone (1975)(1976)(1977)(1978)(1979) over Europe at potential temperature distance from the 2 PVU tropopause. GASP and MOZAIC data have been averaged over the EUR SONDE region (Sect. 2.3). Left column: GASP and sonde vertical profiles, GASP data range within ±1σ (grey shading), number of daily means, number of years where data are available (second row of numbers), and grey triangles to indicate biases of the average to one year: if the sample contains data from greater equal (less than) three years and more than 50% (70%) of data are from one year. Lines and numbers: GASP (black), Uccle (blue), MOHp (orange), Payerne (red). Second column: Relative differences between GASP and balloon data (sondes-GASP, %). Horizontal bars: 95% confidence intervals of differences. Third column: as first column, but for MOZAIC period. Fourth column: as second column, but for MOZAIC period. Differences between aircraft and sonde data in second and fourth column have only been displayed if the number of daily averages is ≥10. First row DJF, second row MAM, third row JJA, and fourth row SON. direct comparison of BM instruments with the UV photometer used is available, Hilsenrath et al. (1986) illustrate that the MOHp BM sensor measured 10-25% less ozone than the Canadian and Wallops Island ECC instruments at 360-180 hPa. This result can be verified indirectly by comparing the GASP multi-annual means over EUR SONDE and East USA with data from the European BM stations and the ECC station at Wallops Island (see below).
For the 1994-2001 period, a much different behaviour over seasons can be identified (Fig. 7, 1994-2001 and Sondes-MOZAIC): There, all BM sondes measure significantly more ozone than MOZAIC throughout the year. In the upper three layers (0 to −15 K) where most data were gathered, the large part of differences is in the range of 5-12 ppbv at Uccle and MOHp corresponding to 5-25% relative differences depending on station and season. Somewhat larger deviations of about 10-15 ppbv are found for Payerne throughout the year (25-30% in DJF, 20-25% in the other seasons). Evaluating the tropospheric ozone time series at Payerne (Jeannet et al., 2007) may possibly explain the larger offset for this station: when accounting for and subtracting ozone associated with background current, the time series at 700 hPa showed much improved agreement with the surface ozone series at Zugspitze (Jeannet et al., 2007;their Sect. 2 .3.6, Fig. 2). However, long-term changes would not be altered when accounting for the background current, as the associacted reduction of ozone was estimated to be approximately the same in the 1970s and in the later period (Jeannet et al., 2007;their Sect. 2.3.6).
The differences between MOZAIC and BM sonde measurements are of comparable magnitude as those identified at 400 hPa between MOZAIC at Frankfurt, Germany, and MOHp (Thouret et al., 1998b;their Fig. 10 and Plate 1) (5-20 ppbv and 5-45%, respectively). Note that we have 5356 C. Schnadt Poberaj et al.: UT/LS ozone changes from GASP/MOZAIC and ozonesondes Fig. 9. As Fig. 7, but aircraft data averaged over East USA (90 • W-60 • W, 30 • N-50 • N) and Wallops Island sonde data. Lines and numbers: GASP/MOZAIC (black) and Wallops Island (red). Differences between aircraft and sonde data have only been displayed if ten or more daily averages were available. compared our results to differences at a lower altitude in Thouret et al. (1998b) since in their analysis no distinction was made between tropospheric and stratospheric air masses.
The large relative shift in sonde behaviour reported in Fig. 7 is reflected in Fig. 8, where relative differences between averaged UT ozone profiles of the 1970s and 1990s (1990s-1970s) derived from GASP/MOZAIC are compared with respective differences from European BM sondes. Between the two aircraft averages, there is no evidence for a long-term change. (Note that in SON, the differences derived from GASP/MOZAIC should be treated with reserve due to limited GASP data availability in combination with a strong bias toward the year 1978, cf. Sect. 2.3). In contrast, at all sounding stations positive changes are indicated at all levels and in all seasons. At those levels where most data are available (0 to −10 K), sonde differences amount to 10-25% and 10-30% at Uccle and MOHp, respectively, and somewhat larger amounting to 15-35% at Payerne.
At Wallops Island, relatively few data are available for building multi-annual averages of the 1970s (numbers in Fig. 9) as the early ascent data are provided in relatively low vertical resolution (annual average of 870 m in the pressure range of 330 hPa to 235 hPa). Still, despite relatively large uncertainty related to the above-indicated problem, clear positive deviations of sonde from aircraft data are found, the sonde measuring more ozone than GASP by 5-20 ppbv depending on season (15-30%, Fig. 9, 1975(15-30%, Fig. 9, -1979. Note that the indicated differences have been estimated from those levels only where at least 10 daily means of aircraft and sonde data were available for multi-annual averaging. In the MOZAIC period, on average, sondes show a slightly smaller positive bias of about 5-10 ppbv or 5-20% (Fig. 9, [1994][1995][1996][1997][1998][1999][2000][2001]. Specifically, while in DJF, differences are comparable in both periods, they are somewhat larger in the 1970s in MAM and JJA. Our results concerning the behaviour of the Wallops Island ECC sensor of the 1970s are similar to those gained at BOIC. There, the Wallops Island sonde also measured up to 20% more ozone in the troposphere than the average of all participating instruments (Hilsenrath et al., 1986). Additionally, when contrasting the GASP with BM and ECC ozonesonde data over Europe and East USA, respectively, our results agree with earlier studies stating that early ECC sensors measured more ozone than BM instruments. The order of magnitude of differences of 10-25% indicated in the literature Dütsch, 1970, 1981;Hilsenrath et al., 1986;Beekmann et al., 1994) is confirmed by this study.
For the 1990s, our results are qualitatively and quantitatively supported by two other studies: Thouret et al. (1998b) compared Wallops Island data of 1980-1993with MOZAIC profile data at New York for 1994-1995 in the free troposphere, differences mostly lied in the range between 5 and 25%. Analogous comparisons of other ECC sounding stations with MOZAIC profiles yielded similar results (Thouret Fig. 10. As Fig. 8, but aircraft differences calculated for the East USA region (90 • W-60 • W, 30 • N-50 • N) (black diamonds) and sonde differences for Wallops Island station data (red dots). Differences are only displayed if ten or more daily means are available for both aircraft and sonde data at each level. et al., 1998b; their Plate 1). In addition, in a comparative evaluation of the ECC balloon time series at Trinidad Head (California, USA) and US west coast MOZAIC airport profiles of the period 1997-2006, the sonde mean was about 6% higher than the MOZAIC average over four airports confirming the other studies (D. Parrish, personal communication, 2008).
Long-term changes between the 1970s and 1990s derived by GASP/MOZAIC for the East USA region and the Wallops Island station are displayed in Fig. 10. The GASP/MOZAIC differences clearly indicate increases in UT ozone throughout the year of up to 20% depending on season. Changes derived from the Wallops Island data are smaller by 10-20% in MAM and JJA and are comparable in DJF and SON. Note that some averages of the 1970s are strongly biased toward one year, e.g. the Wallops Island DJF means, or the GASP summer and autumn averages (Fig. 9). Thus, those averages may not be representative for the whole 1975-1979 period and the "long-term changes" at these levels may rather reflect year-to-year variability in UT ozone. The possibly slightly smaller differences between aircraft and ECC multi-annual averages in the 1990s than in the 1970s may be related to the small number of data available in the 1970s. Probably more important, however, is the change in the strength of the sensing solution, which amounted to 1.5% KI-b (potassium iodide buffer) in 1975-1979 and 1994 and was changed to 1% KI-b in 1995: Surface ozone testing of an ECC sensor Table 2. Mean relative LS ozone differences between sondes and aircraft for the GASP and MOZAIC periods (sonde-aircraft, %), and averaged long-term changes by sondes (1990s-1970s, %/decade). Averages over . "sigma" denotes standard deviations over boxes. Fraction of boxes where differences are positive are also given. in Boulder, Colorado, in 1999 and 2000 against a UV photometric surface ozone analyser showed that the 1% KI-b and 1.5% KI-b solution sondes measured about 7 and 14% higher ozone than the UV photometer (Johnson et al., 2002). These differences agree well with the different biases that we find between balloon and aircraft data in the 1970s and 1990s and may thus explain the differences seen in long-term changes. Hence, the long-term changes and trends derived from Wallops Island station data, that include both periods when the 1.5% KI-b and the 1% KI-b solution were used, may be underestimated by a few percent.

Lower stratosphere
Figures 11-13 and Table 2 show that the aircraft/sonde comparisons of the two periods are clearly different: While in 1975-1979, all BM sonde data agree with GASP in the range of uncertainty or measure slightly less ozone, sondes mea-sure significantly more ozone than MOZAIC in the 1990s (≈5-10% at Uccle, 5-15% at MOHp, and 10-20% at Payerne, cf. Table 2). Note that Uccle exchanged their BM sensor for an ECC device at the end of 1997, and that the Uccle time series are homogenised (see Appendix A). Thus, while the comparison indicates that Uccle tends to measure more ozone than MOZAIC, it cannot be traced back to whether this is due to the BM or ECC device or both. Principally, the LS comparison confirms the results from the UT (Sect. 3.3.1). Considering that a) longitudinal variability is largely reduced in the EL framework, that b) GASP LS data coverage is better than in the UT for the EUR region, and that c) the LS differences of the GASP period are consistently negative over large areas in MAM, JJA, and SON (Figs. 11-13 and Table 2), this may be a hint that the BM sondes of the 1970s even measured somewhat less ozone than GASP. From the UT comparison, the conclusion was agreement in the range of uncertainty (cf. Sect. 3.3.1). Fig. 11. Differences of LS ozone as function of potential temperature distance from the 2 PVU tropopause and equivalent latitude. Left column: Uccle-GASP (1975 (%), middle column: Uccle-MOZAIC (1994-2001) (%), and right column: long-term changes of Uccle soundings deduced from 1970s and 1990s multi-annual averages (1990s-1970s, %). Multi-anual averages have been calculated from daily means. Hatched boxes indicate where differences are statistically significant at the 95% level, grey triangles where Uccle 1970s data are biased toward one year (≥50% from one year), and pink triangles where early Uccle data are available from three years only. Differences have only been displayed where at least 5 daily means and data from at least three (six) years are available for the GASP (MOZAIC) period.
The differences in the aircraft/sonde comparisons of the 1970s and 1990s directly impact the LS long-term differences deduced from GASP/MOZAIC and sondes (Fig. 6, Figs. 11-13, and Table 2): While from GASP/MOZAIC, slightly negative long-term changes are inferred in all seasons, sonde differences, on average, are much larger and point to positive changes that vary with season (5-25% at Uccle, 10-40% at MOHp, and 15-25% at Payerne). Note that the large upper number at MOHp (Table 2) has been derived from a very small number of grid boxes just above the tropopause (Fig. 12), which may not be completely rep-resentative of the whole LS averaging region. The largest sonde long-term changes are all found in SON indicating that decadal changes are largest in autumn, which is consistent with GASP/MOZAIC (Fig. 6, Table 2).

Discussion of comparison of changes by aircraft and BM ozonesondes
Changes in the bias between BM balloon and aircraft ozone between the 1970s and 1990s result in differing long-term changes derived from the two types of instruments, the changes from aircraft being generally smaller than from 5360 C. Schnadt Poberaj et al.: UT/LS ozone changes from GASP/MOZAIC and ozonesondes ozonesondes. Since both sonde and MOZAIC data are widely being used to investigate short-and long-term ozone trends in the troposphere and UT/LS (e.g., Thouret et al., 2006;Bortz et al., 2006;Oltmans et al., 2006), it is important to document the differences. Several factors might cause potential biases, which are discussed in detail in Appendix B. Summarising, we suggest that the cause(s) for the different long-term changes by GASP/MOZAIC and European BM sensors might rather be connected to data quality problems of the ozonesondes for the following reasons: Wallops Island shows clear and similar positive deviations from the aircraft data sets both in the 1970s and 1990s (Sect. 3.3.1). The sonde behaviour is consistent in both periods considering the change in the strength of the sensing solution. As GASP performance can be expected to be the same over East USA and Europe, there is clear indication that the European BM sondes of the 1970s performed differently than the ECC instrument at Wallops Island, the European BM sondes largely agreeing with laboratory considerations (cf. Appendix B, B.). Finally, the BM sondes of the 1990s appear to perform differently than in the 1970s (cf. Appendix B, D.). Assuming the 1970s and 1990s aircraft measurements to be correct, it may be suspected that changes in pre-flight preparation and operating procedures, as well as modifications in the manufacture of ozonesondes might have influenced long-term changes and trends (SPARC, 1998). In fact, Hogrefe et al. (1998) presented evidence for discontinuities in ozonesonde time series by statistical analysis, among them long-term data from Payerne and MOHp. These breaks may significantly affect longterm trend estimates. Furthermore, gaining better understanding of the discrepancies between MOZAIC and sonde C. Schnadt Poberaj et al.: UT/LS ozone changes from GASP/MOZAIC and ozonesondes 5361 Fig. 13. As Fig. 11, but comparison of Payerne and aircraft data. performance in the 1990s is also important with respect to improved knowledge of the reliability of the individual instruments. We therefore encourage further laboratory and field intercomparisons of BM ozonesondes vs. UV photometer, as well as coordinated activities to obtain measurements from simultaneous flights by MOZAIC and other aircraft to clarify differences between the different measurement techniques.

Summary and conclusions
Differences between multi-annual averages of UT/LS ozone of the periods 1975-1979 and 1994-2001 were calculated from the data sets of the GASP and MOZAIC aircraft programs to derive long-term changes. The analysis was separately carried out for the UT and LS using ERA40 dynam-ical tropopause information, interpolated spatially and temporally to the aircraft and balloon coordinates.
Long-term differences are strongly dependent on region and can be summarised as follows: -Largest increases are found over the Middle East and India in spring and summer. The summer increase over India results in a changed seasonal cycle with the summer minimum of the 1970s having disappeared in the 1990s.
-Similarly, over Southeast Asia, spring and summer ozone increased significantly over the last two decades.
-Over Southern Japan, significant increases are found in MAM, JJA, and SON. Over Northern Japan, due to restricted sample size of the GASP data set, estimated long-term changes must be considered with care.
-Over the Northeastern United States and over the Atlantic, long-term differences are mostly positive. Largest increases are found in winter and still considerable increases are seen in spring. In summer and autumn, no statistical evidence for long-term changes is found.
-Over Europe, long-term differences are rather small. Significant positive changes are only found in spring.
-Over the Western United States, no evidence for a longterm change in UT ozone is found.
In the LS, differences between the GASP and MOZAIC multi-annual averages were calculated as function of EL and potential temperature distance from the tropopause. Statistically significant decreases are mostly found at midlatitudes in winter and spring in agreement with the downward trend between the 1970s and 1990s, which is plausible due to ozone depletion by halocarbons and the fact that downward transport of stratospheric ozone to the lowermost stratosphere is strongest in these seasons.
Long-term differences deduced from GASP/ MOZAIC were compared with respective changes derived from ozonesondes of the three European stations Uccle, MOHp, and Payerne, as well as of the US station Wallops Island. In the UT, regionally averaged profiles of aircraft data were compared to respective sounding data. In the LS, the comparison was carried out in the EL/potential temperature framework. The results of the comparison sum up as follows: The early 1970s European BM sonde data agree with GASP within the range of uncertainty (UT) or measure slightly less ozone (LS). In contrast, the more recent sensors show consistently higher ozone values than MOZAIC both in the UT and LS. The unequal behavior in the 1970s and 1990s results in differing long-term changes over Europe derived from aircraft and sondes with changes deduced from sondes being considerably larger than from GASP/MOZAIC.
The comparison of UT ozone over the eastern United States derived from GASP/MOZAIC and from the ECC station Wallops Island shows that the sonde measured more ozone than the aircraft data both in the 1970s and 1990s with indications that differences may be slightly smaller in the 1990s. A plausible cause for the reduced differences may be the reduction in the strength of the sensing solution of the sonde, which is known to lead to reduced ozone mixing ratios. Consequently, long-term changes from aircraft are within the range of uncertainty or slightly larger than deduced from the sonde. The change in sensing solution may explain why long-term changes are more positive from the aircraft than for the Wallops Island data by a few percent.
Acknowledging uncertainties due to sample size restrictions of aircraft and ozonesonde data especially in the 1970s, restricted precision of BM sondes, and methodology of our analysis, we found evidence that the sensitivity of BM sensors to ozone in the UT/LS was different in the 1970s and 1990s resulting in long-term changes that may be considerably overestimated by the sondes. This applies to all BM series, and particularly to the Payerne station. Therefore, we suggest acknowledging potential uncertainties of the Europene BM sondes when interpreting their long-term tropospheric ozone trends, e.g. in comparison with trends derived from numerical simulations. Finally, considering the common use of both MOZAIC and BM sonde data for deriving long(er)-term trends, improved understanding of the discrepancies found in the 1990s would be desirable.

Ozonesonde data
In this appendix, relevant information on processing and corrections of the ozonesonde data is given.
BM instruments were used at MOHp and Payerne throughout both periods. At Uccle, the sensor type was changed to ECC in 1997. To account for this and other changes in the procedures of handling the sensors during more than 30 years of observations, the Uccle time series, available at WOUDC, were homogenised. The homogenisation procedure is described in ftp://ftp.kmi-irm.be/dist/meteo/hugo/ publ/1999/ (o3prof.pdf) (De Backer, 1999). At the US Wallops Island station, ECC ozone sensors were flown.
The BM sonde data at MOHp and Payerne have been corrected by linear scaling with column ozone measurements, as recommended by the WMO standard procedure for BM sondes. The correction factor (CF), representing the correction applied when comparing the ozone column from integration of the recorded profile and a simultaneous column measurement using a Dobson or Brewer spectrophotometer, was used as quality check. The range of allowed CF has been chosen according to Logan (1994).
At Uccle, as a result of the homogenisation, the correction applied is different from the standard procedure. The main steps of the homogenisation will be summarised in the following as being relevant in the comparison with the other BM ozonesonde data of the 1970s (Sect. 3.3). In the homogenisation process, preliminary ozone profiles are calculated first including several corrections for e.g., altitude error, background current, etc.. However, no pump correction profile is applied. The pump correction values at every pressure level, determined after ground calibration, are adjusted such that the profile integral matches the Dobson (before 1989) or Brewer (since 1989) column (De Backer, 1999. This procedure implies that the resulting profiles do not need to be corrected by the standard CF anymore. The homogenisation covers one problem of the tropospheric and UT/LS part of the 1970s data: to obtain the preliminary profile, a correction interpreted as negative background current resulting from impurities in the sensor, was necessary; after a change in preconditioning of the sondes in 1981, it was found that there were discrepancies of 15-25% at tropospheric and UT/LS levels in the ascent and descent ratios before and after the change, i.e. the older sondes measured less ozone than the newer sondes. The correction applied in the homogenisation procedure approximately equals the mean correction by the standard CF, had it been applied (CF ≈1.2 in 1975(CF ≈1.2 in -1979. However, the pump efficiency problems, represented by the altitude dependent correction factors indicated in the WOUDC files, only lead to a correction of 2 to 3% in the UT/LS. The ECC Wallops Island ascents archived in WOUDC are Dobson normalised for the period 1970-1982, and provided non-normalised afterward (S. Oltmans, personal communication, 2006). In many of the latter cases a CF, while existing, is not indicated in the data files. In addition, during both 1975In addition, during both -1979In addition, during both and 1994In addition, during both -2001, for some ascents normalisation was not possible as Dobson total ozone measurements were not available. In this study, the (1970s) 1990s data were (re-)normalised using alternative CFs that are consistent for the whole 1970-2004 Wallops Island record (S. Oltmans, personal communication, 2006). This was achieved by using SBUV climatological profiles (McPeters et al., 1997) for the ozone residual at altitudes above 7 hPa or the top of the sounding, if it reached 30 hPa. For the 1970s, the existing normalisation was removed before applying the alternative CFs. Ascents, for which no CFs were available, were treated as having an ideal CF of 1. This is an acceptable simplification as the average CF, calculated from the SBUV data set, is very near to unity both in 1975-1979 and 1994-2001 (0.99 and 0.983, respectively). Since for a large fraction of ascents, a CF is available (94% and 79% in the 1970s and 1990s, respectively), it can be assumed that on average the CFs of the non-normalised ascents do not significantly deviate from 1 either.
When comparing aircraft UV photometer and balloon data in the UT/LS region, it is of crucial importance to take into account the response time of the sonde sensor, which results in an altitude shift of the balloon vs. the aircraft data. Ignoring the altitude correction may lead to significant errors in the estimated differences between aircraft and balloon ozone in the LS, where vertical ozone gradients are most pronounced. Since the usual practice is not to correct for this lag in response (e.g., SPARC, 1998), we introduce a correction using the following assumptions: The ozonesonde response time, defined as the time required for the sonde signal to decay by a factor 1/e of its initial value when setting ozone temporarily to zero, has been estimated to approximately 30 s . Assuming an average balloon ascent velocity of 5 ms −1 leads to an altitude correction of 150 m. Using the US standard atmosphere (1976), all sonde pressure levels p were converted to altitudes z, which were then shifted downward by the above-indicated distance (z corr =z(p)−150 m) and reconverted in analogous way to corrected pressure levels p corr . Vertical positioning of the ozone data in terms of potential temperature (Sect. 2.3) was obtained by linearly interpolating all potential temperature values θ (p) to the corrected pressure levels θ corr (p corr ). Principally, average ozone mixing ratios are increased in a given layer by the altitude shift, the impact being largest at altitudes with maximum vertical gradients. Note that by vertically shifting the profiles, they become inconsistent with sonde integrated ozone and thus with the CF. However, given the applied shift, the effect on sonde integrated ozone and hence the CF is marginal (≈2%).
De Muer and De Backer (1992) showed that their 1970s meteorological sonde (VIZ) recorded smaller geometric altitudes than a radar instrument that was used to track the balloon ascents. For instance, at 10 km altitude, the difference amounted to approximately −100 m (De Muer and De Backer, 1992, their Fig. 3). This effect was accounted for in the homogenisation procedure of the Uccle data by shifting the sonde profiles upward by the difference profile between balloon and radar. However, for consistency with the Payerne and MOHp data, for which a pressure correction was not applied, we have removed this correction by shifting the Uccle sonde data back down by −100 m. This leads to an overall shift of the Uccle 1970s data of −250 m (= ozone sensor response and removing the pressure sensor response: −150 m-100 m). It is generally assumed that the more recent meteorological sondes have no significant pressure response time (W. Steinbrecht, personal communication, 2008;De Backer, 1999). For this reason, the Uccle sonde data of the 1990s have only been corrected for the response of the ozone instrument.

Potential biases of aircraft and ozonesonde data
Both aircraft and ozonesonde data may potentially be biased most likely because of principle features of the measurement devices and/or post-processing of the raw data in the case of ozonesondes. In the following, we discuss several factors which might cause differences between aircraft and sonde measurements: A. UV photometry. The technique of UV photometry measurement of ozone is very accurate and precise and generally judged superior to chemical detection of ozone in iodine/iodide solution as done in ozonesondes. Both GASP and MOZAIC programs carried out extensive quality control checks (Sect. 2.1). Additional information on the performance of the UV analysers can be gained from quasi-simultaneous and spatially coinciding measurements by other devices. However, the only other measurements to compare the GASP data with are from the 1970s ozonesondes at the European and Wallops Island stations, for which the limited number of spatially and temporally overlapping measurements might limit clear statements. For the 1990s and MOZAIC, very good agreement was obtained by two near simultaneous flights by a MOZAIC A340 and a SwissAir B747 aircraft equipped with a UV-ozone instrument within the frame of the NOXAR (Nitrogen Oxides and Ozone along Air Routes) project on 20 December 1995, over the North Atlantic (Dias-Lalcaca et al., 1998, their Fig. 4).
B. Laboratory simulations of BM ozonesondes of the 1970s. Insights into the behaviour of the 1970s BM ozonesondes are presented in a paper by Tarasick et al. (2002): in laboratory experiments, they tested the response of their Canadian BM sondes in comparison with an ozone calibrator. They found that sonde behaviour strongly depends on pre-flight preparation procedures: Preparing the sondes according to the original procedures recommended by the Brewer-Mast company (Mueller, 1976), the instruments indicated 10-30% lower ozone than the calibrator in the troposphere. In contrast, preparations according to Claude et al. (1987) showed a much reduced or eliminated low bias. Hence, in the altitude range of our analysis (330-195 hPa), after Dobson scaling, BM sondes prepared according to the original procedures should underestimate the ozone concentration by a few to about a little more than 10% (Tarasick et al., 2002;their Fig. 7). Tarasick et al. (2002) also showed that this behaviour can explain the average CF (1.255) encountered for the 1970s Canadian BM record and the discrepancy in tropospheric measurements.
The problems studied by Tarasick et al. (2002) might be applicable to both the Uccle and Payerne stations, while MOHp already prepared their instruments according to Claude et al. (1987) (W. Steinbrecht, personal communication, 2007. In fact, comparing ascent and descent ratios before and after changing the preconditioning of the Uccle sondes in 1981, the instruments before 1981 were identified to measure too low ozone concentrations in the troposphere (De Backer, 1999). The problem was accounted for in the homogenisation procedure (cf. Appendix A). The behaviour of the Payerne instrument of the 1970s should also be similar to the Canadian device. The average CF of the ascents used in this study is 1.21, a very similar value as obtained for the Canadian sondes. In contrast, the altitude dependent response of the MOHp sensor should, according to laboratory experiments in the 1990s, be near one throughout troposphere and stratosphere ; see discussion under C.). However, systematic differences in sonde behaviour (see Sect. 3.3.1), expected to be on the order of up to 10%, of the Payerne vs. the MOHP and Uccle devices (Fig. 7) cannot be inferred from our analysis.
C. Laboratory simulations of the MOHp BM sonde of the 1990s. To test the behaviour of currently used sonde types, ozonesondes were tested in a simulation chamber against a UV photometer (Jülich ozonesonde intercomparison experiment, JOSIE 1996; Smit and Kley, 1998). During JOSIE 1996, intercomparisons between a laboratory UV photometer and different types of ozonesondes were carried out. Among others, the original BM instrument from MOHp prepared ac-cording to Claude et al. (1987) was tested. The MOHp sonde differed from the UV photometer by only −3% ±10 (8) [5]% at 0-5 km (5-10 km) [10-15 km] for a typical midlatitude profile. However, this response found in the laboratory does not agree with our analyses of Sect. 3.3.1 and 3.3.2, which showed a systematic positive offset from MOZAIC data.
D. MOZAIC vs. BM ozonesondes of the 1990s. While in our analysis (Sect. 3.3.1), we compared multi-annually and regionally averaged MOZAIC with respective ozonesonde profiles, Thouret et al. (1998b) compared six individual MOZAIC profiles at Frankfurt (50 • N, 9 • E) with sonde profiles of the MOHp station (48 • N, 11 • E) (app. 300 km distance) that had been sampled at the same time (Thouret et al., 1998b;their Fig. 18). While two cases showed remarkable agreement throughout the troposphere, in three of the other four cases the sonde measured higher ozone than MOZAIC in the free troposphere. This comparison, further ones of MOZAIC and MOHp multi-annual averages described in the same publication, and our own analysis of Sect. 3.3.1 and 3.3.2 all indicate that the BM device at MOHp measures, on average, more ozone than MOZAIC in the troposphere and UT/LS contradicting the JOSIE results. Additionally, our analysis indicates that the behaviour of the Uccle and Payerne stations is very similar to that of MOHp also showing systematic positive deviations from MOZAIC.
The differences between average ozone measured by sondes and MOZAIC could also result from principal limitations in the methodology applied in the comparison as discussed by Thouret et al. (1998b). In this study, regional averages of UT aircraft measurements are compared to average balloon profiles of stations located somewhere in this region. Potentially, spatial variations in UT ozone could either bias aircraft or sonde means. However, from the climatological distribution of UT MOZAIC ozone over Europe no significant spatial variations can be inferred in the different seasons (Fig. 1b). It is thus very unlikely that the spatial separation between the measurements can explain the observed differences. A strong argument for the robustness of the biases of BM sonde vs. MOZAIC ozone and against significant influences by the methodology is the fact that the differences remain when analysing the data in the LS in the EL system, where regional variability at a given EL should be largely suppressed.
E. On the use of the correction factor CF. One of the main corrections applied to BM measurements consists of scaling the ozonesonde profiles to an independent measurement of the ozone column carried out by a Dobson or Brewer spectrophotometer (cf. Appendix A). The scaling practice was introduced in the 1960s, because it was found that the total ozone amount directly inferred from BM sonde measurements was about 10% lower than the comparable optically observed total ozone value (Dütsch, 1966). To adjust the balloon data, the CF is multiplied to the whole profile.
The application of the CF is controversial, since the use of one constant correction value may not be appropriate for the whole profile, and because errors in the total ozone measurement may be passed on to the sonde profile. In particular, errors occurring in the tropospheric measurement are only corrected properly if there is a significant effect on the total ozone measurement and, thus, on the CF. Tropospheric ozone concentrations are low and therefore rather affected by background current. It has been recommended not to use the CF for tropospheric ozone (Beekmann et al., 1995;WMO, 1999;De Backer et al., 1998;Thouret et al., 1998b).
Assuming that the general underestimation of sonde total ozone by BM sondes can mainly be attributed to problems in the stratospheric measurement, it would in fact be more appropriate not to apply the CF for the tropospheric part of the ascents. Concerning today's BM sondes, Dobson scaling may introduce a small positive bias to tropospheric measurements increasing the ozone concentration by a few percent (Thouret et al., 1998b). Not using the CF, differences between balloon and MOZAIC UT measurements would be decreased and thereby improved by an average of 5% and 7% at MOHp and Payerne, respectively. However, the systematic offset between sonde and MOZAIC data would not be removed completely.
Not applying the CF to the 1970s sondes leads to a reduction of UT ozone by 8% and 21% at MOHp and Payerne, respectively. For MOHp, this results in reduced ozone values that are still in the range of uncertainty (≥−10%). Somewhat larger, partly statistically significant negative deviations result for Payerne with differences from GASP ranging from −18% to −3% (not shown). The low bias of the uncorrected Payerne UT data supports the hypothesis that early tropospheric ozone measurements were in fact too low (cf. C.), and that using the CF may (partly) compensate for this problem.
The effect of not considering the CF on long-term ozone changes depends on how the CF has changed over time. At MOHp, no significant change in CF occurred between the 1970s and 1990s, hence not applying the correction does not significantly alter the long-term differences (not shown). At Payerne, however, the CF decreased from 1.21 in 1975-1979 to 1.09 in 1994-2001. Thus, the reduction in differences between MOZAIC and Payerne data in the 1990s is overcompensated by increased differences in the 1970s leading to long-term differences in UT ozone that are increased by 5-15%. This changes the comparison between aircraft and sonde differences for the worse and may represent an overestimation of the long-term increase (see discussion above and C.).