Statistical analysis of water vapour and ozone in the UT / LS observed during SPURT and MOZAIC

A statistical analysis for the comparability of water (H2O) and ozone (O3) data sets sampled during the SPURT aircraft campaigns and the MOZAIC passenger aircraft flights is presented. The Kolmogoroff-Smirnoff test reveals that the distribution functions from SPURT and MOZAIC trace gases differ from each other with a confidence of 95%. A variance analysis shows a different variability character in both trace gas data sets. While the SPURT H2O data only contain atmospheric processes variable on a diurnal or synoptical timescale, MOZAIC H 2O data also reveal processes, which vary on inter-seasonal and seasonal timescales. The SPURT H 2O data set does not represent the full MOZAIC H 2O variance in the UT/LS for climatological investigations, whereas the variance of O 3 is much better represented. SPURT H 2O data are better suited in the stratosphere, where the MOZAIC RH sensor looses its sensitivity.


Introduction
The composition of the tropopause region is strongly determined by large and small scale transports of trace gases.One governing process is the exchange of air masses between the stratosphere and the troposphere.Diabatic ascent or descent like convection or stratospheric intrusions from the overworld (above 380 K isentrope) lead to a vertical exchange and rapid exchanges by quasi-isentropic transport from and to the upper troposphere across the extratropical tropopause to a horizontal exchange (Stohl et al., 2003;Holton et al., Correspondence to: A. Kunz (a.kunz@fz-juelich.de)1995).Mixing of stratospheric and tropospheric air leads to a so-called mixing layer around the tropopause (Hoor et al., 2002).These processes result in a highly variable trace gas distribution in the upper troposphere and lower stratosphere (UT/LS).The strong variability of these processes in time and space thus imply a highly variable composition of the tropopause region in different seasons and different geographical regions.Thus several airborne projects, e.g.SPURT and MOZAIC, were performed to measure the large-scale distribution of trace gases in the UT/LS.
Within the MOZAIC (Measurement of Ozone and Water Vapour by Airbus In-Service Aircraft) programme civil aircrafts are in regular service for making routinely measurements of chemical species in the atmosphere with almost global coverage.The project was initiated in 1993 with automatic in-situ H 2 O and O 3 measurements onboard of up to five long-range A340 aircraft (Marenco et al., 1998).To date at least four flights are performed each day.
The SPURT (Trace gas transport in the tropopause region) campaigns between November 2001 and July 2003 deliver the distribution of a wide range of trace gases in the UT/LS region above Europe.As the campaigns equally cover all seasons, an accurate data set with climatological character should have been obtained to study atmospheric transport and to investigate seasonal variability of trace gases in the UT/LS (Engel, 2006).
A crucial question of this paper is on the representativeness of the limited SPURT data.Are they really suited for a climatological investigation on a seasonal and annual timescale and do they represent the full atmospheric variability of trace gases in the UT/LS?To answer this question we will investigate the comparability of trace gas mixing ratios observed during the limited number of flights in SPURT with Published by Copernicus Publications on behalf of the European Geosciences Union.those of the climatological data set obtained during the frequent MOZAIC flights.A statistical analysis of H 2 O and O 3 follows to show in an objective manner the strengths and weaknesses of the two data sets.The analysis tools developed are not restricted to these particular data sets and are applicable for the comparison of different data sets, including model results, in a general sense.

Geographical and vertical distribution
The SPURT project was performed to investigate the upper troposphere (UT) and lower stratosphere (LS).From November 2001 to July 2003, eight measurement campaigns were carried-out using a Learjet 35 A with a ceiling altitude of 13 km as measurement platform.A typical campaign consisted of 2-3 consecutive mission days.The data set is based on 36 flight missions and 147 flight hours.Each season during the SPURT period is captured by two measurement campaigns in subsequent years in order to investigate the seasonality of the trace gas concentrations (e.g., Krebsbach et al., 2006;Hoor et al., 2004;Hegglin et al., 2006).A description of the SPURT campaigns, the project strategy and performance is given in Engel (2006).
Figure 1 (left) shows the geographical distribution of the SPURT flights in 1 s data points.The aircraft was based at the Hohn military base in northern Germany.Southbound flights usually used Faro in southern Portugal for refueling and northbound flights Tromsø in Norway.Around the three stations the data density is very high because of slow ascents and descents.
The geographical distribution of MOZAIC measurements between 1994 and 2005 is displayed as one minute averages of 5 s measurements in Fig. 1 (right).MOZAIC flights cover almost all continents.The Northern Hemisphere is better covered than the Southern Hemisphere, with more than 40% of MOZAIC flights in the North Atlantic flight corridor, more than 30% in Asia and around 10% of flights above Africa.Most of the measurements (90%) correspond to cruise altitudes 9-12 km (Marenco et al., 1998), lying in the troposphere in the tropics and subtropics and in the UT/LS at mid latitudes.The European region of SPURT campaigns is highlighted as black box and the measurement frequency between 2001 and 2003 in this region can be seen in the right bottom.
Figure 2 displays the vertical data coverage of SPURT and MOZAIC in Europe (see black box in Fig. 1) in 5 K potential temperature bins in reference to the tropopause (2 PVU surface).The distance of the trace gas data from tropopause (DTP) is derived with the help of potential vorticity and potential temperature, calculated from ECMWF output fields.The measurement frequency of MOZAIC in Europe (red line) peaks at a potential temperature of 330 K which corresponds to the vicinity of the tropopause.The maximum measurement frequencies of SPURT (black line) range between 335 K and 350 K, i.e. around 5 K below to 25 K above the tropopause.The average ceiling altitudes of the MOZAIC flights are lower and hence the maximum percentage of measurements appears at lower altitudes.More than 50% of MOZAIC flights and more than 75% of SPURT flights are performed in the lower stratosphere, so data should allow an investigation of trace gases in the tropopause region (e.g., Thouret et al., 2006;Law et al., 1998) and of exchange processes between the troposphere and the stratosphere.).The measuring system and its performance are reported in detail by Thouret et al. (1998).The response time is better than 4 s with a detection limit of about ±2 ppbv.The overall uncertainty is estimated to be about ±(2 ppbv+2% of the observed reading).This corresponds to ±2 ppbv for an O 3 mixing ratio of 10 ppbv, ±4 ppbv at 100 ppbv, ±6 ppbv at 200 ppbv (Thouret et al., 1998).
O 3 during SPURT was measured by UV absorption using the JOE (Jülich Ozone Experiment) instrument.The instrument is based on a Thermo Environmental Instrument ozonometer similar to that used for the MOZAIC programme.The instrument was operated with a time resolution of 10 s and has an accuracy of 5% (Mottaghy, 2001).The MOZAIC and SPURT O 3 instruments are regularly calibrated in the Jülich laboratories against the same reference instrument.

H 2 O measuring instruments
During the SPURT campaigns H 2 O mixing ratio was measured in-situ using the FISH (Fast In Situ Stratospheric Hygrometer) instrument (Zöger et al., 1999) which is based on the Lyman α photofragment fluorescence technique.The FISH instrument has a foreward facing inlet and measures total water, i.e. the sum of the gaseous phase and the condensed phase.The response time is 1 s, which allows also the detection of small-scale variations of H 2 O mixing ratios in the vicinity of the tropopause, in clouds and contrails.The instruments accuracy is approximately 6% and the detection limit is better than 0.2 ppmv.
On board of the five MOZAIC airbuses relative humidity with respect to liquid water RH is measured with compact airborne humidity sensing devices (Helten et al., 1998).The sensing element consists of a capacitive sensor (Humicap-H, Vaisala, Finland) with a hydroactive polymer film as dielectric material whose capacitance depends on the relative humidity, and a platinum resistance sensor (PT100) for direct measurement of temperature at the humidity sensor.The sensor mounted in an appropriate Rosemount housing is designed for measurement of gas-phase water which is calculated from the relative humidity measurement.Adiabatic compression leads to a temperature increase of the sampled air and thus to a reduction of the dynamic range of the sensor and sufficient time response at low static air temperatures.In the middle troposphere the overall uncertainty is within ±4% RH and around ±7% RH between 9 and 13 km.This implies a limited use of the MOZAIC H 2 O sensor in the stratosphere dominated by low RH and thus an increasing large uncertainty.The response time is around 10 s in the lower and middle troposphere and increases up to 1-3 min in the upper troposphere at 10-12 km altitude (Helten et al., 1998).After 500 operation hours the MOZAIC sensor is calibrated in the laboratory in Jülich.

Statistical analysis
Both data sets are statistically analysed in order to assess the comparability of H 2 O and O 3 data in SPURT and MOZAIC.A crucial question is whether or under which constraints the data sets with different coverage in space (region and altitude), time and with different instrument characteristics represent the same population in the atmospheric system.This includes the investigation whether the SPURT campaigns, with around eight flight missions in each season, are as representative as the MOZAIC daily flights for specific regions and whether the mixing ratios observed within the European sector during SPURT represent the seasonal trace gas variability.
The following statistical analysis is performed for MOZAIC data observed in the same geographical region where the SPURT campaigns were carried out and for the same period from November 2001 until July 2003 (black box in Fig. 1 right).The MOZAIC and SPURT data sets are split according the distance to local tropopause (2 PVU surface): upper troposphere UT (DTP<-5 K) and lower stratosphere LS (DTP>5 K).So different sampling strategies and different trace gas characteristics should be accounted for.Influences by the large trace gas gradient in the vicinity of the tropopause (-5 K<DTP<5 K) are excluded.

Probability distribution and selection of data
Figure 3 shows the probability distribution functions (PDF) of H 2 O data and Fig. 4 those of O 3 data dependent on the distance to tropopause for MOZAIC and SPURT (panels A and D respectively).The trace gas frequencies are calculated in 5 K bins relative to tropopause.MOZAIC (top) and SPURT (down) H 2 O mixing ratio related to the distance to the local tropopause in K, considered as the 2 PVU surface.H 2 O is binned in the logarithmical space between 0 and 9.6 with a bin size of 0.8, the distance to local tropopause in K bins.Left panels: PDF of original H 2 O data.The mean vertical profile (grey-black solid line) and the uncertainty of 5% RH (white dashed lines) are shown for the MOZAIC PDF.SPURT accuracy of H 2 O data is 6% of concentration (not shown).Middle panels: The distribution of original data (panels A and D) is shadowed and those of selected H 2 O data set (RH<10%, RH ice ≤100%, H 2 O<500 ppmv, p<250 hPa, see text) is colour coded.The mean PDFs are also shown as black-grey line (original data) and blue-white line (selected data).Right panels: Number of original data points per bin (blue shaded) and of selected data (pink non filled contours 0, 100, 500, 5000 data per DTP bin).The fraction of selected data relative to the original number in each DTP bin in percent is shown as yellow diamonds for all DTP bins with more than 1% data.

MOZAIC -
However, these probability distributions of H 2 O reveal some differences between SPURT and MOZAIC.A very high probability of SPURT H 2 O data lower than 10 ppmv occurs in the stratosphere more than 20 K above the tropopause (panel D).Most strikingly there is only a very low probability of H 2 O data in the respective mixing ratio bins in MOZAIC (panel A).The MOZAIC H O probability becomes largest at higher mixing ratios in the stratosphere.Further there are no SPURT H 2 O values larger than 2000 ppmv in the tro-posphere more than 45 K below the tropopause, where the MOZAIC H 2 O still contains up to 10 data points per bin (see density plots, panels C and F of Fig. 3).This is due to the measurement discrepancy with MOZAIC data sampled from the ground and SPURT data above the 400 hPa level.Hence there is a higher mean PDF (grey-black solid line) corresponding to a higher mean vertical H 2 O profile both in the troposphere and in the stratosphere in MOZAIC than the SPURT.The MOZAIC mean H 2 O profile remains nearly  3, but now for O 3 mixing ratios related to the 2 PVU tropopause.The bin size for O 3 is 0.4 in the logarithmical space between 0 and 7.6, that for the DTP is again 5 K.With a very high accuracy of 5% the original trace gas distributions do not contain any accuracy limits.The right panels show the fraction of selected data relative to the original number in each DTP bin with more than 10% data as yellow diamonds.
constant around 40 ppmv in the stratosphere more than 5 K above the tropopause, whereas the SPURT mean H 2 O profile decreases from 40 ppmv at the tropopause to mixing ratios lower than 10 ppmv around 60 K above the tropopause.Hereby, the 5% uncertainty of the MOZAIC sensor in the UT/LS must be accounted for.The uncertainty of ±5% relative humidity with respect to liquid water is shown as white dashed lines.The uncertainty range in volume mixing ratio scale is expanded in the entire stratosphere, attaining even negative values 40 K above the tropopause.The 5% RH uncertainty leads to a decreasing precision of H 2 O volume mixing ratio deeper in the stratosphere.The SPURT H 2 O data with a high relative accuracy of 6% of H 2 O concentration do not reveal this problem and the mean vertical mixing ratio also decreases in the stratosphere.A corresponding dashed white line is not shown in the SPURT PDF because of the small amount around the mean vertical profile.
The O 3 MOZAIC data set is stronger focused on low mixing ratios than the SPURT data set (see panels A and D of Fig. 4).There is a very high probability of MOZAIC O 3 data in the troposphere below −35 K, where the SPURT data do not contain any O 3 mixing ratios.In the UT/LS above −35 K the mean vertical O 3 profiles (grey-black lines) of SPURT and MOZAIC are very similar and the mixing ratio at the tropopause is around 150 ppbv in both cases.
The discrepancies between both data sets basically result from different instrumental characteristics or measurement strategies.Because of the different H 2 O measurement techniques (see Sect. 2.2.2) the H 2 O data have to be modified before a statistical comparison using the following selection criteria: -The MOZAIC Humicap sensor has a precision of 4-7% RH, i.e. low H 2 O mixing ratios are not detected and cannot be contained in the PDFs of Fig. 3. Thus the dry measurements according to RH<10% in particular in the stratosphere, where SPURT was focused on, cannot be included in the comparison due to sensitivity limitations of the MOZAIC sensor at low RH.
-The FISH instrument has a foreward facing inlet and measures total water, i.e. both the gas phase and the condensed phase H 2 O mixing ratios.The MOZAIC Humicap sensor measures relative humidity with respect to liquid water and the mixing ratios represent only the gas phase.Therefore, only data with a relative humidity with respect to ice RH ice ≤100% can be compared eliminating measurements in clouds and under supersaturation conditions.
-H 2 O mixing ratios larger than 500 ppmv are sorted out, because the FISH instrument is calibrated for mixing ratios below this limit.At larger mixing ratios the measurement cell of FISH becomes optically dense and the FISH fluorescence method is limited on in-situ measurements above a mixing ratio of 500 ppmv.To select only data representative for the UT/LS we further choose the 250 hPa pressure level as lower limit.
-In the UT/LS the MOZAIC sensor has a response time of τ ≈60 s and the FISH instrument of τ ≈1 s.A running mean with a time interval of 60 seconds is therefore applied on the SPURT data for this study.
These selection criteria are applied on the MOZAIC and SPURT H 2 O data.The third criterion with a data selection above the 250 hPa pressure level is also applied on the O 3 data in order to compensate for the tropospheric bias of the complete MOZAIC data set.
Panels B and E in Figs. 3 and 4 show the new H 2 O and O 3 PDFs of the modified data according the selection criteria (colour coded) and the original PDFs, also shown in panels A and D, as shadowed area.The mean vertical profile of the selected H 2 O data set (blue line) is shifted towards larger values in the stratosphere and towards lower values in the troposphere.As a consequence of the criterion to select data with relative humidities above RH>10%, H 2 O mixing ratios below 10 ppmv are excluded.The most probable H 2 O data in the stratosphere are now between 10 and 30 ppmv both in SPURT and MOZAIC.In the troposphere the data are removed because of the 500 ppmv, the 250 hPa and the RH ice ≤100% criteria.According to the 250 hPa criterion there is a O 3 data loss in the troposphere, most effecting the MOZAIC data set.
The normalized frequency distributions of the H 2 O (left) and O 3 mixing ratios (right) of MOZAIC (red) and SPURT (black) in Fig. 5 demonstrate an adjustment for both trace gases when the data selection is applied (solid lines=selected data; dashed lines=original data).But there are still some differences left as e.g. a high normalized H 2 O frequency in SPURT at lower mixing ratios in the troposphere.A difference in sample means and medians remains.The mean O 3 mixing ratios are larger in SPURT than in MOZAIC and vice versa for H 2 O (triangles), thus still reflecting the different vertical sampling range of both projects.The broadness of the SPURT and MOZAIC H 2 O distribution after the selection is very similar especially in the stratosphere.The number of data points (legend of Fig. 5) demonstrates a data loss of around 65% of H 2 O due to data selection both for SPURT and MOZAIC, around 12% of O 3 is lost for SPURT and 45% of O 3 data in MOZAIC.
For the following statistical analysis, the reduced data sets of H 2 O and O 3 in which differences due to the different H 2 O measurement techniques and sampling strategies are eliminated as far as possible, will be used.

Kolmogoroff-Smirnoff test
The Kolmogoroff-Smirnoff goodness-of-fit test compares two independent random samples of measured data and examines whether they stem from the same population (Brandt, 1999;Sachs and Hedderich, 2006).Compared to other goodness-of-fit tests, e.g. the χ 2 -test, the Kolmogoroff-Smirnoff test can be applied to non-normally distributed data.The test is well suited to investigate whether both random samples belong to the same population.The central tendency of the variance, the skewness and kurtosis, i.e. differences of the type of distribution and thus of the distribution functions in Fig. 5 are captured.

Mathematical description
The test statistic is the maximum observed difference of the ordinate between the two non overlapping cumulative frequency curves.Both statistical samples, i.e. the MOZAIC and SPURT data, are binned in an equal number of classes.The empirical cumulative distribution functions Fspurt and Fmozaic and their differences Fspurt − Fmozaic are calculated.The test statistic D is the maximum of the absolute value of this difference, i.e.
For large sample sizes (n spurt +n mozaic >35) the cutoff value D α can be approximated by with n spurt and n mozaic the number of elements of the two statistical samples and K α the Kolmogoroff-Smirnoff constant dependent on the error probability α.Table 1 contains the corresponding values of K α .
If the test statistic D, calculated from both samples, is greater or equal to the cutoff value D α , both distribution functions are significantly different with a selected error probability.

Test performance
The null-hypothesis H 0 "Both distribution functions of trace gases H 2 O and O 3 in MOZAIC and SPURT are the same" is tested against the alternative hypothesis H A "Both distribution functions are different from each other" with a confidence of α=95%.The larger the test statistic D in Eq. 1, the more the null-hypothesis has to be rejected.
Table 2 shows the values of the test statistic D and the corresponding cutoff values D α calculated both for data within the troposphere (DTP<-5 K) and stratosphere (DTP>5 K).
The test statistic D in Table 2 is much larger than the cutoff value D α for all cases, the null hypothesis of equal distribution functions for both the H 2 O and the O 3 mixing ratio therefore can be rejected with a confidence of α=95%.
The tests are also performed for different confidences varying between α=95% and α=99.9%(see also Table 2) with the same test results.Therefore, with high confidence the H 2 O and O 3 mixing ratios of the MOZAIC and SPURT data  A graphical display of the Kolmogoroff-Smirnoff test results gives a so-called probability network (see Fig. 6).The H 2 O and O 3 cumulative frequency functions Fspurt and Fmozaic are plotted logarithmically in this probability network for the troposphere (panels left) and the stratosphere (panels right).The corresponding cutoff value D α is plotted as confidence region for each distribution function (dotted lines).If the null hypothesis H 0 of equal distribution functions is not rejected, the frequency function of Fmozaic lies within the confidence limit of the other distribution function Fspurt and vice versa.Note the distorted ordinate according to the χ 2 -distribution function, which causes the different range of confidence limits although the Kolmogoroff-Smirnoff constant is equal in the whole mixing ratio range of the abszissa.The corresponding cutoff values and test statistics can be found in Table 2.
The test results in Table 2 are reflected well in these figures.In each case we find a region, where both cumulative frequency functions differ significantly from each other, i.e.where the difference between both functions is largest.The maximum difference in ordinate, corresponding to the test statistic D, is always located at the middle range of mixing ratios (see dashed cyan line in Fig. 6).The two tested cumulative distribution functions do not generally lie in the confidence limit of the other one, thus both statistical data samples are different from each other and do not belong to the same population.Although the O 3 cumulative distribution functions are very close to each other for each atmospheric region we still find a small area where the test statistic becomes larger than the cutoff value and thus there is a statistical difference between both distribution functions.
We find a difference between the cumulative distribution functions both for O 3 data based on the same measurement techniques and for H 2 O data using different measurement techniques.This indicates that there are other, most likely sampling or regional causes for the differences between the trace gas data in SPURT and MOZAIC.

Variance analysis
Here, the selected data samples are examined for their variability characteristics.Each SPURT campaign consisted of typically four flights, with a flight time of around four hours each.Each season is covered by eight single flights with H 2 O data.Thus these few days represent a whole season.MOZAIC, however, provides at least two flights with H 2 O data for each day.Hence the SPURT and MOZAIC data are expected to be subject to variability on different timescales.The term timescale in this context is more a matter of speech.Since the movement of the aircrafts is fast compared to the wind speed the onboard sensors encounter the spacial gradients at short timescales and the temporal gradients at long timescales.Since both aircrafts are moving with approximately the same speed the interaction of spatial and temporal gradients is comparable.The concept of a temporal statistical variance analysis is an appropriate tool to investigate trace gas variability and provides information about atmospheric and even chemical influences (Rohrer and Berresheim, 2006).

Test description
For a variance analysis the H 2 O and O 3 data sets in MOZAIC and SPURT are binned into series of time intervals of different lengths, i.e. timescales, between several minutes and years.A mean variance is calculated for each timescale.When dividing a data set of a timescale of one year into two half year data sets, a variance is calculated of the data within the half year bins.Both resulting sample variances are averaged and the mean variance for the data set about a half year results.Then the one year data set is divided into three four month data sets, the procedure is repeated and the mean variance about a three month bin is calculated.

Analysis applied on CIRRUS III flight
Before doing the variance analysis on the complete SPURT and MOZAIC data as in Sect.3.2, we introduce the analysis on the water vapour data observed during one single flight of the CIRRUS III campaign (top of Fig. 7).The motivation of the three CIRRUS campaigns between 2002 and 2006 was to investigate the formation mechanism of cirrus clouds, their radiative effects and to study the chemical or microphysical properties of the cloud particles.Both the FISH instrument and the MOZAIC sensor, already described in Sect.2, were onboard the Learjet 35 A during the last CIRRUS III campaign in November 2006.The CIRRUS III midlatitude cirrus field experiment took also place at the Hohn military base.Six flights mainly inside and outside frontal cirrus clouds were performed in the altitude range from 7-12 km between 45-70 o N.So we can perform an inflight comparison of both instruments and show the results of a variance analysis, if the data are sampled under the same spatial and temporal conditions.A good opportunity to study the importance of interaction between temporal and spatial variances on small timescales during an in-flight comparison.The selection criteria are also applied on this data set and for the resulting data (grey shaded area in Fig. 7   (Fig. 7, down).A similar increasing variance on a timescale of 3.5 h demonstrates that both instruments detected the same atmospheric processes and that there is no discrepance due to the unequal measurement instruments left.ten-day timescale and on an interseasonal timescale between 10 and 90 days is observed.On both timescales the variance enhancement is not as sharp as on the diurnal timescale.At least there is an extreme increase of variance of H 2 O data on the 90 to 300 days timescale, representing a seasonal variability of H 2 O mixing ratio in MOZAIC.

Analysis on MOZAIC and SPURT flights
The tropospheric H 2 O variance in SPURT (black line) coincides with that of MOZAIC on a timescale of 0.15 days, i.e. around four hours.This variance is not only representing the temporal but also the spatial variance.A typical duration of a SPURT flight and those of the MOZAIC flight within Europe was around four hours.The aircrafts velocity of both projects is nearly the same and both measurement systems are comparable on a short timescale of some hours as shown in Fig. 7.There is still a good agreement on a timescale of 1 day, but on longer timescales both variances diverge more and more resulting in a much lower variance of H 2 O in SPURT than in MOZAIC.An increasing variance of SPURT H 2 O can be observed on a three day timescale, the typical timescale of the mission days during each aircraft campaign.
On longer periods till 90 days the variance remains approximately constant, fluctuating around a statistical mean on a three to ten day timescale.This fluctuation reduces on longer timescales.On a seasonal timescale we find again a variance of SPURT H 2 O data.
When dividing the SPURT data set into different time series of non-regular timescales, most of the bins do not contain measurement data.On an inter-seasonal timescale there are SPURT data available on two or three consecutive days.As a consequence when calculating the variance on a 100 day timescale the variance will remain constant until reaching the prescribed bin, which contains the measurement data.This bin includes a timescale of one or two days.As consequence we do not find any H 2 O variability on an inter-seasonal timescale in SPURT.The variance on a seasonal timescale bases on single flights on two or three consecutive days each season during the two years.
The H 2 O variances decrease in the stratosphere (bottom panels in Fig. 8), representing the smaller H 2 O variability in the upper atmosphere.The difference of the variance between MOZAIC and SPURT reduces in the stratosphere, but a discrepance remains.
For SPURT, the stratospheric tracer O 3 reveals an enhancing variance on a ten-day timescale as for MOZAIC (see Fig. 8 right panels).There is no enhancement of SPURT O 3 variance on an interseasonal timescale till 90 days, but also for MOZAIC the O 3 variance increases only marginally.On a seasonal timescale till 300 days there is an increasing variance both for SPURT and MOZAIC O 3 .Compared to the troposphere the O 3 variance increases in the stratosphere.The slope of the O 3 variance of SPURT is similar to that of MOZAIC and there is no considerable difference between SPURT and MOZAIC O 3 variance as observed for the tropospheric tracer H 2 O.

Discussion
The data selection in Sect.3.1 is essential to achieve a sufficient agreement of the frequency distribution functions for both trace gases and projects (see Fig. 5) with some differences left to allow for a statistical comparison of both data sets.
The Kolmogoroff-Smirnoff test reveals a statistical difference between the respective H 2 O and O 3 mixing ratios observed during both projects.The H 2 O cumulative distribution function for MOZAIC is larger than that for SPURT both in the UT and LS, and vice versa for O 3 .There are still different sample means with higher SPURT H 2 O means and lower O 3 means than in MOZAIC.Especially in the stratosphere this must be due to the different campaign performance, with the Learjet in SPURT flying deeper into the stratosphere and thus sampling a higher O 3 and lower H 2 O mixing ratio to average (Fig. 2).
The causes for the statistical difference in the H 2 O and O 3 data sets become apparent by a variance analysis (Fig. 8).The H 2 O data observed during the SPURT campaigns contain atmospheric processes, which take place on a diurnal timescale.There is a fluctuating variance between several minutes and two to three days.
The SPURT data set does not contain information about any processes on longer inter-seasonal timescales, but on a seasonal timescale between 90 and 300 days.Thus SPURT contains on the one hand processes playing a role on the typical campaign timescale (one till three days).Further the seasonal variability is based on the equally time-spaced performed campaigns, each season is covered by two campaigns.Thus the trace gas variability within a season (10 till 90 days) is not included which is about 50% of the total variance of H 2 O.
The MOZAIC H 2 O measurements are influenced by synoptic scale processes on a ten-day timescale and by processes on an inter-seasonal timescale.The variance enhancing on a ten-day timescale represents a variability which is typical for synoptic weather systems influencing the air mass composition in a specific region as low or high pressure systems.There is further a variance between 10 and 90 days, representing processes varying on an inter-seasonal timescale up to three months and a variability on a seasonal timescale.Contrary to SPURT the MOZAIC data set thus gives information about processes on each timescale.
Especially MOZAIC contains information about processes which are representative for the different seasons.The SPURT H 2 O variance is not representative for the seasonal timescale and rather gives an instantaneous picture of the atmosphere on the single flight days.SPURT is rather dominated by short scale fluctuating processes.These different processes in both data sets are the reason for differences in the frequency distribution functions (Fig. 5).
On long timescales the H 2 O variances of MOZAIC and SPURT differ more and more due to the different measurement frequency.The difference is largest in the troposphere, the full atmospheric H 2 O variance in the UT is not captured by the SPURT campaigns.Large scale atmospheric processes and turbulent systems playing a role in the UT on a longer than diurnal timescale and influencing the variability of the tropospheric tracer H 2 O are not contained in the SPURT data and account for this difference.The H 2 O difference in variance lessens in the stratosphere, but still remains.
The stratospheric tracer O 3 does not reveal the differences in variance in SPURT and MOZAIC as observed for H 2 O. SPURT O 3 data represent the atmospheric processes influencing the O 3 distribution in the UT/LS on each timescale despite the inter-seasonal timescale between 10 and 90 days as expected.But the full atmospheric O 3 variance as MOZAIC shows is achieved in the UT/LS on every timescale thus demonstrating that contrary to the tropospheric tracer H 2 O the amount of SPURT data is sufficient to represent the full O 3 variability even on seasonal timescales.This demonstrates the different variability behaviour of the stratospheric tracer O 3 independent of short scale fluctuating processes and acting on longer timescales.
The variance analysis is further performed for different subsamples of MOZAIC data (Fig. 9).The variance of the full MOZAIC H 2 O data between November 2001 and July 2003 (red line) is compared with that of the MOZAIC data on the single Spurt mission days (cyan line), which has a very similar shape as that of SPURT H 2 O (black line).The difference to the variance of the full MOZAIC data (red) reduces marginally on the timescale between 40 to 300 days in the troposphere and between 40 to 150 days in the stratosphere, if there would be one campaign each month (dasheddotted line).To capture the full atmospheric H 2 O variance as MOZAIC shows there have to be each fourth day measurement flights in the troposphere (dotted line), while in the stratosphere measurements each sixth day are sufficient especially on inter-seasonal and seasonal timescales (dashed line).That means that the Learjet would have to fly on around eight days per month in the troposphere and around four days per month in the stratosphere to capture the full climatological variability of MOZAIC H 2 O.

Conclusions
The statistical analysis shows that the SPURT data set, despite its much larger temporal and spatial coverage as compared to other campaigns with research aircraft, does not represent the full variability of atmospheric H 2 O in the tropopause region and can only be used for limited climatological investigations.The single flights of SPURT cannot replace the large number of MOZAIC flights when analysing the H 2 O distribution in a climatological manner.The SPURT observations rather give an instantaneous picture of one day variability especially of the upper tropospheric H 2 O mixing ratio observed during the limited number of flight hours of the single flights.Information about large scale processes varying on a seasonal timescale are less representative, as a variance analysis reveals.For O 3 the number of SPURT flights is almost sufficient.SPURT delivers the atmospheric variability of O 3 on each timescale except of the interseasonal one, which however is weak as the MOZAIC data show.SPURT O 3 can therefore be used even for climatological investigations.The MOZAIC trace gas data are not limited in the variance characteristics.These data represent atmospheric processes varying on longer timescales like synoptical weather systems.They are ideal for seasonal and annual investigations of H 2 O and O 3 mixing ratios.
However, the statistical comparison reveals the known limitation of the MOZAIC RH sensor in the LS.Small scale fluctuations in the UT/LS cannot be observed by this capacitive sensor, while the FISH instrument in SPURT is well suited for studies with attention to fast processes in the UT/LS, as mixing and transport processes.
We have introduced a convenient statistical procedure to compare trace gas data sets of different projects even if they do not coincide in space and time.It would be interesting to adapt these tests on other observational data sets.The tests are further suited for an evaluation and comparison with results from atmospheric models as the Chemical Lagrangian Model of the Stratosphere (CLaMS) (McKenna et al., 2002) and MOZART, the Model of Ozone and Related Chemical Tracers (ECHAM5-MOZ).

Fig. 1 .
Fig. 1.Geographical distribution of flights during SPURT (left) and MOZAIC (right).The frequency of 1 Hz (SPURT) or 1-min-averaged (MOZAIC) data points in each geographical 1 • lat×1 • lon bin is colour-coded.The extension of SPURT flights is marked as black box in the MOZAIC plot and additionally the frequency of MOZAIC flights in this European sector during 2001 and 2003 can be seen down right.

Fig. 2 .
Fig. 2. Vertical distribution of percentage of data points dependent on potential temperature in distance to tropopause (DTP) during SPURT (black line) and MOZAIC in the European region (red line).Averages in 5 K bins are shown in reference to the tropopause (PV=2 PVU, DTP=0 K).The legend contains the percentage of data points in the stratosphere (S) and troposphere (T).

Fig. 3 .
Fig. 3. Probability distribution functions ofMOZAIC (top)  and SPURT (down) H 2 O mixing ratio related to the distance to the local tropopause in K, considered as the 2 PVU surface.H 2 O is binned in the logarithmical space between 0 and 9.6 with a bin size of 0.8, the distance to local tropopause in K bins.Left panels: PDF of original H 2 O data.The mean vertical profile (grey-black solid line) and the uncertainty of 5% RH (white dashed lines) are shown for the MOZAIC PDF.SPURT accuracy of H 2 O data is 6% of concentration (not shown).Middle panels: The distribution of original data (panels A and D) is shadowed and those of selected H 2 O data set (RH<10%, RH ice ≤100%, H 2 O<500 ppmv, p<250 hPa, see text) is colour coded.The mean PDFs are also shown as black-grey line (original data) and blue-white line (selected data).Right panels: Number of original data points per bin (blue shaded) and of selected data (pink non filled contours 0, 100, 500, 5000 data per DTP bin).The fraction of selected data relative to the original number in each DTP bin in percent is shown as yellow diamonds for all DTP bins with more than 1% data.

Fig. 4 .
Fig.4.Probability distribution functions of MOZAIC (top) and SPURT (down) similar to Fig.3, but now for O 3 mixing ratios related to the 2 PVU tropopause.The bin size for O 3 is 0.4 in the logarithmical space between 0 and 7.6, that for the DTP is again 5 K.With a very high accuracy of 5% the original trace gas distributions do not contain any accuracy limits.The right panels show the fraction of selected data relative to the original number in each DTP bin with more than 10% data as yellow diamonds.

Fig. 5 .
Fig. 5. Frequency distributions of the H 2 O (left) and O 3 mixing ratio in the troposphere (DTP<-5 K) and stratosphere (DTP>5 K), normalized by dividing the single bin frequencies in percent by the total number of data points (see legend).The frequency distributions of the data selected by the instrument criteria (see text) are represented by solid lines, those of unselected original data by dashed lines.H 2 O is binned in 5 ppmv and O 3 in 10 ppbv.The means of the selected data MOZAIC and SPURT are marked by triangle symbols, the medians by circle symbols.In case of unselected data they are beyond the range of the ordinate.

Fig. 7 .
Fig. 7. Top: In-flight comparison of FISH (black) and Mozaic sensor (red) H 2 O mixing ratio during one flight mission of the CIRRUS III campaign.The 60 s running mean of the FISH H 2 O mixing ratio is highlighted in cyan and the saturation H 2 O mixing ratio in pink.The part of the flight, which is performed above the 250 hPa pressure level is bounded by the green line.After all selection criteria are applied, data above the grey shaded area are used for the variance analysis.Down: Variance analysis of the FISH (black) and Mozaic sensor (red) H 2 O mixing ratio during the CIRRUS III flight.

Figure 8 Fig. 8 .
Figure8shows the variance analysis of MOZAIC and SPURT H 2 O (left) and O 3 data (right) for the troposphere and stratosphere.The variance of H 2 O in MOZAIC increases from short to long timescales within the troposphere (red line top of Fig.8).There are four consecutive timescale regions, representing a different strength and change of atmospheric H 2 O variability.An increasing variance on an one hour to one or two days, representing the H 2 O variability on a diurnal timescale.Further an enhancement on a typical synoptical

Fig. 9 .
Fig. 9. Variance analysis of different H 2 O subsamples from the MOZAIC data set in the troposphere (top) and stratosphere (bottom).The H 2 O variances of the SPURT (black) and full MOZAIC data (red) from Fig. 8 are additionally shown with the variance of MOZAIC data sampled on the single SPURT flight days (cyan).Further variances are calculated corresponding to one campaign per month (dashed-dotted), i.e. flights on two consecutive days each month between November 2001 and July 2003.Four flight days per month (dashed), i.e. flights on every sixth day.Eight flight days according to flights each fourth day (dotted).

Table 1 .
Selected constants for the Kolmogoroff-Smirnoff test.

Table 2 .
Kolmogoroff-Smirnoff test statistics D and cutoff values D α (rounded for four decimal places) for two different confidences α=95% top) the variance analysis reveals a really good agreement between the H 2 O variances as observed by FISH (black) and Mozaic H 2 O (red) sensor