A new global real-time Lagrangian diagnostic system for stratosphere-troposphere exchange : evaluation during a balloon sonde campaign in eastern Canada

A new global real-time Lagrangian diagnostic system for stratosphere-troposphere exchange (STE) developed for Environment Canada (EC) has been delivering daily archived data since July 2010. The STE calculations are performed following the Lagrangian approach proposed inBourqui (2006) using medium-range, highresolution operational global weather forecasts. Following every weather forecast, trajectories are started from a dense three-dimensional grid covering the globe, and are calculated forward in time for six days of the forecast. All trajectories crossing either the dynamical tropopause ( ±2 PVU) or the 380 K isentrope and having a residence time greater than 12 h are archived, and also used to calculate several diagnostics. This system provides daily global STE forecasts that can be used to guide field campaigns, among other applications. The archived data set offers unique high-resolution information on transport across the tropopause for both extra-tropical hemispheres and the tropics. This will be useful for improving our understanding of STE globally, and as a reference for the evaluation of lower-resolution models. This new data set is evaluated here against measurements taken during a balloon sonde campaign with daily launches from three stations in eastern Canada (Montreal, Egbert, and Walsingham) for the period 12 July to 4 August 2010. The campaign found an unexpectedly high number of observed stratospheric intrusions: 79 % (38 %) of the profiles appear to show the presence of stratospheric air below than 500 hPa (700 hPa). An objective identification algorithm developed for this study is used to identify layers in the balloon-sonde profiles affected by stratospheric air and to evaluate the Lagrangian STE forecasts. We find that the predictive skill for the overall intrusion depth is very good for intrusions penetrating down to 300 and 500 hPa, while it becomes negligible for intrusions penetrating below 700 hPa. Nevertheless, the statistical representation of these deep intrusions is reasonable, with an avrage bias of 24 %. Evaluation of the skill at representing the etailed structures of the stratospheric intrusions shows good predictive skill down to 500 hPa, reduced predictive skill between 500 and 700 hPa, and none below. A significant low statistical bias of about 30 % is found in the layer between 500 to 700 hPa. However, analysis of missed events at one site, Montreal, shows that 70 % of them coincide with candidate clusters of trajectories that pass through Montreal, but that are too dispersed to be detected in the close neighbourhood of the station. Within the limits of this study, this allows us to expect a negligible bias throughout the troposphere in the spatially averaged STE frequency derived from this data set, for example in climatological maps of STE mass fluxes. This first evaluation is limited to eastern Canada in one summer month with a high frequency of stratospheric intrusions, and further work is needed to evaluate this STE data set in other months and locations. Published by Copernicus Publications on behalf of the European Geosciences Union. 2662 M. S. Bourqui et al.: Evaluation of new STE diagnostic system


Introduction
Stratosphere-Troposphere Exchange (STE) is known to control the chemical composition of the lower stratosphere and the upper troposphere to a great extent.Many previous studies have investigated the mechanisms leading to extratropical STE in the Northern Hemisphere and have shown that STE typically occurs during the breaking of baroclinic waves at synoptic-to meso-scales and involves a variety of processes such as those associated with clouds, radiative relaxation, gravity wave breaking and wind shear turbulence (Shapiro, 1978;Langford et al., 1996;Zierl and Wirth, 1997;Stohl et al., 2003;Gray, 2003;Mullendore et al., 2005;Bourqui, 2006).This complexity is unimportant when considering only the global net flux of mass across the tropopause (Holton et al., 1995;Appenzeller et al., 1996).However, because the gradients in chemical constituents across the tropopause are large, it is the separate stratosphere-to-troposphere (S→T) and troposphere-tostratosphere (T→S) mass fluxes that control the transport of chemical species across the tropopause, not the net mass flux.An accurate estimate of separate S→T and T→S mass fluxes requires to resolve the complex processes leading to STE with their underlying scales.Bourqui (2006) suggests that consideration of hourly meteorological fields with horizontal resolution 0.5 × 0.5 • is necessary to resolve the most important contributions from these processes on STE.Furthermore, the nature of the separate S→T and T→S mass fluxes is such that they become infinite at infinitely small scales (Hall and Holzer, 2003).In a finite-grid representation, these fluxes are essentially determined by the smallest resolved scales and therefore do not provide relevant information on STE.This ill-defined character is probably responsible for a good part of the large range of estimates of separated STE fluxes found in the literature (e.g., Hoerling et al., 1993;Gettelman and Sobel, 2000;Sprenger and Wernli, 2003).To render the fluxes well-defined, it is necessary to include an explicit scale constraint such as the residence time (Wernli and Bourqui, 2002;Hall and Holzer, 2003;Bourqui, 2006).In such a case, the parcel is required to reside on either side of the tropopause for a time interval larger than a given threshold.Wernli and Bourqui (2002) and Sprenger and Wernli (2003), using European Center for Medium-Range Weather Forecasts (ECMWF) analyses and ERA-15 reanalyses, respectively, provided such S→T and T→S mass flux estimates for the extra-tropical Northern Hemisphere using a Lagrangian approach with a residence time criterion.However, the absence of independent hemispheric estimates of separated STE fluxes in the literature makes it difficult to evaluate the uncertainties in their results.Wernli and Bourqui (2002) and Sprenger and Wernli (2003) suggested that S→T mass fluxes dominantly occur in regions of storm tracks in the winter and above continents in the summer.They also suggested that rapid transport can bring stratospheric air to the lower troposphere within a time scale of several days, and that in the northern extra-tropics, the lower tropospheric destinations of such rapid transport are concentrated on both sides of the north American continent.Since then, a few studies have been successful at modelling individual events (Cooper et al., 2005;Gerasopoulos et al., 2006).More recently Bourqui and Trépanier (2010) found an intense activity of such rapid transport, confirmed by ozone sonde measurements, during a summer month over North America.The mass flux associated with this deep S→T transport activity was found to be about one order of magnitude larger than the 15-yr climatological estimate from Sprenger and Wernli (2003).A similarly high contribution of stratospheric ozone in the lower troposphere was found by Lefohn et al. (2011) on average during the years 2004-2006.Estimates of cross-tropopause fluxes of ozone from chemistry transport models (CTMs) were summarised in Stevenson et al. (2006) with a focus on the tropospheric ozone budget.The CTMs had varying representations of stratospheric ozone, such as interactive stratospheric chemistry, specified ozone climatologies, and specified ozone vertical fluxes.Their horizontal resolutions were of the order of 2 × 2 • or coarser.The stratospheric source of ozone was found on average to contribute at the same level as the net photochemistry, but the model-to-model variability was very large.S→T fluxes were predicted to increase by an average of 8 % by 2030 under a climate change scenario (their S5-S2 scenario).Hegglin and Shepherd (2009) used a lowresolution Climate-Chemistry Model (CCM) and estimated that S→T fluxes of ozone may increase by 23 % between 1965 and 2095 as a result of climate change.Terao et al. (2008) used a CTM with similar resolution and interactive stratospheric chemistry, and estimated using correlations in the stratospheric and tropospheric ozone variability that the stratosphere contributes to 40 % and 30 % of the 500 hPa ozone budget in high and mid-latitudes, respectively.Using ozonesonde data, Tarasick et al. (2005) showed that both trends and interannual variability of tropospheric ozone values over Canada are highly correlated with lower stratospheric ozone amounts.This suggests that STE may be an important factor in determining tropospheric ozone trends.Similar results were found at European sites (Ordóñez et al., 2007).However, observations of ozone from balloon sondes reflect complex, inhomogeneous trends over the past three decades (Oltmans et al., 2006).Knowledge of STE is still too incomplete to disentangle the relative contributions to these trends from in-situ photochemistry and from the stratosphere.
A global, high-resolution climatology of separate S→T and T→S mass fluxes is clearly needed to help evaluate the accuracy of STE fluxes predicted in CTMs and CCMs and to improve our knowledge of a central contributor to the chemical composition of the troposphere and lower stratosphere and its variability on all time scales.
Global weather forecasts offer an interesting, quite unexplored avenue for developing global Lagrangian estimates of STE: (i) they are free of the non-physical corrections introduced by data assimilation in (re-)analyses; (ii) they offer higher frequency data than the typical 6-hourly (re-)analysis data; (iii) their resolution is now high enough for an accurate representation of the processes responsible for STE; (iv) they allow the real-time forecasting of STE; and (v) they allow evaluation of STE forecasts against observations, something that is not generally possible when STE is estimated using re-analysis data sets.Their disadvantage is that they introduce an error related to the divergence of the global forecasts from the true atmospheric evolution, which increases with the length of the forecast.This error is however not devoid of interest, as it provides a Lagrangian perspective on forecast errors.Trickl et al. (2010) used routine trajectory calculations based on global weather forecasts performed at ETH Zürich and covering the Atlantic Ocean/Western European sector and showed a satisfactory consistency with observations from ozone lidar over the period 2001-2005.Here, we introduce the first global real-time Lagrangian STE data set based on global weather forecasts.These data have been calculated daily since July 2010 at Environment Canada (EC) following the methodology introduced in Bourqui (2006).They consist of global, five-day STE forecasts calculated daily using the 10-day global weather forecast initiated at 00:00 UTC.The five-day STE forecasts are based on six-day forward trajectories started at 00:00 UTC + I , I = 0, 24, 48, 72, 96 h, respectively, and selected as follows: they must cross either the ±2 PVU dynamic tropopause or the 380 K isentrope with a residence time of 12 h within the time window [12 + I ; 36 + I [ hours (by convention, the time is given relatively to the start of the weather forecast at 00:00 UTC).Including trajectories crossing either the ±2 PVU tropopause or the 380 K isentrope allows consideration of extra-tropical and tropical STE, respectively.In addition, this data set offers a diagnostic of the lower part of the stratospheric residual circulation in the extra-tropics.The calculation of trajectories and necessary dynamical fields is made using the highest-available temporal resolution, namely hourly, native forecast data, and the spatial density of initial trajectory points is taken to be of the same order as the spatial resolution of the meteorological data.The complete set of selected trajectories is archived daily and several diagnostics are produced.
The goal of this paper is to introduce and provide a first evaluation of this new Lagrangian STE data set, against observations from a one-month balloon sonde campaign in summer 2010 involving three launch sites located in eastern Canada: Montreal (Quebec), Egbert (Ontario), and Walsingham (Ontario).Meteorological conditions prevailing during the campaign are similar to those described in Bourqui and Trépanier (2010), with the presence of baroclinic waves over Canada moving eastwards.We focus on the transport of stratospheric air into the troposphere, or in other words, on stratospheric intrusions.We use the first day forecast (i.e.I = 0 h), since it is expected to be associated with the smallest possible weather forecast errors.This evaluation covers only one summer season and is restrained spatially.Since the skill of this STE data set may vary in space and time, the characterisation of errors made here may not be automatically generalised to other seasons and regions.Nevertheless, this is a first step towards understanding the capabilities and limitations of this new data set.
The next section introduces in detail the calculation of the Lagrangian STE data set and its diagnostics.Section 3 describes the balloon sonde campaign and provides an analysis of the observations and the detection of stratospheric intrusions.The evaluation of the Lagrangian STE data set is then discussed in Sect.4, followed by an analysis of errors and missed intrusions in Sect. 5 and conclusions in Sect.6.

Description of the new Lagrangian
stratosphere-troposphere exchange data set The approach proposed by Bourqui (2006) is applied separately to each EC 10-day 00:00 UTC operational global weather forecast.These global weather forecasts are calculated at EC's Canadian Meteorological Centre using the Global Environmental Multi-scale (GEM) numerical weather prediction model (Côté et al., 1998a,b) and provide hourly meteorological fields with horizontal grid spacing of 0.3 × 0.3 • , and 80 hybrid vertical levels up to 0.1 hPa.
Trajectories are then computed using an adapted version of the LAGRANTO Lagrangian trajectory model (Wernli and Davies, 1997), which is based upon an iterative Euler scheme.The resulting data set is archived in the NetCDF (Network Common Data Form, http://www.unidata.ucar.edu/software/netcdf) format and represents about 4 Gb per day of selected trajectory data.A first set of three-dimensional forward trajectories is started at 00:00 UTC for up to six days from a 15 × 10 6point three-dimensional grid covering the entire globe with even horizontal spacing of 55 km (distance equivalent to 0.5 • latitude), and 5 hPa vertically between 600 and 10 hPa.A selection is performed during the trajectory calculation in order to keep calculating only those trajectories that cross either the ±2 PVU or the 380 K iso-surfaces during the period [12, 36[ h, with a residence time of 12 h.The combination of the [12, 36[ h periods from the successive forecasts forms a temporal grid with a 24 h resolution.The residence time criterion requires the trajectories to stay for at least 12 h in the stratosphere and in the troposphere, just before/after the exchange time.This criterion efficiently avoids the selection of trajectories that oscillate around the tropopause due to numerical noise (Bourqui, 2006), and at the same time it eliminates the singularity problem of one-way STE fluxes (Hall and Holzer, 2003).This selection of trajectories reduces the number of trajectories that need to be considered www.atmos-chem-phys.net/12/2661/2012/by a factor of 100, from around 15 million to 150 000.The three-dimensional positions of the selected trajectories are archived every hour together with potential vorticity and potential temperature values along each trajectory.Note that these trajectories include both S→T and T→S events.This procedure is then repeated for the same weather forecast for new sets of trajectories started at 00:00 UTC + 24,48,72,and 96 h,representing STE in the periods [36,60[,[60,84[,[84,108[,and [108,132[ h,respectively.These additional trajectories are also included in the data set, as they offer an STE forecasting capability over the next 6 days, and they allow estimation of the degradation of STE estimates along the six first days of the meteorological forecast.In addition, trajectories crossing either ±2 PVU or 380 K in the last time window are extended backward in time to 00:00 UTC.This allows the analysis of (rapid) upward transport using six-day trajectories as well.
Two principal diagnostics are then calculated.Firstly, geographical maps of the density of either stratosphere-totroposphere (S→T) or troposphere-to-stratosphere (T→S) exchanges are produced globally for the ±2 PVU and 380 K iso-surfaces (not shown).Secondly, the S→T trajectories that reach the lower troposphere (700 hPa) are organised as clusters following Bourqui and Trépanier (2010).Two trajectories are considered to belong to the same cluster if their three-dimensional Euclidean separation distance, averaged over the time period when both are located below the tropopause and above the 700 hPa level, is smaller than 222 km (equivalent to 2 • latitude).Note that the calculation of the Euclidean distance is modified with a factor of 100 applied to the vertical separation as a simple way to account for the aspect ratio of the atmosphere.For each cluster, the threedimensional position of the center of mass and the mean radius are calculated along the time axis.
In this first evaluation paper, we use exclusively the trajectories crossing 2 PVU in the S→T direction since only those are relevant for the comparison with the stratospheric intrusions found in the balloon sonde observations.Moreover, we restrict the evaluation to the most accurate ones, namely the trajectories started at 00:00 UTC and representing exchange events within the time window [12 + I , 36 + I [ hours.

Balloon sonde observations and analysis
Balloon sondes were launched daily at 13:00 local daylight savings time (17:00 UTC) from 12 July to 4 August 2010 from three sites in eastern Canada: Montreal (Quebec: 45.3 • N, 73.4 • W), Egbert (Ontario: 44.2 • N, 79.8 • W), and Walsingham (Ontario: 42.6 • N, 80.6 • W).Egbert is situated about 500 km southwest of Montreal (and about 100 km north of Toronto) and Walsingham is situated about 600 km southwest of Montreal (and about 100 km southwest of Toronto).The balloons were equipped with a radiosonde and an ECC ozonesonde (Komhyr, 1969) to measure verti-cal profiles of ozone partial pressure, relative humidity (RH), temperature, pressure, and wind speed and direction.The accuracy of ozone partial pressure and relative humidity measurements is expected to be better than 5 %.The vertical distance between successive measurements in the profiles is 0.3 hPa on average and 1.8 hPa maximum.The number of successful balloon-sonde launches over the 24 days of the campaign was 22 from Montreal, 17 from Egbert and 17 from Walsingham (see Table 1).

Temporal evolution of observations of ozone and relative humidity
Figure 1 shows the daily profiles of RH and ozone mixing ratio as a function of pressure, averaged in 10 hPa vertical bins, at Montreal for the entire campaign.The black solid line denotes the thermal tropopause estimated from the measured temperature profiles using the WMO definition (WMO, 1986).The RH profiles show the presence of dry air in the troposphere over Montreal persisting throughout the entire campaign, with RH values typically lower than 30 %.The bottom of this dry air region undulates between 600 and 800 hPa.The ozone molar mixing ratio profiles show a reverse pattern, with relatively elevated ozone concentrations in the regions of dry air.The anticorrelation between ozone and relative humidity is even more striking for individual profiles (not shown).Such anomalously dry air located below the tropopause with a marked anticorrelation between RH and ozone is the typical signature of the presence of air of stratospheric origin in the troposphere.It is important to note, however, that layers of anomalously dry and ozone-rich air identified within single balloon-sonde profiles may not always have their origin above the 2 PVU tropopause.They can have their origin above any level in the upper troposphere/lower stratosphere region with sufficiently low humidity and high ozone mixing ratios.Since the 2 PVU tropopause does not coincide with a unique set of humidity and ozone thresholds, it is impossible to remove this uncertainty here.In addition, the summer season of this campaign is associated with enhanced in-situ formation of ozone in the planetary boundary layer, which could be transported upward, although in this case one would expect the ozone-rich air to show higher humidity.With these caveats, Fig. 1 suggests that stratospheric intrusions reaching the lower troposphere were present above Montreal for the full length of the campaign, except on 21 July and 27-28 July.The entire period can be separated into three large deep intrusion events, each lasting from four to seven days: (1) from 14 to 20 July; (2) from 22 to 26 July; and (3) from 29 July to 4 August.Based on the RH profile alone (the ozone profile is missing), the second multiday event should be extended to 27 July, since dry air was found on that day down to 800 hPa too.RH and ozone profiles at Egbert and Walsingham (not shown) display similar structures, suggesting a nearly continuous presence of Pressure[mb] Montreal Relative Humidity Measurement [%] Jul12 Jul13 Jul14 Jul15 Jul16 Jul17 Jul18 Jul19 Jul20 Jul21 Jul22 Jul23 Jul24 Jul25 Jul26 Jul27 Jul28 Jul29 Jul30 Jul31 Aug1 Aug2 Aug3 Aug4 Montreal Ozone Measurement [ppbv] Jul12 Jul13 Jul14 Jul15 Jul16 Jul17 Jul18 Jul19 Jul20 Jul21 Jul22 Jul23 Jul24 Jul25 Jul26 Jul27 Jul28 Jul29 Jul30 Jul31 Aug1 Aug2 Aug3 Aug4  stratospheric intrusions reaching the lower troposphere over eastern Canada during this period.The relative proximity of the three stations would suggest that some degree of correspondance should exist between them.But the larger number of missing profiles in Egbert and Walsingham makes it difficult to distinguish multi-day intrusion events or a clear correspondance with the three large intrusion events seen at Montreal.

Objective identification of stratospheric intrusions
In order to objectively compare observations of ozone and humidity to the Lagrangian STE data set, it is necessary to set up a numerical algorithm to identify stratospheric intrusions in the measured profiles.The algorithm built for the purpose of this evaluation uses the same criteria as a subjective determination of intrusion levels (such as that above), and aims for similar results.The results from this objective identification algorithm depend upon the values of the parameters used, and these may not be appropriate for other seasons and locations.These results are also naturally subject to the uncertainties discussed above that are intrinsic to identifying stratospheric intrusions from balloon sonde profiles.These are difficult to quantify.Nevertheless, use of this algorithm ensures that identical criteria are used for all profiles, thus providing consistency and reproducibility of results.The algorithm screens an observed profile from the ground upwards by increments of 10 hPa, and identifies alternating bottoms and tops of intrusions using the following criteria applied to a centered smoothed window of 50 hPa width.Fields are smoothed within this window using a seventh-order polynomial fit.All three criteria must be met for an intrusion bottom or top to be detected, and multiple intrusion layers may be identified in a single profile.Vertical gradients are expressed in pressure coordinates.
Detection of the bottom of the intrusion: Detection of the top of the intrusion: The threshold RH value of 60 % within a stratospheric intrusion allows inclusion of regions of intrusions that have undergone significant mixing with the surrounding air, consistent with the Lagrangian STE data set, which ignores subgrid-scale mixing.After this first-pass identification of any intrusion layers, the algorithm next removes any thin layers by either ignoring intrusions that are thinner than 50 hPa or by merging intrusions that are closer to each other than 50 hPa.Finally, and except in Fig. 1 where top and bottom levels of intrusions are shown on a 10 hPa vertical grid, the top and bottom levels of intrusions are moved to the closest grid point of a 50 hPa-resolution vertical grid starting at 1000 hPa.This vertical grid is also used with the Lagrangian STE data set (see Sect. 4.1).The resulting stratospheric intrusions are only sensitive to the details of the choice of parameters in regions where the stratospheric signature is ambiguous.In such cases, the parameters are chosen such that most ambiguous cases are included.The presence of ambiguous cases constitutes a source of uncertainty in the comparison with the Lagrangian STE data set (see Sect. 5).However, these ambiguous layers are usually located at either the bottom or the top of intrusions, and therefore it will be useful to distinguish between the inner part of intrusions and the layers surrounding their bottom and top levels.
Intrusions identified with this algorithm for Montreal balloon-sonde launches are shown as a black hatching overlaid on each profile in Fig. 1.The hatching nicely separates the troposphere between two regions with markedly different relative humidity and ozone characteristics.The air below identified intrusions is markedly moister and typically, but not always, has less ozone.The hatched region coincides well with the zone identified above of dry, ozonerich air whose bottom undulates over time.Intrusions penetrate to the middle troposphere around 600 hPa every day of the campaign except for 21 and 28 July, when moist air is clearly present at higher altitudes, probably in relationship with clouds in the region.A few isolated layers that were not identified as stratospheric intrusions can also be noted surrounded by hatched areas (e.g., 16, 18, 20 and 24 July; 2 August).Most of these show clearly higher RH than in the surrounding intrusions.Note that some layers with relatively moist air were identified as stratospheric intrusions (e.g., 13 July around 350 hPa, 31 July around 300 hPa, 4 August around 700 hPa).These cases are ambiguous, and as mentioned above, the algorithm parameters are chosen such that ambiguous cases are typically included in intrusions.
Table 1 summarizes the number of profiles where a stratospheric intrusion was detected somewhere below 300, 500 and 700 hPa for the three stations.Intrusions are detected below 300 hPa in 89 % of all profiles, below 500 hPa in 79 %, and below 700 hPa in 38 %.The Montreal station has an even higher number of intrusions at all levels, with an intrusion reaching below 700 hPa every other day.Walsingham has the lowest number of intrusions, though still with one day in four showing an intrusion reaching below 700 hPa.This frequency of deep intrusions is remarkably high, though bearing in mind that this is a relatively short period.Note however from Fig. 1 that no intrusion has been detected below 800 hPa, where boundary-layer processes intervene.This also applies to the other two stations.Yet, it can be estimated from the total black shaded area below 700 hPa in Fig. 1 that around ten percent of the air below this level originates in the stratosphere, a number which is about two orders of magnitude larger than the 15-yr climatological estimate from Sprenger and Wernli (2003).

Distributions of humidity and ozone in observed stratospheric intrusions
In order to characterise the composition of the stratospheric intrusions that were detected and to verify the appropriateness of the objective identification algorithm, it is useful to analyse the distributions of humidity and ozone with respect to the identified intrusions.As mentioned above, the layers near the bottom and top levels of intrusions are in some cases ambiguous.We distinguish between the different regions of a tropospheric column relatively to the intrusion by tagging each 50 hPa layer of a vertical profile with one of the following five categories with decreasing priority: 1. Intermediate (Bottom): the two 50 hPa layers on either side of the bottom of the intrusion.
2. Intermediate (top): the two 50 hPa layers on either side of the top of the intrusion, if they have not already been assigned to the "Intermediate (Bottom)" category.
3. Inside Intrusion: the 50 hPa layers inside an intrusion that have not been assigned to one of the "Intermediate" categories.
4. Above Intrusion: the 50 hPa layers that lie above an intrusion and that have not been assigned to one of the "Intermediate" categories.
5. Below Intrusion: the 50 hPa layers that lie below an intrusion and that have not been assigned either to one of the "Intermediate" categories or to the "Above Intrusion" category.
An example of this categorisation is given in Fig. 2 for 24 July in Montreal.It shows two detected stratospheric intrusions and the accompanying categorisation of each 50 hPa layer.The bottom level of the lower intrusion marks a transition between a tropospheric region below and a dry layer with slightly enhanced ozone mixing ratios above.Ozone and RH show a visible anticorrelation in the vertical from 800 hPa up to 500 hPa, a typical fingerprint of air of stratospheric origin.The upper part of this intrusion is more ambiguous and part of this ambiguity is absorbed in the Intermediate (Top) category.The second intrusion's bottom is detected around 180 hPa, but its top is not detected by the algorithm because it is thin and close to the thermal tropopause.This example illustrates the difficulty faced when reducing an observed profile into binary information on the presence/absence of stratospheric intrusion.Here, the definition of an intermediate region between the inside and the outside of the intrusion provides a useful palliative for the lower, deep intrusion but not for the upper, shallow one.Distributions of observations of RH ( %), specific humidity (g kg −1 ), and ozone (ppbv) are shown in Fig. 3 with respect to these intrusion categories for all three stations combined.Note that the corresponding distributions for individual stations are very similar (not shown).The height of the bars gives the percentage of the 50 hPa layers belonging to a given category.Consistent with the high frequency of intrusions mentioned before, the category "Inside Intrusion" dominates down to 700 hPa but becomes almost absent below this level.This is due to the fact that the bottom of intrusions never penetrated below 800 hPa during the entire campaign.RH (top row) inside stratospheric intrusions is very dry at all pressure levels: more than 60 % of the air has a RH lower than 20 %, and less than 20 % has RH larger than 40 %.The RH distribution is clearly shifted towards higher values in the "Intermediate" categories, and even more so in the "Below Intrusion" and "Above Intrusion" categories.Interestingly, the RH distributions, in particular for the "Inside Intrusion" category, stay relatively constant throughout the four pressure ranges.This may be due to a combination of low specific humidity, indicating memory of a stratospheric origin, and temperature.In order to disentangle these two contributions, the middle row of Fig. 3 shows the specific humidity distributions.The categories keep their relative differences, but the specific humidity clearly increases with increasing pressure in all categories, including the "Inside Intrusions" category.This shows the important role of the vertical profile of temperature in keeping the low RH in stratospheric intrusions throughout the troposphere.Assuming that air inside intrusions comes from the stratosphere, a likely cause for the increase in specific humidity inside intrusions with increasing pressure is that significant mixing takes place with the surrounding air during the descent.Another possible cause may be that the air identified here as of stratospheric origin has its actual origin in the upper troposphere for lower altitude intrusion layers.The overall pattern in ozone distributions is consistently similar to the specific humidity (but inverted), with larger ozone molar mixing ratios at lower pressures and in the "Inside Intrusion" category.Above 300mb Level q < 0.01g/kg 0.01 ≤ q < 0.1g/kg 0.1 ≤ q < 0.5g/kg 0.5 ≤ q < 1.0g/kg 1.0 ≤ q < 5.0g/kg 5.0 ≤ q < 10g/kg q ≥ 10g/kg Above Intermediate Inside Intermediate Below 0 Between 300 & 500mb Level q < 0.01g/kg 0.01 ≤ q < 0.1g/kg 0.1 ≤ q < 0.5g/kg 0.5 ≤ q < 1.0g/kg 1.0 ≤ q < 5.0g/kg 5.0 ≤ q < 10g/kg q ≥ 10g/kg Above Intermediate Inside Intermediate Below 0 Between 500 and 700 mb Level q < 0.01g/kg 0.01 ≤ q < 0.1g/kg 0.1 ≤ q < 0.5g/kg 0.5 ≤ q < 1.0g/kg 1.0 ≤ q < 5.0g/kg 5.0 ≤ q < 10g/kg q ≥ 10g/kg Below 700mb Level q < 0.01g/kg 0.01 ≤ q < 0.1g/kg 0.1 ≤ q < 0.5g/kg 0.5 ≤ q < 1.0g/kg 1.0 ≤ q < 5.0g/kg 5.0 ≤ q < 10g/kg q ≥ 10g/kg Above Intermediate Inside Intermediate Below 0 Between 500 and 700 mb Level Fig. 3. Distributions of the average relative humidity (top, in %), specific humidity (middle, in g kg −1 ), and ozone molar mixing ratio (bottom, in ppbv) within 50 hPa bins of 56 measured profiles, depending on the category to which they belong, and separately for four different pressure ranges: above 300 hPa, between 300 and 500 hPa, between 500 and 700 hPa and below 700 hPa (from left to right).Note the non-linear colour shading for the specific humidity.These distributions include all measurements from the three sites.
This similarity suggests that ozone behaves approximately as a passive tracer over the time scale of an intrusion.

Methodology
The finite spatial (55 km, 5 hPa) and temporal (24 h) intervals separating the trajectories at initial time allow for a mapping of STE mass fluxes with a spatio-temporal grid of typically coarser resolution (e.g. 2 × 2 × 24 h, Bourqui, 2001).
In the case of this evaluation, we need to compare the Lagrangian STE data with instantaneous, point-observed profiles.This requires us to artificially define an area and timeinterval around the observation location and time.The presence of a trajectory within this area and time-interval is then assumed to represent the presence of air of stratospheric origin above the given station, and is compared to the observed profile at the station.Here, we have chosen this area to be within a radius of 55 km around the station, and the timeinterval to be ±6 h around measurement time.The vertical grid was set with 50 hPa spacing from 1000 hPa.The radius was chosen to allow for some stretching of the STE parcel.For the time-interval, we have verified that trajectories cross the 2 PVU tropopause uniformly within the [12 h, 36 h[ window (not shown), such that taking a shorter timeinterval does not lead to a statistical bias.We have chosen a ±6 h interval as a compromise between including events that were not observed, but that would have been observed at a different time, and allowing for the sparseness of the initial intervals between trajectories.Since more than one trajectory can be located in each 50 hPa bin above a station (i.e.within a 55 km radius of the site and within a ±6 h interval), we consider that a 50 hPa bin with one or more trajectories has a stratospheric origin, while the absence of trajectories represents a tropospheric origin.
It is important to note that this finite radius and timeinterval unavoidably leads to mismatches between STE data and observations, and that this error disappears in geographical density maps of STE (not shown).We have tested the sensitivity of the results to these parameters by varying the radius between 27.5 and 555 km and the time-interval between ±1 and ±12 h, but the results could not be improved.When increasing the size of the spatial and/or temporal window, more trajectories were detected in each vertical 50 hPa bin, but the missed events (i.e., no trajectories found in a layer where a stratospheric intrusion was observed) were not reduced significantly.Similarly, when reducing the size of the spatial and/or temporal window, fewer trajectories were detected in each vertical bin, but the overforecasts or false alarms (i.e., trajectories present in a layer where no stratospheric intrusion was observed) were not significantly reduced.This relative insensitivity most likely comes from the fact that intrusions translate to clusters of trajectories, and an increase in the analysis window size essentially leads to the inclusion of more trajectories from the same cluster.
The skill of the Lagrangian STE forecast at capturing the stratospheric intrusions identified in observations is quantified using 2 × 2 contingency tables and standard categorical scores (Jolliffe and Stephenson, 2003).A 2 × 2 contingency table simply provides the number of profiles with/without stratospheric intrusions observed and/or forecast (see illlustration in Table 2).The diagonal of the table represents the correct forecasts (with/without intrusions), while the offdiagonal elements reflect the overforecasts or false alarms (top right) and the missed events (bottom left).The four categorical scores used in this study are described below.Variables a, b, c, d and n are defined in Table 2. PC: Proportion Correct = (a +d)/n (1 = perfect).This score represents the proportion of correct forecasts, whether with or without intrusions.A sequence of perfect guesses would result in a PC = 1.0.An infinite sequence of random guesses would result in a PC = 0.5.A finite sequence of random guesses would result in a PC belonging to a binomial probability distribution centered in 0.5, whose width depends on the size of the sequence.
Hereafter, we consider that the PC is statistically signif-  Murphy, 1996).
FB: Frequency Bias = (a + b)/(a + c) (1 = perfect).This score measures the bias in the forecast.It is equal to 1 for an unbiased forecast, larger than 1 for an overforecast, and smaller than 1 for an underforecast.
HR: Hit Rate = a/(a + c) (1 = perfect).This score measures the ratio of the number of correct forecast events to the total number of observed events.It thus quantifies the proportion of missed events.It is equal to one when there are no missed events, and 0 when all events have been missed.Since it only counts the correct forecast events, it may be artificially improved when there is a tendency for overforecasting.
FAR: False Alarm Ratio = b/(a + b) (0 = perfect).This score measures the ratio of the number of overforecast or false alarm events to the total number of forecast events.It thus quantifies how trustworthy the forecast events are.
It is equal to 0 when no false alarms are forecast and 1 when all forecasts are false alarms.

Comparison of stratospheric intrusions in the STE data set and in observations at the three stations
The number of trajectories of stratospheric origin (i.e.crossing the 2 PVU tropopause in the S→T direction) found in the data set that approached one of the three balloon-sonde sites within a radius of 55 km, and within ±6 h of the measurement time, is shown in Fig. 4 for each day for the three stations, with the stratospheric intrusions identified from observed profiles overlaid as black hatching.At Montreal (Fig. 4, top row), the first two multi-day intrusion events seen in observations, namely 14-20 July and 22-26 July, coincide well with distinct multi-day intrusion events in the forecasts.During these two periods, differences occur mainly below 600 hPa in both directions (i.e., both missed or overforecast events), but with a tendency for  Jul12 Jul13 Jul14 Jul15 Jul16 Jul17 Jul18 Jul19 Jul20 Jul21 Jul22 Jul23 Jul24 Jul25 Jul26 Jul27 Jul28 Jul29 Jul30 Jul31 Aug1 Aug2 Aug3 Aug4  the forecasts to miss events.At Egbert and Walsingham, the general structures of the intrusions also seem well captured, but again with a few events missed, such as 14 and 27 July at Egbert and 27 July at Walsingham.Other mismatches mostly concern pressures larger than 600 hPa, as at Montreal.The third multi-day intrusion event identified in observations at the three stations (29 July-4 August) is also found in the STE forecasts.Interestingly, the forecasts of this event are very similar for the three stations, with deep intrusions found in the beginning and end of the period, and shallower intrusions confined above 450 hPa found inbetween.Observations suggest that this event was deeper in its core above Montreal than above the other stations.The forecast therefore captures accurately the beginning and end of the event at all three stations, but misses the deep part of its core above Montreal.This may be either due to a spatial shift of the whole event in the forecasts, or to a misrepresentation of a deeper part of its core which was only observed at Montreal.Note that the thermal tropopause (black solid lines) estimated from observations and the dynamical tropopause as shown by the top of the blue shaded columns in Fig. 4 agree well at Montreal, but show a few days of mismatch at Egbert and Walsingham.
Table 3. Statistical evaluation of the skill of the Lagrangian STE data at representing intrusions penetrating below 300, 500 and 700 hPa.
An event is here defined as the presence of stratospheric air anywhere below the given level.Contingency tables are for the three stations together.Categorical skill scores are for the three stations together, with individual stations given in bracket (in the order: Montreal, Egbert, Walsingham).PC = Proportion Correct (1.0: all correct; 0.5: randomly correct); FB = Frequency Bias (>1: high bias; <1: low bias); HR = Hit Rate (1.0: all hit; 0.0: none hit); FAR = False Alarm Ratio (0.0: no false alarm; 1.0: all false alarms).a Statistically significant at the 10 % bilateral level.
b Not statistically significant at the 10 % bilateral level.

Skill at representing the overall depth of intrusions
Table 3 provides the contingency tables and categorical scores for the skill of the STE forecast at capturing events, defined as the presence of a stratospheric intrusion penetrating below 300, 500, and 700 hPa.These figures can be linked to the number of intrusions identified from observations given in Table 1.Note that in Table 3, forecast and observed intrusions are considered as matching as long as they both include an intrusion layer located somewhere below the pressure threshold.This is an allowance for uncertainties associated with the relative sparseness of the trajectory's initial positions.In the upper troposphere, the STE data set tends to overforecast intrusions penetrating below 300 hPa, such that all profiles but one are forecast with an intrusion, including 5 non-observed events.This translates into a small positive bias varying between 5 and 14 % at individual stations and estimated to 10 % over the full campaign (Table 3, left column).These overforecasts translate into a proportion of correct forecasts estimated to 91 %.Consistently, the hit rate is equal to 100 %, while the false alarm ratio is equal to 9 %.In the mid-troposphere, the STE data set both misses and overforecasts a few intrusions penetrating below 500 hPa, resulting in a slight underestimate of the frequency of intrusions of 11 %.The proportion of correct forecasts is reduced to 77 %, with 20 % missed events, partially compensated by 10 % false alarms.For deep intrusions penetrating below 700 hPa, the proportion of correct forecasts is not statistically significant, i.e. the forecasts have no predictive skill at all.The STE data miss 66 % of events, which are partially compensated by 56 % false alarms, resulting in a small low bias of 24 % (FB = 0.76).This low bias shows however a large variability among the three stations (FB = 0.54, 0.75 and 1.14), owing to the overall small number of events.The STE data are therefore only useful in a statistical sense for intrusions penetrating lower than 700 hPa.

Skill at representing the detailed structure of intrusions
Table 4 provides the contingency tables and categorical scores for the skill of the STE data at capturing the detailed structure of the intrusions.Here an event is defined as the presence of air of stratospheric origin within a 50 hPa vertical bin, and the analysis is performed separately for four ranges of pressures.Above 300 hPa, the predictive skill is excellent, with a slight tendency for false alarms and related positive bias, consistent with the discussion on Table 3.In the pressure range 300 to 500 hPa, the predictive skill degrades slightly, with increased missed events and false alarms.However these almost compensate, and result in a very small positive bias.Hence, down to 500 hPa, consistent with the analysis of Table 3, the STE data have both very good predictive skill and a small statistical bias.In the pressure range from 500 to 700 hPa, the proportion correct drops to 0.58, reflecting limited predictive skill that is however still statistically significant.The forecasts miss about half of the events, and are composed of 28 % false alarms.This results in a low bias estimated at 31 % (FB = 0.69).It is interesting to note that this degradation is dampened when considering skill at representing overall intrusion depth of 500 hPa (Table 3), due to a partial correction by the layers below 700 hPa.This suggests a greater difficulty in capturing the details of these intrusions in the 500-900 hPa region than their overall presence.In the pressure range 700 to 900 hPa, the probability of observing an event drops to 15 %, resulting in only 30 events throughout the entire campaign, all located above 800 hPa.The probability of forecasting an event is very realistic at 13 %, and includes events down to 900 hPa.
The STE forecasts miss 77 % of events, that are well compensated by 73 % false alarms, yielding a slight positive bias of 13 % (FB = 0.87).But despite this large number of missed events and false alarms, the proportion correct remains high (PC = 0.81).This score is artificial, and is due to the large number of non-observed, non-forecast events, reflecting the statistical predominance of no-events below 700 hPa rather than the forecast's predictive skill.Hence, the STE data show a very good statistical representation of events below 700 hPa, but do not provide useful predictive skill.
In an attempt to see where missed events and false alarms are located with respect to the observed events, Fig. 5 shows the percentages of 50 hPa layers with or without trajectories of stratospheric origin in the same five categories as discussed for Fig. 3.Note that the "Intermediate (Top)" and "Intermediate (Bottom)" categories are ambiguous in the observations and will be ignored in the evaluation of the STE data made here.In the region above 300 hPa, the STE forecasts miss about 15 % of the 50 hPa layers belonging to the "Inside Intrusion" category.This is due to the mismatch between the thermal tropopause and the 2 PVU tropopause.The categories "Above Intrusion" and "Below Intrusion" are quite systematically filled with overforecasts, consistent with the discussion on Table 4.This error is also found in the next range of pressures from 300 to 500 hPa, with systematic overforecasts in the category "Above Intrusion".This means that the STE data set systematically misses intermediate layers of tropospheric air embedded between stratospheric intrusion layers.In this pressure range, the STE data set represents the "Inside Intrusion" well, with about 15 % missed events, consistent with the discussion on Table 4.It still fills the "Below Intrusion" category with about 50 % overforecasts.In the region between 500 and 700 hPa, the forecast again fills the "Above Intrusion" category with about 50 % overforecasts.It represents the "Inside Intrusion" category less well, where it misses about a half of the intrusion layers, again consistent with Table 4.It represents the "Below Intrusion" category well, with about 20 % overforecasts.Finally, in the region below 700 hPa, the only significant category is the "Below Intrusion" category, which is captured with about 5 % overforecasts, a number strongly constrained by the large number of non-intrusion days in both observations and STE data.Overall, this shows that most overforecasts above 500 hPa take place within tropospheric layers that lie between stratospheric intrusions, and that the hit rate scores discussed in Table 4 almost exactly apply to the inner part of intrusions.This suggests that ambiguous layers, likely isolated in the "Intermediate (Top)" and "Intermediate (Bottom)" categories, do not account for the missed events in the forecasts.

Analysis of STE forecast errors
The previous section showed a smooth degradation, from the upper to the lower troposphere, of the predictive skill of the STE data for the 50 hPa intrusion layers.The predictive skill was found excellent in the upper troposphere, and good in the middle troposphere, but was deemed negligible below 700 hPa.The statistical bias was found to be slightly positive in the upper troposphere, and negative below 500 hPa.The region 500-700 hPa had the strongest bias, with a frequency bias (FB) equal to 0.69 for the full campaign.This low bias was also found in the analysis of the STE data skill at representing overall intrusion depths.The predictive skill is not only sensitive to errors in the STE data, but also to uncertainties in the method of comparison with the observations.It is likely that the latter will have a lesser effect on the statistical bias of the STE data, due to the averaging effect.In this section, we analyse the five possible causes for this underestimate of stratospheric intrusions in the middle to lower troposphere in the Montreal area.This station has Percentage of occurrence of the presence/absence of trajectories in 50 hPa bins within 55 km of the station, depending on the categories they belong, separately for four different pressure ranges: above 300 hPa; between 300 and 500 hPa; between 500 and 700 hPa; and below 700 hPa (from left to right).These distributions include all measurements from the three sites.
statistics that are well representative of the overall statistics for the full campaign, and has the largest number of observed balloon-sonde profiles.
1.As noted in Sects.3.1 and 3.2, the identification of intrusions in the observed profiles is subject to uncertainties related to (i) the upper-troposphere acting as a source of dry, ozone-rich air, and not only the region above 2 PVU; (ii) the presence of ambiguous layers in observed profiles; and (iii) the additional errors due to in-situ ozone formation.These uncertainties are difficult to estimate in this study, and are expected to contaminate the evaluation of the STE data made here.Using the objective algorithm developed for the purpose of this study, a better understanding of these uncertainties could be gained in a future balloon campaign by calculating additional backward trajectories started from the nodes of the measured profiles.Additional chemical species data (e.g., from aircraft measurements) would also be very helpful.However, this source of error does not affect the real skill of the STE forecasts.
2. The relative spatio-temporal sparseness of the trajectory starting grid affects the accuracy of the evaluation made here, as discussed in Sect.4.1.It obviously limits the evaluated predictive skill of the STE data.It may also, however, introduce an artificial bias when a set of trajectories representing a stratospheric intrusion becomes too dispersed to be captured within the 55 km radius of a station.These missed events can be detected by screening the clusters of trajectories that descend into the middle or lower troposphere.Note that these clusters are part of the routine calculations of the Lagrangian STE data set.The next subsection offers a detailed screening of candidate clusters for the missed events and shows that too-dispersed trajectories do significantly contribute to the missed events.Note that this error source does not apply to climatological density maps, for which incidences are summed over spatiotemporal intervals.
3. The temporal limit of 6 days for the trajectory length may lead to missed events, and a statistical bias, since all older stratospheric air will be missing in the STE data.Figure 6 shows the distribution of the age of trajectories as they reach one of the three stations within the "Inside Intrusion" category.The age is determined as the difference between the time of crossing the 2 PVU and the average time while passing through the 55 km radius disk around the station.Since trajectories cross the 2 PVU tropopause approximately uniformly within the time window [12 h, 36 h[ (not shown), and have a total length of 144 h, their maximum possible age is uniformly distributed between 108 h and 132 h (grey shaded zone in Fig. 6).The shapes of the distributions, although likely under-sampled, suggest a maximum frequency of the age above 300 hPa at around 48 h, between 300 and 500 hPa at around 48-60 h, and between 500 and 700 hPa at around 84 h.Below 700 hPa, only three events were found and no inference can be made.
It is interesting to see that all distributions (except the one for below 700 hPa) show a decrease in frequency towards the age of 108 h, i.e. before any limitation due to the trajectory length arises.This suggests that the trajectory length of 6 days allows for the highest frequency ages.Nevertheless, the layers 500-700 hPa and below 700 hPa are likely affected by the age cut-off.In an attempt to evaluate further this cause of error, the next subsection discusses candidate clusters that, if temporally extended, may account for missed events.
4. Another type of error is related to the divergence of the weather forecasts with respect to the true atmospheric state, which likely increases with forecast length.Such error can be expressed as geographical shifts in stratospheric intrusions, as misrepresentations in their vertical www.atmos-chem-phys.net/12/2661/2012/location and extent, or simply as an absence of stratospheric intrusions in the meteorological forecasts.They may translate into climatological biases.Furthermore, such a Lagrangian measure of meteorological forecast errors, which focuses on non-conservative flows that link the lower stratosphere to the lower troposphere, offers an interesting new perspective on forecast skill.This perspective is useful for understanding the accuracy of regional and global air-quality models and also the dynamical interaction between the upper and lower troposphere.Some insights into this source of error will be drawn in the next subsection by identifying candidate clusters that could have accounted for missed events if spatially shifted.
5. The last possible cause of error is related to the absence of subgrid scale parameterisations such as convection and turbulent diffusion in the kinematic trajectories.Bourqui and Trépanier (2010) suggested that descent from the stratosphere is typically cloud-free and quasi-adiabatic until a parcel reaches the lower troposphere, where clouds and boundary-layer processes intervene.It is therefore expected that such errors most likely contaminate the STE forecasts in the lower troposphere, typically below 700 hPa.Moreover, since subgrid scale processes affect grid scale winds in the weather model, it is expected to see some indirect effect of sub grid scale convection or turbulence on the trajectories, such as sudden jumps or disorganised motions.
It is difficult to estimate implications on the predictive skill resulting from this source of error.

Cluster analysis of missed events
Here, we first identify clusters of trajectories that descend from the tropopause down to 700 hPa.This clustering is discussed in Section 2 and only rejects about 10 % of all trajectories reaching the 700 hPa level.The clusters whose centers of mass approach the station within a radius of 1111 km (distance equivalent to 10 • latitude) within ±6 h of the measurement time are then selected, and the trajectories contained in each cluster are plotted on individual maps (see Fig. 7 for a sample).Additionally, the clusters that enter within the same radius around the station within their last 12 h are also selected and plotted on individual maps (see Fig. 7 for a sample).The first are analysed to identify possible clusters that were potentially shifted with respect to the station, or clusters that could not be seen within the 55 km radius around the station because their trajectories were too dispersed.The second are analysed to identify clusters that would have reached the station, were they calculated for a slightly longer period of time.that would be candidates for the missed events if they were slightly shifted geographically (sh), if the density of trajectories were higher (de) or if they had been slightly extended in time (ex).Note that the same cluster may be a candidate for different events.Table 5 shows that all missed events, except the 22 July one, have at least one candidate cluster, and that in most cases, several distinct candidate clusters exist.In about 70 % of all missed events, there is at least one candidate cluster that passes over the station but is missed due to its low trajectory density.This explanation reduces the number of missed events by a factor of three at least, with the remaining missed events (except for one case) being explainable by a shift in the cluster due either to the spatio-temporal sparseness of trajectory starting grid leading to a mismatch between trajectory clusters and measurements, or to a geographical shift in the meteorological fields.
Figure 7 provides examples of typical candidate clusters that could account for three missed events: first row for 30 July, second and third rows for 15 July, fourth and fifth rows for 16 July.The candidate cluster for the 30 July event (first row) clearly crosses Montreal in the beginning of the ±6 h window around the measurement time, but because of significant dispersion, the trajectory density was too low and no trajectory was detected within the 55 km radius around the station.The cluster spans pressures between 700 and 800 hPa when it traverses the Montreal column, which coincides well with the bottom of the missed intrusion layer (500-750 hPa, see Table 5).In the second example (Fig. 7, second row), a large cluster passes a few hundreds of kilometers east of the station around the 15 July measurement time.The trajectories are coloured with six different colours.
Table 5. List of missed stratospheric intrusions in Montreal and clusters found in the Lagrangian STE data set that could account for the missed events if they were slightly shifted in space (sh), extended in time (ex), or if the density of trajectories were higher (de).An observed stratospheric intrusion is considered to be missed if no trajectory is found within 100 hPa and a radius of 55 km around the station.The intrusions are determined using the objective algorithm described in Sect.3.Each cluster is identified with its number of member trajctories, and one cluster can be a candidate for more than one missed event.The clusters listed in boldface are illustrated in Fig. 7. Its pressure range (500-750 hPa) coincides well with the observed intrusion (600-750 hPa).Another cluster (third row) that started three days earlier approaches Montreal while the trajectories disperse and end just above it.Although the cluster spans a large range of pressures at the time is arrives above the station (300-900 hPa), the particular trajectories that enter within the 55 km radius around Montreal are located above 600 hPa.These two candidate clusters illustrate well the complex time-dependent Lagrangian structures of certain stratospheric intrusions.The fourth and fifth rows in Fig. 7 are candidates for the missed event on 16 July, the fourth cluster with a southeastward shift and the fifth cluster with an extension of the trajectories.Both clusters together span the pressure range of the missed intrusion layer (600-750 hPa).As shown in Table 5, several clusters that fall within the time and pressure ranges of the event but that are either shifted geographically or are too dispersed to be detected within the 55 km radius around the station can account for a missed event.In view of this, it seems tempting to increase the size of this radius.However, as discussed above, doing so would add more overforecasts and would not reduce significantly the number of missed events.

Conclusions
We have developed a new global real-time Lagrangian diagnostic system for Stratosphere Troposphere Exchange.The Lagrangian STE data set produced by this new system offers daily, high-resolution global data on both the downward (S→T) and upward (T→S) transport of mass across the dynamical ±2 PVU tropopause and the 380 K isentrope.The data set is calculated using the methodology proposed in Bourqui (2006) using global meteorological forecasts from Environment Canada's operational Global Environmental Multiscale (GEM) numerical weather prediction model.Routine calculations with the new STE diagnostic system began in July 2010 and have now delivered more than one year of data.Archived data include trajectories, daily maps of STE mass flux densities, diagnosis of deep intrusions, and details on clusters of trajectories.Once its errors are evaluated and characterized, this data set will be useful for improving our basic understanding of STE in the northern and Southern Hemispheres as well as in the tropics.It will also offer a reference data set for evaluating the cross-tropopause transport predicted by various chemistrytransport models and climate-chemistry models.
The present study is the first evaluation of this new data set, using balloon-sonde measurements from a field campaign in eastern Canada in summer 2010.During the campaign, balloon sondes were launched daily from three sites (Montreal, Egbert, and Walsingham).Measurements of ozone, relative humidity, temperature, pressure, and wind speed and direction were made through the troposphere and lower stratosphere.The Lagrangian STE data set is evaluated with respect to its capacity to capture stratospheric intrusions identified in the observations.Here, the evaluation is restrained to the first-day forecasts, since they are expected to be associated with the smallest possible weather forecast errors.Analysis of the observational data suggested an unexpectedly high frequency of (deep) stratospheric intrusions during the campaign period.Over the three stations, 89 %, 79 % and 38 % of the days showed the signature of stratospheric air below 300 hPa, 500 hPa, and 700 hPa, respectively, though with significant variability among the three stations.In Montreal, half of the measured profile showed clear signatures of stratospheric air below 700 hPa.It should be noted however, that the identification of stratospheric intrusions based solely on individual profiles of ozone and RH has inherent flaws.In particular, it is not able to distinguish descents of dry upper tropospheric air from descents of air from above the 2 PVU tropopause.Ambiguous layers exist in the observed profiles.Summertime low-level in-situ ozone production and upward vertical transport may add further errors.
An objective algorithm was developed to identify stratospheric intrusions in the measured profiles in order to help the comparison with the STE data set.This objective algorithm also allowed characterization of the observations with respect to the intrusions.Distributions of RH within these different categories showed clear differences between the "Inside Intrusion" and the "Above" or "Below Intrusion" categories, but the distributions remained similar through the troposphere from the ground to above 300 hPa.This similarity with height is shown to be due to the effect of the increase of temperature with increasing pressure, which counterbalances the increase in specific humidity.The latter clearly increases at lower altitudes but is much dryer inside intrusions than outside.The ozone molar mixing ratio is found to behave very similarly to the specific humidity, but in an inverted fashion.This increase in specific humidity and decrease in ozone mixing ratios with increasing pressure inside intrusions suggests that there is significant mixing taking place during the descent of stratospheric air.
Evaluation of the ability of the STE data set to represent the overall depth of stratospheric intrusions identified in the observed profiles shows very good predictive skill for intrusions penetrating below 300 hPa and 500 hPa, respectively.The bias is estimated to be smaller or equal to 10 % in both cases.The predictive skill vanishes for intrusions penetrating below 700 hPa, while the statistical skill remains reasonable with an estimated bias of 24 %.Evaluation of the STE data set at capturing the detailed structure of stratospheric intrusions shows good predictive skill above 500 hPa, with an estimated bias smaller than 15 %.The predictive skill degrades in the layer 500-700 hPa, with only 58 % correct forecasts, and an associated 31 % estimated low bias.The bias improves in the layer below 700 hPa while the predictive skill vanishes.
Five possible causes of mismatches between the Lagrangian STE data set and the stratospheric intrusions identified from observed profiles were discussed: 1. Uncertainties in the identification of stratospheric intrusions in observed profiles.
3. The temporal limit of 6 days on trajectories.
4. Errors in the weather forecasts.
5. Lack of subgrid-scale processes in the trajectory calculations.
It is impossible to quantify accurately the separate contributions of these errors in this study, since the identification of stratospheric intrusions from observations is itself uncertain.Nevertheless, a qualitative understanding can be gained by screening individual missed intrusions for candidate clusters that would account for the missed event, were it to be slightly shifted in space, extended in time, or if the density of trajectories were to be increased.For all missed events except one, we found candidate clusters of trajectories approaching Montreal in the correct range of pressures.About 70 % of the missed events could be explained by a cluster of trajectories that dispersed such that no trajectory was detected within the 55 km radius around Montreal but several trajectories were found on either side of this station.This reduces the bias found in the layer 500-700 hPa by a factor of at least three.
Examples were given of candidate clusters for three missed intrusions.
From the five causes of errors listed above, only two (3, and 4) are anticipated to affect a climatology based on this STE data set.Trajectory age distributions furthermore suggest that error source 3 only affects the layers below 500 hPa.With a maximum bias of about 30 % in the layer between 500 and 700 hPa, composed of missed events up to 70 % of which can be attributed to the low density of trajectories in dispersed intrusions, the STE data set can be anticipated to have a reasonably small climatological bias.
This study represents the first evaluation of this new Lagrangian STE data set.It is however limited to eastern Canada in one summer season with a high frequency of stratospheric intrusions, and therefore the skill characterised here may not be automatically generalised to other seasons and regions.Further evaluations in different seasons and locations around the world will be useful in order to characterise its errors in different parts of the world.

Fig. 1 .
Fig.1.Time series of daily measured profiles above Montreal.Colour shading represents relative humidity (top row, units of %) and ozone molar mixing ratio (bottom row, units of ppbv).The black thick line shows the thermal tropopause determined from the observed temperature and the black hatching shows the regions identified as stratospheric intrusions (the region between the bottom and the top intrusion levels) in the observed profiles (see Sect. 3).Blank columns correspond to missing data.

Fig. 2 .
Fig. 2.Illustration of a measured profile on 24 July 2010 above Montreal, and the corresponding categories based on the objective identification of top and bottom levels of stratospheric intrusions (see Sect. 3 for details).Left: measurements of ozone molar mixing ratio (red line, in ppbv), relative humidity (black line, in %), and temperature (blue line, in • C), with horizontal lines marking bottom and top intrusion levels (red and blue dashed lines, respectively), and thermal tropopause (red solid line).Right: the corresponding categories assigned to each 50 hPa layer in the troposphere.

Fig. 4 .
Fig. 4. Time series of the number of trajectories of stratospheric origin in the Lagrangian STE data set detected within 55 km of Montreal(top), Egbert (middle), and Walsingham (bottom) and within ±6 h from observation time (blue shading).The dynamical tropopause coincides with the top of the blue shaded columns.The black thick line shows the thermal tropopause determined from the observed temperature and the black hatching shows the regions identified as stratospheric intrusions (the region between the bottom and the top intrusion levels shifted to the closest gridpoint on the 50 hPa grid) in the observed profiles (see Sect. 3).The white hatching denotes missing observations.
Fig.5.Percentage of occurrence of the presence/absence of trajectories in 50 hPa bins within 55 km of the station, depending on the categories they belong, separately for four different pressure ranges: above 300 hPa; between 300 and 500 hPa; between 500 and 700 hPa; and below 700 hPa (from left to right).These distributions include all measurements from the three sites.

Fig. 7 .
Fig. 7.Examples of typical clusters of trajectories that could be candidates for the missed events listed in boldface in Table5.First row: candidate 41(de) for 30 July event, starting time is 25 July, 00:00 UTC.Second row: candidate 195(sh) for 15 July event, starting time is 13 July, 00:00 UTC.Third row: candidate 86(de) for July 15 event, starting time is 9 July, 00:00 UTC.Fourth row: candidate 26(sh) for 16 July event, starting time is 11 July, 00:00 UTC.Fifth row: Candidate 11(ex) for 16 July event, starting time is 11 July, 00:00 UTC.Left column: geographical location of the cluster's trajectories with respect to a radius of 555 km around Montreal station (equivalent to 5 • latitude).Middle column: horizontal distance of each cluster's trajectory with respect to Montreal station (units of distance equivalent degrees latitude) as a function of time elapsed since trajectory starting time.Right column: pressure of cluster's trajectories (hPa) as a function of time elapsed after starting time.The black vertical lines in the middle and right columns show the window ±6 h around measurement time.The trajectories are coloured with six different colours.

Table 2 .
Definition of the contingency table.See text for details.

Table 4 .
Statistical evaluation of the skill of the Lagrangian STE data at representing the detailed structure of intrusions.An event is here defined as the presence of stratospheric air in a 50 hPa vertical bin.The evaluation is made separately for four pressure ranges.Contingency tables are for the three stations together.Categorical skill scores are for the three stations together, with individual stations given in brackets (in the order: Montreal, Egbert, Walsingham).PC = Proportion Correct (1.0: all correct; 0.5: randomly correct); FB = Frequency Bias (>1: high bias; <1: low bias); HR = Hit Rate (1.0: all hit; 0.0: none hit); FAR = False Alarm Ratio (0.0: no false alarms; 1.0: all false alarms).
* Statistically significant at the 10 % bilateral level.
Table 5 gives the list of missed intrusions during the campaign in Montreal, with the pressure range of the missed part of the intrusion.The right column gives the clusters, identified by the number of member trajectories, Distribution of the age of trajectories when reaching one of the three stations in the "Inside Intrusion" category (see Sect. 3.2).The age is calculated as the difference between the time of crossing the 2 PVU tropopause and the average time while passing through the 55 km radius disk around the station.The distributions are separated with respect to the pressure range reached by trajectories within the 55 km radius disk around the station.The vertical axis provides the number of trajectories in each bin, without normalisation.A grey shading is applied in the region of ages between 108 and 132 h to indicate uncompleteness of the distributions caused by the limitation of the trajectory length to 6 days.

Table 5
• latitude).Middle column: horizontal distance of each cluster's trajectory with respect to Montreal station (units of distance equivalent degrees latitude) as a function of time elapsed since trajectory starting time.Right column: pressure of cluster's trajectories (hPa) as a function of time elapsed after starting time.The black vertical lines in the middle and right columns show the window ±6 h around measurement time.