Atmospheric Chemistry and Physics a Multi-sensor Upper Tropospheric Ozone Product (mutop) Based on Tes Ozone and Goes Water Vapor: Validation with Ozonesondes

Accurate representation of ozone in the extrat-ropical upper troposphere (UT) remains a challenge. However , the implementation of hyper-spectral remote sensing using satellite instruments such as the Tropospheric Emission Spectrometer (TES) provides an avenue for mapping ozone in this region, from 500 to 300 hPa. As a polar orbiting satellite TES observations are limited, but in this paper they are combined with geostationary satellite observations of water vapor. This paper describes a validation of the Multi-sensor UT Ozone Product (MUTOP). MUTOP, based on a statistical retrieval method, is an image product derived from the multiple regression of remotely sensed TES ozone, against geostationary (GOES) specific humidity (remotely sensed) and potential vorticity (a modeled dynamical tracer in the UT). These TES-derived UT ozone mixing ratios are compared to coincident ozonesonde measurements of layer-average UT ozone mixing ratios made during the NASA INTEX/B field campaign in the spring of 2006; the region for this study is effectively the GOES west domain covering the eastern North Pacific Ocean and the western United States. This intercom-parison evaluates MUTOP skill at representing ozone magnitude and variability in this region of complex dynamics. In total, 11 ozonesonde launch sites were available for this study, providing 127 individual sondes for comparison; the overall mean ozone of the 500–300 hPa layer for these son-des was 78.0 ppbv. MUTOP reproduces in situ measurements reasonably well, producing an UT mean of 82.3 ppbv, with a mean absolute error of 12.2 ppbv and a root mean square error of 16.4 ppbv relative to ozonesondes across all sites. An overall UT mean bias of 4.3 ppbv relative to sondes was determined for MUTOP. Considered in the context of past TES validation studies, these results illustrate that MUTOP is able to maintain accuracy similar to TES while expanding coverage to the entire GOES-West satellite domain. In addition MUTOP provides six-hour temporal resolution throughout the INTEX-B study period, making the visualization of UT ozone dynamics possible. This paper presents the overall statistical validation as well as a selection of ozonesonde case studies. The case studies illustrate that error may not always represent a lack of TES-derived product skill, but often results from discrepancies driven by observations made in the presence of strong meteorological gradients.


Introduction
Extensive scientific effort has been directed toward accurately characterizing ozone variability in the extratropical upper troposphere (UT); nevertheless, ozone prediction in this region from 500 to 300 hPa is difficult due to the presence of fine-scale filamentary features that shift with weather patterns, and the layer's position as a mixing region between stratospheric and tropospheric reservoirs of air (Gettleman et al., 2011;Bowman et al., 2007;Fairlie et al., 2007, Wernli andSprenger, 2007).The identification of these advective filaments and spiral features in potential vorticity and water vapor fields has long been understood as evidence of conditions favorable for stratosphere-to-troposphere (STT) exchange Published by Copernicus Publications on behalf of the European Geosciences Union.and mesoscale chemical mixing (Appenzeller et al., 1996;Methven and Hoskins, 1998;Stohl andTrickl, 1999, Wimmers et al., 2003).The complexity of these mixing processes and jet stream dynamics in the UT region lead to heightened ozone error in the UT in most chemical transport models.For example, Tarasick et al. (2007) found that two Canadian air quality forecast models (AURAMS and CHRONOS) perform poorly in the UT relative to ozonesonde measurements.Results from their study showed that both models tend to significantly under-estimate UT ozone, with the AURAMS model exhibiting a difference of as much as 80-90 % and the CHRONOS model exhibiting maximum differences near 50 %.Among reasons for this poor model performance, the authors suggest cross-boundary transport, including stratospheric influence, as well as NO x emissions and resulting in situ ozone production from lightning strikes (Cooper et al., 2006(Cooper et al., , 2007)), and sub-grid-scale convective lifting of planetary boundary layer ozone and ozone precursors.Even chemical models using assimilation of satellite column ozone data, such as the RAQMS model, tend to exhibit highest errors in the upper troposphere and lower stratosphere (Pierce et al., 2007).While these chemical models are constantly being improved, interest in accurately capturing the presence and variability of ozone in the UT suggests that new methods, based on satellite observations specific to the upper troposphere, may provide a realistic companion approach.
In order to test the validity of UT ozone measurements made by the Tropospheric Emission Spectrometer (TES) and in an attempt to address the problems with current UT ozone modeling noted above, Felker et al. (2011) developed a Multi-sensor UT Ozone Product (MUTOP).MUTOP is a derived field, an empirical product based on the statistical correlations between TES-observed UT ozone mixing ratios and two quasi-conservative synoptic-dynamic tracers for ozone in the UT: specific humidity (based on the Geostationary Operational Environmental Satellite (GOES) water vapor channel) and potential vorticity (PV) from the Global Forecast System (GFS) model.Blending the advantages of two remote sensing platforms by using GOES along with TES, the MUTOP product provides temporal and spatial coverage similar to a geostationary view, while gaining TES's ability to characterize UT ozone.As such, MUTOP derived product imagery fills an important niche in presenting the broader meteorological context for ozone transport and variability at fixed locations like ozone sounding sites.
This approach, combining observations from more than one satellite platform or sensor, has been successfully employed by other investigators to derive estimates of tropospheric column ozone (Fishman et al;2003;Ziemke, et al., 2006;Schoeberl, 2007;Osterman et al., 2008).Previously, observations from two instruments on Aura, the ozone monitoring instrument (OMI) and the microwave limb sounder (MLS) have been used both a) to evaluate individual events and b) to provide a global climatology of tropospheric and stratospheric columns of ozone (Doughty et al., 2011;Ziemke et al., 2011, respectively).The work of Doughty et al., 2011 also focused on the INTEX-B time period, however they employed a much more complex global model data assimilation technique to derive tropospheric ozone profiles than the simple regression approach employed here.Most recently, Tang and Prather (2011) compared instantaneous ozone observations from four Aura instruments, TES, OMI, MLS, and HIRDLS (the High Resolution Dynamics Limb Sounder) plus coincident ozone sondes, with modeled ozone to address the question of how the stratospheric source affects tropospheric abundance of ozone.They conclude that high-resolution (1 degree by 1 degree) simulation of ozone confirms that stratosphere to troposphere exchange occurs on a spatial scale of a few hundred kilometers and on a time scale as short as hours at a given location.Our results are consistent with this previous work, but it seems useful to establish that the goal of this paper is different.The broader object of MUTOP is to illustrate that observations from the polar orbiting instrument TES when put into context with geostationary observations can in fact reasonably map the variability in a dynamic quantity like UT ozone at the time scale of a few hours, and it could be employed as a forecasting tool used in near-real-time.
An example of the TES-derived MUTOP product is shown in Fig. 1, it illustrates the layer-average volume mixing ratio (VMR) of ozone in the upper troposphere for two specific times, (a) 24 April, 18:00 UTC, and (b) 13 May, 00:00 UTC, with values typically ranging from 40 to 250 ppb.The image product retains the horizontal resolution of the GOES specific humidity fields, with a temporal resolution of 6 h (determined by the assimilation of the GFS temperature fields used to derive GOES specific humidity).The advantage of MUTOP imagery is that it readily depicts the meteorological context of upper-tropospheric ozone enhancement; features like ridges, troughs, cutoff lows, mesoscale streamers and vortex roll-up are all readily identified, and MUTOP animations clearly illustrate the dynamic fluctuation of ozone in the upper troposphere (see supplementary image animation, Felker et al., 2011).
This paper presents MUTOP product validation: it compares the derived multi-sensor ozone product at specific sonde-launch locations against layer-averaged ozonesondes.Results from this work are presented relative to previous, independent validation studies that were based on the TES ozone retrievals themselves.

Previous TES validation efforts and results
Several sources provide specific background information on the TES instrument and its ozone retrieval methodology (Beer, 1992;Bowman et al., 2002;Beer, 2006;Clough et al., 2006).Previous validation of TES ozone retrieval performance was carried out in the form of three major studies (Worden et al., 2007;Nassar et al., 2008;Richards et al., 2008).The first validation study by Worden et al. (2007)  examined the performance of TES version 1 (V001) total column ozone retrievals based on comparison to a limited set of coincident ozonesonde launches.Since there were not a large number of TES overpasses with corresponding sonde launches at the time, the authors were forced to use loose coincidence criteria (observations made within 48 h and 600 km) to allow for a large enough paired dataset.Statistical results were divided into categories by region (northern mid-latitudes, sub-tropics, etc.) and by height (lower troposphere and upper troposphere) in the atmosphere.
Once a larger set of data was available, a similar TES validation study was carried out by Nassar et al. (2008) to examine the performance of revised TES version 2 (V002) column ozone retrievals.With a larger data set available to them, the authors were able to tighten the coincidence criteria to 9 h and 300 km between sonde launches and TES overpass column retrievals.In a test of the effects of non-coincidence, the authors also validated based on an even tighter coincidence criteria of 3 h and 100 km.Separation into categories by region and atmospheric height was similar to the Worden et al. (2007)  for their study was 3 h .In all three of these TES validation studies the TES averaging kernel was applied to the measured validation data to account for differences in vertical resolution (see Worden et al., 2007 for details).

Validation techniques and overall goals
In contrast to previous TES validation studies of total column ozone, the work in this paper focuses on the upper troposphere.The derived field of upper tropospheric ozone, MU-TOP, represents the variation in layer-average ozone from the 300-500 hPa region.To assess the realism of these empirical UT estimates of ozone, we compare the TES-derived product, MUTOP, to ozonesonde measurements.Statistical results quantify the general ability of MUTOP to accurately represent ozone fluctuations across a large spatial domain.Beyond statistical analysis of the product's overall performance, a secondary goal is to present individual MUTOP and ozonesonde comparisons in case study format in order to illustrate the meteorological context for both good and poor agreement between product and sonde.This allows for differentiating between potential sampling error and actual product skill error.Sampling error, in this context, refers to noncoincidence of measurements in space, time, or both.We find these errors are associated with strong meteorological gradients in the vicinity of sounding sites at the time of in situ observations.

Multi-sensor UT ozone product (MUTOP)
The multi-sensor upper tropospheric ozone product (MU-TOP) is derived from the multiple regression of TESobserved UT ozone mixing ratios (based on TES V002), against GOES Layer Average Specific Humidity (GLASH) brightness values, and Global Forecast System (GFS) modeled potential vorticity (PV).Felker et al., 2011, showed that despite an inverse correlation between specific humidity and PV, collinearity did not destabilize the regression, and the two variables provided complimentary power, with GOES specific humidity explaining more of the TES ozone variance in lower PV air, while PV explained more of the variance in TES ozone in extremely dry air.The strength of the overall relationship supports the assumption that UT ozone mixing ratios should be enhanced in regions of atmospheric aridity (low specific humidity) and high PV as a result of dynamical processes associated with STT exchange.The regression results (Table 1, Felker et al., 2011) were used as a statistical retrieval of MUTOP for the entire GOES West domain.MUTOP product fields are available between 16 April 2006 and 16 May 2006 at 6 h intervals (00:00, 06:00, 12:00, and 18:00 UTC); to view a multi-day animation of the MU-TOP imagery, see the supplementary material in Felker et al. (2011).
Table 1.Vertical weighting function applied to ozonesondes for comparison to MUTOP, based on the GOES specific humidity contribution weighting function (layer average is simplistically defined as the sum of the ozone volume mixing ratio at each pressure level multiplied by the contribution weight; there is no ozonesonde contribution from below 500 or above 300 hPa).with ozone to generate a current proportional to the amount of ozone passing through the instrument chamber (Komhyr, 1986;Komhyr et al., 1995).Past investigations have revealed that ozonesondes of this type have a precision of about 5 % and accuracy of about 10 % in the troposphere (Smit et al., 2007;Deshler et al., 2008;Tarasick and Slater, 2008).A typical sonde ascent rate is about 4-5 m s −1 , with measurements made approximately every 10 s during ascent.

Matching ozonesondes to MUTOP estimates
In order to validate MUTOP against in situ ozonesonde measurements, it was first necessary to layer-average the ozonesonde profiles in the same manner as the TES-observed ozone profiles used to derive MUTOP.This results in a layeraverage ozone value for the region from 500 to 300 hPa.It is based on a vertical weighting function that matches the GOES water vapor channel contribution weighting function, with a maximum weight coming from near the center of the layer, around 400 hPa.Table 1 illustrates the weights used for each layer.Overall there were 127 ozonesonde profiles from 11 launch sites over the GOES-West domain during the INTEX-B campaign that were used in this validation study.With the exception of Hilo, Hawaii, all of the sonde launch sites used were located within the continental United States or Canada.To match each individual layer-average ozone volume mixing ratio (VMR) measurement from a sonde to the most coincident MUTOP estimate, the closest product pixel to the latitude and longitude of the ozonesonde launch site was used.However, it is important to note that ozonesondes are not spatially-fixed column measurements; measurements are made along the trajectory of the ascending balloon based on actual wind patterns.The impacts of this will be discussed in a set of case studies.With respect to temporal coincidence, all sonde launches were kept in the data set in order to provide a large enough sample for statistical analysis.Since MUTOP was created at 6 hour intervals, the maximum possible time separation between sonde launch and the most coincident MUTOP image was 3 h.

Evaluations of MUTOP performance
The performance of the TES-derived UT ozone product was determined based on its error and bias relative to coincident ozonesonde measurements.Time series plots are presented in order to demonstrate overall product performance with respect to capturing the timing of UT ozone variations at each individual ozonesonde launch site.
First, based on numerical comparison of TES-derived MUTOP to coincident sonde-derived UT ozone measurements, the mean absolute error (MAE), root mean square error (RMSE), and overall UT ozone product bias, were determined and are reported on a site-by-site basis for each sounding location.These same statistical error and bias values are also reported for the entire dataset along with the overall correlation.Second, a series of case studies were carried out to evaluate potential causes or reasons for individual errors and in order to examine the product's strengths and weaknesses.We have identified specific cases where MUTOP significantly over-predicts or under-predicts the sonde-derived estimate, and in both instances we illustrate the meteorological conditions that appear to explain the mismatch.These cases are identified as sampling error; they occur with synoptic situations that produce strong gradients, conditions under which the multi-sensor product may represent a different air mass from that which the ozonesonde sampled.

Time series evaluations
Plots of time series comparisons between ozonesondemeasured layer-average UT ozone and multi-sensor estimates of layer-average UT ozone are provided in Fig. 3 for the sounding sites with more than ten launches during the INTEX-B study period.These generally illustrate good product skill at reproducing the site specific timing and magnitude of variations in UT ozone.Overall these figures suggest that MUTOP captures ozone variability in the UT fairly accurately.For example, the MUTOP results for Kelowna, British Columbia track the synoptic/dynamic response of the ozonesondes going from ∼150 ppb down to ∼60 ppb, and then later capture two more spikes over 150 ppb as the UT responded to the passage of upper level troughs.At Bratt's Lake, the product tracks the gradual increase in ozone, while at Trinidad Head, California the product, like the ozonesondes, did not observe as much dynamic range in ozone.In the case study section, it is shown that for some instances of significant under-prediction or over-prediction by MUTOP, the lack of agreement between the TES-derived product and the ozonesondes reflects the influence of highly variable meteorological conditions on ozone.

Statistical validation
Results are compiled in Table 2. Overall, MUTOP displayed a mean absolute error (MAE) of 12.2 ppbv and a root mean square error (RMSE) of 16.4 ppbv relative to ozonesonde measurements.Generally, TES-derived MUTOP was biased high relative to sondes, 4.3 ppbv ± 15.9 ppbv.Specific average values, errors, and biases are also listed for each sounding site.This tabulation shows that average ozone mixing ratios and variability in UT ozone are site dependent.Nevertheless, biases and errors are fairly consistent between sites, with almost all sites (except Valparaiso, Indiana, discussed below) showing MUTOP to be biased high.Figure 4a shows the overall correlation between ozonesonde-measured layeraverage UT ozone and the TES-derived MUTOP estimates; the correlation coefficient is 0.824, indicating that MUTOP accounts for ∼68 % of the observed variability in UT ozone over this domain.Figure 4b, the frequency distribution of error, illustrates the error distribution.There are 23 soundings with errors that are more than one standard deviation away from the mean.An analysis of the MUTOP imagery for each of these days illustrates that this level of mismatch always occurs in the vicinity of strong meteorological gradients in MUTOP.Several of these represent instances when soundings were launched just ahead of or just behind a transient feature like a streamer or a cutoff low.Examples of these conditions are presented as case studies.

Error and bias comparisons
A comparison of these statistical results to the outcomes from previous TES validation studies (Table 3), shows that MUTOP performs comparably to TES itself, with similar errors and biases.Multi-sensor UT ozone product RMSE was 16.4 ppbv.These results are very similar to those found in Nassar et al. (2008) for TES performance in the Northern Hemisphere (NH) mid-latitude UT.In the Nassar et al. study, in which the authors were validating TES retrievals (V002) directly against coincident (to within 9 h and 300 km) ozonesonde profiles, they found an overall NH mid-latitude UT TES RMSE of 17.8 ppbv and a NH mid-latitude Spring UT TES RMSE of 19.2 ppbv.This suggests that the derived multi-sensor UT ozone product being validated in this study has equivalent or slightly better skill at predicting layeraverage UT ozone mixing ratios as compared to TES itself, while providing spatial-synoptic coverage far beyond what is available from individual TES overpasses.
Bias estimates were also very similar, with NH midlatitude bias of 5.9 ppbv ± 17.8 ppbv and NH mid-latitude spring season bias of 8.3 ± 19.2 ppbv in Nassar et al. as compared to 4.3ppbv ± 15.9 ppbv within the GOES-West domain in this study.While these results from Nassar et al. are exclusively for the NH mid-latitudes, this study includes one site in the NH sub-tropics (Hilo, HI), the remainder of profiles used in this study were from mid-latitude locations.
The comparability of these results is rather encouraging given the different approach they represent.This study validates the TES-derived multi-sensor UT ozone product (MU-TOP), while in Nassar et al. (2008), the authors were validating TES itself by applying the TES averaging kernel to the ozonesonde profiles.However, the similarity of these results does suggest that the MUTOP product is robust and furthers the idea that TES may have an overall positive UT ozone bias.MUTOP provides a relatively good representation of layer-averaged TES retrievals in the UT at a spatial scale and resolution which has not been previously available.
In the TES validation study by Richards et al. (2008), aircraft DIAL and in situ FASTOZ ozone measurements were used for comparison to TES V002 retrievals.They were also dealing with the same INTEX period as is used in this study.For flights out of Hawaii, Richards et al. ( 2008) found a TES UT (500 to 300 hPa) ozone bias and error of 3.11 ppbv ± 13.65 ppbv.For flights out of Anchorage, Alaska, they found a TES UT (500 to 300 hPa) ozone bias and error of 9.05 ppbv ± 25.33 ppbv.Again these results suggest that TES is generally over-estimating UT ozone and illustrate that the derived multi-sensor UT ozone product provides comparable accuracy while allowing for much greater spatial and temporal coverage.
In a rather different ozone assessment, Ziemke et al. (2006Ziemke et al. ( , 2011)), used a 2-D interpolation of stratospheric ozone from the microwave limb sounder (MLS) to derive fields of stratospheric column and they used ozone monitoring instrument (OMI) observations to derive total column ozone.The difference between these quantities was derived as the  .33.DU in the troposphere, relative to sondes), and found that much of the error could be associated with meteorological transport and the specific dynamics of indi-vidual events.Given the comparable nature of results generated in this paper to the previous work, the advantage of MUTOP is the relative ease and accuracy of our simple regression approach based on TES and upper-tropospheric water vapor, which provides the ability to observe fine-scaled features and to display the temporal evolution of UT ozone.

Case study validation
In this section, several specific ozonesonde measurements and corresponding MUTOP estimates have been examined with respect to the synoptic-dynamical situation at the approximate time of sonde launch.The goal here was to examine under what conditions the MUTOP statistical retrieval has high/low predictive skill.We also identify potential reasons for error in situations of poor predictive skill, defined  as events with errors (MUTOP-ozonesonde) greater than 1.5 standard deviations from the mean.To generalize, 8 events that had error greater than +1.5 standard deviation from the mean error, of these 7 were associated with soundings launched on the edge of a strong gradient in MUTOP where the sonde does not appear to have sampled the meteorological feature, e.g., a dry air, high PV streamer, or a cutoff low.There were 9 events that had errors greater than −1.5 standard deviation from the mean, and 5 of these were associated with soundings launched into a cutoff low indicative of a low tropopause, so the sounding observed air with a strong stratospheric signature.Examples of meteorological conditions associated with these types of extreme errors are discussed below; the discussion often includes reference to events that exhibit very little error immediately preceding or following these outlier cases.

Kelowna, British Columbia -21-22 April 2006
On April 21 at 00UTC, there was very good agreement between the TES-derived MUTOP value of 65 ppbv and the Kelowna ozonesonde observation of 69 ppbv taken at 23:16 UTC on 20 April (Julian Day (JD) 110), within 45 min of the time shown in the MUTOP image.This value is representative of a broad region of UT ozone in the range of 60-70 ppb in the continental Pacific Northwest Region.However, within 18 h , (by 18:00 UTC on 21 April, JD 111) a dry air streamer positioned off the NW coast had advanced to a point just west of Kelowna (Fig. 5a), and by 00:00 UTC on 22 April, the leading edge of the dry air streamer, and its associated ozone enhancement, were positioned over the sounding location (Fig. 5b).The difference between these two MUTOP images shows the rapid eastward propagation of the streamer feature and the very strong UT ozone gradient along its leading edge.On 21 April at 18;00 UTC the multi-sensor ozone product shows the layer-average UT ozone above Kelowna to be ∼70-80 ppbv, while 6 h later at 00:00 UTC on the 22nd, the product estimates an ozone volume mixing ratio of 110 ppbv.However, the actual sounding was made in between the time of these two MUTOP images, it was launched at 23:16 UTC on 21 April, 44 minutes before the MUTOP image shown in Fig. 5b.It is apparent from the actual sounding information that the ozonesonde passed through the very eastern edge of a dry air streamer feature in the UT.Significant changes in UT moisture and ozone can be seen in the ozonesonde profiles in Fig. 5c, with a shift from a moist UT the day before with an ozone mixing ratios of ∼69 ppbv (black lines) to a much drier UT with stronger winds (60 to 70 knots), a lower tropopause, and layer-average UT ozone of ∼80 ppbv (blue lines).However, the most significant ozonesonde differences are observed in the region from 300 to 200 hPa, above the UT region of interest captured by MUTOP.At first glance, the contrast of MUTOP 110 ppbv versus sonde 79 ppbv could be considered pure product error, but differences in air masses being sampled must also be considered.Since the sonde launch was in an area of strong moisture and ozone gradients in the UT, simply the time difference between the sounding launch and the MUTOP product will cause the two methods to sample different air masses in the UT.In fact, looking at the wind speeds and directions throughout the ozonesonde flight up to 300 hPa, it is evident that the sonde balloon was pushed north-northeastward with 50 to 80 knot winds and remained out ahead of the UT ozone enhancement (Fig. 5b).Hodograph analysis (not shown) puts the balloon approximately 60 km NNE of the sonde location, this combined with consideration of the rapid movement of the streamer suggests that a combination of these two meteorological factors could have contributed considerably to the 30 ppbv difference in the estimate of ozone volume mixing ratio between MUTOP and the sounding.This suggests that the lack of correspondence between the TES-derived product and the sonde may be driven by time mismatch in the presence of strong meteorological gradients, and therefore may not represent product error in MUTOP.Furthermore, this type of event suggests that sondes launched into an environment ahead of an upper-level feature like this will necessarily sample a column of air ahead of the advancing gradient.A more thorough analysis of sondes versus satellites (or sondes versus model analyses) would account for the integrated motion of the balloon, a non-trivial vertical integration given the fact that MUTOP represents a one-dimensional layer average of the upper troposphere.Here, we do not solve this problem, but seek to identify this as a source of error.

Richland, Washington -23 April 2006
Considerable MUTOP error observed relative to the ozonesonde launch from Richland, Washington at 22:50 UTC on 23 April (JD 113) arose under conditions similar to the former case in British Columbia.In the Richland case, the ozonesonde was launched on the western edge of this same evolving dry air streamer in the UT.For this time and location, the balloon was launched along the trailing edge of a corresponding region of stronger gradients in UT moisture and ozone.For 24 April at 00:00 UTC, the multi-sensor ozone product shows that the sonde launch site was on the southwestern edge of the dry air feature and that there was less ozone-rich UT air to the south and west of Richland (Fig. 6a.).The multi-sensor product predicted a layer-average UT ozone mixing ratio of 122 ppbv while the ozonesonde launched from Richland observed 79 ppbv.In terms of ozone volume mixing ratio magnitude and error, this is very similar to the previous case from Kelowna, BC.
In an attempt to understand the sources of error in this case, ozonesonde profile data and radiosonde profile data from two nearby stations at 00:00 UTC on the 24th were used along with the multi-sensor UT ozone product image (Fig. 6a).The ozonesonde profile from 22:50 UTC shows gradually-increasing tropospheric ozone values from the surface to 300 hPa, with a larger ozone enhancement right above 300 hPa, just below the tropopause near 260 hPa (Fig. 6b).However, since this enhancement is above the 300 hPa level, it is not included in the sonde average by the layer-averaging scheme used in this study.Winds were not available for the Richland ozonesonde.However, Fig. 6c shows two separate radiosonde profiles from 00:00 UTC on the 24 April.The Spokane, Washington profile (in black) shows the thermodynamic structure and wind profile of the atmosphere on the western edge of the dry air streamer while the Great Falls, Montana profile (in blue) shows the air mass differences on the eastern edge of the streamer feature.(The location of the Spokane and Great Falls sounding sites are shown in Fig. 6a as black and blue crosses, respectively.)One can see the marked wind shift in the UT region from a northeasterly jet on the west side to a southwesterly jet on the east side of this upper-level feature.These NE winds on the western flank of the high ozone, low specific humidity streamer feature would push the sonde balloon about 50 km SW, toward lower UT ozone values of ∼80-90 ppbv.As with the last case, the additional radiosonde data provide evidence that sampling error may be an issue in high winds and strong ozone gradient regions in the UT, particularly when observed winds act to keep the balloon ahead of the advancing MUTOP-predicted UT ozone enhancements.
By the next day, 24 April, JD114, there was excellent agreement (as seen in Fig. 3, the time series for Note the dewpoint sensor was not reporting for this sonde, and there were no winds available, but the UT layer average ozone was 79 ppbv.(c) Two additional Skew-T plots show radiosonde profiles of temperature, dewpoint temperature and column wind for Spokane, WA (solid black, black dash, and black wind barbs),and Great Falls, MT (solid blue, blue dash and blue wind barb) for 24 April at 00:00 UTC, on either side of the UT dry-air streamer feature.Note the dramatic change in UT wind direction between these two stations.
Richland) between the ozonesonde (138 ppbv) and MUTOP (140 ppbv).The broadened streamer was located over the sounding site, the feature having continued to elongate and advect retrograde (from the NE to SW), moving over the Richland sounding location.In this location the balloon was far more likely to sample the MUTOP feature.Figure 1a, which was used to illustrate a full MUTOP image, is from 24 April, 18:00 UTC, the sonde launch time on JD 114 and it clearly shows the streamer forming an upper level cut-off low over the vicinity of Richland.This is confirmed in upper-air maps for the time (not shown).

Valparaiso, Indiana -April 22 2006
The previous two case studies examined situations when MUTOP over-predicted UT ozone relative to corresponding ozonesondes because the sondes appear to have sampled different air masses due to a combination of temporal separation and sonde drift in high wind regions along the edge of strong UT ozone gradients.These conditions kept the ozonesonde from observing the highest ozone seen in the TES-derived images.In this next case, we illustrate that strong winds may also enhance the ozone the sonde observes in the upper troposphere.As shown in Fig. 3, sondes launched from Valparaiso, Indiana, on 21 and 23 April, found very good agreement between the MUTOP product and the sonde measurements.However, on April 22, JD 112, the one mismatch, a serious MUTOP under prediction (135 ppbv, Fig. 7a) of the sonde observed ozone value (202 ppbv, Fig. 7b) drives the overall negative bias (−7.2 ppbv, Table 2) observed for Valparaiso.This sonde launch occurred on the southern edge of a deepening cutoff low feature, under an upper level jet with strong vertical wind shear.Winds in the layer of high ozone were 50 to 70 knots, from the WSW, and had the potential to advect the sonde ∼80 km ENE, further into the cutoff low.Perhaps even more relevant is the fact that the MU-TOP image is from 18:00 UTC, and the balloon, launched at 19:00 UTC, would not be expected to ascend to the UT level until about 19:30 UTC.However, the next MUTOP image, from 00:00 UTC, indicates there was indeed an increasing amount of ozone in the base of the cutoff low, with the highest ozone increasing from 160 ppb to 200 ppb in three hours.The combination of the ozone increasing with time in the cut-off low, and the winds advecting the balloon deeper into the upper-level feature suggest that the closest MUTOP image underestimated the amount of ozone the sonde would encounter during its ascent.This again demonstrates the error is influenced by timing and the dynamic meteorological conditions.

Edmonton, Alberta -May 10, 2006
The case study from Edmonton, Alberta, Canada demonstrates a synoptic situation in which the multi-sensor UT ozone product appears to have accurately characterized layer-average UT ozone (Fig. 8a).The ozonesonde launch took place at 11:19 UTC on 10 May and the MUTOP image is from 12:00 UTC on 10 May, JD 130.As can be seen from the MUTOP image, the sonde launch took place at a time when a filamentary streamer feature in the UT led to an enhancement of ozone in the layer.In this case, the product estimated an UT layer-average ozone volume mixing ratio of 102 ppbv, while the ozonesonde measured 99 ppbv.
This ozonesonde launch took place at a time when there was a strong UT ozone gradient around Edmonton, conditions which contributed to high product error in other cases we have shown; but the MUTOP performed very well here.The reasons for this become more apparent after examining the time coincidence and the ozonesonde profile and column winds (Fig. 8).In this case, time match is excellent, especially after considering sonde ascent time to the 500-300 hPa layer.Also, the sonde observed relatively light and variable wind velocities up to 300 hPa, suggesting that sonde drift was negligible (∼20 km ESE, within the same upper level feature) and that the sonde and product should have observed the same UT air mass despite the strong gradient in ozone.Figure 8b shows an ozone enhancement around 300 hPa consistent with tropopause folding, which seems to have been well-characterized in the GFS PV field as well; upper-tropospheric potential vorticity (PV) in this region was 1.7 pv units, not shown.

Kelowna, British Columbia -12 May 2006
The final case study examines product performance under a situation of an ozone extreme.MUTOP shows a region of high ozone (Fig. 9a); a sequence of weather maps (not shown) illustrate that this feature is a short wave travelling along the periphery of a larger trough over the Pacific.This shortwave feature was captured by the Kelowna ozonesonde, seen as a spike in Fig. 3 (Kelowna time series, Julian Day 132), but was gone by the time of the next sounding, having rotated northward.The ozonesonde launched from Kelowna, BC on 12 May, JD 132, measured the highest layer-average UT ozone of all the sondes used in this MUTOP validation.The ozonesonde was launched at 23:16 UTC into the upper level short-wave trough over the Pacific Northwest coast, emanating from the eastern side of a much larger upper-level low which extended over the North Pacific ocean (the image for this time was presented in Fig. 1b).The ozonesonde observed a layer-average UT ozone volume mixing ratio of 232 ppbv, while MUTOP from 13 May at 00:00 UTC estimated 203 ppbv.The ozonesonde profile (Fig. 9b), as well as the very high layer-average PV value (7.0 PV units, not shown), indicate that the layer-averaging in this case mostly represents the lower stratosphere (the stratosphere begins at ∼450 hPa according to the thermodynamic profile of the ozonesonde).Since the time separation between ozonesonde launch and MUTOP is fairly small (45 min) and the UT ozone enhancement is fairly broad spatially (at the time of the MUTOP image, the balloon would be ∼50 km east of the sounding site) it is likely that the error here is more related to the MUTOP ozone retrieval skill in the lower stratosphere.Based on the linear fit of observations in the work reported here (Fig. 4a) and by Nassar et al. (2008, Fig. 3, Northern Midlatitudes), in spite of the overall positive bias in the data, MUTOP (and TES) will underpredict extreme ozone values (greater than 150 ppbv), suggesting that these methods will under-predict ozone in the lower stratosphere.In fact, previous TES versus sonde validations by Nassar et al. (2008) truncated ozonesondes at the thermal tropopause, while Richards et al. ( 2008) noted a negative bias for profiles influenced by a low troposphere and greater stratospheric contributions.This could obviously have influenced the negative bias observed for Valparaiso, as the previous case study discussed an event with significant stratospheric enhancement of the regions we have defined as the UT (300 to 500 hPa).Based on a truncated UT layer analysis like Nassar, the ozonesonde layer-average VMR for this Kelowna event would be ∼180 ppbv (including only the layer from 500 to 450 hPa), rather than 232 ppbv which includes the layer from 500 to 300 hPa.Given that we have retained the full UT layer in averaging the sonde, we would expect that MUTOP will under-predict the observed ozone in events with a low tropopause.
The MUTOP error in this specific case relative to the ozonesonde measurement is approximately 12.5 %, which is not unreasonable for such an extreme case, and is within the 5-15 % error range reported by Richards et al. (2008).

Possible reasons for observed error
As demonstrated by the overall correlation, the time series plots by station, and the selection of case studies, while MU-TOP performs fairly well overall in estimating actual UT ozone, there are instances in which large errors were observed.However, this does not mean that MUTOP is wrong, but is does suggest our features are displaced from the features captured by the balloon sounding.This mismatch over short time and space scales has been largely ignored in previous validation work, although Doughty et al., 2011 reported two case studies where dynamic conditions seem to drive error between satellite based predictions and ozonesonde observations.In the case studies, we consider reasons for error on the basis of the meteorology.It is very possible that interpolating MUTOP to the exact time and place of the ozonesonde (when it reaches the UT 500-300 hPa layer) using the feature preserving morphing tools developed by Wimmers and Velden (2011) for advecting microwave imagery, and accounting for sonde displacement, would reduce the number of extreme errors (MUTOP overprediction errors greater than +1.5 standard deviation), since 7 out of 8 (or ∼88 %) of these extreme errors were associated with strong meteorological gradients.These efforts are beyond the scope of this study, however, they might prove worthy of further research.
It is important to acknowledge that error can be introduced into MUTOP at several different steps.First, there can be error in the actual products that the multi-sensor estimate is based on.Since MUTOP depends on correlations of TESobserved UT ozone, GOES derived specific humidity (in the form of GLASH brightness values), and GFS-modeled potential vorticity, error can be contributed from any of these three product components.
TES V002 UT ozone error has already been evaluated in two papers (Nassar et al., 2008;Richards et al., 2008).TES error can be related to problems in a-priori estimates, retrieval methodologies, and lack of sensitivity (Worden et al., 2004;Bowman et al., 2006).
GOES layer average specific humidity (GLASH) retrieval error can be generated in two main forms: poor modeling of UT temperature (which is used to derive specific humidity from the GOES 6.7 micron water vapor channel) or moisture saturation (Wimmers and Moody, 2001).In the second problem, the satellite retrieval is saturated by moisture in the upper layers of the UT and is unable to observe potentially dry intrusions beneath, thus overestimating UT layer-average specific humidity between 500 and 300 hPa.It should also be noted that the GLASH product performance diminishes at very high GOES satellite zenith angles due to increased path length through the atmosphere (Wimmers and Moody, 2001), which will also result in an overestimate of layer average specific humidity.This issue could impact the sites farthest away from the GOES west nadir point (135 • W longitude, 0 • N latitude), which in this validation study, were Walsingham, Valparaiso, and Egbert.Table 2 shows these sites do have the largest RMSE.
Finally, GFS-modeled PV error can be generated if either UT temperature or UT wind fields are poorly modeled.While estimates of atmospheric temperature are fairly good in modern numerical models, wind fields can still exhibit error in magnitude, direction, and spatiotemporal placement, which has the potential to mis-locate MUTOP features.This is another factor that has the potential to produce errors in the vicinity of strong gradients.

Ideas for further validation
While the results of this validation study are promising for operational use of MUTOP, further validation should characterize the product's performance in other regions of the geostationary domain (GOES East, Meteosat, etc.) and during different seasons and years.The current validation effort has only examined MUTOP skill during the spring season, corresponding to the time frame and domain of the INTEX-B field campaign.The statistical retrieval is based on empirical relationships based on data from the spring; therefore it is possible that the MUTOP in its current form may not perform as well in other seasons.Since Mid-latitude stratospheretroposphere exchange is generally considered to be heightened in the spring (Appenzeller et al., 1996), the TES derived relationships of the multi-sensor product may tend to over-predict layer-average UT ozone in other seasons.
Another limitation of the current validation is that the comparisons are based on only one year (2006) of data, and therefore cannot account for any inter-annual variability in the UT ozone relationships with PV and upper level (UL) aridity.To test this, validation against ozonesondes or other UT ozone measurements from other years could also be carried out in the future.Ozonesonde launches provide validation sources for a spatially limited portion of the GOES-West domain, and only included three ozonesondes from Hilo, Hawaii.Since a large database of airborne Differential Absorption LiDAR (DIAL) ozone measurements is available from flights during the INTEX-B campaign, it could be useful to compare profile measurements from these flights to coincident MUTOP estimates, analogous to Richards et al. (2008).This would extend validation over more of the North Pacific.Finally, detailed temporal interpolation would improve product/sonde timing issues, and hodograph analysis of individual sondes could improve the spatial coincidence issues.
It is worth noting that for this one season analyzed, we do observe a relatively consistent enhancement of ozone over the subtropical Pacific between 20 and 35 • N (see the region of green indicating 80 to 120 ppbv in Fig. 1, or review the image animation of Felker et al, 2011).Although this product represents the layer average ozone from 300 to 500 hPa, these enhancements may reflect the influence of stratosphere- to-troposphere (STT) transfer of ozone along the subtropical jet, as was recently analyzed and reported by Trickl et al. (2011), andManney et al. (2011).These are considerably weaker than the enhancements associated with the troughs and meridional streamers associated with the dynamics of the polar jet stream that are traceable with UT ozone mixing ratios of 120 to over 200 ppbv.The intention of this research was to create and validate a near real-time product of remotely-sensed UT ozone based on extrapolating TES observations to the GOES domain.Results presented here suggest a MUTOP-like product used in near real-time, enhanced with feature preserving temporal interpolation (morphing) has the potential to provide a continuous high resolution characterization of ozone presence and variability in the UT, and could be used for ongoing validation.This work demonstrates that a statistical retrieval of upper troposphere ozone based on GOES and GFS PV in near real-time could provides a unique assessment of dynamically forced UT ozone at high spatial and temporal resolution, providing a dynamical context for on-going ozonesonde networks, and might prove useful as a remotely sensed index of upper level frontal activity like the PVI index of Cai (2003).

Fig. 1 .
Fig. 1.These sample MUTOP images map the TES-derived multi-sensor upper-tropospheric ozone product which characterizes the UT layer-average (300-500 hPa) volume mixing ratio of ozone in ppbv over the GOES west domain domain for two dates, (a) at 18:00 UTC on 24 April 2006 and (a) at 00:00 UTC on 13 May 2006.
study.While the former two studies focused on comparison of TES column retrievals of ozone to ozonesonde measurements, the third validation study by Richards et al. (2008) used airborne Differential Absorption LiDAR (DIAL) measurements from the Intercontinental Chemical Transport Experiment Phase B (INTEX-B) field campaign for validation of TES column ozone.The comparisons used in their study were all from flight legs over the North Pacific, out of Anchorage, Alaska and Hawaii during INTEX/B, the period from mid-April to mid-May 2006.The coincidence criterion

Fig. 2 .
Fig. 2. Ozonesonde launch site locations that lie within the GOES-West domain and that are used for validation purposes in this study overlaid upon a MUTOP image product for 22 April 2006 at 00:00 UTC.The grayscale provides layer-average UT ozone mixing ratios in ppbv, with dark shades representative of low ozone presence in the UT and light shades representative of elevated ozone presence in the UT.

Fig. 3 .
Fig. 3. Time series of MUTOP layer-average UT ozone VMR estimates (circles with grey dashed lines and RMS error bars) and corresponding ozonesonde layer-average UT ozone VMR measurements (squares and black solid lines) for the six ozonesonde stations with more than ten launches during the INTEX-B study period.
Fig. 4. (a).Correlation between MUTOP-estimated layer-average UT ozone and ozonesonde-measured layer-average UT ozone, oneto-one line (dashed), best fit line (solid): r = 0.824; MUTOP explains 68 % of observed variability.b) Frequency distribution of error, defined as the predicted (MUTOP) minus the observed (ozonesonde) with the average denoted by the solid line and the standard deviation by the dashed lines; shows the overall tendency for the TES derived image product to over predict the observed ozone by about 4 ppbv.

Fig. 5 .
Fig. 5. MUTOP versus the Kelowna, BC ozonesonde launch on 21 April 2006.The two zoomed MUTOP images from (a) 21 April at 18:00 UTC and (b) 22 April at 00:00 UTC show the rapid increase of the UT ozone over six hours associated with the advancement of the dry-air streamer.The color bars represent layer-average UT ozone VMR in ppbv and the black stars mark the location of Kelowna, BC, where the MUTOP estimated value is shown in the box for each time.(c) The Skew-T Log-P diagram presents vertical profiles of ozone, temperature, dewpoint temperature, and column winds for the previous day, 20 April (gray crosses, black line, black dash line and black wind barbs respectively) and the 21 April(light blue crosses, blue line, blue dash line, and blue wind barbs) ozonesonde launch; the UT ozonesonde average mixing ratio is shown in the box for each time, respectively.
Fig. 6.(a) The zoomed MUTOP image from 24 April at 00:00 UTC shows the location of the Richland, Washington sonde launch site (black star) relative to the dry-air streamer ozone enhancement in the UT.The color bar represents layer-average UT ozone VMR in ppbv.The black and blue crosses show the location of the Spokane, Washington and Great Falls, Montana radiosonde sites.(b) Skew-T plot of the Richland ozonesonde from 22:50 UTC, 23 April 2006, launched 70 min before the preceding MUTOP image.Note the dewpoint sensor was not reporting for this sonde, and there were no winds available, but the UT layer average ozone was 79 ppbv.(c) Two additional Skew-T plots show radiosonde profiles of temperature, dewpoint temperature and column wind for Spokane, WA (solid black, black dash, and black wind barbs),and Great Falls, MT (solid blue, blue dash and blue wind barb) for 24 April at 00:00 UTC, on either side of the UT dry-air streamer feature.Note the dramatic change in UT wind direction between these two stations.
Fig. 7. (a).The zoomed MUTOP image from 18:00 UTC, 22 April shows the location of the Valparaiso ozonesonde site (black star) on the edge of the advancing upper level cutoff low over the Great Lakes region, at the base of the cut-off low and in the vicinity of strong vertical wind shear.The color bar represents estimated layer-average UT ozone VMR in ppbv, and indicates MUTOP at 18:00 UTC was 135 ppbv.(b) The ozonesonde Skew-T plot is from 19:00 UTC, 22 April 2006 and displays c temperature (black solid), dew point (black dashed) and ozone VMR (blue dashed).Note the extremely enhanced ozone captured by the sonde between 500-300 hPa, the layer-average ozone was 202 ppbv.

Fig. 8 .Fig. 9 .
Fig. 8. Images from Edmonton ozonesonde launch on 10 May 2006.(a) The zoomed MUTOP image shows the location of the Edmonton ozonesonde site (black star) within a dry-air streamer UT ozone enhancement.The color bar represents estimated layer-average UT ozone VMR in ppbv, indicating a value of 102 ppbv for Edmonton.(b) Skew-T plot from the 10 May ozonesonde launch displays column temperature (black solid), dewpoint temperature (black dashed), and ozone VMR (blue dashed), indicating a layer average of 99 ppbv.Note the ozone enhancement between 400 and 300 hPa from tropopause folding is well-captured by coincident MUTOP, and column winds are weak and variable up the 300 hPa, suggesting limited sonde drift.

Table 2 .
Statistical results of MUTOP ozonesonde validation by sonde launch site and for all sites combined (N = 127), bias is calculated as MUTOP (TES) minus ozonesonde.

Table 3 .
Comparison of study results to past TES validation study results.It was noted that the method works well as long as horizontal gradients in the stratospheric column are relatively small.