A Tropospheric Emission Spectrometer HDO/H 2 O retrieval simulator for climate models

. Retrievals of the isotopic composition of water vapor from the Aura Tropospheric Emission Spectrometer (TES) have unique value in constraining moist processes in climate models. Accurate comparison between simulated and retrieved values requires that model proﬁles that would be poorly retrieved are excluded, and that an instrument operator be applied to the remaining proﬁles. Typically, this is done by sampling model output at satellite measurement points and using the quality ﬂags and averaging kernels from individual retrievals at speciﬁc


Introduction
In order to usefully compare model predictions against satellite measurements, various features of the retrieval must be taken into account. For retrievals of trace-gas profiles based on optimal estimation, these are: the effects of the satellite's orbital path, varying retrieval sensitivity under different atmospheric conditions, limited vertical resolution, and contributions from prior constraint profiles. This involves excluding profiles that would be poorly retrieved, and, for the profiles remaining, applying an instrument operator to the raw model profiles. This transforms the raw model fields of interest into what would be seen by the instrument. By comparing the modified profiles against the satellite retrievals, genuine model errors can be more readily identified.
The vertical sensitivity of each retrieval to the true vertical profile is represented by an averaging kernel, which depends on factors such as cloud cover and surface temperature. In applying the instrument operator to the model field, the choice of quality filtering, prior and averaging kernels should be as specific as possible to the model conditions at each time and location. Under the presence of thick clouds, for instance, infrared retrievals are typically of poor quality and excluded from any analysis of the satellite data; the same filter needs to be applied to the model data in these conditions. This is also true for averaging kernel structure. For a high quality retrieval over low clouds, the peak retrieval sensitivity will be at a greater height than for clear sky conditions, all other factors being equal.
Suitable quality filtering and averaging kernel selection is commonly assumed to be achieved by sampling the model fields along the orbital path of the satellite and using information from individual retrievals. The assumption underlying this approach is that the modeled meteorological conditions influencing retrieval sensitivity and averaging kernel structure are in good agreement with those viewed by the instrument. However, persistent differences between the observed and modeled clouds, for example, would lead to unsuitable quality filtering, averaging kernel selection, and possibly inaccurate diagnostics. When the quality filtering and averaging kernels selection are poor, differences between the satellite and the model for the quantity of interest cannot be attributed solely to model error, which is the goal, but also to this poor selection, defeating the purpose of applying the instrument operator. Selection error will increase with fewer constraints on the modeled meteorology. It is presumably smaller for chemical transport models (CTMs) with fully-prescribed, assimilated meteorology, and increases for coupled chemistry-climate models with nudged meteorological components such as horizontal winds. For free-running simulations, there is no expectation that the modeled and instrument-measured meteorological fields agree at short time scales. To the best of our knowledge, however, the effect of errors in the meteorology (e.g. clouds) on retrieval quality filtering and averaging kernel selection has not been assessed in any of these cases.
Our interest is in retrievals of the deuterium composition of water vapor (HDO) from the Tropospheric Emission Spectrometer (TES). These data have unique potential value in understanding moist processes in the atmosphere (Sherwood et al., 2010), and for our purposes, in constraining cloud physics parameterizations. For this purpose, perturbed physics tests of convective parameters with nudged winds can provide a useful evaluation of the subgrid physics with realistic boundary conditions, while free-running simulations are important when parameterization changes can feedback strongly onto the large-scale circulation. But in the latter case, because we have no expectation of time-evolving agreement between the free-running model and observed weather, the standard approach to retrieval quality filtering and averaging kernel selection cannot be used reliably. This is particularly important in the case of deuterium because cloud processes will strongly influence the isotopic composition of vapor, and also its measurability.
In this study, we examine the assumptions underlying the standard, retrieval-based approach to applying the TES HDO operator and describe an alternative "categorical" approach for use specifically with free-running climate model simulations. The categorical approach relies as little as possible on short time-scale agreement between the model and instrument of quantities that influence retrieval quality and averaging kernel structure. It instead uses their dependence on atmospheric conditions, similar to those identified by , in trying to predict the retrieval quality and averaging kernel structure for a given set of model conditions. Our approach was also motivated by the progress made in cloud simulators (e.g. Bodas-Salcedo et al., 2011) in that we apply the TES operator as an instrument simulator within the NASA GISS ModelE general circulation model (GCM). Our focus is on the tropics, in order to evaluate the performance of the TES operators under a limited set of conditions, and where our future process-based studies will be initially conducted.
The paper is structured as follows. Section 2 describes the TES HDO retrievals and the factors which influence retrieval quality and averaging kernel structure. The GISS ModelE is described in Sect. 3. The standard, retrieval-based TES operator and its suitability are described in Sect. 4. The new, categorical TES operator and its suitability are described in Sect. 5. In Sect. 6, the effects of applying the two types of TES operators on the modeled δD fields are examined, several sensitivity tests are described, and the retrieved and modeled δD fields are briefly compared. A brief discussion follows in Sect. 7. Future studies will examine the reasons for model-satellite δD discrepancies in detail.

TES HDO retrieval and instrument operator
The TES instrument onboard the Aura satellite is an infrared Fourier transform spectrometer measuring in the 650 cm −1 to 3050 cm −1 spectral range, following a sun-synchronous orbit with a repeat cycle of 16 days (Beer et al., 2001). We use version 4 level 2 H 2 O and HDO nadir retrievals which have a horizontal footprint of 5.3 km by 8.5 km. H 2 O and HDO amounts are jointly retrieved using optimal estimation, using spectral windows in the region between 1100 cm −1 and 1350 cm −1 . The retrieved profiles represent an adjustment from the prior H 2 O and HDO constraint profiles. The adjustment is estimated iteratively to minimize the difference between the measured spectra and that predicted by a forward radiative transfer model using the estimated profiles as input . Retrieved profiles are provided on 67 pressure levels.
For HDO, a single, constant HDO/H 2 O profile from the global mean of the NCAR CAM model is used for the prior constraint. For H 2 O, the prior varies by retrieval, and is obtained from collocated grid points from the GEOS-5 global transport model operated by the NASA Global Modeling and Assimilation Office (GMAO) (Rienecker et al., 2007). A single, fixed H 2 O constraint would yield poor-quality retrievals because H 2 O amount can vary so widely in the troposphere. The retrieval is based on the logarithm of H 2 O and HDO profiles because of their potentially large variation in the vertical, and to ensure positive retrieved amounts. The estimated error of the retrieved HDO is 10 % in the tropics . All analysis is for daytime retrievals only, for compatibility with the simulated ISCCP cloud properties (described in Sect. 3).
The TES HDO instrument operator applied to model profiles can be described as follows. Using the notation of Worden et al. (2011), the model HDO/H 2 O ratiox R suitable for comparison with satellite measurements is expressed aŝ In Eq. (1), the subscripts and superscripts indicate the following: "R" relates to the isotopic ratio HDO/H 2 O, "a" relates to a prior constraint, "D" relates to HDO and "H" relates to H 2 O. In Eq. (1), x R a is the prior isotopic ratio HDO/H 2 O before standardization with respect to Vienna Standard Mean Ocean Water (VSMOW), x D a is the prior HDO amount and x H a is the prior H 2 O amount. x D and x H are the raw, modeled HDO and H 2 O amounts, respectively. All x terms are the logarithm of the isotopic ratio or species amount, i.e. x = ln(q), where q is the species amount in units of volume mixing ratio (vmr). The x terms are column vectors of size 67 × 1, with modeled amounts interpolated linearly from the 40 model levels. A DD is the HDO averaging kernel, A HH is the H 2 O averaging kernel, and A HD and A DH are the cross-kernels between them. The cross kernels represent the sensitivity of one retrieved species to the actual profile of the other. All averaging kernels are square but asymmetric matrices with size 67 × 67. Following Risi et al. (2012), the full 67 TES pressure levels were truncated to the vertical range relevant to HDO analysis. Thex R and x R a vectors were truncated to the 10 TES pressure levels spanning the 909 hPa to 383 hPa range, where the HDO retrievals are somewhat sensitive. The x D a , x H a , x D and x H vectors were truncated to the 26 TES levels spanning the 1000 to 100 hPa range, HDO and H 2 O composition over which can influence the retrievals over 907 hPa to 383 hPa. Accordingly, each of the averaging kernel matrices is truncated to size 10 × 26. This truncation reduces computation time and storage requirements for the TES data considerably, with little effect on the results . Most analysis presented in this study is further restricted to the 825 hPa to 510 hPa range where the HDO retrieval is most sensitive, following Yoshimura et al. (2011), and which spans the ∼ 600 hPa level examined by Berkelhammer et al. (2012) and Risi et al. (2012). TES measurements were mapped to the 2 • × 2.5 • ModelE grid. The overall sensitivity of the retrieval is measured by the trace of the HDO averaging kernel A DD . HDO retrieval sensitivity is influenced by cloud thickness and height, surface temperature and moisture content . Only retrievals classified as high quality are included, which was defined as having sensitivity greater than 0.5 Berkelhammer et al., 2012;Risi et al., 2012) and the overall HDO retrieval quality flag set to 1. The minimum sensitivity requirement ensures that the retrieval is sufficiently sensitive over some vertical range to the measured spectra, and not dominated by contributions from the prior constraint. Figure 1 shows an example TES nadir orbital path during daytime over the tropics for one day. Of 133 measurements, only the 85 high-quality retrievals are shown. Example averaging kernels for one high quality retrieval over the Indian Ocean are shown in Fig. 2. After the quality filtering, we adopt the pressure level of peak sensitivity for a given level of retrieved HDO, defined as p D , as the key characteristic of the operator. In Fig. 2a, p D for both the 619 hPa (purple) and 681 hPa (light blue) is approximately 700 hPa. The mean p D between 825 hPa and 510 hPa will be the primary metric used for distinguishing averaging kernel shapes. Figure 3 shows the spatial variation of retrieval quality and p D across the tropics during 2006-2009. There were 202 713 daytime retrievals, 69 % of which were high quality over the ocean and 57 % over land, but with considerable spatial variation (Fig. 3a). Over the oceans, there were fewer high-quality retrievals over the ITCZ and SPCZ bands, eastern Indian Ocean, the Maritime Continent, and the West Pacific Warm Pool due to the frequent presence of precipitating clouds. There is also lower retrieval quality off of the west coasts of South America and Africa possibly due to low moisture content and lower sea-surface temperatures. Over land, the lowest quality is over the Sahara, presumably due to low moisture content. Given that the retrieval quality can decrease under either very wet or very dry conditions, there is no apparently simple rule which would separate low and high quality retrievals.
Over 825 hPa to 510 hPa, there is also considerable variation in p D for high quality retrievals (Fig. 3b). Over the oceans, p D is lower (at a higher altitude) in moist regions where there is abundant mid tropospheric moisture, but also in the dry regions off of the coasts of South America and Africa presumably due to low-level marine stratocumulus, as described by Lee et al. (2011). p D is higher over the dry subtropical anticyclones due to a moist boundary layer and dry free troposphere.

Observed controls on TES HDO retrieval quality and p D
The first task in developing the new approach is to understand controls on retrieval quality and p D in the TES measurements. Possible controls were identified using the pattern correlations between Fig. 3a and b and different underlying meteorological quantities. The following variables were considered from mean fields calculated from 2006-2009: cloud optical depth (τ ), cloud fraction (CF), defined as the percentage of retrievals in a grid cell with cloud optical depths greater than 0.3, cloud top pressure (CTP), surface temperature (T S ), and moisture content. Moisture content was expressed as total precipitable water (PW T ) and further separated into precipitable water in the boundary layer (PW B ) (within 150 hPa of the surface) and precipitable water in the free atmosphere (PW F ) (above 150 hPa from the surface). All moisture quantities were computed from the prior H 2 O pro-files, which are sampled from GMAO reanalysis. The analysis of controls on p D is for high quality retrievals only, for both the averaging kernels and underlying meteorological quantities. Correlation and regression quantities were computed using ordinary least-squares regression, which does not take into account errors in the control variables. Table 1 lists the pattern correlations. Over the ocean, retrieval quality was most strongly associated with CF, with a correlation of −0.70, indicating that, as would be expected, retrieval quality decreases with increasing cloud cover. Compared to CF, τ was a weak predictor of retrieval quality, likely because of its highly non-normal distribution. Over land, retrieval quality was most strongly associated with T S , with a correlation of −0.72 and to a slightly lesser degree, with PW B , (which itself has a correlation of −0.59 with T S ). Over the ocean, p D is most strongly associated with PW F . As PW F decreases, p D moves toward the boundary layer where moisture is abundant, and will therefore exert a stronger influence on the retrieved HDO at higher altitudes. PW B itself had a low correlation with p D because it varies substantially less than PW F over the ocean. Over land, p D was most strongly associated with T S , but with a lower correlation of −0.51 compared to over ocean, and equally high correlation with PW F . The linear fits between retrieval quality and p D for the primary control variables are shown in Fig. 4. It can also be seen that the observed control on retrieval quality over land is due to a set of high-temperature, low quality points, which were associated with extremely hot and dry conditions over the Sahara (Fig. 3a). The unexplained variation in these relationships is due to the influence of the more weakly correlated variables and other unknown factors. We considered adopting multivariate regression models to capture this variability, but found that the collinearity between meteorological quantities led to unstable regression estimates, and that remedial measures such as principal component regression precluded straightforward interpretation.
Comparisons such as those in Fig. 4 will serve as the primary means of evaluating the suitability of different TES operators. It is these relationships that we seek to evaluate for different TES HDO operators in the model, namely that: -Retrieval quality should decrease where there is increasing CF over ocean and increasing T S over land.
p D should move closer to the boundary layer as PW F decreases over the ocean, and move closer to the free troposphere as T S decreases over land.
-The scatter in the linear fits is similar to that observed in the TES measurements. That is, the dispersion of the residuals around the fitted regression lines should be similar to those in Fig. 4.

NASA GISS ModelE
We use the atmosphere-only version of the NASA GISS ModelE general circulation model at 2 • × 2.5 • horizontal resolution and 40 vertical levels. The core model is an updated version of that described in Schmidt et al. (2006), with a recent summary of the cloud physics provided by Kim et al. (2011). The simulation period was 2006-2009, covering the continuous period of TES retrievals, with an additional year for spin-up. A spin up time of five years did not affect the results. Internannually-varying monthly sea-surface temperatures and sea-ice cover are prescribed (Rayner et al., 2003). The horizontal winds in the model were nudged toward NCEP-NCAR Reanalysis (Kalnay et al., 1996) at each model time-step. All other dynamical quantities are calculated prognostically. Our eventual interest is evaluation of free-running simulations against the TES observations, but nudging allowed for consistent comparison between the retrieval-based and categorical TES operators for a configuration typical of how the retrieval-based operator has been commonly applied in the past.
ModelE is equipped with stable water isotope tracers (Schmidt et al., 2005), advected using the quadratic upstream scheme of Prather (1986), which yields an effective transport resolution approximately twice that of the horizontal model resolution. Isotopic fractionation between H 2 O and the rare isotopologues H 18 2 O and HDO is parameterized for all moist processes, from evaporation and evapotranspiration over the ocean and land surfaces, to condensation and deposition, and post-condensation exchange between rainfall and vapor. The stable water isotope tracer parameterization is much simpler than the underlying cloud parameterization, and is more tightly constrained by laboratory measurements. This is what makes the TES HDO retrievals potentially valuable, as isotopic measurements can be used in evaluating the underlying cloud physics with a fair amount of confidence that the isotopic physics are correct. Or, put another way, errors in the modeled isotopic fields are likely to be dominated by errors in the cloud physics rather than errors in the isotopic physics.
ModelE also includes an internal simulator for the International Satellite Cloud Climatology Project (ISCCP) that  Table 1  produces cloud diagnostics for comparison with the ISCCP datasets (Klein and Jakob, 1999). For our purposes, the key feature of the ISCCP simulator is the random, subgrid joint distribution of τ and CTP, conditioned upon the grid-scale vertical distributions of humidity, convective cloud cover and large-scale cloud cover.

Review of retrieval-based operators in previous studies
In applying the TES operator in Eq.
(1) to model profiles x D and x H , choices must be made whether to include the profile, in choosing the prior profile x H a , and the averaging kernels A DD , A HH , A HD and A DH , all of which are different for each retrieval. These choices should reflect the conditions at each model point. Using the standard, retrieval-based approach, the model fields are sampled along the orbital path, but excluding model points collocated with poor-quality retrievals. For the remaining model points, the averaging kernels and priors from individual measurements are used in applying Eq. (1). The underlying assumption of this approach is that the modeled and retrieved factors influencing retrieval quality and averaging kernel structure are in agreement.
This approach is based on the earlier, pre-Aura launch description of Jones et al. (2003) of the potential accuracy of the TES CO retrievals. Variants of the technique have been used in validating TES retrievals against collocated measurements of CO from aircraft , O 3 from aircraft (Richards et al., 2008) and sondes , and H 2 O measurements from sondes . It has also been used for comparisons between TES CO retrievals and those from the Atmospheric Chemistry Experiment (ACE)  and Measurements of Pollution in the Troposphere (MOPITT) (Luo et al., 2007b).
The approach has subsequently been applied in CTM studies focusing on TES O 3 data assimilation (Parrington et al., 2008), the sources, sinks and transport of pollution in the troposphere (Nassar et al., 2009;Choi et al., 2010;Liu et al., 2009), and inverse modeling of CO ) and CO 2 (Nassar et al., 2011). These studies all involved CTMs with fully prescribed meteorological fields. In studies using the GEOS-Chem CTM, meteorology is prescribed from GMAO reanalysis. Through its assimilation of radiosonde profiles of temperature, humidity and winds, and independent satellite estimates of atmospheric moisture and winds, the GMAO reanalysis provides reasonable estimates of the factors which are known to influence averaging kernel structure and retrieval sensitivity (e.g. Norris and Da Silva, 2007). Voulgarakis et al. (2011) applied the TES operator using the retrieval-based approach in their analysis of O 3 -CO correlations for two coupled chemistry-climate models with prescribed SSTs and horizontal winds nudged toward reanalyses. All other meteorological fields were calculated prognostically, unlike the CTM studies described above.  considered three chemistry-climate models with prescribed SSTs and nudged toward reanalysis, and using collocation-based averaging kernel selection and quality filtering. A fourth free-running (non-nudged) simulation was also considered. Their focus was on estimating the error associated with using monthly mean maps of spatially-varying averaging kernels rather than individual retrievals. A small error would allow the TES operator to be applied to monthly mean model output, simplifying multi-model comparisons against satellite measurements. We note that by embedding the TES operator within the model, we have avoided this issue altogether. Risi et al. (2012) used the monthly-mean approach in comparing TES HDO fields to those from nudged simulations with the LMDz isotopically-equipped GCM for several different parameter values within the cloud scheme. Yoshimura et al. (2011) used the standard approach using individual retrieval-based sampling in their comparison of the TES and IsoGSM HDO fields with varying isotopic physics, noting that this approach necessitates model nudging. Both studies stressed the importance of applying the TES operator to model outputs for quantitative comparisons with the data. Lee et al. (2009) andField et al. (2010) compared TES HDO to free-running simulations with different convective and isotopic configurations, but without applying a TES operator, making their interpretation necessarily qualitative.

Retrieval-based controls on TES HDO retrieval quality and p D
The standard, retrieval-based TES operator was implemented within ModelE for H 2 O and HDO. TES retrievals are ingested into the model's TES simulator along Aura's orbital path (as in Fig. 1) at each half-hour model time step, during daytime and over the tropics only. The retrieval quality filtering and averaging kernel selection is done regardless of the agreement in meteorology between the model and TES. In cases where a model cell contains more than one high quality TES measurement, the averaging kernels and H 2 O priors for all are applied to the model profile and the mean of the resulting profiles is taken. We evaluated the suitability of this approach by comparing the relationships in Table 1 for the TES observations to those from the retrieval-based operator. If the modeled meteorology agreed exactly with that retrieved by the instrument, then the relationships between retrieval quality and p D would be the same as in Table 1 when the control variables from TES are replaced with those from the model. The degree to which this is not the case quantifies the difference in meteorology observed by TES and simulated by the model in the context of their influence on retrieval quality and p D . Figure 5 shows the same observed TES retrieval quality and p D as Fig. 4, but as a function of modeled CF and PW F over the ocean and T S over land. Over the ocean, there is too weak a decrease in observed retrieval quality with increasing model CF, indicated by the slope of −0.18 and weaker correlation of −0.26 (Fig. 5a). This reflects, despite nudging, the low correlation of 0.35 between the TES and ModelE CF. The regions where TES is excluding more retrievals do not always correspond to where the thick clouds are in the model, for example. There is less disagreement in control on retrieval quality over land (Fig. 5c), because of the higher correlation of 0.74 between modeled and retrieved T S . Compared to retrieval quality, the observed controls on p D over the ocean are better captured by the retrieval-based operator (Fig. 5b). This is also due to the strong correlation between the TES and ModelE ocean PW F fields (0.86), which leads to a similar relationship with p D . Over land, the relationship between modeled T S and p D is in fair agreement with, but slightly weaker than for the observed T S .

Description of categorical operator
The observed TES retrieval quality and, to a lesser extent p D , are not entirely consistent with the underlying model conditions, despite nudging. This problem will be worse for freerunning simulations. We have therefore developed a technique to apply the TES operator in a way that presumes no agreement between the observed and modeled meteorology at short time-scales, but such that the retrieval quality and averaging kernel selection are suited to the modeled conditions at that point. This approach is referred to as the "categorical operator" and was implemented alongside the retrieval-based operator in ModelE.
For different categories defined according to the variables in Table 1, we computed the mean retrieval quality, mean averaging kernels, and mean H 2 O prior from the TES retrievals (described in detail in the next Section). The mean retrieval quality is the proportion of HDO retrievals in a category that were classified as high quality. The mean of the averaging kernels is the matrix resulting from taking the element-byelement means of all averaging kernels (for high quality retrievals only) falling into a given category. Applying the categorical TES operator in the model then consists of two steps: 1. At each time step and grid point, the values of the categorical variables in the model are used to look up the associated categorical TES retrieval quality. The model profile is included with a probability equal to the categorical retrieval quality. If a particular set of model conditions was associated with 30 % high quality retrievals, for example, then there is a 30 % chance that that model profile would be included.
2. For the profiles passing the retrieval quality filter, the categorical variable values in the model are used to look up the associated prior H 2 O profile and averaging kernels, which are used in applying Eq. (1).

R. D. Field: TES HDO/H 2 O Retrieval Simulator
Thus, rather than use information from individual retrievals, we use conditions in the model to empirically predict the retrieval quality and averaging kernel structure for a sampled model point.

TES categorizations
Thirteen categorizations of increasing complexity were considered, which ranged from having one category across all retrievals to 1620 categories when the retrievals were separated according to discrete ranges of all control variables. Table 2 shows the values used for each variable in different categorizations. CF is not retrieved for individual measurements, but is included implicitly for the categories involving clouds by including a clear sky category with τ less than 0.3. An important element of the categorical operator is our use of the ISCCP simulator in ModelE. Rather than use grid-mean values of τ and CTP, we randomly select an ISCCP subgrid column with equal probability and use its τ and CTP. The subgrid τ will not be normally distributed; a single, large τ can skew an otherwise clear-sky grid box toward an unrepresentatively high τ in the grid-scale mean. Using the individual ISCCP subgrid columns guards against an inevitable bias toward high τ values with low retrieval sensitivity that would result if the grid-scale mean were used. Inclusion of low sensitivity retrievals would result in comparison of retrieved and, after applying the TES operator, model profiles that have both relaxed toward the prior, creating artificially high agreement between the satellite and model (Nassar et al., 2008).
Categorizations are named according to the variables they include. We tried to strike a balance between capturing distinctions in retrieval quality and averaging kernel structure and using as few categories as possible. The cloud-only C categorization extends the decomposition of Lee et al. (2011) to the coarse, qualitative ISCCP categories. The C fine categorization corresponds to the full ISCCP categories. The PW and PW fine categorizations use precipitable water only, and contribute 9 and 49 categories, respectively when precipitable water is separated into boundary layer and freeatmosphere components. The LOτ TPW F categorization with 180 categories included only the variables identified in Table 1 as the most important (land/ocean separation, τ , T S and PW F ). This was a possible optimal categorization that captures variation in retrieval quality and p D using far fewer categories than the full LOCTPW categorization which includes all variables.
To show how retrieval quality and averaging kernel structure varies, we look first at the C categorization based on τ and CTP. Retrievals with τ less than 0.3 account for 64 % of observations, with the rest consisting mostly of mid-and high-level clouds (Table 3). Retrieval quality is generally high for τ less than 1.3, and for low-level clouds with τ between 1.3 and 3.6 (Table 4), but otherwise poor. The relatively poor quality of 68.2 % for the low τ and high CTP category suggests an additional factor influencing retrieval quality, such as T S over land.
To illustrate the associated changes in averaging kernel structure, Fig. 6 shows the averaging kernel rows at 619 hPa for CTP less than 440 hPa and three different ranges of τ . Averaging kernels rows for τ less than 0.3 (Fig. 6a) have a higher p D than for τ between 0.3 and 1.3 (Fig. 6b), but neither peak is particularly sharp. Neither is significantly different from the grand mean because these categories constitute such a large proportion of all retrievals. Sensitivity for thicker clouds is generally low (Fig. 6c), even with only high quality retrievals included, and the averaging kernel has a much flatter peak. The average retrieval quality for this category is 11 %. Model points corresponding to these conditions would in general be excluded from the analysis.
The CPW categorization extends the C categorization by further separating the retrievals according to PW B and PW F , which may vary independently of cloud cover. Figure 7 shows the averaging kernels underlying the mean in Fig. 6a, but for a moist boundary layer (PW B greater than 20 mm) and for three categories of PW F . The main distinction is that p D increases from 600 hPa in Fig. 7a to 800 hPa in Fig. 7c as PW F decreases. The error bars are also narrower than in Fig. 6a, and particularly for the low PW F case, the peaks are sharper than in separating based on τ only in Fig. 6a and Fig. 6b. Although the focus of the averaging kernel separation is the A DD row at 619 hPa, the corresponding changes in the H 2 O prior x H a (not shown) were as expected, with the x H a decreasing strongly above the boundary layer for PW F less than 10 mm. Before applying the TES operator, we can gauge how more complicated categorizations might yield a better mapping from model conditions to retrieval quality and the most suitable averaging kernels. Of interest is the degree to which different categorizations separate high from poor quality retrievals, and for the high quality retrievals, the degree to which p D is separated. This is analogous to the correlations in Table 1, but for a set of discretized predictor variables.
For each categorization, the separation between high and poor quality retrievals was measured by the mean difference between each category's quality and the overall mean quality. In computing the mean difference, each categorical quality is weighted by the number of observations, so that low-quality categories with few observations are not overrepresented. For the C categorization, this value is 18.4 %, the mean of the absolute differences between the entries in Table 4 and the overall mean of 68 %, with the mean absolute difference in each category weighted by the frequency of occurrence entries in Table 3. Figure 8 shows this value for each of the twelve categorizations. Most of the separation in retrieval quality can be obtained using only the simple "C" categorization, with smaller contributions from other variables. This is consistent with the strong pattern correlation between retrieval quality and cloud fraction in Table 1. The strongest additional gains are made by including T S in the categorization (CT), consistent with its association with retrieval quality over land. Despite the importance of cloud properties in separating good retrievals from bad, little was gained by using the "C fine " categorization, which is likely due to the larger error in the cloud properties  compared to other categorical variables. Averaging kernel separation was measured by the total root-mean square error (RMSE) of p D at 619 hPa across all categories in a categorization. Only high quality retrievals were considered in calculating the p D RMSE for consistency with any analysis of the retrieved HDO fields. The p D RMSE can be thought of as the total, within-category standard deviation of p D across all categories, weighted by frequency of occurrence. We are interested in the degree to which the total within-category variance p D decreases for increasingly complicated categorizations, or how the error bar widths tend to decrease across all categories within a categorization. A decrease in the p D RMSE would result in a better mapping between model conditions and averaging kernel shape. Figure 9 shows the total p D RMSE for the thirteen different categorizations. Precipitable water plays a more important role in separating p D than in separating retrieval quality. The PW categorization, for example, contributes to greater p D separation than the C categorization, despite having fewer categories. There is a further decrease for the CPW categorization, and also for the CTPW categorization. The LOτ TPW F categorization appears to strike a balance between minimizing the RMSE and using relatively few categories, with further, slight decreases for the CTPW and full LOCTPW categorizations.
From Figs. 8 and 9, all of clouds, precipitable water and surface temperature are important, which we would expect from Table 1. The cloud categories are important on their own in separating high from poor quality TES retrievals, and precipitable water provides most separation of p D . There are diminishing returns, however, as the size of the categorization increases. It is not immediately clear whether more complicated categorizations yield relationships closer to those in Table 1 or different δD fields after applying the TES operator.

Categorical controls on TES HDO retrieval quality and p D
The categorical operator was tested in ModelE with four representative categorizations: C, PW, LOτ TPW F and LOCTPW. In each case, the underlying model configuration was the same as in the case of applying the retrieval-based TES operator, but the quality filtering and averaging kernel and H 2 O prior selection from individual TES measurements were replaced with categorical selection. Figure 10 shows the approximated retrieval quality for the four categorizations. For the C categorization (Fig. 10a), the approximated retrieval quality bears some resemblance to the observed retrieval quality (Fig. 3a), but is 10 % lower over the ocean and without the sharp decrease in retrieval quality over the southern Sahara. Over the Pacific and Atlantic sectors, the regions of high retrieval quality are to the east of those in the observations. The PW categorization (Fig. 10b results in a mean ocean retrieval quality of 68.9 %, nearly identical to the TES observations, but lacks the distinction between wet and dry regions seen in the observations and for the C categorization. The approximated retrieval quality of the LOτ TPW F and LOCTPW categorizations (Fig. 10c, d) are all similar over the ocean, with the LOCTPW categorization having a sharper decrease over the southern Sahara.
While instructive to see the sensitivity of the retrieval quality to the different categorizations, their performance should, strictly speaking, be evaluated according to how well they approximate the observed relationships in Fig. 4, rather than by their agreement with the observations in Fig. 3a. These relationships are shown for the four categorizations in Fig. 11. The C categorization (Fig. 11a) results in a slightly stronger relationship (r = −0.78) between the cloud fraction and the approximated retrieval quality than in the observations. This would be expected given that clouds are the only categorical variable used to select quality; in the absence of other, real, complicating factors, the approximated relationship is slightly too strong compared to the observed relationship in Fig. 4a. Furthermore, over the ocean, the lower approximated retrieval quality of 58.9 % is the result of the higher modeled CF (47.8 %) compared to the TES observations (35.3 %).
Conversely, the PW categorization results in a weaker relationship between CF and retrieval quality (Fig. 11b). In this case, cloud fraction acts as a lurking variable in the categorization. CF is somewhat correlated with PW B (0.48) and PW F (0.67), but not strongly enough to accurately predict retrieval quality when excluded from the categorization. This case reinforces the need to evaluate the categorical operator based on agreement in the relationships, rather than in the retrieval quality fields. Over the ocean, it is tempting to infer that the PW categorization is more accurate because of its agreement in the mean (Fig. 10b) with retrieval quality. This agreement is misleading however; by not including clouds explicitly in the categorization, the approximated retrieval quality does not decrease under the higher modeled cloud fraction, which it should. The relationships are in better agreement, neither too strong nor too weak, for the LOτ TPW F categorization (Fig. 11c), and to some extent the Atmos. Chem. Phys., 12, 10485-10504 LOCTPW categorization (Fig. 11d) . Over land, the C and PW categorizations (Fig. 11e, f) performed poorly in capturing the variation in retrieval quality over land. When T S is not included in the categorization, there is too little covaration between T S and either of CF, PW F or PW F to capture the decrease in retrieval quality with T S . More realistic approximations were obtained for the LOτ TPW F and LOCTPW categorizations (Fig. 11g, h), which include T S , and land/ocean separation, although there is still less agreement than for over the ocean. The approximated p D for the five categorizations is shown in Fig. 12. The approximated p D for the C categorization (Fig. 12a) shows little of the variation seen in the TES observations (Fig. 3b), with little increase in p D over the Pacific and Atlantic subtropical anticyclones. The PW categorization (Fig. 12b) does capture this increase, but not the lower p D over the tropical rain belts, and with a smoother structure owing to the smoothness of the quality filtering. The approximated p D for the LOτ TPW F and LOCTPW categorizations (Fig. 12c, d) were comparably similar to the TES p D fields over the ocean and land. Figure 13 shows the approximated controls on p D . As in the observed relationships in Fig. 4b and Fig. 4d, PW F and T S include only model points classified as having high retrieval quality. The weak slope of the C categorization over the ocean (Fig. 13a) reflects the absence of variation in p D in Fig. 12a. The slope for the PW categorization (Fig. 13b) is closer to the observed slope, but with an overly strong correlation, too little scatter, and with unrealistically high p D overall. Similar to retrieval quality, the control on p D is more realistic when both clouds and precipitable water are included (Fig. 13c, d). The inclusion of clouds in the categorization helps to separate high PW F for clear and cloudy sky, allowing the clear sky values with higher quality to be included. The RMSE of pD at 619 hPa (hPa) Fig. 9. RMSE of p D (height of peak HDO sensitivity) for the twelve categorizations, and the "Single" categorization. Numbers in parentheses indicate the total number of categories within each categorization.
full LOCTPW categorization has a more realistic amount of scatter, but both that and the LOτ TPW F categorizations have a steeper slope and higher correlation than in the observations. The retrieval-based operator in Fig. 5b, by contrast, had a too-flat slope and weak correlation. Over land, the approximated T S control on p D was of the opposite sign for the C and PW categorizations (Fig. 13e, f), and best approximated by the full LOCTPW categorization (Fig. 13h).
Overall, the LOτ TPW F and LOCTPW categorizations performed best in approximating controls on retrieval quality and p D . Both were equally deficient in not having a strong enough decrease in retrieval quality with T S over land, and an overly strong increase in p D with PW F over the ocean. These are likely the greatest source of selection error in applying the categorical TES operator to raw model δD fields.

Comparison of retrieval-based and categorical TES operators
Ultimately, we are interested in the effects of applying the different TES operators to raw ModelE δD fields. Figure 14 shows this effect for the retrieval-based TES operator over the whole analysis period. Again, the retrieval-based operator has been applied regardless of agreement between the retrieved and modeled values of CF, PW F and T S . The effect of sampling along the orbital path can be seen by the less smooth field of Fig. 14b compared to Fig. 14a. Application of Eq.
(1) to the raw model fields after quality filtering results in an average δD increase of 8.8 ‰ over ocean and 6.4 ‰ over land (Fig. 14c), but this reflects larger regional changes. In general, the largest absolute changes occur where there is the largest difference between the raw model field and the prior δD over 825 hPa to 510 hPa, which is roughly −150 ‰ when vertically weighted by specific humidity. Over northern Africa, the high model δD decreases toward the prior by up to 40 ‰, whereas over South America and the Maritime Continent the low δD increases toward the prior by up to 35 ‰. Figure 15 shows the result of applying the different categorical TES operators. The changes in δD are similar to the retrieval-based operator in that regions of low raw Mod-elE δD tend to increase toward the TES prior, but there are significant regional differences for the C and PW categorizations. Using the C categorization (Fig. 15a), there is a strong decrease in δD over the anticyclones in the Pacific and Atlantic, despite the raw ModelE δD not being particularly high. This is due to the effect of not including PW in the categorization and consequently not capturing the variation in p D . Using only clouds in the categorization, these regions are simply classified as having low CF, and will be associated with averaging kernel shapes similar to those in Fig. 6a. This averaging kernel is inappropriate, however, as it does not capture the higher p D associated with the PW F less than 10mm (Fig. 7c) which occurs in those regions. As a result, the mid-tropospheric δD composition, which is low, has an overly strong influence in applying Eq. (1), resulting in an overly strong δD decrease. Using the PW categorization (Fig. 15b), this problem is absent, but there is a weaker increase in δD over the western Pacific warm pool. The more complex categorizations result in similar changes to the δD field (Fig. 15c, d), not varying by more than 1 ‰ in their overall mean and with only small regional differences. With a sufficient CF control on retrieval quality and PW F control on p D , the deficiencies over the ocean for the C and PW categorizations are absent for each. The ModelE δD changes for the categorical operators result from approximating the controls on retrieval quality and p D using conditions in the model, rather than from collocated TES retrievals. They are accurate to the extent that the approximated controls in Fig. 11 and Fig. 13 agree with the ob-servational controls in Fig. 4. Focusing on the full LOCTPW categorization, the most significant deficiency was the PW F control on p D over the ocean (Fig. 13d), where the approximated slope was −1.6 hPa mm −1 too strong compared to observations. We can see, however, that while the slope for the LOτ TPW F categorization was only −1.2 hPa mm −1 too strong, this translated into less than a 1 ‰ difference in the mean change in δD over the ocean from the LOCPTW categorization (Fig. 15c, d). This suggests that if a categorization existed that more closely approximated the observed PW F control on p D in the observations, this would not likely result in change of more than several ‰ to the transformed δD field, ignoring the contributions of other secondary controls. This provides a sense of the maximum error in the transformed δD Atmos. Chem. Phys., 12, 10485-10504 field associated with errors in quality filtering and averaging kernel selection. We note also that in this case, the change in δD for the retrieval-based and LOCTPW categorical operator were very similar, owing to the agreement in the underlying PW F fields, and because of the shared HDO prior and raw model δD fields.

Sensitivity tests
To further understand how the change in δD might vary with different configurations, we examined the sensitivity of the LOCPTW-based operator to the effects of orbital sampling, a fixed H 2 O prior x H a , and also the performance outside of the tropics.
The effect of sampling the model at all points and not just along the TES orbital path was primarily a smoother transformed field (Fig. 16a) compared to without (Fig. 15e) owing to a much greater sampling frequency. Aghedo et al. (2011) found that the effects of orbital path sampling were also minimal on modeled CO, O 3 , temperature and H 2 O at a monthly scale. Voulgarakis et al. (2011) also reached to a similar conclusion regarding the correlation between daily O 3 and CO. The TES sampling frequency is therefore sufficient to capture variability in the model over several years, although it remains to be seen whether this is the case at shorter time scales.
Unique to the joint TES HDO/H 2 O retrievals is the use of a changing H 2 O prior x H a . It must also be chosen in applying the TES operator, representing another potential source of categorical selection error. We assume that the quality of averaging kernel selection for the A HH , A DH and A HD operators for different categorizations follows that of A DD . As a test of the importance of x H a selection on the TES operator in Eq. (1), we fixed x H a to the constant profile of the "Single" categorization, but with the averaging kernels still chosen from the LOCTPW categorization. This had little effect (Fig. 16b), which likely means that the A HH and A DH terms are typically very similar (as was the case for the example profile in Fig. 2), and that the strength of TES operator is largely controlled by the second term on the RHS of Eq. (1).
The focus of future comparisons between the modeled and observed δD fields will be over the tropics, following a series of recent studies Kurita et al., 2011;Berkelhammer et al., 2012;Kim et al., 2012). For broader potential application, however, we tested the performance of the TES HDO simulator outside of the tropical domain. The LOCTPW categorization was re-calculated from TES measurements over 60 • S to 60 • N. The range of the surface temperature categories was increased from 260 K to 330 K to capture a wider observed temperature range. Model simulations were run with the TES operators applied over 60 • S to 60 • N. To assess performance outside of the tropics, we examine the degree to which observed variation in relationship strength by latitude is captured by the categorical TES operator. Figure 17 shows the correlation between retrieval quality and p D and the primary control variables at different latitudes. Observed retrieval quality over the oceans (Fig. 17a) remains negatively correlated with CF, weakening slightly at high northern latitudes. The retrieval-based operator performs poorly in capturing this association, but the categorical operator performs well. Over land (Fig. 17c), the observed negative correlation between retrieval quality and T S becomes positive at high latitudes, presumably due to the covariation moving poleward between T S and atmospheric moisture content. This change is captured by both operators, but too sharply in the case of the categorical operator.
The associations between p D and the primary control variables are not generally well-captured over the wider latitude range. Over the ocean (Fig. 17b), the overly-strong negative correlation between p D and PW F over the tropics compared to observations (in Table 1) increases moving poleward. The observed decrease in correlation outside of the tropics is captured to some degree by the categorical operator, but with a lag, and nor is there any modeled rebound in correlation at high latitudes. Over land, there is an observed positive relationship between T S and p D across all latitudes (Fig. 2d). This is poorly captured by the categorical operator, for which there is no correlation between 40 • S and 0 • . In fact, over land, when extratropical TES measurements are included in calculating the categorization, the performance of the operator is degraded in the tropics. When the categorization is calculated only from TES measurements between 15S and 15N, the correlation between p D and T S of 0.64 is in good agreement with the observed correlation of 0.51. When the operator is based on measurements between 60 • S to 60 • N, however, the correlation over 15 • S to 15 • N is 0. So not only is prediction of p D in the extra tropics poor, but it contaminates the fairly good performance over the tropical land shown in Fig. 13j. Application of the categorical TES operator outside of the tropics will likely require that latitude-specific categorizations be computed from the TES retrievals, and possibly that other control variables be considered.

Comparison with TES δD
Comparisons between the TES and ModelE δD are shown in Fig. 18. The raw ModelE δD is on average 17 ‰ lower than TES over the ocean and 41 ‰ lower than TES over land, but with negative biases of up to 63 ‰ and 96 ‰ over each, respectively (Fig. 18b). The negative bias over the ocean occurs over the tropical rain bands and in the dry regions off of the west South American and central African coasts. In the latter cases, the bias likely results from outflow of strongly depleted vapor due to continental convection.
The negative bias over the ocean is reduced to ∼ 7 ‰ after applying either the retrieval-based (Fig. 18c) or categorical (Fig. 18d) TES operators, and more weakly reduced to ∼ 35 ‰ over land. The changes in bias over the ocean are interpreted as follows. Where there is heavy, precipitating cloud, observed retrieval quality is lower (Fig. 3a). Because precipitation tends to lower vapor δD (e.g. Lee and Fung, 2008), this introduces an observational bias toward higher δD through the exclusion of retrievals under cloudy and lower δD conditions, and relaxation toward a prior constraint with higher δD. By applying the TES operator, these effects are captured (Figs. 14c, 15e) leading to the more accurate comparisons in Fig. 18c, d. It also becomes more apparent that the model bias toward lower δD is specific to a model process over land. It was beyond the scope of this paper to understand these biases, but immediate candidates that will be investigated in the future are too-strong continental convection and too-weak transpiration.

Discussion
Changes to the raw model δD over the tropics from applying the TES operators were large. Over the ocean, the mean increase in modeled δD from applying the TES operator was 9 ‰, and was up to 30 ‰ over regions with low, raw δD such as the west Pacific warm pool. Over land, there was a mean increase of 6 ‰, but with increases of up to 30 ‰, and decreases of up to 40 ‰ over northeastern Africa where raw δD is very high.
To put these changes in context, they are of the same order as the δD model biases in previous comparisons against the TES δD retrievals. Yoshimura et al. (2011) saw a systematic bias of −20 ‰ in the IsoGSM model over the same vertical layer. Risi et al. (2012) saw a bias of 30 ‰ in their comparison of LMDz at 619 hPa. That the regional differences to the raw ModelE δD fields resulting from the TES operator are of the same magnitude confirm its importance in any quantitative comparison between the model and satellite measurements. Similarly, Aghedo et al. (2011) determined that the error associated with not applying the TES retrieval operator to retrieved CO, O 3 , temperature, and particularly H 2 O, was much larger than the error associated with monthly averaging or the absence of orbital sampling.
The changes in δD for the cloud-only categorization were unrealistic owing to poor p D approximation. For this nudged simulation, the new δD fields for retrieval-based and full LOCTPW categorical operators were in good agreement because of the similarity of their PW F and T S fields and because of accurate mapping of these quantities to a suitable averaging kernel. The LOτ PW F categorization generally performed well through its inclusion of the most important controls on retrieval quality and p D , and has the advantage of having far fewer categories, but the influence of T S on p D over land was too strong. The accuracy of the modeled PW F field is likely the result of the nudged, large-scale control on the humidity field and averaging over four years. It is doubtful that this agreement will be the case for free-running simulations with strongly perturbed physics or over shorter time scales, in which case the categorical operator would be more appropriate.
Atmos. Chem. Phys., 12, 10485-10504 Particularly for retrieval quality, the categorical operator performed poorly over land compared to the ocean. One factor is simply that estimates of categorical retrieval quality averaging kernel structure over land will be less robust because there are fewer TES measurements. More importatntly is that there are likely additional factors influencing retrieval quality and averaging kernel structure over land that we have not considered. For p D in particular, the observational controls over land were weaker (Table 1), making their approximation in the simulator more difficult. As the categorical operator evolves, we will start by testing topography, land cover type, and, related to both, thermal contrast between ground and air, which will be greater over land than ocean. In the latter case, the apparently worse performance over land could be because we considered daytime retrievals only.
Further refinements will be required to use the categorical operator outside of the tropics. Over the oceans, more PW B and PW F categories will be required at the low ends of their scales, assuming that vertical moisture gradients continue to be the dominant control on p D outside of the tropics. Any improvements that are obtained over land in the tropics should improve performance in the extratropics, particularly in the northern hemisphere. We hope to avoid computing the categorizations separately for different latitude bands, but this might be inevitable.
Isotopic constraints provide a new way of assessing GCM simulations of processes which are highly sensitive to perturbed cloud physics, such as those driving the Madden-Julian Oscillation (MJO). Berkelhammer et al. (2012) separated the contributions of evaporative and convergent mois-ture phases during different phases of the MJO. Kim et al. (2012) showed how the absence of an MJO in the default AR5 version of ModelE could be rectified by increasing the entrainment and reevaporation strength in the convective parameterization, but at the expense of the mean state of precipitation. It would be instructive to compare the isotopic response of these changes to TES HDO retrievals, given the sensitivity of isotopic composition to these types of processes Lee et al., 2009;Field et al., 2010). The categorical TES operator provides a means of doing this for arbitrary convective configurations.
In comparisons between retrieved and simulated HDO for other models, regardless of which operator approach is taken, or some other approach, we suggest looking at the agreement between retrieved and modeled CF, PW F and T S . This will give a sense of how appropriate the retrieval quality filtering and averaging kernel selection is for the modeled meteorology, particularly as observational constraints are weakened with free-running perturbed physics experiments. It remains to be seen how the categorical approach performs for freerunning model simulations or for other isotopically-equipped AGCMs. The modeled retrieval quality and p D fields (i.e. in Figs. 10, 12) will change to the extent that the underlying control fields change, or rather, to the extent that the covariation between the control variables changes. One potential weakness is that a new model configuration will have an increase in the frequency of conditions corresponding to categories that were not well populated by TES measurements and for which the retrieval quality and mean averaging kernels are less robust (although the opposite could also be true). This type of evaluation could also be extended to other species, such as O 3 and CO, after identifying the strongest controls on their retrieval quality and averaging kernel structure, as could the categorical TES operator for use in nonnudged composition-climate model evaluation. We note that cloud cover and surface temperature will likely play an important role for most species, but the importance of atmospheric moisture content is likely specific to HDO.