Inﬂuence of ENSO on entry stratospheric water vapor in coupled chemistry-ocean CCMI and CMIP6 models

. The connection between the dominant mode of interannual variability in the tropical troposphere, El Niño Southern Oscillation (ENSO), and entry of stratospheric water vapor, is analyzed in a set of the model simulations archived for the Chemistry-Climate Model Initiative (CCMI) project and for phase 6 of the Coupled Model Intercomparison Project. While the models agree on the temperature response to ENSO in the tropical troposphere and lower stratosphere, and all models also agree on the zonal structure of the response in the tropical tropopause

There are six CMIP6 integrations, each of 164 years. The CCMI integrations are between 138 and 140 years each (depending on the precise end date), and for NIWA and the NCAR models there are multiple ensemble members. The total sum is of over 2700 numerically computed years stored as model output.
We have added more details to Table 1 to clarify this.
• P4L10 "model output is compared to water vapor in the ERA5 reanalysis" -I agree that ERA5 has the best quality of the stratospheric water vapor if compared with other reanalyses...however, it is not an observed water vapor... Furthermore, water vapor observed in the stratosphere (e.g. MLS) is not used is the assimilation procedure of ERA5...Typically, all other reanalyses (ERA-Interim, JRA-55) have stratospheric water vapor that is not good enough for any scientific interpretation. Davis at al., ACP, 2019 writes: "...because of the known deficiencies in the representation of stratospheric transport in reanalyses, the stratospheric water vapor products from the current generation of reanalyses should generally not be used in scientific studies." The improvement of ERA5 (as documented in Wang et al., 2020) is probably a consequence of a better transport scheme...You should mention all these points. ERA5 H20 in the stratosphere is not the result of assimilated H2O observations but of transported H2O (+ contribution from methane oxidation). Especially there is not enough validation for the period 1979-1995, which was strongly influenced by volcanic eruption (El Chichon and Pinatubo) As discussed earlier, we now use measurements of water vapor from the SWOOSH data set.
• P4L17 Few more details for the multi-linear regression of the QBO signal would be desirable. Typically two orthogonal components (zonal winds at 30 and 50 hPa) are used... We agree that if water vapor higher up is considered, a second QBO index is needed, or a different lag needs to be chosen. Nevertheless, for water vapor at the cold point adding a second QBO has little effect, as QBO at 50hPa with a 1-2 month lag maximizes the effect. This effect is shown explicitly in Tan et al 2019, now cited. This point has now been clarified.
• P5L21-25 It is not clear what is the advantage to show the results at 80 and 90 hPa...I would expect, to see the effect of slow upward propagation of the signal (like a tape recorder) so the signal at 80hPa should be slightly later than at 90 hPa (this can be seen in the satellite observations) ...but this slow propagation is typically not well reproduced by the models and reanalyses (which also use transport models to describe stratospheric H2O)...

Abbreviation now added earlier
• P7L24-29 This is my strongest criticism: you consider ERA5 stratospheric H2O as "observation" (see L28). This is certainly not the case (see above). I would agree that temperature are more like "assimilated observations" but this is certainly not the case of stratospheric H2O in ERA5. I think, you should reformulate all these sentences and include a paragraph about stratospheric H2O in ERA5....Another point: are you using ERA5.0 that was shortly replaced by ERA5.1...it was recognized that temperatures (i.e. cold point tropopause) has a systematic bias for the period 2000-2009...this point should be also clarified.
See above with regards to water vapor. We use ERA5.1 for cold point temperatures, now clarified.
• P9L5 "ice lofting" -you are correctly mentioning that this process is not included in (all) models. This process is also not included into ERA5 stratospheric H2O what you consider as the "observation data"...at the end you compare different models with the ERA5 H2O also derived from the "internal" chemistry-transport model in the ERA5 (although the cold point temperatures may have a good quality).

Reviewer #2
This study compares 12 models from the CCMI and CMIP6 projects with reanalysis observations to evaluate these models' skills in simulating ENSO's impacts on stratospheric water vapor variability at 90hPa in the tropics. One key metrics (asymmetry of the ENSO-water vapor relationship) is used in this work to assess each model's performance. It appears much effort has been put into this work and I don't see any serious problems in their analyses. In my view, the authors use right tools (resampling, composite) and their analyses are basically sound. However, considering that the authors seek to publish this study in a scientific journal, I expect to read more discussions to understand why some models are better than others and what possible causes of these failures and successes displayed in this paper could be. The authors only briefly dis cuss this issue in section 5 ( the final paragraph). They attribute the limitations of some "bad" models to weak interannual variability of water vapor around the troposphere and a lack of some key processes determining variability of cold point temperatures. I feel this very limited discussion is not sufficient and more indepth thoughts are needed to improve the presentation and reasoning in the paper. So I consider that some minor revisions are required before accepting this article for publication.
We thank reviewer #2 for the positive and constructive comments.
To me, the main finding of the paper is that all models can well capture LN's impacts on water vapor variability in winter. I feel that the paper could benefit more from some more discussions on why all models perform better on this aspect, rather than just making a list of models with better performance.
To improve the discussion and the results' interpretation, we have added a figure to the revised manuscript comparing the cold point tropopause temperature to ERA5.1 for all models. While the cold point temperature can account for the bias in some models, it doesn't for others. Specifically, MRI simulates more water vapor variability in CMIP6 than CCMI even as the cold point has cooled. The new figure is copied below In Fig4, there is a 2-month lag between T and water vapor and two models show very different patterns from others. I am wondering whether these differences are sensitive to the selection of the time lag. With different time lags, could we observe an improved performance in these two models.
We have recreated this figure using a 1 and 3 month lag. Results are very similar (which is not surprising at least to us considering the high autocorrelation of sea surface temperature anomalies). There truly is a wide discrepancy among the models.
The climatological mean state of vertical temperature profile in the tropics in models may play a key role to determine model's performance in replicating the ENSO-water vapor linkage. Here the authors mainly examine anomalies away from the mean state. I suggest that the authors should pay some attention on the mean state of cold point temperatures to examine whether some biases in the mean state could be translated to models' failures to reflect the ENSO-water vapor connection.
We thank the referee for the suggestion. We have added a figure showing the climatological cold point in these models (see above). Most models have a warm bias (evident in CMIP5 too), and for some models the bias in variability can be explained by the bias in the cold point climatology. This figure will be included in the revised manuscript.
In my view, the selection of 15S to 15N in Fig. 3 needs to be justified. In addition, it is better to show the latitude -vertical transects of zonal mean temperature anomalies ( or an average across some longitudes in the pacific) to provide a 3-D picture of LN and EN' related tropical temperature responses I the 6 models.
As discussed as far back as Gettleman et al 2001 and also in detail in Garfinkel et al 2013 (both cited in the paper), ENSO leads to opposite signed temperature anomalies in the Central and West Pacific, and the cold point moves east for El Nino and west for La Nina. Hence a zonal mean picture, or averaging over a fixed longitude range, isn't enlightening. Table 2 in the present study makes this point as well.
We have created Figure 3 for 10N-10S and also 20N-20S, and results are essentially identical.
In the revised submission we will include lat vs. lon maps of temperature anomalies at 90hPa (for CCMI) and 100hPa (for CMIP6) for these models in the supplemental material.

Reviewer #3, Mohamadou Diallo
Major points: 1. Among the climate models used here some have interactive QBO (WACCM, HadGEM, . . .) and other have nudged QBO. This will lead to different modulations of the water vapor entry (tape recorder). Therefore, it is important to remove the QBO signal adequately in order to attribute properly the remaining variability to ENSO. However, this does not seem to be the case here or at least the description is not clear. Therefore, I have these questions: a. Is the QBO proxy used in the MLR for all analyses calculated using the each model winds for the REF-C1 simulations? b. Is the QBO proxy used in the MLR for all analyses coming from the observation (Berlin QBO or NASA)? c. Is the QBO proxy used in the MLR for all analyses a combination of both a) for autogenerating QBO models and b) for nudged model? If you have used the method a) or b) the results will seriously be questionable because of the time of the QBO modulation in the models, which autogenerate the QBO is different with the observed QBO signal,leading to bias results.
We use the QBO at 50hPa from each data source individually. This has been clarified in the revised version of the manuscript. Fig.1 where the non-linear response of water vapor induced by ENSO is claimed, it would be very interesting to see the QBO nudged models like (EMAC) regarding this non-linearity. Does ERA5 also show this non-linearity in Fig 1? The QBO contributes the most in the entry of water vapor anomalies via its modulation of the cold point temperature Diallo et al., 2018;Tao et al. 2019), therefore, it's important to handled it properly, which is seems to not be the case here. Figure 4 of Garfinkel et al 2018 shows the nonlinearity in SWOOSH water vapor, and a similar effect is evident in ERA-5 though the limited observational record means that the effect is not robust and could be due to sampling variability. Following the comments of the reviewer #1, we now include SWOOSH instead of ERA-5 in the present manuscript, and include SWOOSH in Figure   1. Despite the observational nonlinearity of the water vapor response to the ENSO signal, most models don't show this nonlinearity, as discussed in the text, though the NCAR models do.

Regarding the
3. It would be also very interesting show the tape recorder of each climate model simulation compared to ERA5. According to Hardiman et al. 2017, Figure 8, the HadGEM REF-C1 simulation compares very poorly with the SWOOSH observations, therefore, it would be interesting to see these models performance of tape recorder.
While this would indeed be interesting, it is beyond the scope of this paper to consider water vapor variability in these models above the cold point. We have added to the discussion section that future work is needed to understand the impact of these changes in the lowermost stratosphere on changes in water vapor higher up.

added
Correspondence to: Chaim I. Garfinkel (chaim.garfinkel@mail.huji.ac.il) Abstract. The connection between the dominant mode of interannual variability in the tropical troposphere, El Niño Southern Oscillation (ENSO), and entry of stratospheric water vapor, is analyzed in a set of the model simulations archived for the Chemistry-Climate Model Initiative (CCMI) project and for phase 6 of the Coupled Model Intercomparison Project. While the models agree on the temperature response to ENSO in the tropical troposphere and lower stratosphere, and all models also agree on the zonal structure of the response in the tropical tropopause layer, the only aspect of the entry water vapor 5 response with consensus is that La Niña leads to moistening in winter relative to neutral ENSO. For El Niño and for other seasons there are significant differences among the models. For example, some models find that the enhanced water vapor for La Niña in the winter of the event reverses in spring and summer, other models find that this moistening persists, while some show a nonlinear response with both El Niño and La Niña leading to enhanced water vapor in both winter, spring, and summer. A moistening in the spring following El Niño events, perhaps the strongest signal in observations, is simulated by 10 only half of the models. Focusing on Central Pacific ENSO versus East Pacific ENSO, or temperatures in the mid-troposphere as compared to temperatures near the surface, does not narrow the inter-model discrepancies. Despite this diversity in response, the temperature response near the cold point can explain the response of water vapor when each model is considered separately.
While the observational record is too short to fully constrain the response to ENSO, it is clear that most models suffer from biases in the magnitude of interannual variability of entry water vapor. This bias could be due to biased cold point temperatures 15 in some models, but others appear to be missing forcing processes that contribute to observed variability near the cold point.

1
Water vapor (WV) is the gas with most important greenhouse effect in the atmosphere, and the feedback associated with stratospheric water vapor in response to increasing anthropogenic greenhouse gas emissions is around half of that for global mean surface albedo or cloud feedbacks (Forster and Shine, 1999;Solomon et al., 2010;Banerjee et al., 2019;Li and Newman, 2020). The amount of water vapor entering the stratosphere also regulates the severity of ozone depletion 5 (Solomon et al., 1986) and is important for other aspects of stratospheric chemistry (Dvortsov and Solomon, 2001). Hence, it is important to understand how the comprehensive models that are used for e.g. future ozone and climate projections capture the processes regulating entry of stratospheric water vapor.
Lower stratospheric water vapor concentrations are mainly determined by the tropical temperatures near the cold point, where dehydration takes place as air parcels transit into the stratosphere (Mote et al., 1996;Zhou et al., 2004Zhou et al., , 2001Fueglistaler and Haynes, 10 2005b;Fueglistaler et al., 2009;Randel and Park, 2019). Several different processes have been shown to influence these cold point temperatures, and the goal of this work is to revisit the influence of one of these processes -El Niño Southern Oscillation (ENSO) -on entry water vapor in the lower stratosphere.
El Niño (EN), the ENSO phase with anomalously warm sea surface temperatures in the tropical East Pacific, leads to a warmer tropical troposphere and cooler tropical lower stratosphere (Free and Seidel, 2009;Calvo et al., 2010;Simpson et al., 15 2011), with the zero-crossing in the vicinity of the cold-point (Hardiman et al., 2007). In addition, EN leads to a zonal dipole in temperature anomalies near the tropopause, and in particular to a Rossby wave response with anomalously warm temperatures over the Indo-Pacific warm pool and anomalously cold temperatures over the Central Pacific (Yulaeva and Wallace, 1994; Randel et al., 2000;Zhou et al., 2001;Scherllin-Pirscher et al., 2012;Domeisen et al., 2019). In the tropical tropopause layer (TTL), water vapor increases in the region with warm anomalies and decreases in the region with cold anomalies by ∼ 25% 20 (Gettelman et al., 2001;Hatsushika and Yamazaki, 2003;Konopka et al., 2016).
The net effect of these zonally asymmetric and symmetric changes on water vapor above the tropical cold point is complex.
The two largest EN events in the satellite era (in 1997/1998 and in 2015/2016) were followed by moistening of the tropical lower stratosphere (Fueglistaler and Haynes, 2005a;Avery et al., 2017;Diallo et al., 2018), and the ERA-5 reanalysis, which tracks satellite water vapor well over the last few decades, also shows a clear moistening after the 1982/1983 event (figure 25 3 of Wang et al., 2020). Strong La Nina (LN) events in 1998/1999 and 1999/2000 also clearly preceded elevated water vapor concentrations in the tropical lower stratosphere. The net effect of more moderate events (either LN or EN) is unclear (Gettelman et al., 2001), and there may be a nonlinear effect. Specifically, Garfinkel et al. (2018) found that both strong EN and LN events lead to elevated water vapor concentrations as compared to neutral ENSO in a chemistry-climate model, and indeed such an effect is weakly evident (though not significant) in observations ( Figure 4 of Garfinkel et al., 2018). In addition, 30 there is a strong seasonal dependence of the effect of EN on stratospheric water vapor, with the increase in water vapor for EN and decrease for LN occurring mainly in boreal spring (Calvo et al., 2010;Garfinkel et al., 2013;Konopka et al., 2016;Tao et al., 2019).
The limited length of the observational data record, and the importance of other atmospheric processes (e.g. the Quasi-Biennial Oscillation) which may interact nonlinearly with ENSO (Yuan et al., 2014), limit the confidence with which observed variability during and following ENSO events can be unambiguously associated with ENSO. Several studies have used simulations from single models to try to understand the role of ENSO for entry stratospheric water vapor (Scaife et al., 2003;Garfinkel et al., 2013;Garfinkel et al., 2018;Ding and Fu, 2018), though it is not clear whether the re-5 sults are general to other models. The goal of this study is to consider a wider range of models, with a combined model output of over 2700 years, in order to better understand the response of stratospheric water vapor to ENSO. We focus here on chemistry-climate models, as these models must reasonably simulate entry water vapor otherwise their stratospheric chemistry will suffer from biases.
After introducing the data and methodology in Section 2, we contrast the impact of ENSO on stratospheric water vapor in 12 10 different chemistry climate models. Even though all models simulate a similar response to ENSO in the troposphere and also in the lower stratosphere (warming and cooling respectively), there is no consensus as to the impact of ENSO on stratospheric water vapor. Some models simulate enhanced water vapor for EN in both the winter of the event and the following spring, while other models find an opposite response, while some simulate a nonlinear response with both EN and LN leading to enhanced water vapor in spring (as is evident in GEOSCCM, Garfinkel et al., 2018). In all cases the temperature response near the cold 15 point can explain the divergent responses of water vapor to ENSO.

Data
We examine six models participating in the Chemistry-Climate Model Initiative (CCMI, Morgenstern et al., 2017) and six models participating in phase 6 of the Coupled Model Intercomparison Project (CMIP6 Eyring et al., 2016). However the 20 focus in most of this paper is on the CCMI models for which data is archived at higher vertical resolution, as this allows for a more careful diagnosis of the physical processes. Coupled chemistry-climate models are expected to have more robust interannual variability of temperatures in the lower stratosphere as compared to models with fixed ozone (Yook et al., 2020), and hence we only include CMIP6 models with interactive stratospheric chemistry. CCMI was jointly launched by the Stratosphere-troposphere Processes And their Role in Climate (SPARC) and the Interna- 25 tional Global Atmospheric Chemistry (IGAC) to better understand chemistry-climate interactions in the recent past and future climate (Eyring et al., 2013;Morgenstern et al., 2017). This modeling effort is an extension of CCMVal2 (SPARC-CCMVal, 2010), but utilizes up-to-date chemistry climate models that also include tropospheric chemistry. We consider the Ref-C2 simulations, which span the period 1960-2100, impose ozone depleting substances reported by the World Meteorological Organization (2011), and impose greenhouse gases other than ozone depleting substances as in Representative Concentration Pathway 30 (RCP) 6.0 (Meinshausen et al., 2011). The full details of these simulations are described by Eyring et al. (2013). Note that the GEOSCCM simulations provided to CCMI did not have a coupled ocean, but Garfinkel et al. (2018) has already examined the ENSO-water vapor connection in this model in a coupled ocean configuration. As we are interested in connections between ENSO and the stratosphere, we only consider CCMI models with a coupled ocean in which ENSO develops spontaneously.
We consider all available ensemble members. The CCMI models used in this study are listed in Table 1. Harari et al. (2019) showed that each of these models simulate surface temperature variability in the Nino3.4 region similar to that observed.
All six of these models represent the Quasi-Biennial Oscillation (QBO) (Rao et al., 2020;Richter et al., 2020). In total, more than 2700 year of model output are available. 10 Model output is compared to model-level temperatures in the ERA-5.1 reanalysis (Hersbach et al., 2020) and water vapor from 1993 through 2019 in version 2.6 of the SWOOSH dataset (specifically the combinedeqfillanomfill product, Davis et al., 2016). ERA-5 assimilates available satellite and GPS data in the tropical tropopause layer and has higher vertical resolution (approximately 300m in the tropical tropopause layer) than any previous reanalyses (Hersbach et al., 2020).

15
This study focuses on the impact of ENSO on the stratosphere on interannual timescales, and in order to remove any impacts on longer timescales due to climate change, and also to remove any linear impacts from the Quasi-Biennial Oscillation which is known to affect water vapor (Reid and Gage, 1985;Zhou et al., 2001Zhou et al., , 2004Fujiwara et al., 2010;Liang et al., 2011;Kawatani et al., 2014;, we first use multiple linear regression (MLR) to remove the linear variability associated with greenhouse gases and the QBO from all time series (i.e., the same regression is applied to temperature and water 20 vapor). We use historical CO 2 concentrations for historical simulations and the equivalent CO 2 from the RCP6.0 scenario to track future greenhouse gas concentrations (Meinshausen et al., 2011), and zonal averaged zonal winds from 5 • S to 5 • N at 50hPa with a 2 month lag to track the QBO. We compute the QBO separately for each data source. Tao et al. (2019) found a maximum correlation for a 1 month lag while we find the correlation is higher for a longer lag (not shown), though our conclusions are unchanged if we use 1 month. For consistency, this same MLR procedure is applied to CCMI, CMIP6, and 25 ERA-5/SWOOSH data.
Each CCMI model makes data available at different pressure or sigma levels, which limits the precision with which we can compare models. However differences in the pressure levels at which data are available are generally less than 2hPa, and we consider anomalies of each model from its own climatology. When considering entry water vapor for CCMI we examine the level closest to 80hPa and when considering the cold point temperature we examine the level closest to 90hPa archived by each    Statistical significance of the composite mean response to a given ENSO phase is determined using a Student-t test. The adjusted R 2 (eq 3.30 of Chatterjee and Hadi, 2012) is used to quantify the added value in using a polynomial best fit (e.g.
The adjusted R 2 takes into account the likelihood that a 5 polynomial predictor will reduce the residuals by unphysically over-fitting the data. The polynomial fit can be preferred if the adjusted-R 2 for the polynomial fit is larger by any amount as compared to the linear R 2 , though we only show the polynomial fit if the adjusted R-squared exceeds the R 2 for a linear fit by 33%. Note that the 33% criterion is subjectively chosen, though results are similar for a slightly modified criterion.

5
We begin with the water vapor response to ENSO in the WACCM simulation included in CCMI in Figure 1. At 90hPa and also at higher pressure levels (i.e., lower in the TTL), EN leads to enhanced water vapor and LN to reduced water vapor in both winter and spring. Convection can rapidly mix moist boundary layer air with the TTL (e.g. Levine et al., 2007). Above the cold point, however, the water vapor response is not significant in November and December, but then shows a distinct nonlinearity 5 in subsequent months, with both EN and LN leading to enhanced water vapor. This nonlinear effect is similar to that seen in the GEOSCCM model by Garfinkel et al. (2018) , and is also similar to the effect in SWOOSH observational data (Figure 1).
These results are summarized in Figure 2a, which shows the water vapor response for EN (the events in the right shaded box on Figure 1), LN (the events in the left shaded box on Figure 1), and neutral ENSO (all other events). In January through June, both EN and LN lead to significantly more entry water vapor than neutral ENSO. The pronounced moistening during EN 10 peaks in the spring after the event has already begun to decay. These effects are all consistent with that seen in GEOSCCM in Garfinkel et al. (2018). A generally similar effect is evident in CAM4Chem, which shares code with WACCM.
The four models shown in Figure 2cdef have a qualitatively different response to ENSO than the NCAR models and GEOSCCM. Specifically, HadGEM3-ES, NIWA, MRI-ESM1r1, and EMAC-L47MA all simulate somewhat more water vapor for LN than neutral ENSO (though this effect is generally not statistically significant), and significantly more water vapor 15 for neutral ENSO than EN, in January through April. In NIWA and EMAC-L47MA this effect extends through all calendar months.
This large diversity in the entry water vapor response to ENSO occurs despite the fact that all models simulate a qualitatively similar response in tropospheric and lower stratospheric temperatures. Figure 3 shows the distribution of 15 • S-15 • N temperature as a function of longitude and height for these six models in March and April, the months with the strongest disparity 20 among the models in the response of entry water to ENSO, and a map view of the temperature anomalies at 100hPa and 70hPa are included in the supplemental material. All models are characterized by a more pronounced warming between 200 • E and 250 • E immediately above the region with warming sea surface temperatures as compared to other longitudes, and in all models there is a zonal mean increase in temperature throughout the troposphere. The tropospheric warming peaks in the upper troposphere, and extends up to the TTL near 120 • E in all models. Furthermore, all models simulate a lower stratospheric cooling 25 (above 70hPa) in response to EN and a warming in response to LN. While the magnitude of these features differs among the model, the patterns are robust.
Near the tropopause, however, there is less agreement among the models in the large scale temperature response, and this difference can account for the large diversity in the water vapor responses to ENSO. The middle column of Figure 2 shows the zonally averaged temperature response to ENSO in the tropics near 90hPa. The zonally averaged temperature response 30 to ENSO in WACCM has little resemblance to the water vapor response. Rather, the water vapor response can be better understood by focusing on the coldest region of the tropics. Due to the relative slowness of vertical transport as compared to horizontal transport in the tropical tropopause layer, entry water vapor is sensitive to the coldest regions in the tropics and not just zonal mean temperatures (i.e. the cold point, Mote et al., 1996;Hatsushika and Yamazaki, 2003;Bonazzola and Haynes, effect as follows: We first sort the temperature in all grid points from 15 • S to 15 • N in each bimonthly period. We then calculate the threshold temperature associated with the first quintile, second quintile, etc., of tropical temperatures. We compute these quintiles separately for the EN, LN, and neutral ENSO, and then compute the difference for each ENSO phase from the model climatology. The results of this analysis for the second quintile are shown in the right column of Figure 2a. The coldest 20% of 5 the tropics is ∼0.25K warmer during EN as compared to the model climatology from November through June, while for LN and neutral ENSO the coldest 20% of the tropics is colder than the model climatology. Overall, the correlation between the 20% quintile cold point temperature anomalies and the water vapor anomalies is 0.73 (Table 2). Results are generally similar for CAM4Chem through June: the correlation of entry water with the coldest 20% is positive, while the correlation with zonal mean temperatures is not.

10
HadGEM3-ES, NIWA, MRI-ESM1r1, and EMAC-L47MA all simulate similar temperature responses if we focus on the zonal mean or the coldest 20% of the tropics, though correlations with entry water vapor are higher if we focus on the coldest 20% of the tropics rather than zonal mean temperature (Table 2). For these models, temperatures are warmer for LN than neutral ENSO and colder for EN than neutral ENSO (Table 2). Overall, the temperature response to ENSO in the coldest 20% of the tropics near 90hP a can help account for the substantial inter-model diversity in the response of entry water to the 15 stratosphere. Garfinkel et al. (2013) and Ding and Fu (2018) considered the possibility that sea surface temperatures (SSTs) in the central Pacific may have a different effect on entry water than SSTs in the East Pacific, and the two studies, using different individual models, found that warmer SSTs in the central Pacific lead to dehydration. We evaluate this effect for the CCMI models in . Note that all models simulate a long-term moistening trend of the lower stratosphere if the trend is computed before applying the MLR described in section 2 (trend indicated above Figure 4ghijkl), and of the six models considered, the two with the strongest long-term moistening trend simulate a negative correlation between temperatures at 500hPa and entry water vapor when focusing on interannual variability. Hence there is no evidence that temperatures at 500hPa are a more discriminatory predictor of entry water vapor on interannual timescales than 35 7 ENSO. That being said, it is conceivable that on longer timescales, the magnitude of mid-tropospheric warming would be e.g.
related to an upward expansion of the TTL (a robust response to climate change) and such an expansion of the TTL might be expected to lead to more entry water vapor. A thorough investigation of this possibility is beyond the scope of this paper.

Comparison to observations and CMIP6
What is the observed response of entry water vapor to ENSO? Figure 5a is as in Figure 2a but for SWOOSH entry water 5 vapor, and while both LN and EN are associated with more water vapor, the difference between EN and neutral ENSO and between LN and neutral ENSO is not statistically significant. (Note that if ERA5.1 water vapor is used and the years 1979 to 2019 are considered, the moistening for EN is significant in July and August). Similarly, the regression coefficient of a linear best-fit of entry water vapor with ENSO ( Figure 1) is also not statistically significant (and for ERA5.1 water vapor, the increase is significant in July and August (details not shown)). Despite the lack of a significant effect in observations, the 10 models that appear to be closest to the observed response are the NCAR models and also the GEOSCCM simulations evaluated by Garfinkel et al. (2018).
A complication when comparing the models to SWOOSH entry water is that ∼140 years at least of model data are available for each model while only 27 years of data are available for observations. Hence it is ambiguous whether the difference between models and observations reflects an actual model bias, or alternately might reflect uncertainty given the small observational   (Figure 5bcd). The other models, however, suffer from large discrepancies between the observed and modeled responses to ENSO even when we compare similar sample sizes.
An additional metric to evaluate differences in observed vs. modeled ENSO teleconnections is for the model to simulate 30 a similar amount of variance as compared to that observed, as otherwise the model does not satisfactorily capture internal atmospheric variability (Deser et al., 2017;Garfinkel et al., 2019;Weinberger et al., 2019). We therefore compare the standard deviation of entry water vapor for each model in Figure 6a. The 95% confidence interval of the standard deviation as given by 8 a chi-squared test is indicated with a vertical line. In boreal winter, only HadGEM3-ES and MRI-ESM1r1 simulate realistic variability, with NIWA simulating too much and the other models simulating too little. In boreal summer, all models suffer from unrealistic variability.
Recently, at least six coupled ocean-chemistry climate models have participated in CMIP6, and we now assess the ENSOwater vapor connection in these models: CESM2-WACCM, GFDL-ESM4, GISS-E2-1-G, MRI-ESM2-0, UKESM1-0-LL, and 5 CNRM-ESM2-1. Of these six models, three are newer versions or successors of models that participated in CCMI (CESM2-WACCM, MRI-ESM2-0, and UKESM1-0-LL). Figure 7 is as in Figure 5 but for 70hPa water vapor, as water vapor near 80hPa is not a standard CMIP6 output variable. The observed water vapor response at 70hPa resembles that at 82hPa (Figure 7a vs. Figure 5a). While the models generally agree that LN leads to moistening in winter, the models simulate a wide diversity of responses in the spring and summer following LN and EN. For only one model is the modelled response consistent with 10 observations in that the subsampled response from the model encompasses observations (UKESM1-0-LL). For all other models the observed and modeled response to water vapor are inconsistent in at least one season and one ENSO phase, and while the inconsistency is relatively small for GISS-E2-1-G and MRI-ESM2-0 and to a lesser degree CESM2-WACCM, it is pronounced for CNRM-ESM2-1 and GFDL-ESM4.
The standard deviation of 70hPa tropical water vapor for each CMIP6 model is shown in Figure 6b. While nearly all CCMI 15 models struggled to capture realistic variability, half of the CMIP6 models simulate a realistic amount of variability. Specifically, the CCMI models HadGEM3-ES and MRI-ESM1r1 failed to simulate realistic variability in spring, but the corresponding CMIP6 models UKESM1-0-LL and MRI-ESM2-0 are realistic. GISS-E2-1-G also simulates a realistic amount of variability.
However the other three CMIP6 models simulate too-little variability, though the bias in WACCM is smaller in the CMIP6 CESM2-WACCM than in the CCMI version of WACCM in winter.

20
Biases in the standard deviation of entry water have been shown to be associated with biases in cold point temperature (Hardiman et al., 2015;, and such an explanation can account for the biased variability in some of the models. Figure 8 shows the climatological zonal mean temperature from 10S to 10N in each model in January and February as compared to ERA5.1. The NIWA model suffers from a too-warm cold point and, consistent with this, too-strong variability in entry water. EMAC-L47MA and CNRM-ESM2-1 suffer from the opposite problem: too-cold a cold point and too little 25 variability in entry water. The Met-Office model used in CMIP5 is known to have a warm cold point bias (Hardiman et al., 2015), and this bias is somewhat reduced in CMIP6 (see blue line and circle); this reduced bias is consistent with the improved variability in entry water. WACCM had a similar bias to the Met-Office model in CCMI but was substantially improved for CMIP6 (see red circle and circle), and water vapor variability is improved at least in midwinter. Not all models show a clear correspondence between cold point and water vapor biases, however: the coldpoint warm bias in the MRI model evident in 30 CCMI was reduced in CMIP6, however water vapor variability increased, indicating that other confounding causes may be present.
More generally, there is still an overall tendency for models to have too-warm a cold point, similar to the bias in CMIP5 models (Hardiman et al., 2015), even as entry water vapor variability is generally too weak. These models may not yet adequately simulate all of the processes leading to observed variability in water vapor (e.g. ice lofting), or because the models may 35 9 not include all of the relevant forcing processes (e.g. aerosols in the Asian monsoon) that contribute to observed variability.
Future work to improve models in this region of crucial importance for climate is clearly needed.

Summary
The amount of water vapor entering the stratosphere helps to determine the overall greenhouse effect and also regulates the severity of ozone depletion. The goal of this study is to understand how the comprehensive models that are used for e.g. of the tropics to ENSO able to explain the simulated response to water vapor. 25 The observational record is too short to confidently classify models as "good" or "bad", though most models simulate a response inconsistent with that observed even if we subsample their output to mimic the length of the observational record. Furthermore, nearly all CCMI models and half of the CMIP6 models suffer from biases in the amount of interannual variability in entry water vapor, with most models simulating too little variability. This bias in some models is due to biases in cold point temperature, though note that overall the cold point is too warm in most models (Figure 8 in this paper and Hardiman et al., 2015, 30 for CMIP5). More generally, the too-weak variability could be due to biases in how the models simulate key processes regulating water vapor or due to missing forcings that lead to water vapor variability. Either way, the close correspondence between temperatures in the coldest 20% of the tropics and the simulated water vapor response to ENSO (Table 2) suggests that the Otherwise we show a linear least-squares best fit in each panel. models resolve the most important factor governing entry water vapor variability (Mote et al., 1996;Hatsushika and Yamazaki, 2003;Fueglistaler et al., 2004;Fueglistaler and Haynes, 2005a;Oman et al., 2008;Randel and Park, 2019). The good news is that all three modeling groups that contributed to both CCMI and CMIP6 show an improvement in this bias. Future work is needed to fully consider what led to this improvement, and also to consider the impacts of these changes in the lowermost stratosphere on water vapor higher up.    Note that the pressure level for each model differs due to data availability, and the levels used for this chart are indicated on Figure   2.    Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., and Taylor, K. E.: Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization, Geoscientific Model Development, 9, 1937Development, 9, -1958Development, 9, , 2016 Forster, P. M. and Shine, K. P.: Stratospheric water vapor changes as a possible contributor to observed stratospheric cooling, Geophys. Res. Lett., 26, 3309-3312, doi:10.1029/1999GL010487, 1999 Free, M. and Seidel, D. J.: Observed El Niño-Southern Oscillation temperature signal in the stratosphere, J. Geophys. Res., 114, D23108,