The effect of atmospheric nudging on the stratospheric residual circulation in chemistry-climate models

25 We perform the first multi-model intercomparison of the impact of nudged meteorology on the stratospheric residual circulation using hindcast simulations from the Chemistry-Climate Model Initiative (CCMI). We examine simulations over the period 1980-2009 from 7 models in which the meteorological fields are nudged towards a reanalysis dataset and compare these with their equivalent free-running simulations and the reanalyses themselves. We show that for the current implementations, 30 nudging meteorology does not constrain the mean strength of the stratospheric residual circulation and the inter-model spread is similar, or even larger, than in the free-running simulations. The nudged models generally show slightly stronger upwelling in the tropical lower stratosphere compared to the free-running versions and exhibit marked differences compared to the directly estimated residual circulation from the reanalysis dataset they are nudged towards. Downward control calculations applied 35 to the nudged simulations reveal substantial differences between the climatological lower stratospheric tropical upward mass flux (TUMF) computed from the modelled wave forcing and that calculated directly from the residual circulation. This explicitly shows that nudging decouples the wave forcing and the residual circulation, so that the divergence of the angular momentum flux due to the mean motion is not balanced by eddy motions, as would typically be expected in the time mean. Overall, 40 nudging meteorological fields leads to increased inter-model spread for most of the measures of the mean climatological stratospheric residual circulation assessed in this study. In contrast, the nudged simulations show a high degree of consistency in the interannual variability of the TUMF in the lower 2 stratosphere, which is primarily related to the contribution to variability from the resolved wave forcing. The more consistent interannual variability in TUMF in the nudged models also compares more closely 45 with the variability found in the reanalyses, particularly in boreal winter. We apply a multiple linear regression (MLR) model to separate the drivers of interannual and long-term variations in the simulated TUMF; this explains up to ~75% of the variance in TUMF in the nudged simulations. The MLR model reveals a statistically significant positive trend in TUMF for most models over the period 1980-2009. The TUMF trend magnitude is generally larger in the nudged models compared to their free-running 50 counterparts, but the intermodel range of trends doubles from around a factor of 2 to a factor of 4 due to nudging. Furthermore, the nudged models generally do not match the TUMF trends in the reanalysis they are nudged toward for trends over different periods in the interval 1980-2009. Hence, we conclude that nudging does not strongly constrain chemistry-climate model (CCM) simulated long-term trends in the residual circulation. Our findings show that while nudged simulations may, by construction, produce 55 accurate temperatures and realistic representations of fast horizontal transport, this is not typically the case for the slower zonal mean vertical transport in the stratosphere. Consequently, caution is required when using nudged simulations to interpret the behaviour of stratospheric tracers that are affected by the residual circulation.

More recent theoretical developments offer a means of calculating the diabatic circulation using 115 stratospheric tracers (Linz et al., 2017), which is a promising avenue as this is more closely related to the residual circulation than AoA. Linz et al. (2017) showed consistent estimates of the diabatic circulation in the lower stratosphere based on two independent satellite tracer datasets but identified large uncertainties of up to a factor of two in the mean circulation strength in the upper stratosphere.
Hence, the available tracer datasets are not yet suitable for characterizing trends in the diabatic 120 circulation using these methods. Targeted measurement strategies to better characterize long-term changes in the stratospheric meridional circulation have been proposed (Moore et al., 2014;Ray et al., 2016).
In an attempt to obtain a closer comparison with observed stratospheric trace species, some studies have used model simulations with meteorological fields nudged or relaxed towards analysis or 125 reanalysis datasets (Jeuken et al., 1996). These include studies of stratospheric ozone variability and trends (e.g. van Aalst et al., 2004;Solomon et al., 2016;Hardiman et al., 2017b;Ball et al., 2018), comparisons between models and satellite-based multi-species observational records (Froidevaux et al., 2019) and in particular, focusing on specific meteorological events such as the Sudden Stratospheric Warming in the 2009/2010 winter (Akiyoshi et al., 2016), as well as the chemical and climatic effects of 130 volcanic eruptions (Löffler et al., 2016;Solomon et al., 2016;Schmidt et al., 2018). Nudged simulations have also been used to study mechanisms for dynamical coupling between the stratosphere and troposphere (Hitchcock and Simpson, 2014) and to examine the effects of different regions on atmospheric predictability (e.g. Douville, 2009;Jung et al., 2010). Nudging involves adding additional tendencies to the model equations to constrain the modeled variables. Nudged variables can include 135 horizontal winds (or divergence and vorticity), temperature, surface pressure, latent and sensible heat fluxes. However, vertical winds, which are a small residual from horizontal divergence, are not nudged and the underlying model physics can yield quite different results from the datasets they are nudged towards (Telford et al., 2008;Hardiman et al., 2017).
The approach of nudging a CCM towards reanalysis data follows a similar philosophy to traditional 140 off-line chemical transport models (CTM), though there are fundamental differences between these types of model in terms of their tracer advection. CTMs need to match the mass transport with the evolution of the pressure field. This can be done exactly in isobaric coordinates (often used in the stratosphere) but requires a correction in regions where grid box mass changes (e.g. as surface pressure changes). CCMs are less affected by this mass-wind inconsistency than CTMs (Jöckel et al., 2001), but 145 nudging will add forcings that are inconsistent with the model state. CTMs use the full 3-D circulation from the (re-)analyses directly and have been widely developed and used over the past few decades (e.g. Rood et al., 1988;Chipperfield et al., 1994;Lefèvre et al., 1994). They have proven to be very successful at simulating stratospheric tracers on a range of timescales (Chipperfield, 1999), including decadal changes (Mahieu et al., 2014). However, this success has been built on extensive testing of the 150 optimum way to use the reanalysis data to force the CTMs. For example, Chipperfield (2006) showed how different approaches to calculating the vertical velocity in the TOMCAT/SLIMCAT model could lead to very different distributions of stratospheric age of air, while Monge-Sanz et al. (2013a) compared the performance of different European Centre for Medium-Range Weather Forecasts (ECMWF) analyses within the same CTM framework. Krol et al. (2018) recently provided a summary 155 of how current CTMs intercompare for tracer calculations. Monge-Sanz et al. (2013b) compared the approaches of using ECMWF analyses directly in a CTM with the ECMWF CCM nudged using the same analyses. They found that the CTM and nudged CCM were consistent in showing a degraded performance when using older ERA-40 reanalysis compared to the later ERA-Interim. However, they also showed some differences between CTM and nudged-CCM tracers using the same analyses, with 160 the nudged CCM showing stronger upward motion in the tropical stratosphere. Therefore, with regards to the slow residual circulation, one cannot assume that a nudged CCM will behave in a similar way to a CTM even when using the same meteorological analyses. Recently, Ball et al. (2018) showed 2 nudged CCMs which failed to capture the observed variations in the lower stratospheric ozone as measured by satellite observations, while Chipperfield et al. (2018) using the TOMCAT CTM simulated a better 165 agreement of modeled ozone variations with the observations. Overall, the success of some CTM simulations in simulating long-lived stratospheric tracers has been built on many years of model development and testing. In contrast, nudged CCMs are much newer tools and have not yet been evaluated to the same extent. A recent study by Orbe et al. (2018) analyzed tropospheric tracers in nudged CCM simulations and found large differences in the distributions of the tracers, which could be 170 partly traced to differences in the model convection schemes. They urged users to adopt a cautious approach when interpreting tracers in nudged simulations given their dependence on not only largescale flow but also sub-grid parameterizations. However, a critical evaluation of the stratospheric residual circulation in nudged CCM simulations has been lacking to date. and section 4 summarizes the results and discusses the implications for using nudged simulations to 185 study aspects of the observational record.

Models and experiments
CCMI is the successor activity to CCMVal-2 and the Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP; Lamarque et al., 2013). We use the hindcast free-running 190 simulations, REF-C1, and the nudged specified dynamics simulations, REF-C1SD, which cover the periods 1960-2009and 1980-2009. Here we analyze the common 30-year period 1980-2009 that was run by all models for both experiments with prescribed observed SSTs and sea ice concentrations. The CCMI data were downloaded from the British Atmospheric Data Centre (Hegglin and Lamarque, 2015). For an extensive overview of the CCMI models see Morgenstern et al. (2017). 195 We analyze those CCMI models that output the necessary TEM diagnostics. At a minimum, this requires the residual vertical velocity (w ̅ * ) and the residual meridional velocity (v ̅ * ) (Andrews et al., 1987); where available we also use the resolved and parametrized wave forcing fields from the models.
This gives results from a total of 10 models which differ from one another in various aspects such as their horizontal resolution, ranging from 1.9° to 5.6°, their vertical resolution as well as their sub-grid 200 parameterizations (see Table 1). The main text concentrates on the 7 out of 10 models that performed both the REF-C1 and REF-C1SD experiments (Table 2) simulations nudge temperature and other meteorological fields such as horizontal winds, vorticity and divergence and some surface fields (Table 2), while the chemical fields are left to evolve freely. The 210 nudging timescales range from 6 -50 hours and the height range over which nudging is applied varies ( Table 2). The TEM and related diagnostics that were available from each model are shown in Table 3.
The models use different reanalysis fields for nudging taken from ERA-Interim (Dee et al., 2011), JRA-55 (Ebita et al., 2011;Kobayashi et al., 2015) or MERRA (Rienecker et al., 2011). The differences in the residual circulation diagnosed from reanalyses have been identified and documented in previous 2.2. Model diagnostics

TEM residual circulation
The TEM velocities (v ̅ * ,w ̅ * ) are defined as (Andrews et al., 1987): where Ψ ̅ * (φ,z) is the residual meridional mass streamfunction, ρ 0 is log-pressure density, α is Earth's radius and ϕ is latitude. As most of the models analyzed here use a hybrid-pressure vertical coordinate, the prognostic variable is the pressure vertical velocity, ω ̅ * calculated in Pa s -1 , which must be converted to m s -1 in order to get the residual vertical velocity, w ̅ * . The conversion of ω to w in isobaric 225 coordinates is given by the following equation: where p is pressure, R = 287 J K -1 kg -1 is the gas constant for dry air and H is a fixed scale height. Both TEM velocity components were submitted as monthly mean fields to the CCMI data archive. Upon close examination of the CCMI model output, some discrepancies were found in the way that the 230 residual vertical velocity was calculated among the models. Although a fixed scale height of H = 6950 m was recommended in the CCMI data request (Eyring et al., 2013;Hegglin and Lamarque, 2015), the TEM output from some models (EMAC and SOCOL3) was calculated incorrectly using a temperature-dependent density, ρ 0 = p RT ⁄ , instead of the log-pressure definition of the density ρ 0 = ρ s • e − ⁄ , such that z has a unique 1:1 correspondence with p. This methodological error leads to artificial 235 spread in the model w ̅ * fields. We note that previous multi-model comparisons of the residual circulation that use w ̅ * taken directly from models may have been subject to the same issue (e.g. Butchart et al., 2010;SPARC, 2010), though we cannot confirm this. To avoid this methodological inconsistency, Dietmüller et al. (2018) recalculated w ̅ * from v ̅ * using the continuity equation, which requires a vertical integration and a derivative along the meridional direction. The recalculation of w ̅ * 240 from v ̅ * was also explored for this study, but it was found to introduce additional errors affecting the latitudinal structure of w ̅ * (not shown), specifically because of the reduced number of CCMI requested pressure levels compared to the native model levels. We were able to overcome the discrepancy in the submitted w ̅ * fields for the EMAC simulations by reconverting high frequency ω ̅ * output to w ̅ * using the log-pressure density as in equation 2. However, for SOCOL3 the required output for this was not 245 available and hence we use the submitted w ̅ * for which the absolute values should be treated with caution. For the other models, the results presented in this study are based on the original diagnostics submitted to the CCMI data archive, which we have verified were calculated in the correct way.
We compute the mass flux across a given pressure surface as (Rosenlof, 1995): 2π ∫ ρ 0 a 2 cosϕ pole ϕ w ̅ * dϕ = 2πaΨ * (ϕ), (3) 250 using the boundary condition that Ψ = 0 at the poles. By finding at each pressure level the latitude at which Ψ max and Ψ min occur, which correspond to the height-dependent turnaround (TA) latitudes, we can calculate the net downward mass flux in each hemisphere. The net tropical upward mass flux, equal to the sum of the downward mass fluxes in each hemisphere, can then be expressed as (Rosenlof, 1995): The TUMF has been used widely as a measure of the strength of the BDC (e.g. Rosenlof, 1995;Butchart and Scaife, 2001;Butchart et al., 2006Butchart et al., , 2010Butchart et al., , 2011Butchart, 2014 and references therein;260 Seviour et al., 2012), so its use here enables a direct comparison with earlier studies. Arguably, the strength of the TUMF is a first-order metric for evaluating changes in the stratospheric mass circulation as a consequence of nudging. As mentioned above, by calculating the annual means of TUMF accounting for the seasonal cycle of the TA latitudes we capture the correct evolution of the intraseasonal (not shown) and interannual variability in the TUMF. 265

Downward control principle calculations
Under steady-state conditions, Ψ ̅ * (φ,z) at a specified latitude φ and log-pressure height z is given by the vertically integrated eddy-induced total zonal force, F ̅ , above that level (Haynes et al., 1991): where in the quasi-geostrophic limit m ϕ ≈ -2Ωa 2 sinϕcosϕ. The above integration applies along lines of constant mean absolute angular momentum per unit mass, m=acosϕ(u ̅+aΩcosϕ) where u is the zonal mean zonal wind and Ω is Earth's rotation rate, with boundary conditions of Ψ→0 and ρ 0 w ̅ * → 0 as z → ∞. These lines of constant angular momentum are approximately vertical except near the equator 275 (up to ~ ±20°) such that we can calculate the solution of the above integral using constant φ for the limits of the integral (Haynes et al., 1991). In climate models, ̅ has contributions from resolved waves due to the Eliassen Palm flux divergence (EPFD) and from parameterized gravity wave drag due to subgrid scale waves that originate from orography, convection and frontal instabilities. This enables us to estimate the contribution to the tropical upward mass flux of both resolved planetary wave driving 280 (EPFD) and the orographic (OGWD) and non-orographic (NOGWD) parameterized gravity wave drag from the CCMI model output (Table 2) and compare with the direct estimates derived from w ̅ * .
Applying the downward control principle (Haynes et al., 1991) can provide useful insights into the driving mechanisms of the stratospheric residual circulation and therefore explain part of the intermodel spread found in both  simulations. While the downward control principle 285 enables the contributions of EPFD and OGWD/NOGWD to TUMF to be calculated under various assumptions (Haynes et al., 1991), one has to keep in mind that the different wave forcings can interact and thus are not independent of each other (Cohen et al., 2013).
It is important to note some possible limitations of the diagnostic approaches chosen for this study.
Both the direct and downward control principle methods rely on the applicability of quasi-geostrophic 290 theory to interpret the results. In addition to the two approaches used here, the residual circulation can also be estimated using the thermodynamic equation. Studies have shown that the estimates from the different methods for evaluating the residual circulation can differ (Seviour et al., 2012;Abalos et al., 2015;Linz et al., 2019), particularly in reanalyses where standard global conservation laws (e.g. conservation of mass) are generally not required to be met. Similar issues are likely to beset the nudged 295 model simulations owing to the additional tendencies included in the model equations. The differences between the calculation methods for the residual circulation can be as large as, or larger than, the differences between reanalysis datasets for the same diagnostic (Abalos et al., 2015;Linz et al., 2019), and may further depend on choices around averaging between fixed latitudes or the TA latitudes (Linz et al., 2019), so it is important to bear this in mind in interpretation of the results presented here. 300 Unfortunately, heating rates were not available from all CCMI model simulations to perform the thermodynamic equation calculation. Nevertheless, we compute the direct and downward control principle diagnostics for the residual circulation in a self-consistent manner in the models and reanalyses to enable comparison with earlier multi-model studies (Butchart et al., 2006(Butchart et al., , 2010SPARC, 2010). 305

Multiple linear regression model
To investigate the drivers of interannual variability in the residual circulation we apply a multiple linear regression (MLR) model (equation 6) to the annual mean timeseries of TUMF. The model includes terms for known drivers of variations in tropical lower stratospheric upwelling: major volcanic 310 eruptions (Pitari and Rizi, 1993), El Nino Southern Oscillation (ENSO) (García-Herrera et al., 2006;Marsh and Garcia, 2007;Randel et al., 2009), the Quasi-Biennial Oscillation (QBO) (Baldwin et al., 2001) and a linear trend (Calvo et al., 2010).
where β 0 is a constant, β i is the regression coefficient for basis function x i and ε(t) is the residual.
Following Maycock et al. (2018), the volcanic basis function is defined as the tropical lower stratospheric average volcanic surface area density (SAD), the ENSO basis function is the timeseries of east-central equatorial Pacific Ocean SST anomalies (Niño 3.4 index; 5°S to 5°N; 170°W to 120°W), 320 the two QBO terms are the first two principal component timeseries from an empirical orthogonal function (EOF) analysis of the zonal mean zonal winds between 10°S-10°N and 70 to 5 hPa, and a linear trend. The first three regressors, volcanic, ENSO and the linear trend are identical for both REF-

Reanalysis Data
In order to compare the REF-C1 and REF-C1SD simulations against the reanalysis datasets used for the nudging, we use the SPARC Reanalysis Intercomparison Project (S-RIP) dataset (Martineau, 2017;Martineau et al., 2018). This provides a common gridded version of the reanalysis TEM fields on a 2.5 × 2.5 grid up to 1 hPa. The pressure vertical velocity, ω ̅ * , is converted to the residual vertical velocity, the magnitude of the circulation in REF-C1SD (whether upwelling or downwelling) is larger than in REF-C1. As expected, the climatologies show upwelling in the tropics between around 30°S to 30°N and downwelling at higher latitudes. In the lowermost stratosphere (100-80 hPa), within the region of 350 tropical upwelling, the REF-C1SD MMM generally shows larger w ̅ * values in the subtropics and smaller values at the equator compared to REF-C1, indicating a tendency for a more double peaked w ̅ * structure in the tropics in the lowermost stratosphere (Ming et al., 2016a) simulates a tri-modal w ̅ * structure while MRI-ESM-r1 shows a relatively constant w ̅ * across the tropics.
For the REF-C1 experiment, both EMAC simulations, CMAM and SOCOL3 show a narrower double peaked structure, with EMAC-L47 exhibiting a rather pronounced NH subtropical maximum. 390 Conversely, CESM1-WACCM simulates the broadest region of tropical upwelling in the lower stratosphere, with the SH subtropical maximum occurring at higher latitudes compared with the rest of the models. The other REF-C1 simulations also exhibit a double peaked w ̅ * structure, which is generally more hemispherically symmetric, but with varying amplitudes.
A double-peaked w ̅ * structure in the lower stratosphere has previously been shown in reanalysis 395 datasets (Abalos et al., 2015;Ming et al., 2016) and some CCMs (Butchart et al., 2006(Butchart et al., , 2010. This can also be seen in Figure   However, as implemented in these simulations (Table 2), nudging does not strongly constrain the mean amplitude and structure of the residual circulation nor does it produce circulations that closely resemble the direct estimates from the reanalyses. We now compare the TUMF in each REF-C1SD experiment with the reanalysis it was nudged towards (Figure 4d). Taking at first a broad view of the entire profiles, there is a resemblance between the profiles of TUMF differences in EMAC-L47 and SOCOL3 as compared to ERA-I, which may be 465 related to the similarities in the implementation of nudging in these models; for example, vorticity and divergence were nudged with the same relaxation parameters (see Table 2 To understand the dynamical factors that contribute to the modelled climatological residual circulation and its spread, Figure 5 shows the annual mean TUMF at 70 hPa along with the downward control calculations (section 2.2.2) to quantify the contribution of resolved and parameterized wave forcing to the TUMF. The black bars on the left show the TUMF diagnosed from w ̅ * and the grey bars on the right show the estimated contribution to TUMF from the Eliassen Palm flux divergence (EPFD, dark grey), not provide wave forcing fields (Table 3), so we cannot perform the downward control calculations for that model.
In the free-running REF-C1 simulations (Fig. 5a), the estimated TUMF from the total wave forcing for the majority of the models (apart from CESM1-WACCM and EMAC-L90), slightly exceeds the TUMF 490 calculated directly from w ̅ * . Since these simulations are internally consistent, the imperfect match indicates that the downward control principle as applied here relies on the close but inexact applicability of certain assumptions such as the system being in a steady-state in response to a steady mechanical forcing (Haynes et al., 1991). The REF-C1 inter-model range in TUMF at 70 hPa is 5.74×10 9 to 6.62×10 9 kg s -1 (inter-model standard deviation = 0.29×10 9 kg s -1 ). Comparing the CCMI results in 495 Figure 5a with the results from CCMVal-2 models (see Figure 4.10; SPARC, 2010), the MMM TUMF at 70 hPa for the seven REF-C1 model simulations analysed here (6.05×10 9 kg s -1 ) is within the intermodel range of the fourteen CCMVal-2 models, which show a MMM TUMF around 4% weaker (5.8×10 9 kg s -1 ) (SPARC, 2010). In terms of the contribution of the resolved wave forcing to the TUMF in the free running simulations, there appears to be a decreased inter-model range (3.26×10 9 to 5.33×10 9 500 kg s -1 ) in the present study compared with the CCMVal-2 models, albeit that study included more models (1.5×10 9 to 5.5×10 9 kg s -1 ) (SPARC, 2010). Some CCMI models have increased their horizontal resolution by up to a factor of two (CMAM, MRI-ESM1r1, SOCOL3) and also their vertical resolution up to 80 vertical levels (MRI-ESM1r1) compared with CCMVal-2 models (Dietmüller et al., 2018), which could improve their ability to simulate resolved wave forcing. There is a notable feature of 505 CMAM which shows that the NOGWD contributes negatively to TUMF (indicated with two red horizontal lines on Figure 5 and Supplementary Figure S11 (0.44×10 9 kg s -1 and 0.72×10 9 kg s -1 , respectively). Nonetheless, the residuals (i.e. the difference 515 between the directly calculated TUMF and the total downward control estimated contribution from the wave forcing) are substantially larger and positive (except for EMAC-L90) in the REF-C1SD experiment than in REF-C1. This shows that nudging adds an additional non-physical tendency in the model equations which acts to decouple the wave forcing from the residual circulation; this means the physical constraint that the divergence of the angular momentum flux due to the mean motion is 520 balanced over some sufficient time average by that of all eddy motions does not apply in the nudged models (Haynes et al., 1991). The details of how this decoupling is manifested is likely to vary from one model to another depending on multiple factors such as nudging timescales, nudging parameters, nudging height range, and model resolution. Comparison of the TUMF at 10 hPa for the REF-C1SD experiment (see Supplementary Figure S11b) also reveals substantial differences in some models 525 between the direct and downward control TUMF estimates in the middle stratosphere. Variations in the residuals as a function of height may indicate differences in the effect of nudging on the connection between the climatological wave forcing and the shallow and deep branches of the circulation (Birner and Bönisch, 2011). However, the inter-model ranges in the directly calculated TUMF at 10 hPa are more comparable in the two experiments than was found at 70 hPa (1.45×10 9 to 1.70×10 9 kg s -1 and 530 1.51×10 9 to 1.72×10 9 kg s -1 for REF-C1 and REF-C1SD, respectively) (Supplementary Figure S11b).
Interestingly, for the single simulations that were nudged towards MERRA and JRA-55 (CESM1-WACCM and MRI-ESM1r1, respectively) the TUMF at 70 hPa in the REF-C1SD runs appear to be close to the estimates from the reanalyses they are nudged towards (compare black bars in Figure 5b). This may simply be a coincidence given that there are substantial differences in the structure of w ̅ * 535 between the REF-C1SD simulations for those models and the reanalyses (Figures 3b and 3d), and this is not found for all 5 models that were nudged towards ERA-I. Indeed, given there is substantial spread in TUMF amongst the 5 REFC1-SD models nudged to ERA-I, it is likely that the differences between the REFC1-SD and reanalysis datasets are related to how nudging was implemented in each model; a wide variety of relaxation timescales and vertical nudging ranges were used by the models (Table 2) constraints on the global circulation, such as conservation of momentum and energy. This is found to alter the residual circulation but in a manner that cannot be understood from a closure of the circulation through the integrated wave forcing as would ordinarily apply in the downward control principle (Haynes et al., 1991).

Annual cycle
We now evaluate the representation of the annual cycle in the residual circulation. Figure 6 Figure   7b) reveals that on average the nudged models show a slightly larger peak-to-peak annual cycle amplitude (0.16 mm s -1 vs. 0.13 mm s -1 ). In general, the amplitude of the annual cycle in tropical mean w ̅ * is slightly more constrained across the REF-C1SD simulations with the spread in peak-to-peak amplitude, as measured by the inter-model standard deviation, being around 25% smaller than in REF-575 C1 ( = 0.015 mm s -1 vs. 0.020 mm s -1 , respectively). In terms of seasonal mean behaviour, the nudging appears to constrain the tropical mean w ̅ * in boreal summer (JJA), which exhibits ~20% less spread than in the REF-C1 experiments, but it does not constrain the tropical mean w ̅ * in boreal winter (DJF), which shows a factor of two larger spread than the free running models. Furthermore, the differences in tropical mean w ̅ * between the REF-C1SD runs and the respective reanalysis they are nudged towards 580 are generally larger in boreal winter than in boreal summer for most models. In terms of spatially- There are also substantial differences between the TA latitudes in the REF-C1SD experiment and the 595 reanalyses in all months, which shows that nudging does not produce consistent structures of regions of upwelling and downwelling to those in the reanalysis. To summarize the results of Figure 7, there is substantial inter-model spread in the TA latitudes and in the amplitude of the annual cycle in w ̅ * highlighting significant interhemispheric differences in the upwelling region between both sets of simulations as well between the nudged experiment and the reanalyses. 600 suggest there is some compensation occurring between the different parameterised wave forcing components (e.g. Cohen et al., 2013). It should be noted that the reanalyses have been shown to exhibit strong similarities in their resolved EP fluxes as shown by the linear correlation in the timeseries of 625 tropical upwelling at the 70 hPa level when considering the momentum balance estimates of w ̅ * (Abalos et al., 2015). This result indicates that although nudging does not constrain the mean residual circulation, it does constrain the interannual variability and produces similar contributions to variability across models from both resolved and parameterized wave forcing. In contrast, the REF-C1 simulations show a highly variable pattern of the estimated TUMF anomalies from EPFD and parameterized wave 630 forcing (Figures 9c, 9i), despite the fact they use the same observed SSTs and some nudge the phase of the QBO (CCSRNIES-MIRCO3.2, CESM1-WACCM, EMACL47/L90 and SOCOL3). In summary, the remarkably coherent interannual variability in the annual TUMF timeseries in the  simulations is due to both the resolved and parametrized wave forcing being constrained by nudging; this is in strong contrast to the climatological strength of the TUMF where there were large differences 635 between the directly calculated TUMF and that due to wave forces (Figure 5b). The reasons for the difference in the effect of nudging on the behaviour of the residual circulation between the long-term mean and interannual variability is unclear.  Diallo et al., 2017). However, there is still a considerable range of amplitudes and 650 only the CESM1-WACCM and MRI-ESM1r1 regression coefficients are highly significant (Supplementary Figure S14). The issue of establishing a robust response of the TUMF to volcanic forcing over a short period is demonstrated by the range in amplitudes of the volcanic regressors for different REF-C1 ensemble members from the same model (see Supplementary Figures S6-S9). This highlights that in a free-running climate simulation, internal variability can overwhelm the response to 655 forcing over short timescales. The "true" volcanic signal in TUMF will also depend on the representation of stratospheric heating due to aerosol in the various models. We note that the EMACL47/L90 models contained a unit conversion error where the extinction of stratospheric aerosols was set too low by a factor of ~500 (see Appendix B4 of Morgenstern et al., 2017), hence the stratospheric dynamical effects of the eruptions were not properly represented in the EMAC simulations The REF-C1 models all show a positive best estimate regression coefficient for the TUMF response to ENSO (Figure 10), which is quite consistent in amplitude, but it is only strongly statistically significant in CCSRNIES-MIROC3.2 and SOCOL3 ( Figure S5). This is in contrast to the REF the other reanalyses and the REFC1-SD models. The residuals are generally less correlated between the reanalyses on interannual timescales than was found in the REFC1-SD simulations, but are broadly similar on inter-decadal timescales. However, the regression residuals in the reanalyses show a different temporal behavior from those in the REF-C1SD simulations ( Figure 11) (note that the y-axis scale for 700 the residuals in Supplementary Figure S15 is double that for the CCMI models in Figures 10 and 11). In summary, although nudging constrains the interannual variability in the TUMF at 70 hPa, the attribution to some specific drivers differs across the models and in comparison, to the reanalyses they were nudged towards.

Trend sensitivity analysis 705
Following the results of the MLR analysis described in section 3.5, which showed a statistically  (Polvani et al., 2018).

Conclusions
This study has performed the first multi-model intercomparison of the impact of nudged meteorology on the representation of the stratospheric residual circulation. We use hindcast simulations over 1980- experiments differs markedly from that estimated for the reanalysis they are nudged towards. 755 3. In most of the nudged models there are large differences of up to 25% between the directly calculated tropical upward mass flux in the lower stratosphere and that calculated from the diagnosed total wave forcing using the downward control principle (Haynes et al., 1991). coherence, but this is not the case for the free-running simulations.
6. The results of the MLR analysis applied to the TUMF in the reanalyses show differences in the total variance explained and the attribution of variance to the different physical proxies. There are also marked differences between the individual regression coefficients derived for the REF- C1SD models compared to the reanalysis dataset used for nudging. 775 7. Most nudged simulations show a statistically significant positive trend in TUMF in the lower stratosphere over 1980-2009, which is on average larger than the trends simulated in the freerunning models. This is despite the fact that five out of the seven models analyzed were nudged towards ERA-Interim, which shows a negative long-term trend in TUMF (see also Abalos et al., 2015), while JRA-55 and MERRA show a positive trend. However, the 780 magnitude of the TUMF trend varies by up to a factor of 4 across the nudged models, which is larger than the spread in the free-running simulations. This is an important limitation for using nudged CCM simulations to interpret long-term changes in stratospheric tracers. Our findings highlight that nudging strongly affects the representation of the stratospheric residual 790 circulation in chemistry-climate model simulations, but it does not necessarily lead to improvements in the circulation. Similar disagreement in the characteristics of tropospheric transport in the CCMI nudged simulations has also been reported (Orbe et al., 2018). The differences found in the nudged runs compared with the free-running simulations suggest that although nudging horizontal fields can remove model biases in, for example, temperature and horizontal wind fields (Hardiman et al., 2017b), the 795 simulated vertical wind field will not necessarily be similar to the reanalysis. A particularly interesting finding of our study is that while nudging does not constrain the mean strength of the residual circulation, it does constrain the interannual variability. The reason for the distinct effects of nudging on the residual circulation across these different timescales is currently unknown.
Multiple factors are likely to determine the effect of nudging on the residual circulation in a given 800 model including model biases, nudging timescales, nudging parameters, nudging height range, and model resolution. The differences in the stratospheric residual circulation between the REF-C1SD and the REF-C1 runs may not arise solely from the dynamics, but can also be partly influenced by the indirect effects of nudging the temperatures which in turn affects the diabatic heating (Ming et al., 2016a(Ming et al., , 2016b. In addition to nudging the horizontal winds (mechanical nudging), nudging the 805 temperature (thermal nudging) might be systematically creating a spurious heat source in the model, which leads to a stronger BDC in the lower stratosphere as suggested by Miyazaki et al., (2005) with MRI GCM. Our results highlight that the method by which the large-scale flow is specified and more specifically the choice of the reanalysis fields, the relaxation timescale and the vertical grid (pressure level versus model level) in which the nudging is applied needs to be better understood and evaluated 810 for their influence on the stratospheric circulation. Discrepancies between the vertical grid of the models and the reanalysis pressure levels they are interpolated onto or unbalanced dynamics are possible explanations for the differences found between the directly inferred circulation and that diagnosed from the wave forcing in the nudged simulations. Nudging would either violate continuity, or if continuity is maintained, it will come at the expense of the vertical fluxes, which are not nudged. The interesting 815 aspect here seems to be that this results in substantial change to the net fluxes across a range of timescales, i.e. it does not only increase numerical noise in the w ̅ * component.In order to reduce discrepancies between nudged and free-running simulations, various nudging techniques have been investigated. The role of gravity waves in the error growth that the nudging introduces over time has been highlighted for a single model (Smith et al., 2017). Constraining just the horizontal winds without 820 the temperature was found to be a good strategy when investigating the aerosol indirect effects without affecting significantly the mean state (Zhang et al., 2014). The relaxation timescale when applying the nudging has been found to play an important role in single model studies (Merryfield et al., 2013), but there is no general consensus for the value of the relaxation constant, which is model-specific for the simulations considered here (Morgenstern et al., 2017). Given the varying implementations of nudging 825 in the models analysed here, our study is ill-suited to investigate in detail the mechanisms for how nudging affects the residual circulation. A dedicated study of the sensitivities within one model to relaxation timescales, nudging parameters, nudging height range and vertical resolution would help to offer a detailed explanation for these differences.
The large spread in climatological residual circulation in nudged CCM simulations is an important 830 limitation for those wishing to use them to examine tracer transport, for example stratospheric ozone trends (Solomon et al., 2016), volcanic aerosols (Schmidt et al., 2018), and diagnostics for age-of-air (Dietmüller et al., 2018). Despite the limitations for transport within the stratosphere described here, some success has been reported in studies that used nudged simulations to investigate specific meteorological events such as Sudden Stratospheric Warmings, and in particular for exploring processes 835 beyond the top of the nudging region in the Mesosphere-Lower Thermosphere (e.g., Tweedy et al., 2013;Chandran and Collins, 2014;Pedatella et al., 2014). In conclusion, owing to the limitations of the current techniques for nudging models highlighted here, we urge caution in drawing quantitative comparisons of stratospheric tracers affected by the residual circulation in nudged simulations against stratospheric observational data. 840

Data availability
The majority of CCMI-1 used in this study can be obtained through the British Atmospheric Data

Competing Interests
The authors declare that they have no conflict of interest