Comment on acp-2021-206

The authors document the climatology of Brewer-Dobson Circulation in CMIP6 models, and its response to forcing in the historical and 1pctCO2 doubling integrations. They contrast the behavior of models with available observations/reanalyses, and provide a process oriented exploration of the residual circulation, breaking down the role of resolved waves vs. parameterized gravity waves. This is the first time CMIP class models have provided the necessary output for this analysis, and I expect this paper to become an important reference point for our understanding and discussion of the BDC. I therefore strongly recommend publication of this thorough and well written manuscript pending consideration of the minor suggestions below.

oriented exploration of the residual circulation, breaking down the role of resolved waves vs. parameterized gravity waves. This is the first time CMIP class models have provided the necessary output for this analysis, and I expect this paper to become an important reference point for our understanding and discussion of the BDC. I therefore strongly recommend publication of this thorough and well written manuscript pending consideration of the minor suggestions below.
I hope that authors see my suggestions below as a genuine attempt to help improve the paper. This is a very strong manuscript, and I very much support its publication.

Ed Gerber
General minor suggestions 1) I feel there was tension, starting from the abstract, about the narrative on the comparison of observations and models, particularly in the upper stratosphere. For instance, at line 4 of the abstract suggest that the models are inconsistent, but then immediately following, at line 6, it is suggested that there is great uncertainty in the model trends. I am not an expert in the observed trends, and my main suggestion is chiefly to be more consistent with the message. Do the authors mean something like "while there is great uncertainty in trends in the upper branch of the BDC in models, model trends appear to be statistically distinguishable from observed trends"? If this is the case, I would first highlight the uncertainty, and then state that despite this great uncertainty, models cannot be reconciled with observations. And this said, I continue to worry that the uncertainty in observed trends may be underestimated. Am I correct that the key mismatch is with Engel et al. 2017 (Air Core measurements at two sites) and MIPAS retrievals from Stiller et al. (2012Stiller et al. ( , 2020, though the MIPAS estimate has become closer with the revision of our treatment of SF6 (Fritsch et al. 2020).
uncertainties associated with the fact that observation estimates (as with air core samples) are based on sparse measurements relative to the model based estimates using global averages. https://journals.ametsoc.org/view/journals/atsc/68/1/2010jas3527.1.xml To be constructive, I am curious if an apples-to-apples comparison with Engel et al would be possible. As highlighted by Garcia and Randall (2011), the uncertainty on age of air may increase if you only sample it at a few locations and times, as opposed to globally. I suspect that model based estimate of uncertainty will increase markedly with limit sampling.
And finally, given the uncertainty associated with SF6 decay rates, I still worry that maybe our problem is being able to model SF6, as opposed, to being able to model age.
All this said, this was meant to be a minor suggestion. If the message is that models are still inconsistent, I would just highlight that there is a lot of uncertainty first, and then say that despite this, we cannot yet reconcile model trends with available observations.
2) As noted by the authors, there term Brewer-Dobson Circulation has been used in many ways in the literature. As I feel this paper will become a very important reference point for the BDC, I would urge the authors to set a tone of best practices, and always refer to w* as the residual circulation (or the diabatic circulation / mean overturning circulation).
An example where this would be helpful would be lines 323-4, where I think the authors mean to refer to changes in the residual circulation. Even though w* weakens in the southern hemisphere polar vortex (e.g. Figure 7), the age of air consistently decreases here. In the sense of tracer transport, then, the models are still suggesting an increase, even though w* has the opposite trend.
Note that I regret that I myself have used the terms loosely in the past! This meant as a minor suggestion.
Minor suggestions by line number 2 consider "...in order to simulate surface climate variability and change." 12 I would have thought the BDC describes the transport of *mass*, heat, and trace gases. The difference between the net transport of mass vs. trace gases is a nice way to highlight the role of isentropic mixing. 20 Consider deleting "which accounts for zonal asymmetries and" so that this reads " and two-way mixing, the irreversible tracer transport..." My concern is that residual mean circulation depends fundamentally on eddies (in many regions, the "zonal mean" transport is in the opposite direction), and I wouldn't want a reader to think that eddies only matter for the mixing.
27 Consider "transport diagnostic that quantifies the elapsed time" 30 Linz et al. 2017 use AoA measurements to quantify the residual circulation. It might be fair to include a discussion of this paper here, or perhaps later on, in the discussion of observations. Linz et al. found that MIPAS SF6 age would imply huge problems with the reanalyses and models, or could reflect uncertainty in the lifetime of SF6.
uncertainty. There are only 8 models, so naively, you are saying 6/8 models must agree. But for the residual circulation, there are only 7, and AoA, only 5 at best. Perhaps you could say, we ask that at least 2/3rds of the model agree, which in practice meant 5 of 7 for diagnostics of the residual circulation, and 4 of 5 for the age of air. 141 consider "which quantifies the influence of" 165 It might be appropriate to also reference Linz et al. 2016, which makes this very explicit. 184 I think AoA converges faster not just because it has memory (integrating in time), but also because it integrates in space. The age at any point in the atmosphere depends not just on the local circulation, but on the integrated solution below. You could simply state "being an integrated quantity in both time and space." 212 Weaker trends in the tropics relative to the high latitudes is consistent with the acceleration of the residual circulation. As suggested by Linz et al, 2016, increasing the residual circulation should reduce the gradient in age; hence a stronger reduction of midlatitude age. This result was first established by Neu et al. 1999 with the leaky pipe! 219 I worry that variability at 30 years here is Gibbs ringing. The 30 year box car average used to compute the trends will amplify any variability at this frequency relative to others. Figure 10. I am curious if the kink in resolved wave forcing c. 7 hPa is due to issues with one model, or an artifact of the vertical resolution of the data set (such that it shows up in all the models). 254-6. I had to reread this a few times. I gather that the contribution of NOGW is uniformly small at this level, but the role of OGW is more uncertain; it plays a significant role in 4 models, and hardly any at all in 3. 265-8 This is an interesting result, which is consistent with the suggestion of Oberlander-Hayn et al. (2016) that much of the trend at this level can be understood as a lifting of the climatological overturning circulation. I appreciate that this paper is making a similar argument to al the studies listed at line 244-5, but I think the difference is the emphasis on why there is a trend. Downward control always makes one look above, while a lifting of the circulation points to the rise in the tropopause and the expansion of the troposphere in response to surface warming. 304 I'm not sure if I follow the argument here. The fact that the global vs. tropical sensitivity is different on interannual times scales (13a) suggests that it's naive to just consider the tropical SSTs.
312-3 consider "Consistent with previous multi-model studies, there remains a clear disagreement ..." (just to avoid agreement /disagreement in the same sentence!) More importantly, please consider discussing this mismatch in more detail, as I do not think it is yet an apples-to-apples comparison. It might also be good to provide references to the previous modeling studies, and the observational studies as well. (I know they are provided elsewhere, but it's helpful for people who focus on the conclusions to read the paper quickly.) 324 As noted in my general comment, might be good to say the residual circulation here, as opposed to BDC.