Comment on acp-2021-570

This manuscript, “The role of anthropogenic aerosols in the anomalous cooling from 1960 to 1990 in the CMIP6 Earth System Models,” investigates the causes of the mid-century excessive surface air temperature (TAS) cooling in the CMIP6 earth system model (ESM) ensemble relative to observations, what the authors have dubbed the “pot-hole” cooling (PHC). Internal variability does not explain the anomalous cooling. This study links the PHC bias to anthropogenic SO2 emissions (as a proxy for all aerosols, because sulphates are the dominant aerosol in this time period), which are much larger in the ESMs during the PHC period than in observations. The PHC is also most pronounced over the Northern Hemisphere midlatitude sources of sulphates during this period, further supporting the connection between the anomalous cooling in models and exaggerated aerosol emissions over North America, East Asia, and Europe within ESMs. The PHC is further attributed to differences in the sensitivities of the ESMs to changes in aerosol loading, modulated through the impact of aerosol changes on outgoing shortwave radiation at top-ofatmosphere (OSR), called the aerosol-forcing-sensitivity; change here refers to the difference between the historical simulation for each ESM, and the hist-piAer simulation, which is identical to the historical simulation except that aerosol emissions are held fixed at preindustrial levels. Impacts of aerosols on cloud amount in particular was found to be the major driver of inter-model spread in aerosol-forcing-sensitivity, and thus the PHC effect.

This manuscript would benefit from more focusing of the main results and somewhat less attention to all the details, except where necessary to describe and support the main conclusions. It is somewhat easy to get lost in the descriptions of results and lose sight of the main takeaways, and some sections would benefit from being worded more concisely, such as the paragraph beginning at Line 282. The section describing Figures 2-4 could also be shortened; Fig. 4 doesn't seem to add any new information that is critical to the conclusions, and so could be dropped from the manuscript. Fig. 2 likewise may not be necessary to include or could be replaced by an additional subplot in Fig. 1 showing the time series of SO2 emissions or sulphate loading (for the ESMs at least); Fig. 3 seems enough to link the PHC spatially to the centers of aerosol emissions and contours of anthropogenic SO2 emissions could be added here as they were for Fig. 2. And it should be made clearer that the lower-complexity models in these plots support the results for the ESMs concerning exaggerated sulphate loading relative to the observations. The larger issue in this paper is with the formulation of the aerosol-forcing-sensitivity and its decomposition in aerosol-radiation interactions (ARI) and aerosol-cloud interactions (ACI). Lines 160-162 and lines 594-596, for example, either state or imply that the impact of differences in aerosol amount within the ESM (overestimated aerosol loading) and the impact of ESM response to aerosol amount changes (aerosol-forcing-sensitivity) have been separated from each other and their impact on temperature response quantified. However, the manuscript does not clearly do so, nor clearly justifies the decomposition. The aerosol-forcing-sensivity defined as ΔOSR /ΔloadSO4 is clearly useful, as shown, for example, by its high correlation with the change in temperature per unit change in sulphate loading in Fig. 6c, but this does not seem to easily translate into a high correlation with the PHC (Fig. 7a), and which is the main focus of this analysis. Indeed, Fig. 7b seems to show the opposite of the main conclusions of this manuscript: the PHC difference between historical and hist-piAer experiments is much more strongly correlated with the change sulphate between the historical and hist-piAer simulations, while the aerosol-forcing-sensitivity in Fig. 7a does not seem to show the negative correlation claimed in Line 433; the aerosol-forcing-sensitivity does not sem to explain the intermodel spread in PHC bias. The correlations presented in Figure 8 between OSR_clearsky and total cloud fraction with sulphate loading, in combination with Figures 2-4, point clearly to the impacts of overestimation of the aerosol loading in the models, but does not really separate it into a forcing-sensitivity. The unclear separation between these two aerosol impacts (concentrations and forcing impacts) need to be further developed and justified before the conclusions can be considered more firm, or the text and figures better clarified if already sufficiently developed.
This leads to the formulation of the aerosol-forcing-sensitivity into ARI and ACI in Equation 1 and in the appendix. The variables used in this formula, OSR, SO4 loading, and cloud amount, do not seem to be independent of each other (as in Fig 8), but are treated as independent variables. This raises some doubts about the validity of the linear decomposition presented here, and further makes it seems as if Equation 1 is an overregression of the overestimated aerosol concentrations onto the radiative fluxes in the models. The manuscript would also benefit from a much clearer explanation of the origin of the terms and their combination in the appendix (and therefore Equation 1): which term corresponds to which process exactly, and so why they are included in the various steps of the derivation of Equation 1 presented in the appendix. This would also help explain how Equation 1 is different from simply being the effect of overestimated aerosol loading, and how ARI and ACI are differentiated from each other. Again, this needs to be further developed or more clearly explained and justified to firm up the conclusions of this manuscript.

Specific Comments:
Lines 260-263: Volcanic forcing has been left in, is that right? How are other major eruptions, like Agung, within the 1960s-1990s period treated?
Lines 165-167: Do differences between the two simulations in planetary albedo, clear-sky albedo, etc. need to be accounted for when decomposing aerosol-forcing-sensitivity? Do they complicate interpretation of the results presented in this paper, and why/why not? Lines 170-173: Wilcox et al. (2015) seems out of place, and needs to be more clearly related to the methods/results discussed here.
Lines 394-398: This is really a repeat from Lines 260-263, so this sentence here is unnecessary.
Line 401: Any indication why for UKESM, or are the reasons still unknown?
Lines 526-563: What about the correlations for the other two models that provided effective droplet radius output? It would also be very interesting to contrast UKESM to MPI, since they have similar PHC biases but different aerosol-forcing-sensitivities -how much does the sensitivity matter if they produce the same temperature bias with different sensitivities, or is it due to differences in ARI or ACI (or just differences in aerosol loading)?
Technical Corrections: Line 151: Need second closing parenthesis.
Line 151: Need to add the variable loadSO4 to Table 1. Line 164: Don't need the "is" Line 190: Should say "be estimated" Line 628: Should say "dominant" Fig. 6: Caption should say "Scatter plots"