Comment on acp-2021-654

the long list of satellite instruments and their acronym is not needed in an abstract? I find that the abstract does not highlight enough the novel and extensive character of the SO2 emission inventory nor the 3D plume injection method. Abstract, lines 17-20: you say that your results “show” and that eruption “are found to”; I would instead say that your “confirm” these results as this has been shown by Schmidt et al. (2018)? lines 17-20: you say that your results “show” and that eruption “are found to”; I would instead say that your “confirm” these results as this has been shown by Schmidt et al. (2018)? Line 36: is it important to specify at which level it affects Earth radiative balance? If so also mention surface level in addition to TOA and tropopause. Line 38: Multiple papers discuss how climate-volcano feedback could modulate future volcanic forcing though, and it may be a good place to mention it? See e.g. Swindles et al. (2017) (deglaciation effect on eruption frequency), Fasullo et al. (2018) (modulation of volcanic influence on surface temperature by changes in ocean stratification), Aubry et al. (2021) (impact of climate change on the volcanic stratospheric sulfate aerosol cycle) Line 40: unless I misunderstand I guess you are talking about (mostly CMIP5) simulations that did not account for this forcing? Many model studies have accounted for this forcing since then, including CMIP6 historical simulations that use GloSSAC or e.g. Mills et al. (2016) and Schmidt et al. (2018)? Line 43/44: please add references Line 50: The SO2 emission and time-averaged volcanic forcing of degassing volcanoes and small eruptions is one order of magnitude larger than that of eruptions associated with stratospheric SO2 injections (e.g. Schmidt et al. 2012, Carn et al 2016). So clarify what you mean by “smaller natural source of aerosols” as this seems wrong as written. Line 46: do you mean “overlooked” instead of “underestimated”? If not what was underestimated? Their radiative forcing? But does it not contradict the previous sentence? Introduction: I think the work of Mills et al (2016) and Schmidt et al 2018 (not cited) need to be discussed more given strong similarities with your study. Also you don’t mention ISAMIP at all (Timmreck et al 2018) whereas your simulations are obviously relevant to this MIP? Introduction: Also see major comment 1: the two main novelties of your study are overall not motivated in your intro (i.e. new injection strategy and improved SO2 inventory). Section 2: could you group satellite instruments in terms of those used to constrain SO2 inputs in your model vs those used to evaluate the output of the model simulations? This would add a lot of clarity to this section. Also why not using GloSSAC (Thomason et al. 2018, 2020)? Line 119-120: as said in my major comment I think you need to discuss the strength and limitations of choosing such a time window, and in particular how it compares to the SO2 e-folding time and the fact that choosing this time window may result in neglecting a large portion of SO2 converted to aerosols (even though I understand the argument that an earlier time window could account for SO2 estimates accounting for SO2 that will be rapidly scavenged by co-injected ash or hydrometeors; but this all needs to be discussed carefully). Sensitivity tests for this time window and understanding its impact on your SO2 estimates would be welcome. Line 137: again this time window needs to be justified better. Also I’m not at all a remote sensing expert but I think it’s the first time I see SO2 estimated from extinction coefficients in visible wavelength? Is that a standard method? How is the effect of SO2 on radiation properties isolated from other species, in particular sulfate aerosols? It may be standard techniques that I’m not aware of about but it would be good to clarify. Line 139: My understanding here is that you are saying that if there is a data gap during the peak perturbation, you scale up by an arbitrary factor to recover a reasonable peak value? How is that factor chosen? There is absolutely no explanation nor reference and it may deserve dedicated SI plots? Line 139 and 170: about data gaps and how to treat them, I’m just wondering why not using GloSSAC where the same problem had to be addressed and which is the reference dataset for the community? I understand you can’t use it for SO2 but surely for aerosol properties it would make sense? The fact that major initiatives such as GloSSAC or ISAMIP are not mentioned is a bit surprising. Line 236: no apostrophe needed for Global Volcanism Program Line 241: The tropopause altitude varies between ca. 8-9 and 16-17km depending on latitude and season, why not using the model diagnostic tropopause instead of the three thresholds used? Justify rigorously why you consider a threshold way below the tropopause height in the tropics but potentially way above at high latitudes. Also why do you need to mask tropospheric SO2? Would your model not account for the fact that tropospheric aerosol would have minimal impact on climate? I get that you don’t want an overlap between the tropospheric and stratospheric volcanic SO2 inventory, but does the tropospheric SO2 inventory really account for emissions as high as 12-14 km or is it only passively degassing volcanoes? Lines 243-247: see my major comment #1 Table 2: this table really must be made available as a csv file or something that researchers can download and read in scientific programming software. Remove the table from the body of text as it is way too big. Lines 257-264: see my major comment #1. While I think this is at the moment poorly explained and that you have to show analyses demonstrating the advantages and challenges with this injection method, I do think that it is one of the most novel and important aspect of the paper (combined with your inventory) and that it should be highlighted and motivated a lot more. Line 275-276: you either need a reference backing this claim or data analysis to support it (e.g. does the GVP database have a comparable number/frequency of VEI 3-5 eruptions during 1991-2002 relative to 2002-present day? Or was it really a more quiescent period? Lines 293-294: a brief comparison with observations in Carn et al. 2016 would be welcome here (I think they suggest even lower UTLS e-folding time). Also you say yourself here that the conversion time is about 2 weeks, which seems to strongly undermine your chosen 8-17 day time window to constrain SO2 emission from satellites? Line 303-304: please clarify what you mean by “feedback to atmospheric dynamics” and cite appropriate references Line 309-310: the reader has to look at three different figures and compare them to verify this statement. It would me much better if you could present equivalent observations and model plots on the same figure and different panels. This would greatly facilitate modelobservation comparisons. Line 326: the vast majority of studies use SAOD at 550nm like you (e.g. Schmidt et al. 2018), and also 1020nm (e.g. Aubry et al. 2021) which is another standard one for some instruments? So this statement seem really not justified and should be removed or modulated. Line 331: clarify that the AOD of 0.4 is in the tropics and isn’t a global mean value Line 334: There could be other factors explaining model-observation differences in the post-Pinatubo period including flaws in the model (as evident from the different decay timescales) and uncertainty in the SO2 mass, or at least the “climatically relevant” portion of it (you use 17Tg, other studies use as little as 10 which should be briefly discussed; see Zhu et al. 2021, Mills et al. 2016, Schmidt et al. 2018). Line 337: unless major eruptions are missing, is it really likely that imperfections in your inventory explain the large SAOD differences over 1993-1996? Figure 11: it may be better to show horizontal bars (with a length of 1 year) instead of green crosses as these are time-average measurement and it would facilitate comparison with your high-resolution output? Figure 11: Here and on Figure 9 and 10, could you not show for comparison the simulations from at least Bruhl et al. (2015) and maybe Schmidt et al. 2018 assuming their data are available with the paper? Discussing the differences would really improve the discussion. Legend of Figure 11: specify the time resolution of the ERBE data. Is there no other observational estimate of radiative forcing to complement observations shown? E.g. CERES data? Line 354: “previous studies”-> show their data and discuss comparison? On that note making sure that your key outputs (SAOD/radiative forcing time series) are easily available is important and I don’t think it’s the case yet? Key outputs should not be made “available upon request” but should ideally be provided as SI or in a data repository. Line 359: For reference, can you indicate the SO2 mass for Merapi used in your and other (e.g. Carn et al. 2016) inventories? Overall, it would be really useful to have a comparison of your inventories with other standard ones, in particular those used in ISA-MIP (Timmreck et al. 2018). Another potentially useful reference, showing how different inventories affect the SAOD prediction by a simple model, is Aubry et al. (2020) (see Figure 8 there). Figure 12: could you discuss how these results compare with recent studies, e.g. Rieger et al. (2020) or Stocker et al. (2019) Section 7: Overall I find that some of the most natural lines of discussion (and accompanying analyses) are completely missing including: i) comparison of your new inventory with other ones, including Carn et al.; ii) comparison of your new simulations with other equivalent ones, including Schmidt et al (2018) and Bruhl et al (2015); iii) discussion of how your 3D-plume injection strategy compares to a point injection. Line 385: provide numbers (e.g. latitude resolution at equator) that make it easier for the reader to understand the difference between these resolutions. Line 410: Missing reference? Line 429-430: Zhu et al. (2020) should be cited here Line 429-432: On model difference/setup and how it may affect simulated aerosol properties, Clyne et al. (2021) is an important difference and should be discussed here and elsewhere. Lines 448-455: This whole paragraph doesn’t acknowledge the contribution of previous studies when most of the statements made are not really new. First maybe you should refer to the AR6 report now that it is out instead of the AR5? Second, for radiative forcing estimate, the contribution of Schmidt et al. (2018) should be acknowledged and you should compare in details your forcing estimates to theirs. Third, for temperature effects, you should cite the papers by Santer and co-authors (2014, 2015) and Schmidt et al. (2018). I personally think that the novel aspects of your paper would be highlighted better if you ended it on key points related to the new inventory and the 3D plume injection method.

1) The first novel aspect of the paper is the way in which volcanic SO2 is injected in the model. Previous studies have used a "point-source" approach with SO2 injected in one model column over a range of altitudes, with a few studies also injecting over a range of latitude for Pinatubo. However, in this study, the authors instead inject a "plume" consistent with spatially-resolved satellite observations. First, I think that this novel aspect is not highlighted enough in the introduction section and throughout the text, and it could be one of the key point of the manuscript. I also find your new method to be poorly explained and justified, in particular in section 5. On line 264, you say that the total amount of SO2 is calculated by integrating the SO2 profile but then mentioned that you add a 3-dimensionnal perturbation to the model which confused me. In section 5, you also don't clearly state how these 3D plumes are obtained. My understanding from sections 3/4/5 is that: For each eruption, 3D SO2 plumes are obtained from time-averaged SO2 observations between the 8 th and 17 th day following each eruption? The 3D plumes , obtained from measurement 8-17 days after the eruption, are injected at the time of the eruption The 3D plumes are injected at latitude consistent with measurement taken but centered on the longitude of the volcano Did I get this right? It all need to be crystal-clear and more detailed in the text as this is key to your method and a very unusual approach? You need to justify these choices better and show sensitivity tests for a large and small eruption (or ideally a full 1990-2019 simulation) showing how this differ from a standard "point" injection at the volcano location/plume height with a mass of SO2 corresponding to the initial total SO2 (not the SO2 after 8-17 days). Such tests seems really critical to demonstrate that your proposed method is better than standard methods, otherwise any related claim is unfounded. One of the main justification you provide to justify your injection strategy is that it removes any tropospheric SO2 that is not climatically relevant but: i) you already only consider SO2 above a threshold height (which is not justified; e.g. why 14km at the tropics instead of the tropopause height? If it's because of radiative heating and lofting where does the threshold come from?) so why do you need further processing to remove potential "shortlived" SO2?; ii) The SO2 e-fold time is on the order of days-weeks (Carn et al. 2016, Fig 14); Even for stratospheric SO2 one would expect a significant amount of SO2 to be already converted to aerosol by the end of your 8-17 day time window, in particular for lower stratospheric injections. So would your method not result in large underestimation of SO2 amounts injected? I can see reasons why your method could make sense, e.g. fast SO2 scavenging by ash during the first days-weeks, but I think it is still not justified enough in the paper. More importantly, you need to show comparison between your approach vs standard point injection with the full SO2 mass to be able to really discuss the strengths and weaknesses of your strategy.
2) Overall, your paper really lacks comparison with existing work -including that from Bruhl et al 2015 -and a lot of key references are missing. As an example, on line 245-247, you suggest that your SO2 mass estimates will be very different from those in the dataset by Carn et al. (2016). Why not show a figure, at least in SI, comparing SO2 masses and heights for all events in common? This would be really informative. Regarding your simulations, you do not mention at all the work by Schmidt et al. (2018) which conducted exactly the same type of simulations, albeit with a different SO2 inventory and model. Citing it seems critical, and some of their time series (SAOD, radiative forcing) are likely available and could be compared to your model which would really improve the discussion. Also, it would have been nice to see a comparison of your new simulations with the previous model version/inventory used by some of the co-authors (Bruhl et al 2015) to get a sense of whether there is improved agreement with observations. Last, you compare your simulations with observations from multiple satellite instruments which is welcome, but I was under the impression that the GloSSAC dataset -built using some of the data you use -is now the reference for the community (at least for CMIP6 forcing). Could you add a comparison to GloSSAC?

Minor comments
Title: I think the title does not convey clearly enough the novelty of the new injection method; consider replacing "vertically-resolved satellite measurements" by something else? Maybe "Reconstructing volcanic forcing since 1990 using a comprehensive volcanic emission inventory and spatially resolved sulfur injection in a chemistry-climate model"? Your 3D plume are not just vertically resolved?
Abstract: the long list of satellite instruments and their acronym is not needed in an abstract? I find that the abstract does not highlight enough the novel and extensive character of the SO2 emission inventory nor the 3D plume injection method.
Abstract, lines 17-20: you say that your results "show" and that eruption "are found to"; I would instead say that your "confirm" these results as this has been shown by Schmidt et al. (2018)?
Line 36: is it important to specify at which level it affects Earth radiative balance? If so also mention surface level in addition to TOA and tropopause.
Line 38: Multiple papers discuss how climate-volcano feedback could modulate future volcanic forcing though, and it may be a good place to mention it? See e.g. Swindles et al. Line 50: The SO2 emission and time-averaged volcanic forcing of degassing volcanoes and small eruptions is one order of magnitude larger than that of eruptions associated with stratospheric SO2 injections (e.g. Schmidt et al. 2012, Carn et al 2016. So clarify what you mean by "smaller natural source of aerosols" as this seems wrong as written. Line 119-120: as said in my major comment I think you need to discuss the strength and limitations of choosing such a time window, and in particular how it compares to the SO2 e-folding time and the fact that choosing this time window may result in neglecting a large portion of SO2 converted to aerosols (even though I understand the argument that an earlier time window could account for SO2 estimates accounting for SO2 that will be rapidly scavenged by co-injected ash or hydrometeors; but this all needs to be discussed carefully). Sensitivity tests for this time window and understanding its impact on your SO2 estimates would be welcome.
Line 137: again this time window needs to be justified better. Also I'm not at all a remote sensing expert but I think it's the first time I see SO2 estimated from extinction coefficients in visible wavelength? Is that a standard method? How is the effect of SO2 on radiation properties isolated from other species, in particular sulfate aerosols? It may be standard techniques that I'm not aware of about but it would be good to clarify.
Line 139: My understanding here is that you are saying that if there is a data gap during the peak perturbation, you scale up by an arbitrary factor to recover a reasonable peak value? How is that factor chosen? There is absolutely no explanation nor reference and it may deserve dedicated SI plots?
Line 139 and 170: about data gaps and how to treat them, I'm just wondering why not using GloSSAC where the same problem had to be addressed and which is the reference dataset for the community? I understand you can't use it for SO2 but surely for aerosol properties it would make sense? The fact that major initiatives such as GloSSAC or ISA-MIP are not mentioned is a bit surprising.
Line 236: no apostrophe needed for Global Volcanism Program Line 241: The tropopause altitude varies between ca. 8-9 and 16-17km depending on latitude and season, why not using the model diagnostic tropopause instead of the three thresholds used? Justify rigorously why you consider a threshold way below the tropopause height in the tropics but potentially way above at high latitudes. Also why do you need to mask tropospheric SO2? Would your model not account for the fact that tropospheric aerosol would have minimal impact on climate? I get that you don't want an overlap between the tropospheric and stratospheric volcanic SO2 inventory, but does the tropospheric SO2 inventory really account for emissions as high as 12-14 km or is it only passively degassing volcanoes?
Lines 243-247: see my major comment #1 Table 2: this table really must be made available as a csv file or something that researchers can download and read in scientific programming software. Remove the table from the body of text as it is way too big.
Lines 257-264: see my major comment #1. While I think this is at the moment poorly explained and that you have to show analyses demonstrating the advantages and challenges with this injection method, I do think that it is one of the most novel and important aspect of the paper (combined with your inventory) and that it should be highlighted and motivated a lot more.
Line 275-276: you either need a reference backing this claim or data analysis to support it (e.g. does the GVP database have a comparable number/frequency of VEI 3-5 eruptions during 1991-2002 relative to 2002-present day? Or was it really a more quiescent period? Lines 293-294: a brief comparison with observations in Carn et al. 2016 would be welcome here (I think they suggest even lower UTLS e-folding time). Also you say yourself here that the conversion time is about 2 weeks, which seems to strongly undermine your chosen 8-17 day time window to constrain SO2 emission from satellites? Line 303-304: please clarify what you mean by "feedback to atmospheric dynamics" and cite appropriate references Line 309-310: the reader has to look at three different figures and compare them to verify this statement. It would me much better if you could present equivalent observations and model plots on the same figure and different panels. This would greatly facilitate modelobservation comparisons.
Line 326: the vast majority of studies use SAOD at 550nm like you (e.g. Schmidt et al. 2018), and also 1020nm (e.g. Aubry et al. 2021) which is another standard one for some instruments? So this statement seem really not justified and should be removed or modulated.
Line 331: clarify that the AOD of 0.4 is in the tropics and isn't a global mean value Line 334: There could be other factors explaining model-observation differences in the post-Pinatubo period including flaws in the model (as evident from the different decay timescales) and uncertainty in the SO2 mass, or at least the "climatically relevant" portion of it (you use 17Tg, other studies use as little as 10 which should be briefly discussed; see Zhu et al. 2021, Mills et al. 2016, Schmidt et al. 2018).
Line 337: unless major eruptions are missing, is it really likely that imperfections in your inventory explain the large SAOD differences over 1993-1996? Figure 11: it may be better to show horizontal bars (with a length of 1 year) instead of green crosses as these are time-average measurement and it would facilitate comparison with your high-resolution output? Figure 11: Here and on Figure 9 and 10, could you not show for comparison the simulations from at least Bruhl et al. (2015) and maybe Schmidt et al. 2018 assuming their data are available with the paper? Discussing the differences would really improve the discussion.
Legend of Figure 11: specify the time resolution of the ERBE data. Is there no other observational estimate of radiative forcing to complement observations shown? E.g. CERES data? Line 354: "previous studies"-> show their data and discuss comparison? On that note making sure that your key outputs (SAOD/radiative forcing time series) are easily available is important and I don't think it's the case yet? Key outputs should not be made "available upon request" but should ideally be provided as SI or in a data repository.
Line 359: For reference, can you indicate the SO2 mass for Merapi used in your and other (e.g. Carn et al. 2016) inventories? Overall, it would be really useful to have a comparison of your inventories with other standard ones, in particular those used in ISA-MIP (Timmreck et al. 2018 (2015); iii) discussion of how your 3D-plume injection strategy compares to a point injection.
Line 385: provide numbers (e.g. latitude resolution at equator) that make it easier for the reader to understand the difference between these resolutions.
Line 429-430: Zhu et al. (2020) should be cited here Line 429-432: On model difference/setup and how it may affect simulated aerosol properties, Clyne et al. (2021) is an important difference and should be discussed here and elsewhere.
Lines 448-455: This whole paragraph doesn't acknowledge the contribution of previous studies when most of the statements made are not really new. First maybe you should refer to the AR6 report now that it is out instead of the AR5? Second, for radiative forcing estimate, the contribution of Schmidt et al. (2018) should be acknowledged and you should compare in details your forcing estimates to theirs. Third, for temperature effects, you should cite the papers by Santer andco-authors (2014, 2015) and Schmidt et al.