“ Evaluating the simulated radiative forcings , aerosol properties and stratospheric warmings from the 1963 Agung , 1982 El Chichón and 1991 Mt Pinatubo volcanic aerosol clouds

Accurate quantification of the effects of volcanic eruptions ✿✿✿✿✿✿✿✿✿ Accurately ✿✿✿✿✿✿✿✿✿✿ quantifying ✿✿✿✿✿✿✿ volcanic ✿✿✿✿✿✿✿ impacts on climate is a key requirement for better ✿✿✿✿✿ robust attribution of anthropogenic climate change. Here we use the UM-UKCA compositionclimate model to simulate the global dispersion of the volcanic aerofsol ✿✿✿✿✿✿ aerosol ✿ clouds from the three largest eruptions of the 20th century: 1963 ✿✿✿ Mt. Agung, 1982 El Chichón and 1991 ✿✿✿ Mt. ✿ Pinatubo. The model has interactive stratospheric chemistry and aerosol microphysics, with coupled aerosol-radiation interactions for realistic composition-dynamics feed5 backs. Our simulations align with the design of the Interactive Stratospheric Aerosol Model Intercomparison (ISA-MIP) ”Historical Eruption SO2 Emissions Assessment”. For each eruption, we perform 3-member ensemble model experiments with ✿✿ for ✿ upper, mid-point and lower estimates for ✿✿ of SO2 emission, each initialised to a meteorological state to match the observed phase of the quasi-biennial oscillation (QBO) at the times of the eruptions. We assess how each eruption’s emitted SO2 evolves ✿✿✿✿✿✿✿✿✿ translates into a tropical reservoir of volcanic aerosol and analyse the subsequent dispersion to 10 mid-latitudes. We compare the simulations to the three ✿✿✿✿✿✿✿ different ✿ volcanic forcing datasets used in historical integrations for the two most recent Coupled Model Intercomparison Project (CMIP) assessments: the Global ✿✿✿✿ (e.g. ✿ Space-based Stratospheric Aerosol Climatology (GloSSAC)for CMIP6, and the , ✿ Sato et al. (1993) and Ammann et al. (2003)datasets used in CMIP5. We also ✿ ) ✿✿✿✿ that ✿✿✿ are ✿✿✿✿✿ used ✿✿ in ✿✿✿✿✿✿✿✿✿ historical ✿✿✿✿✿✿✿✿✿✿ integrations ✿✿✿ for ✿✿✿✿ the ✿✿✿✿ two ✿✿✿✿✿✿ recent ✿✿✿✿✿✿✿✿ Coupled ✿✿✿✿✿✿ Model ✿✿✿✿✿✿✿✿✿✿✿✿✿✿ Intercomparison ✿✿✿✿✿✿✿ Project ✿✿✿✿✿✿✿ (CMIP) 15 ✿✿✿✿✿✿✿✿✿✿✿✿ assessments. ✿✿✿ We ✿ assess the vertical extent of the volcanic aerosol clouds by comparing simulated extinction to Stratospheric Aerosol and Gas Experiment II (SAGE II) v7.0 satellite aerosol data ✿✿✿✿✿✿✿✿ extinction ✿✿✿✿✿✿✿✿✿✿✿✿✿ measurements ✿ (1985-1995) for Pinatubo and El Chichón, and to 1964-65 northern hemisphere ground-based lidar measurements for Agung. As an independent test for the simulated volcanic forcing after Pinatubo, we also compare to the shortwave (SW) and longwave

consists of zonal-mean stratospheric AOD at 550 nm (sAOD 550 ) and column effective radius (Reff). The CMIP5 modelling groups used different approaches to apply this across the spectral wavebands of their model's radiative transfer module and to 90 redistribute the total optical thickness into their model vertical levels (e.g. Driscoll et al., 2012). Stenchikov et al. (1998) also constructed a forcing dataset for Pinatubo that included the variation in the forcings across wavebands in the SW and LW, combining SAGE II and Stratospheric Aerosol Measurement (SAM) II (McCormick, 1987) aerosol extinctions, as well as data from the Improved Stratospheric and Mesospheric Sounder (ISAMS) Grainger et al., 1993;Lambert et al., 1997), the Cryogenic Limb Array Etalon Spectrometer (CLAES) (Roche et al., 1993), 95 AVHRR, lidar and balloon observations.
Since then, a large number of chemistry-climate models (CCMs) have been developed, and applied to improve our understanding of past stratospheric change. Several co-ordinated hindcast integrations with the CCMs were designed and carried out via activities such as CCMVal (Eyring et al., 2005(Eyring et al., , 2008Morgenstern et al., 2010) and CCMI (Eyring et al., 2013;Morgenstern et al., 2017), with each of the models using different methods to include stratospheric heating from volcanic aerosol clouds, so 100 as to represent volcanically-forced changes in stratospheric trace species. Some CCMs prescribed pre-calculated zonal mean heating rate anomalies (e.g. Schmidt et al., 2006), whilst other derived the heating from prescribed aerosol datasets, either the 2-D GISS sAOD 550 data set or from 3-D prescribed aerosol surface area density (SAD). SPARC (2010, Chap. 8) presented a detailed analysis of lower stratospheric warming in CCMVal-2 simulations following Pinatubo eruptions, that showed a broad range in the simulated lower stratospheric warming (from 0 to 4 K) with SAD-derived warming tending to over-predict the 105 effect.
Another important volcanic forcing dataset is that from Ammann et al. (2003, hereafter, Ammann data), which was produced via a simple parameterisation for the dispersion of the volcanic aerosol from a specified number of major tropical eruptions determined by the seasonal cycle in the Brewer Dobson circulation. The peak aerosol optical depth for each eruption was scaled to match estimates of maximum aerosol loading from Stothers (1996); Hofmann and Rosen (1983b); Stenchikov et al. (1998), 110 assuming a fixed particle size distribution (Reff = 0.42 micron).
Recently, Arfeuille et al. (2014) created the most up-to-date volcanic forcing dataset to to enable models to include aerosolradiation interactions (aerosol optical properties) consistently with the additional heterogenous chemistry occurring on volcanic aerosol particles. This comprises three datasets, two for SW and LW aerosol optical properties, for each model to map the aerosol onto the wavebands in the radiative transfer module of the host climate model (see Luo, 2016). For the heterogeneous 115 chemistry, a third dataset of SAD was provided, the original version known as the 4λ dataset. An updated version of this dataset (3λ dataset) was produced specifically for the CMIP6 simulations (see ftp://iacftp.ethz.ch/pub_read/luo/CMIP6/). All three datasets were generated from simulations with a 2-D interactive stratospheric aerosol microphysics model (AER), including 26 separate eruptions for the 1600-2013 time period.
Here we analyse volcanic forcing experiments with the Unified Model -United Kingdom Chemistry and Aerosol (UM-120 UKCA) composition-climate model, which has interactive stratosphere-troposphere chemistry and aerosol microphysics. The model experiments simulate the volcanic aerosol clouds, and associated radiative forcings, from the three largest tropical eruptions over the past century: Mt. Agung (March 1963), El Chichón (April 1982) and Mt. Pinatubo (June 1991). Aligning with 4 https://doi.org/10.5194/acp-2020-344 Preprint. Discussion started: 6 May 2020 c Author(s) 2020. CC BY 4.0 License. the design of the Interactive Stratospheric Aerosol Model Inter-comparison Project (ISA-MIP) co-ordinated multi-model ''Historical Eruption SO 2 Emissions Assessment" (Timmreck et al., 2018), we have carried out 3-member ensembles of simulations 125 with each of upper, low and mid-point best estimates for SO 2 injection for each eruption. Simulated aerosol properties of the volcanic aerosol plume are compared to range of observation-based datasets.
The UM-UKCA experiments includes the online radiative effects from both tropospheric as well as stratospheric aerosol simulated with same interactive aerosol microphysics module. There several important improvements in aerosol microphysics module since the original Pinatubo analysis presented in Dhomse et al. (2014), that are discussed in Brooke et al. (2017);130 Marshall et al. (2018,2019); Yoshioka et al. (2019). Section 3 provides the specifics of the model experiments, with section 4 describing the observational datasets. Model results are given in Section 5. Key findings and conclusions are presented in Section 6.

Model Experiments
We use the Release Job 4.0 (RJ4.0) version of the UM-UKCA composition-climate model (Abraham et al., 2012), which 135 couples the Global Atmosphere 4.0 configuration (Walters et al., 2014, GA4) of the UK Met Office Unified Model (UM v8.4) general circulation model with the UK Chemistry and Aerosol chemistry-aerosol sub-model (UKCA). The GA4 atmosphere model has a horizontal resolution of 1.875 • × 1.25 • (N96) with 85 vertical levels from the surface to about 85 km. The RJ4.0 configuration of UM-UKCA adapts GA4 with aerosol radiative effects from the interactive GLOMAP aerosol microphysics scheme and ozone radiative effects from the whole-atmosphere chemistry that is a combination of the detailed stratospheric 140 chemistry and simplified tropospheric chemistry schemes (Morgenstern et al., 2009;O'Connor et al., 2014).
The experiment design is similar to that in Dhomse et al. (2014), but with the volcanic aerosol radiatively coupled to the dynamics, as in Mann et al. (2015), transient atmosphere-only free running simulations. Briefly, the model uses the GLOMAP aerosol microphysics module, the scheme configured to be applied across the troposphere and stratosphere with stratospheretroposphere chemistry. Greenhouse gases (GHGs) and ozone-depleting substance (ODS) concentrations are from  ulation recommendations in the Chemistry-Climate Model Initiative (CCMI-1; Eyring et al. (2013); Morgenstern et al. (2017)) activity. Simulations are performed in atmosphere-only mode, and we use CMIP6 recommended sea-surface temperatures and sea-ice concentration that are obtained from https://esgf-node.llnl.gov/projects/cmip6/. The main updates since Dhomse et al. (2014) are: i) updated dynamical model (from HadGEM3-A r2.0 to HadGEM3 Global Atmosphere 4.0), hence improved vertical and horizontal resolution (N48L60 vs N96L85), ii) coupling between aerosol and radiation scheme , 150 iii) additional sulphuric particle formation pathway via heterogeneous nucleation on transported meteoric smoke particle cores (Brooke et al., 2017). The atmosphere-only RJ4.0 UM-UKCA model applied here is the identical model to that applied in Marshall et al. (2018) andMarshall et al. (2019), with the former run in pre-industrial setting for the VolMIP interactive Tambora experiment (see Zanchettin et al., 2016) and the latter in year-2000 timeslice mode for the perturbed injection-source-parameter ensemble analysed there. 155 5 https://doi.org/10.5194/acp-2020-344 Preprint. Discussion started: 6 May 2020 c Author(s) 2020. CC BY 4.0 License.
Prior to each of the eruption experiments, we first ran 20-year time-slice simulations with GHGs and ODSs for the corresponding decade (1960for Agung, 1980for El Chichón and 1990 for Pinatubo), to allow enough time for the stratospheric circulation and ozone layer to adjust each composition-climate setting for that time period. Tropospheric aerosol and chemistry (primary and precursor) emissions were also set to interactively simulate the tropospheric aerosol layer and oxidising capacity for the corresponding decade. Discarding the first 10 years as spin-up, we then analysed the QBO behaviour in the second 10 160 years, selecting initialisation fields from three different model years that then ensure each ensemble member approximately matches the post-eruption QBO state seen in the ERA-interim re-analysis (Dee et al., 2011).
For each eruption then, a total of nine different volcanically-perturbed simulations were performed, three different "approximate QBO progressions" for each SO2 emission amount (see Table 1). The 9 control simulations had identical pre-eruption initial conditions and emissions, except the Pinatubo/El Chichon/Agung SO2 emission was switched off. For simplicity the 165 simulations do not use the simulated aerosol in the calculation of heterogeneous chemistry; the control simulations use climatological SAD values in the stratosphere (mean 1995-2006) while the other simulations use time-varying SAD from Arfeuille et al. (2013).

175
For the pre-satellite era , volcanic aerosol properties in CMIP6 data are constructed using results from the AER 2-D aerosol model (Arfeuille et al., 2014), considering injection heights in the literature and from plume-rise model, and from comparing to other forcing datasets, ice core sulphate deposition and ground-based solar radiation measurements within Sato et al. (1993) and Stothers (2001). Although the CMIP6 dataset consists primarily of the three parts explained in Introduction section (waveband-mapped aerosol optical properties in the SW and LW, plus surface area density), additional datasets are also 180 provided, including monthly zonal mean log-normal aerosol size distribution properties such as mean radius, volume density and extinctions at 550 nm. These datasets are provided at 0.5 km vertical resolution between 5 km and 40 km).
For 1979-2016, the CMIP6 dataset is replaced with the most up-to-date the Global Space-based Stratospheric Aerosol Climatology data known as the GloSSAC dataset (https://doi.org/10.5067/GloSSAC-L3-V1.0) described in (Thomason et al., 2018). GloSSAC combines stratospheric aerosol information from several different satellite instruments: SAGE I and II, Center (https://eosweb.larc.nasa.gov/ : last access March 10, 2020). During the El-Chichón period, GloSSAC is largely based on SAGE I (January 1979-November 1981 and SAM II (1978 extinction at 1000 nm, and the 550nm extinction derived 190 from fit to the variation in 550:1020 colour ratio that is derived from SAGE II measurements. One limitation is that the SAM II instrument only measures at high-latitudes. After the El-Chichón eruption (SAGE gap period, April 1982-October 1984 (Rosen et al., 1994), ground-based lidar measurements from Mauna Loa, Hawaii (19.5 • N, (Barnes and Hofmann, 1997)) from the NASA Langley lidar at Hampton, USA, and Camaguey, Cuba (23 • N, see Antuña, 1996)).
For the Agung aerosol cloud, observational data not readily available to evaluate the vertical extent of the aerosol cloud.

200
Hence we digitised the observations from optical radar at Lexington, Massachusetts (42 • 44' N, 71 • 15' W Grams and Fiocco, 1967). These aerosol backscatter observations at 694 nm are converted to extinction at 532nm, as described in supplementary material.
For the stratospheric aerosol optical depth (sAOD) comparison, we use three different observation-based datasets. CMIP6 extinctions at 550nm are integrated for all the levels above the tropopause to calculate sAOD 550 . As mentioned earlier, the most 205 widely used volcanic forcing data is from Sato et al. (1993) and is obtained from https://data.giss.nasa.gov/modelforce/strataer/.
Another important sAOD 550 evaluation dataset is based on a combination of simple representation of the dispersion and an assumption of the size distribution by Ammann et al. (2003), and is obtained via ftp://ftp.ncdc.noaa.gov/pub/data/paleo/climate_ forcing/volcanic_aerosols/ammann2003b_volcanics.txt. The lower stratospheric warming following each eruption is estimated by comparing 5-year temperature anomalies from the ERA-Interim reanalysis data (ERA-Int, available from www.ecmwf.int).

210
ERA-int data is available since 1979, hence for the Agung comparison, we use ERA40, an earlier version of ECWMF reanalysis datasets.

Results and Discussion
The temporal radiative forcing signature from a major tropical eruption is primarily determined by the evolution of the volcanic aerosol cloud in the stratosphere. An initial ''tropically confined phase" sees zonally-dispersing SO 2 and ash plume 215 transforming to layered aerosol cloud. Meridional transport in the subsequent "dispersion phase" then leads to a hemispheric or global cloud of mainly aqueous sulphuric acid droplets. The efficacy of such volcanic clouds' solar dimming, and the extent of any offset via long-wave aerosol absorption, is strongly linked to how large the sulphuric aerosol particles grow (their size distribution) as this large-scale dispersion progresses (e.g Lacis et al., 1992).
In the following subsections we assess, for each eruption, the simulated volcanic aerosol cloud for the upper, lower and 220 mid-point/best-estimate SO 2 emissions and compare to available observational constraints. Our focus here is primarily on 7 https://doi.org/10.5194/acp-2020-344 Preprint. Discussion started: 6 May 2020 c Author(s) 2020. CC BY 4.0 License. aerosol optical properties, evaluating mid-visible stratospheric AOD, but also aerosol extinction, in both the mid-visible and near-infra-red, to understand how the altitude and vertical extent of the cloud varies for each eruption. In each case, we also compare the lower stratospheric warming with the temperature anomaly from the ERA-Interim/ERA-40 reanalyses.

225
In the Pinatubo case, satellite measurements are able to constrain the particle size evolution (in terms of effective radius), and hence here we also compare model-simulated effective radius to that provided with the CMIP6 dataset, which underpins each model's specified multi-wavelength aerosol optical properties. With Pinatubo by far the dominant external forcing in the 1990s, we compare simulated SW and LW forcings to the Earth Radiation Budget Experiment (ERBE) satellite data to gain direct insight into how the different SO 2 emission simulations evolve in terms of top-of-the-atmosphere (TOA) radiative forcings.  Baran and Foot (1994) analysed satellite observations of the Pinatubo aerosol cloud from the High-resolution Infrared Radiation Sounder (HIRS), converting the measured LW aerosol optical properties into a timeseries of global aerosol burden. In Dhomse et al. (2014), we used this observed global burden dataset to evaluate the model's simulated aerosol cloud, translating the peak global burden of 19 to 26 Tg from the HIRS measurements into a 3.7 to 6.7 Tg range for stratospheric sulphur, assuming the particles were 75% by weight aqueous sulphuric acid solution droplets. We identified an important inconsistency 235 in the model's predictions, when also considering satellite observations of volcanic SO 2 . The satellite measurements of SO 2 show that 7 to 11.5 Tg of sulphur was present in the stratosphere in the days after the eruption (14 to 23 Tg of SO 2 , Guo et al. (2004a)), so only around 50% of the emitted sulphur remained present at peak volcanic aerosol loading. In contrast, the model simulations showed that 90% of the sulphur emitted remained in the volcanic aerosol cloud at its peak global mass burden.
This inconsistency was also found in other interactive Pinatubo stratospheric aerosol model studies (Sheng et al., 2015a;Mills 240 et al., 2016), with number of models finding best agreement with observations for 10 to 14 Tg emitted SO 2 (5 to 7 Tg of sulphur), which is less than the lower bound from the TOMS/TOVS measurements. In Dhomse et al. (2014), we suggested the models may be missing some process or influence, which acts to redistribute the sulphur within the volcanic cloud, causing it then to be removed more rapidly. Figure 1a shows the timeseries of global stratospheric aerosol sulphur burden from current Pinatubo simulations compared 245 to the previous model simulations with 20 and 10 Tg SO 2 injection as presented in Dhomse et al. (2014). The 20, 14 and 10 Tg SO 2 Pinatubo clouds generate peak loadings of 8.3, 5.9 and 4.2 Tg of sulphur, translating into conversion efficiencies of 83, 84 and 84%, respectively. This continuing discrepancy with the satellite-derived 50% conversion efficiency might be due to accommodation onto co-emitted ash particles. Recently we have re-configured the UM-UKCA model to enable new simulations to test this hypothesis (Mann et al., 2019b). We consider the requirement to reduce model emitted SO 2 to be less 250 than that indicated by satellite measurements as an adjustment to compensate for a missing removal/redistribution process in the initial weeks after the eruption.
The simulated Pinatubo global stratospheric sulphur burden in runs Pin10 and Pin14 is in good agreement with the HIRS observations, both in terms of predicted peak burden, and the evolution of its removal from the stratosphere. In particular, the model captures a key variation in the HIRS measurements, namely that the removal of stratospheric sulphur was quite slow 255 8 https://doi.org/10.5194/acp-2020-344 Preprint. Discussion started: 6 May 2020 c Author(s) 2020. CC BY 4.0 License.
in the first year after the eruption. The volcanic aerosol cloud retained a steady 4-5 Tg of sulphur for more than 12 months before its removal proceeded at much faster rate in late 1992 and early 1993. The corresponding simulations from Dhomse et al. (2014) (Pin10 and Pin20 ) show a simpler peak and decay curve, the removal from the stratosphere proceeding much faster and earlier than the HIRS measurements indicate.
As shown in Mann et al. (2015), and other studies (Young et al., 1994;Sukhodolov et al., 2018), when interactive strato-260 spheric aerosol simulations of the Pinatubo cloud include the heating effect from aerosol absorption of outgoing LW radiation (i.e. the radiative coupling of the aerosol to the dynamics), the resulting enhanced tropical upwelling greatly changes the subsequent global dispersion. In Mann et al. (2015), we also showed that this coupling improves the simulated tropical midvisible and near infra-red extinction compared to the SAGE II measurements. We identified that the SAGE II measurements are consistent with the combined effects of increased upwelling and later sedimentation, highlighting the need to resolve 265 composition-dynamics interactions when interactively simulating such major volcanic aerosol clouds.
Here we show that this effect also leads to a quite different global sulphur burden, with the later dispersion peak in the midlatitude sulphur becoming a greater contributor. This behaviour is explored further in Figure 1b, where we assess the e-folding timescale for the removal of stratospheric sulphur, derived by applyig least squares regression fit on 7-month running-mean mass burden values (3 monthly means either side). We find that a Pinatubo realisation that injects more sulphur produces 270 a volcanic aerosol cloud that is removed more rapidly, the effect apparent throughout the decay period. The timing of the accelerating removal occurs consistently across the 3 runs with residence times for Pin10, Pin14 and Pin20 decreasing from 9, 6 and 4 months in May 1992, to minima of 5, 3 and 2 months in February 1993.
Later (in Figure 4) we assess the behaviour of model-predicted effective radius, showing that it continues to increase steadily in the tropics throughout 1992, the maximum particle size at 20 km occurring in January 1993. That the maximum effective 275 radius occurs at exactly the same time as the minimum in e-folding time illustrates the importance for interactive stratospheric aerosol models to represent its increased size, sedimentation of particles proceeding faster as the particles grow larger. One thing to note however, is that although the different volcanic SO 2 amount is emitted at the same altitude, since the runs are freerunning, later we show that each different emission amount causes different amounts of heating, the resulting enhancements to tropical upwelling lofting the cloud to different altitudes.

280
The predicted stratospheric sulphur burdens in Pin10 and Pin14 compare well to the observations, suggesting a 10 Tg to 14 Tg SO 2 emission range will produce a volcanic aerosol cloud with realistic volcanic forcing magnitude. The comparison could provide a test for other interactive stratospheric models, to identify a model-specific source parameter calibration. It should be noted that such a reduction in emissions, to values below the SO 2 detected (Guo et al., 2004a), is a model adjustment, likely compensating for a missing sulphur loss/re-distribution process. 285 We also note some differences in sulphur burden between current and previous (Dhomse et al., 2014) Pinatubo simulations.
Firstly, the background burden in run Pin00 is much lower (0.11 Tg) than previous simulations (0.50 Tg) and now in reasonable agreement with (Hommel et al., 2011;Sheng et al., 2015b;Kremser et al., 2016), or with lower end of the ASAP report (SPARC, 2006) (0.12-0.18 for Laramie OPC balloon soundings and 0.12-0.22 Tg Garmisch lidar measurements respectively; there cited as 0.5-0.7 Tg and 0.5-0.9 Tg mass of 75% weight aqueous sulphuric acid solution, respectively). The main reason for the 290 9 https://doi.org/10.5194/acp-2020-344 Preprint. Discussion started: 6 May 2020 c Author(s) 2020. CC BY 4.0 License. reduction in simulated quiescent stratospheric sulphur burden, compared to Dhomse et al. (2014), is the influence from meteoric smoke particles (MSP), forming meteoric-sulphuric particles (Murphy et al., 2014). One of the effects from simulating these particles, in addition to homogeneously nucleated pure sulphuric acid particles, is to reduce the sulphur residence time (Mann et al., 2019a). There are also some dynamical differences in the updated simulations here, which use an improved vertical and horizontal resolution model (N96L85 rather than N48L60), that might influence stratosphere-troposphere exchange and 295 stratospheric circulation.
Secondly, we also assess the simulated stratosphere into the 3rd post-eruption year (after June 1993). Although for the first two years, the model's global stratospheric sulphur in the simulations Pin10 and Pin14 tracks closely with HIRS estimates (Figure 1a), the satellite-derived S-burden drops off rapidly from about 3 Tg in January 1993 to 0.5 Tg by September 1993.
On the other hand, the simulated volcanic aerosol cloud does not disperse down to that value until September 1994. However, 300 this accelerated loss of stratospheric sulphur in the HIRS data seems to be inconsistent with other satellite measurements, for example SAGE II measurements (see Figure 3), as well as OPC measurements (Thomason et al., 1997) and CLAES observations (e.g. Bauman et al., 2003;Luo, 2016). This suggests that latter part of the HIRS data may be inaccurate, though it seems difficult to identify a driving mechanism for this. Each of the model experiments suggest the stratosphere remained moderately enhanced throughout 1993 and 1994. 305 Figure 2 shows, for each eruption magnitude, the zonal mean ensemble-mean stratospheric AOD at 550 nm (sAOD 550 ) from the UM-UKCA Pinatubo simulations (Pin10, Pin14, Pin20), compared to three different volcanic forcing datasets. For this period, the GloSSAC data should be considered the primary one, being based on the latest version of the SAGE II, as an update from the gap-filled dataset from the SPARC ASAP report (SPARC, 2006, Chapter 3).
As in the HIRS sulphur burden comparisons (Figure 1), the Pin20 simulation, which best matches the satellite-observed SO 2 310 estimates, strongly overpredicts the stratospheric AOD in the tropics and Northern Hemisphere (NH) mid-latitudes, compared to all three reference datasets. However, whereas the lower emissions runs Pin10 and Pin14 both closely track the observed global column sulphur variation, run Pin10 has best agreement with all three reference datasets for mid-visible sAOD. For this run Pin14 is high-biased in the tropics and NH mid-latitudes. In the tropics, all three emission-magnitude ensembles are higher than the reference datasets. 315 Figure 2 illustrates the well-established global dispersion pattern for the Pinatubo aerosol cloud: initially confined to the tropical reservoir region, then dispersing to mid-latitudes, following the seasonal variation in the Brewer-Dobson circulation.
The over-prediction in the tropics is a common feature among interactive stratospheric aerosol models. It is noticeable that this over-prediction is worst in the first 6-9 months after the eruption, which could indicate the source of the model's discrepancy.
Whereas an overly non-dispersive tropical pipe in the model could be the cause, the timing is potentially more consistent with a 320 missing loss pathway that is most effective in the initial months after the eruption. Co-emitted volcanic ash will also have been present within the tropical reservoir, as seen in the airborne lidar depolarisation measurements in the weeks after the eruption (Winker and Osborn, 1992), and remained present in the lowermost part of the mid-latitude aerosol cloud in both hemispheres (Young et al., 1992;Vaughan et al., 1994). The AOD high bias is consistent with the hypothesis that a substantial proportion of the emitted sulphur may have been removed from the stratosphere by accommodation onto the sedimenting ash. If this 325 10 https://doi.org/10.5194/acp-2020-344 Preprint. Discussion started: 6 May 2020 c Author(s) 2020. CC BY 4.0 License. mechanism is causing such a vertical re-distribution within the tropical reservoir, it will increase the proportion of Pinatubo sulphur being removed into the troposphere via the rapid isentropic transport that occurred during the initial months in the lowermost stratosphere. Furthermore, stratospheric AOD is not a measure of sulphur, and the variations in sAOD will partly indicate changes in scattering efficiency that result from the gradient in effective radius that is disussed in later section.
The peak mid-visible AOD from AVHRR is higher than the SAGE II gap-filled satellite measurements (Long and Stowe, 330 1994). For example, as noted in Thomason et al. (2018), the peak mid-visible stratospheric AOD in the AVHHR dataset is around 0.4, compared to 0.22 in GloSSAC. However, other possible model biases cannot be ruled out. One consideration for these free-running simulations, even with each ensemble member initialised to approximate the period's QBO phase, is that nudging towards re-analysis meteorology would give more realistic representation of this initial phase of the plume dispersion (Sukhodolov et al., 2018). We chose to perform free-running simulations to allow the enhanced tropical upwelling 335 resulting from increased LW aerosol-absorptive heating, consistent with the SO 2 emission, known to be a strong influence on the subsequent simulated global dispersion (Young et al., 1994).
In contrast to the tropics and NH mid-latitudes, where run Pin14 agrees best with the reference datasets, run Pin20 compares best to the Southern Hemisphere (SH) sAOD 550 measurements in GLOSSAC. Runs Pin10 and Pin14 underestimate the cloud in this region. This difference may be highlighting the requirement for a more accurate simulation of the QBO evolution, likely 340 necessary to capture the Pinatubo cloud's transport to SH mid-latitudes (e.g. Jones et al., 2016;Pitari et al., 2016). One thing to note is that our simulations do not include the source of volcanic aerosol formed from the August 1991 Mount Hudson eruption in Chile. However, measurements from SAGE II (Pitts and Thomason, 1993) and ground-based lidar (Barton et al., 1992) indicate that the Hudson aerosol cloud only reached to around 12 km, with the Pinatubo cloud by far the dominant contributor to SH mid-latitude sAOD.

345
Overall, the sAOD 550 comparisons confirm the findings from Figure 1 that for UM-UKCA, consistent with other global microphysics models (Sheng et al., 2015a;Mills et al., 2016), Pinatubo aerosol properties are better simulated (acknowledging the discrepancy in the SH) with a 10 Tg to 14 Tg range in volcanic SO 2 emission.
Although Figure 2 suggests significant differences among the volcanic forcing datasets for the Pinatubo period, the GLOS-SAC data is the reference dataset while the 1991-4 period in Sato data is mostly based on an earlier version of the SAGE II 350 data. The GloSSAC data have been compared extensively with lidar measurements (Antuña et al., 2002;Antuña, 2003), and combined for the gap-filled dataset (SPARC, 2006) with improvements in the SAGE II aerosol extinction retrieval algorithm (version 7).
For historical climate integrations in CMIP5, some models used the Sato forcing dataset whilst others used Ammann and their differences affect interpretation of volcanic impacts among the models (Driscoll et al., 2012). For CMIP6, all models have 355 harmonised to use the same forcing dataset, with a dedicated VolMIP analysis to compare the climate response in each model and with the GloSSAC Pinatubo forcing applied to the pre-industrial control (Zanchettin et al., 2016).
After comparing the total sulphur burden and sAOD, Figure 3 shows UM-UKCA simulated mid-visible extinction at 3 different altitudes in the lower stratosphere, to evaluate the simulated vertical extent of the Pinatubo cloud through the global dispersion phase. For the tropics, extinction comparisons are shown at 24 km, 28 km and 32 km, whereas for SH (35 • S-360 60 • S) and NH (35 • N-60 • N) mid-latitudes the chosen levels are 20 km, 24 km and 28 km, to account for the higher tropical tropopause. Simulated extinctions are compared with raw SAGE v7.0 data (Damadeo et al., 2013) as well as the gap-filled GloSSAC product (Thomason et al., 2018) at 525 nm. As discussed previously, extinctions from Pin14 (and to some extent Pin10) show much better agreement with observational data for all three latitude bands. Most importantly, model extinction remain close or slightly lower in the mid-latitude compared to SAGE II extinction even after 4 years, suggesting that the sharp 365 decay in S-burden observed by Baran and Foot (1994) may be unrealistic. Interestingly, in the SH mid latitudes, extinction from Pin14 shows much better agreement with SAGE II extinctions at 20 and 24 km. This again confirms biases discussed in Figure 2 that could be attributed to the weaker lower stratospheric transport in the SH mid-latitudes. At 1020 nm, agreement is even better (See Supplementary Figure S1). Also as observed in Figure 1 and 2, extinction differences between runs Pin10, Pin14 and Pin20 are largest for the first few months after the eruption but extinction lines almost overlap within ensemble 370 variance from each eruption. This again confirms that the more SO 2 injection, the faster growth and removal within first few months after the eruption.
One of the key feature seen in Figure 3 that is not captured well in any model simulation is the plateau in the SAGE II (and GloSSAC) tropical peak extinction. For example, at 24 km (where the effect of instrument saturation should be minimal), after reaching peak values within first 3 months, extinction values remain almost flat for at least 6 months. At 20 km, this plateau in 375 extinction in the tropics is visible for almost 12 months in the GloSSAC data (not shown). Similar features are visible at 1020 nm extinction ( Figure S1). If indeed these plateau features are realistic in observational data, then they would be maintained by balance between tropical up-welling (upward branch of the Brewer-Dobson circulation as well as one from aerosol-induced heating) and sedimentation of particles that have grown via coagulation. On the other hand, model-simulated extinction shows more prominent seasonal cycle fluctuations during NH winter when the Brewer Dobson circulation (tropical upwelling ) is 380 strongest (e.g. Dhomse et al., 2008;Weber et al., 2011;Butchart, 2014). These plateau structures in extinctions are not so distinct at mid-latitudes in either hemisphere but seasonal cycle fluctuations that are determined by the wintertime circulation are visible in both SAGE II and model data. Another important difference is that modelled extinctions are low-biased (up to 50%) during pre-eruption months. This could be associated with low background S-burden in our model or slightly elevated stratospheric aerosol due to small volcanic eruptions (such as Kelud,1990) that are not present in our simulations. Another 385 explanation could be due to the fact that model not resolving the uptake of organics, with observations Murphy et al. (2007) and modelling (Yu et al., 2016) suggesting organic-sulphate particles (Murphy et al., 2014) are the dominant aerosol type in the tropical and mid-latitude upper troposphere and lower stratosphere.
Next, we compare effective radius (Reff) at similar altitudes. Figure 4 shows zonal mean Reff at 20 and 25 km from runs Pin10 and Pin14. along with observation-based Reff described in Luo (2016). As shown in previous sections, Pin20 clearly 390 shows a high bias compared to S-burden, sAOD 550 as well as extinction observations, hence it is excluded in Figure 4. Overall, the temporal and spatial evolution of Reff estimated using observational data seems to be well captured in Pin14 compared to Pin20 (although Reff magnitude is high-biased by about 10%). Another important feature is that Reff at 25 km in the model simulations persists much longer than CMIP6 Reff. It is important to note that Russell et al. (1996) Russell et al. (1996). And this clearly shows significant improvement since Dhomse et al. (2014) where Reff was underestimated by about 50%. The updated comparison to the These improvements were noticed during model development after Dhomse et al. (2014) to include meteoric smoke particles and their interactions at version 8.2 of the GLOMAP codebase (Brooke et al., 2017;Mann et al., 2019a;Marshall et al., 2019).
In the tropics, where Reff increases are largest, the timeseries of Reff is noticeably different at 20km and 25km. At 25km, the model simulations are somewhat counter-intuitive. Initially, they show decrease in Reff, likely due to this central part of 405 the volcanic cloud being younger (and smaller) particles formed as the oxidation of the volcanic SO 2 triggers extensive new particle formation. By contrast at 20km, below the altitude at which the volcanic plume detrains the SO 2 (injection height range is 21-23km) the effective radius shows a steady increase, as relatively larger particles sediment to these altitudes as the tropical volcanic aerosol reservoir progresses. There is a slow but substantial growth in the average particle size in this tropical Pinatubo cloud, with the 20km level reaching peak Reff values only during mid-1992, in contrast to the peak S-burden and 410 sAOD 550 which have already peaked at this time, being in decay phase since the start of 1992.
Whereas the simulated peak Reff enhancement occurs by mid-1992 in the tropics, in the NH mid-latitudes, the peak Reff occurs at the time of the peak meridional transport, the Reff variation there mainly reflecting the seasonal cycle of the BD circulation (Butchart, 2014). The different timing of the volcanic Reff enhancement in the tropics and mid-latitudes is important when interpreting or interpolating the in-situ measurement record from the post-Pinatubo OPC soundings from Laramie. 415 Russell et al. (1996) show the Reff values derived from the Mauna Loa ground-based remote sensing are substantially larger than those from the dust-sondes at Laramie. Model simulation confirms this inherent coupling between dynamics, circulation and microphysical growth processes causes a different relationship between the tropical to mid-latitude ratio in Reff in the upper and lower portions of the volcanic aerosol cloud.
An important aspect of volcanically enhanced stratospheric aerosol is that they provide surface area for catalytic ozone 420 loss (e.g. Hofmann and Solomon, 1989). Stratospheric sulphate area density comparison for three different months (December 91, June 1992 and December 1992) is shown in Figure 5. SAD derived using observational data (Arfeuille et al., 2014) also known as 3λ SAD is also shown. Again, Pin20 SAD shows a high biases, whereas Pin10 SAD seems to show good agreement with 3λ data. Our simulations do not include the SO 2 injection from the August 1991 Mt. Hudson eruption (Chile), and yet the model captures well the volcanic SAD enhancement in the SH mid-latitude stratosphere. The model does transport to the high latitudes (weaker subtropical barrier in the middle stratosphere) and/or quicker coagulation thereby faster sedimentation. Figure 6 shows the time series of observed SW and LW radiative near-global mean flux anomalies (60 • S -60 • N), with respect to a 1985 to 1989 (pre-Pinatubo) baseline. ERBE (black symbols) data is from Edition 3 Revision 1, non-scanner, wide field of-view observations (Wielicki et al., 2002). Coloured lines indicate ensemble mean forcings anomalies from three Pinatubo 435 SO 2 emission scenarios. The Pin10 simulation generates a peak solar dimming of 4 W/m2, matching well both the timing and magnitude of the peak in the ERBE SW anomaly timeseries. It is notable that if the ERBE SW anomaly is calculated relative the 1995-1997 baseline, we see a peak solar dimming of 5.5 W/m2 (not shown), which then compares best with the Pinatubo SW forcing from Pin14. Consistently with the the S-burden, sAOD550 and mid-visible extinction comparisons (Figures 1, 2 and 3), the Pin20 simulation also overpredicts the magnitude of the Pinatubo forcing compared to ERBE. It is important to 440 note here that the model Pinatubo forcings are not only from the volcanic aerosol cloud, but include also any effects from the simulated post-Pinatubo changes in other climate forcers (e.g. stratospheric ozone and water vapour). As expected run Pin20 shows largest anomalies in both SW and LW radiation and distinct differences between Pin10, Pin14 and Pin20 are visible until the end of 1992. For this 10 to 20Tg emission range, we find the global-mean SW forcing scales approximately linearly with increasing SO2 emission amount, the 40% increase from 10 to 14Tg and 43% increase from 14 to 20Tg causing the 445 Pinatubo SW forcing to be stronger by 34% (4.1 to 5.5 W/m2) and 36% (5.5 to 7.5 W/m2) respectively.
In contrast to the SW forcings, the magnitude of the anomaly in the peak LW forcing is best matched in the Pin20 simulation, although the Pin14 simulation also agrees quite well with the ERBE anomaly timeseries. Whereas the Pinatubo SW forcing will follow closely the mid-visible aerosol changes, the LW forcing is more complex to interpret, simulated LW aerosol absorption not analysed in this paper, and almost certainly having a different temporal variation than the 550 nm and 1020 nm extinction 450 variations analysed here. Also, the model LW forcing also includes effects from the dynamical changes in stratospheric water vapour which partially offsets the SW dimming (e.g. Joshi et al., 2003) adding to the LW aerosol effect. Our simulations do not include co-emission of water vapour, which might have influened stratospheric chemistry (e.g. LeGrande et al., 2016) and altered observed Pinatubo forcing. Another possible explanation for this discrepancy might be much weaker signal in LW radiation alongside ERBE temporal coverage (36 days vs 72 days). Again, as in the sulphur burden and extinction comparisons, 455 after January 1992 observed SW anomalies seem to decay at a faster rate compared to all the model simulations.
Another important volcanic impact is the aerosol-induced heating in the lower stratosphere as large particles absorb outgoing LW radiation. Since the ERA-interim analysis assimilates radiosonde observations from large number of sites in the tropics, we can compare the temperature anomaly to the model predictions, as a further independent test. However, exact quantification of this mechanism is somewhat compilcated as the ERA-interim stratospheric temperature anomalies also includes influence from 460 other chemical and dynamical changes such as variation in ozone and water vapour as well as QBO and ENSO related changes in tropical upwelling (e.g. Angell, 1997b;Randel et al., 2009). Assuming the 5-year anomalies will remove effects of some of the short-term processes. Modelled temperature anomalies are simply differences between the sensitivity (Pin10, Pin14 and Pin20) and control (Pin00) simulations. Although we compare the simulated Pinatubo warming (temperature difference) to ERA-interim temperature anomalies, this is only intended to provide an approximate observational constraint for the magnitude 465 14 https://doi.org/10.5194/acp-2020-344 Preprint. Discussion started: 6 May 2020 c Author(s) 2020. CC BY 4.0 License. of the effect and the altitude at which it reaches maximum. The Pin10 simulation best captures the magnitude of the ERAinterim post-Pinatubo tropical temperature anomalies, and the model simulations and re-analysis both show maximum warming occurred in the 30 to 50 hPa range around 3-4 months after the eruption. The model predicts Pinatubo aerosol cloud continued to cause a substantial warming (> 2 K) throughout 1992, that propagates downwards as in ERA-interim temperature anomalies.

El Chichón aerosol cloud 470
Whereas Pinatubo is often the main case study to evaluate interactive stratospheric aerosol models, El Chichón provides a different test for the models, its volcanic aerosol cloud dispersing almost exclusively to the NH. We also seek to understand whether the biases seen for Pinatubo (over-predicted tropical sAOD and discrepancy between literature estimates of SO2 emission and the peak global aerosol loading) are also seen for this alternative major eruption case.
Both El Chichón and Pinatubo eruptions occurred in the modern satellite era, however there are far fewer datasets available 475 for the evaluation of El Chichón aerosol properties as it occurred in the important gap period between SAGE-I and SAGE II (see Thomason et al., 2018). Although there are quite extensive observational data records for the El Chichón volcanic aerosol clouds (e.g McCormick and Swissler, 1983;Hofmann and Rosen, 1983a), combining these data with satellite datasets would greatly reduce large uncertainties about the evolution of the El-Chichón aerosol cloud (e.g. Sato et al., 1993;SPARC, 2006).
Here, our analysis focuses primarily on comparing simulated mid-visible stratospheric AOD at 550 nm (sAOD 550 ) to the 480 CMIP6 and Sato datasets. We also test the simulated vertical extent of the El Chichón cloud, comparing extinction at 20 km and 25 km to the SAGE II (and GloSSAC) data record, and compare the model's simulated warming in the tropical lower stratospheric to temperature anomalies in the ERA-Interim reanalyses. In the model, how deep the tropical volcanic aerosol reservoir that forms is closely linked to the altitude of the volcanic SO2 emission. We aligned our experiments with the ISA-MIP HErSEA experiment design (Timmreck et al., 2018), specifying a 24-27 km injection height based on the information from the airborne lidar measurements in the tropical stratosphere that provide the main constraint for the gap-filled dataset (see Figure 4.34 in the ASAP report (SPARC, 2006)). Balloon measurements from Southern Texas and Laramie (Hofmann and Rosen, 1983b), and the constraints from the airborne lidar survey flights in 495 July, September and October (McCormick and Swissler, 1983) will likely provide a good constraint for the interactive models, showing that a large part of the plume was transported to NH mid-high latitudes via middle branch of the Brewer Dobson circulation at around 25km, the lower altitudes of the cloud remaining confined to the tropical reservoir. The evolution of the cloud is complex and strongly influenced by several effects: the rate of SO 2 conversion to aerosol and the depletion of their interactions) and the downward propagating QBO. The multiple interacting processes within the tropical reservoir make analysing this early phase dispersion a complex problem, yet their combined net effects determines the subsequent transport of the aerosol to mid-latitudes, and the radiative forcing that results.
Due to significant differences observed in Figure 8, even with limited SAGE II observations, simulated extinctions are compared in Figure 9. Simulated extinctions for all three SO 2 emission scenarios show an excellent agreement with SAGE 505 II from October 1984 onwards. A sudden jump in the GloSSAC data at the start of the SAGE II period is evident, and other unexplained sudden increases in extinction earlier in the CMIP6 dataset, e.g. in the SH at 24km. On the other hand, somewhat elevated SAGE II extinction in the NH mid-latitudes compared to model extinctions highlight possible model discrepancy due to injection altitude leading faster removal. GloSSAC extinction in the SH mid-latitude shows very little seasonal variation, and the sudden changes seen at both 20 and 24km are surprising and difficult to reconcile with expected variation, and could 510 potentially be artefacts from the interpolation procedure. Overall, Figure 9 clearly suggests potential areas where combining with models may help improve the GloSSAC (and other) datasets, highlighting the need for combining observational data with El Chichón-related model simulations to better represent the consistency and variations within the El Chichón surface cooling included in climate models. Figure 10 shows the tropical warming of the stratosphere predicted by the model, comparing again to the ERA-interim tem-515 perature anomaly (compared to the mean for 1982-1986). As in Pinatubo case (Figure 7), the speed of downward propagation of these anomalies seems to be well captured by all the simulations. Peak warming of about 3 K observed in ERA-interim between 30-50 hPa seems to be well reproduced in Elc07. Warm anomalies (up to 1 K) visible in ERA-interim data between 10-20 hPa suggest the downward propagating westerly QBO contributed to up to 1 K warming, hence simulated warming will be about 1K less than the ERA-interim anomalies. Overall, Elc05 seems to reproduce El Chichón-related warming more 520 realistically but the slight warming persisting near 70 hPa until March 1983 is absent in this simulation. Again this suggest that for UM-UKCA, 5 Tg and 7 Tg are reasonable lower and upper limits of SO 2 injection required to simulate observed lower stratospheric warming.

Mt. Agung aerosol cloud
The El Chichon and Pinatubo eruptions occurred when satellite instruments were monitoring the stratospheric aerosol layer, 525 and the global dispersion of their volcanic aerosol clouds are relatively well characterised For the Agung period our knowledge of the global dispersion is less certain and primarily based on the synthesis of surface radiation measurements from Dyer and Hicks (1968). These measurements show the Agung cloud dispersed mainly to the SH, although aerosol measurements from 10 balloon-borne particle counter soundings from Minneapolis in 1963-65 (Rosen, 1964(Rosen, , 1968) and ground-based lidar from Lexington, Massachussetts in 1963(Grams and Fiocco, 1967 show substantial enhancement in the NH as well. For 530 this period, the Sato forcing dataset enacts solar dimming following the ground-based solar radiation measurements discussed in Dyer and Hicks (1968), whereas the CMIP6 dataset is based on simulations with a 2D interactive stratospheric aerosol model.  sparc-ssirc.org/data/datarescueactivity.html). Its primary aim is to gather and in some cases re-calibrate post-Agung aerosol measurements from major volcanic periods to provide new constraints for stratospheric aerosol models. For example, shipborne lidar measurements of the tropical volcanic aerosol reservoir after Pinatubo have recently been recovered (Antuna-Marrero et al., 2020;Mann et al., 2020). As part of this paper, we are contributing to this SSiRC activity and have recovered the Lexington post-Agung ground-based lidar measurement from Grams and Fiocco (1967) and use these to constrain the 540 vertical extent of the Agung aerosol cloud. Figure 11 compares sAOD 550 from model simulations with CMIP6, Sato and Ammann data. Both CMIP6 and Sato datasets suggest the tropical volcanic aerosol cloud dispersed rapidly, and almost exclusively, to the SH, consistent also with our understanding of QBO-dependent meridional transport (Thomas et al., 2009). This means that during the westerly QBO phase the volcanic plume is quickly transported towards the winter hemisphere whereas during the easterly phase the tropics-to-545 high-latitude transport is slower, hence some part of the plume is available for the wintertime transport into the opposite hemisphere. In contrast, the Ammann dataset suggests a significant part of the cloud was transported to the NH, the dispersion parameterisation considering only seasonal changes in stratospheric circulation. Hence, the modulation of meridional transport caused by the QBO, in the Agung case, increasing the export from low to mid-latitudes, is not represented in the Ammann dataset.
550 Figure 11 also shows that for the post-Agung period, there are very large differences in the sAOD 550 between CMIP6 and Sato data. Hence, climate simulations performed using these two forcing data sets would have significantly different response between two CMIP assessment. Overall, the CMIP6 dataset generates much stronger peak sAOD 550 than Sato, with a peak of around 0.4 in the tropics, a few months after the eruption. Sato data shows peak value of about 0.12, which suddenly drops to below 0.05 within couple of months. Then after there is a steady build-up with a local peak in sAOD 550 occurring in 555 November 1963, 8 months after the eruption. The Sato dataset then enacts a much stronger second peak in tropical sAOD 550 in August-September 1964 that must be based on measurements from Kenya and the Congo (Dyer and Hicks, 1968). By contrast, CMIP6, based on the AER-2D model, predicts the Agung cloud dispersed rapidly to the SH with the tropical reservoir reducing to sAOD 550 of less than 0.05 at that time. Our simulations predict the Agung aerosol dispersed to the SH with similar timing to the CMIP6 dataset, but with a larger proportion remaining in the tropical reservoir. Similar to CMIP6 datasets our simulations 560 also predict secondary sAOD 550 peak in SH mid-latitudes near 40 • S. Although a similar pattern is produced in almost all simulations, sAOD 550 from Agu12 seems to be in much better agreement with CMIP6 data.
These comparisons highlight that there is still substantial uncertainty about the global dispersion of the Agung cloud. However, there are extensive set of stratospheric aerosol measurements carried out during this period (see http://www.sparc-ssirc. org/data/datarescueactivity.html). Hence, there is potential to reduce this uncertainty combining these observations also with 565 interactive stratospheric model simulations (Timmreck et al., 2018). Dyer and Hicks (1968) discuss the transport pathways for the volcanic aerosol, in relation to seasonal export from the tropical reservoir. Stothers (2001) analysed a range of measurements to derive the turbidity of the Agung cloud, but they neglected measurement from Kenya and Congo sites in their analysis, attributing a lower accuracy in those data. It is notable those observations were during the dry season, when other sources of aerosol could potentially have caused additional solar dimming. In terms of modeling, Niemeier et al. (2019) discussed possible 570 implications of two separate Agung eruptions in 1963. They performed two model simulations, one with a single eruption and one with two separate eruptions on 17th March and 16th May with 4.7 Tg and 2.3 Tg SO 2 injection, respectively. They found significant differences between simulated aerosol properties and available evaluation datasets. They suggested that two separate eruptions are necessary to simulate the climatic impact. However, due to limited observational data they could not validate their model results extensively. They also discussed that simulated sAOD 550 differences with respect to evaluation data are 575 larger than the differences between their two simulations. Pitari et al. (2016) also present global mean sAOD 550 changes after the Agung eruption with single eruption (12 Tg on 16 May 1963), but they did not show the latitudinal extent of the Agung volcanic cloud dispersion. nearly all the model simulations show high bias compared to CMIP6 data and model extinction. On the other hand, at 20 km, tropical CMIP6 extinction seems to peak a bit later and there is better agreement in the mid-latitude extinction in both the hemispheres. The UM-UKCA extinctions reflect the primary influence from the QBO because of changing the sub-tropical edge of the tropical reservoir as well as peak wintertime meridional transport in either hemisphere. On the other hand, CMIP6 extinctions, show strong seasonal cycle in the tropics. The differences between our model and CMIP6 extinction must be 585 primarily due to injection altitude and the simplified aerosol microphysical model used to construct CMIP6 data. Similar evolution is observed in the extinctions at 1020 nm as shown in supplementary Figure S3. as presented in Grams and Fiocco (1967). The method used to convert lidar backscatter to extinction is described in the Supplementary Material. Although lidar data shows large variability, these single location measurements still provide better 590 insight about the transport of Agung aerosol cloud in the NH. At 16km, Agu09 seem to show better agreement with lidar data, although by spring 1965, simulated extinctions are lower than the lidar data, suggesting faster decay in the model at this level.
A similar pattern is observed at 20 km. The somewhat larger lidar extinction in spring 1965 compared to model simulations might be due to either weak model tropics-to-NH-mid-latitude transport (more transport to the SH), or aerosol removal is too fast in the simulations. Extinctions at 24 km are shown in supplementary Figure S4, and again confirm good agreement between 595 lidar and Agu09. Overall, the extinction comparison with Lexington lidar data suggests that transport of the Agung volcanic cloud and its vertical extent in to the NH mid-latitude is well represented in Agu09.
Finally, we compare tropical warming in Figure 12. As ERA-Interim reanalyses start in 1979, hence we calculate observationalbased anomalies from ERA-40 data. Bearing in mind that almost all the reanalysis datasets have significant inhomogeneities in the pre-satellite era, observation-based warming estimates should be treated carefully. However, as expected ERA-40 data 600 show almost 1 K warming in the middle stratosphere before the eruption indicating downward propagation of warmer anomalies associated with the westerly QBO. Using radiosonde data, Free et al. (2009) estimated about 1.5 K warming near 50 hPa, which is somewhat consistent with ERA-40 (after removing 1 K warming due to westerly QBO). However, almost all of the simulations show 1-2 K more warming compared to ERA-40 data as modelled temperature differences do not include QBO-related anomalies.  et al., 2018). The analysis is also designed to provide new "microphysically-consistent and observationally-constrained" volcanic forcing datasets for climate models, to represent each eruption's surface cooling more realistically.
Simulated aerosol optical properties are compared against a range of satellite datasets. The model captures the observed vari-615 ation in global stratospheric sulphur from 1991-3 HIRS measurements very well, and experiments Pin10 and Pin14 defining a model-specific 10 to 14 Tg emissions uncertainty range and identifying a potential weighting to define a best-fit forcing dataset for Pinatubo. Our simulations also show that the aerosol decay rate is inversely proportional to the SO 2 injection amount, illustrating how increased aerosol particle size causes faster sedimentation. The model ensembles compares very well to mid-visible and near-infra-red aerosol extinction from SAGE II measurements. Although, the model shows higher sAOD 620 biases in the tropics, it is common feature seen in interactive stratospheric aerosol models (e.g. Mills et al., 2016;Sukhodolov et al., 2018;Niemeier et al., 2019). We have also compared the Pinatubo ensembles to the three widely used forcing datasets (CMIP6-GloSSAC, Sato and Ammann) and we find that Pin14 model ensemble shows overall best agreement. A plateau in lower stratospheric tropical extinction seen in GloSSAC data for almost one year after the Pinatubo eruption, is not reproduced in our simulations and thus remains as an open scientific question. The 10-14Tg SO 2 emissions rage for the model is lower 625 than the 14-23 Tg observed to be present after the eruption (Guo et al., 2004b), and the tropical sAOD 550 high bias is consistent with the models missing an important removal process. Plausible suggestions for these are: a) the vertical redistribution of the volcanic cloud due to ash, b) changes in SO 2 oxidation due to OH decrease inside the plume, and c) too strong a subtropical barrier in the models.
The simulated Reff shows good agreement with CMIP6 data, the model simulates a deeper global layer of enhanced SAD 630 than in the 3λ dataset (Luo, 2016). Simulated global-mean SW forcing (solar dimming) in run Pin10 shows excellent agreement with the magnitude of the anomaly in the ERBE data, and the LW forcing in the model also matching well with the magnitude and shape of the ERBE anomaly. Assuming a 1 K colder temperature anomaly in ERA-Interim tropical temperatures due to the downward propagating QBO, a warming of 3 K near 50 hPa is well simulated in both Pin10 and Pin14 simulations.
Overall, most of the comparisons suggest that about 10-14 Tg SO 2 injection between 21-23 km is sufficient to simulate the 635 climate and chemical impact of the Mt. Pinatubo eruption.
period, hence El Chichón-related aerosol properties must be treated with caution. Based on comparisons of the lower stratospheric warming of about 2 K, 5 Tg and 7 Tg SO 2 injections seem to be reasonable lower and upper limits for what is required to simulate observed temperature changes.
Finally, evaluation of Mt. Agung aerosol is more complicated due to much larger differences in the observation-based datasets. Due to the westerly phase of QBO and timing of the eruption, CMIP6 data show a tropical peak in sAOD 550 within 645 a month of the March eruption which is transported to SH mid-latitudes by October. Sato dataset suggest two peaks in the tropics 8 and 14 months after the eruption. Run Agu06 shows reasonable agreement with limited amount of observational extinction data, although that is not conclusive. Comparison with the lidar measurements from Lexington suggests that the vertical extent of the Agung volcanic cloud in the NH mid-latitudes, is in good agreement with run Agu09. Comparisons with ERA-40 temperature anomalies also suggests that 3 K warming in the tropical stratosphere (2 K in the model simulation due 650 to westerly phase of QBO). Assuming CMIP6-simulated sAOD 550 is realistic, 6 Tg and 9 Tg SO 2 injection seem to be the best lower and upper estimates required to simulate Mt. Agung-related aerosol in the UM-UKCA.
Overall, we have validated the interactive stratospheric aerosol configuration of the GA4 UM-UKCA model, and have shown the simulated aerosol properties for the Pinatubo ensemble are consistently in good agreement to a range of satellitebased observational datasets. For Pinatubo, we have also compared to three different independent tests of the radiative effects 655 from the volcanic aerosol cloud: the ERBE flux anomaly timeseries in the SW and LW, and the stratospheric warming in the ERA-interim re-analysis. These comparisons confirm that a 10 to 14 Tg emission flux of SO 2 would accurately represent the effects the new forcing datasets would enact for Pinatubo in chemistry climate model integrations. For El Chichón and Agung, the magnitude of the volcanic forcing is highly uncertain, the volcanic aerosol datasets used in CMIP5 and CMIP6 historical integrations showing substantial differences. We contend there is substantial potential to improve on this situation, 660 by identifying consensus forcings from multi-model simulations (Timmreck et al., 2018), with comparison to additional in-situ and active remote sensing measurements such as those being initiated within the SSiRC data rescue activity (Antuna-Marrero et al., 2020;Mann et al., 2020).   show sAOD550 from CMIP6, Sato et al. (1993) and Ammann et al. (2003), respectively.  1991 1992 1993 1994 h) 24km 1991 1992 1993 1994 i) 20km      1982 1983 1984 1985 h) 24km 1982 1983 1984 1985 i) 20km  g) 16km 1963 1964 1965 1966 h) 20km 1963 1964 1965 1966 i) 16km  Pin10 10 As Pin00 21-23 As Pin00 Pin14 14 As Pin00 As Pin10 As Pin00 Pin20 20 As Pin00 As Pin10 As Pin00 Elc00 0 4 April 1982 NA Westerly Elc05 5 As Elc00 24-26 As Elc00 Elc07 7 As Elc00 As Elc05 As Elc00 Elc10 10 As Elc00 As Elc05 As Elc00 Agu00 0 17 March 1963 NA Westerly Agu06 6 As Agu00 20-22 As Agu00 Agu09 9 As Agu00 As Agu06 As Agu00 Agu12 12 As Agu00 As Agu06 As Agu00