The 2019 Raikoke volcanic eruption: Part 1 Dispersion model simulations and satellite retrievals of volcanic sulfur dioxide

. Volcanic eruptions can cause signiﬁcant disruption to society and numerical models are crucial for forecasting the dispersion of erupted material. Here we assess the skill and limitations of the Met Ofﬁce’s Numerical Atmospheric-dispersion Modelling Environment (NAME) in simulating the dispersion of the sulfur dioxide ( SO 2 ) cloud from the 21-22 June 2019 eruption of the Raikoke volcano (48.3 ◦ N, 153.2 ◦ E). The eruption emitted around 1 . 5 ± 0 . 2 Tg of SO 2 , which represents the largest volcanic emission of SO 2 into the stratosphere since the 2011 Nabro eruption. We simulate the temporal evolution of the 5 volcanic SO 2 cloud across the Northern Hemisphere (NH) and compare our model simulations to high-resolution SO 2 measurements from the Tropospheric Monitoring Instrument (TROPOMI) and the Infrared Atmospheric Sounding Interferometer (IASI) satellite SO 2 products. We show that NAME accurately simulates the observed location and horizontal extent of the SO 2 cloud during the ﬁrst 2-3 weeks after the eruption, but is unable, in its standard conﬁguration, to capture the extent and precise location of TROPOMI show a similar peak in SO 2 mass burden (1.4-1.6 Tg of SO 2 ) with an average SO 2 e-folding time of 14-15 days in the NH. Our work demonstrates the large potential of using high-resolution satellite retrievals to identify and rectify limitations in dispersion models like NAME, which will ultimately help to improve dispersion modelling efforts of volcanic SO 2 clouds.


Introduction
Volcanic activity can vary strongly in intensity, ranging from passive degassing volcanoes emitting sulfur into the lower troposphere to explosive eruptions that can release large amounts of ash and gases high into the stratosphere (e.g., Oppenheimer et al., 2011). It is well established that volcanic eruptions can impact Earth's climate system through changes in the energy balance (e.g., Robock, 2000;Schmidt et al., 2012;Stenchikov, 2016;Schmidt et al., 2018), which can affect the hydrological 30 cycle (e.g., Trenberth and Dai, 2007) and atmospheric dynamics (e.g., Shindell et al., 2004). Furthermore, volcanic air pollution events can lead to a severe and spatially widespread health hazard and increase excess mortality (e.g., Schmidt et al., 2011).
For the aviation industry, ash and gas emissions from volcanic eruptions can pose a flight safety hazard. Flying through volcanic ash is a well-recognised hazard as it can reduce visibility, damage the exterior of the aircraft and compromise the functionality of aircraft engines. Ingestion of volcanic ash can cause engine failure and permanently damage jet engines (Casadevall 35 et al., 1996;Prata and Tupper, 2009;Dunn, 2012;Prata et al., 2019). When sulfur dioxide (SO 2 ) oxidises to sulfuric acid and upon hydration forms sulfuric acid aerosol particles (see e.g., Hamill et al., 1977;Hofmann and Rosen, 1983), damage to the exterior of aircrafts can occur (e.g. window crazing) (Bernard and Rose, 1990). Through sulfidation, SO 2 can also cause serious damage to the interior of the engines. Sulfuric acid aerosol particles have been recorded to corrode nickel alloys in engine components (e.g. compressor blades) when alkali metal salts, like mineral dust or sea salt, are co-present (Eliaz et al., 2002; 40 Grégoire et al., 2018). While this effect has not been linked to immediate engine failures, it is a concern for the aviation industry as it increases maintenance costs. Apart from material damage, sulfurous odours can also cause distress of cabin passengers and aircrew.
The Raikoke volcano is located in the Kuril Island chain, near the Kamchatka Peninsula in Russia (48.3   ographic location of volcanic ash present in the atmosphere. Currently, the VAACs are only required to provide forecasts of 55 volcanic ash dispersion and therefore less development has been made on the forecasting of volcanic gas clouds. There is, however, increasing consensus among the scientific community that monitoring and simulating SO 2 clouds could be of interest to stakeholders, as volcanic SO 2 can pose a public health hazard and potentially affect the aviation industry (e.g. increase of aircraft maintenance costs) Schmidt et al., 2014;Carboni et al., 2016;Granieri et al., 2017;Grégoire et al., 2018). Volcanic SO 2 clouds are also frequently (but not always) co-located with ash clouds. Detecting ash clouds from 60 satellites retrievals remains a challenging task and therefore SO 2 clouds are potential tracers for the more difficult observable ash clouds (e.g., Sears et al., 2013;Kristiansen et al., 2015). As a result, the latest roadmap published by the IAVW (ICAO, 2019b) includes SO 2 forecasts as a core item to be implemented in the future.
The main tool used by VAACs to provide accurate forecasts of volcanic cloud characteristics is atmospheric dispersion models (ADMs), which are numerical models that simulate how air parcels disperse within the atmosphere. ADMs are used a strong research focus on improving the simulation of volcanic ash in ADMs and measuring volcanic ash using in-situ and satellite measurement techniques (e.g., Witham et al., 2007;Corradini et al., 2011;Prata and Prata, 2012;Mulena et al., 70 2016; Harvey et al., 2018;Webster et al., 2020). The skill of ADMs in simulating the evolution of SO 2 clouds has also been investigated (e.g., Eckhardt et al., 2008;Heard et al., 2012;Boichu et al., 2013;Schmidt et al., 2015), but to a much lesser extent as this has generally been the realm of global climate models interested in the climatic impacts of the periodic stratospheric injections from volcanoes (e.g., Haywood et al., 2010;Solomon et al., 2011;Schmidt et al., 2018).
Observations are vital in determining the skill of the dispersion models. While in situ observations are available for several 75 well-studied volcanoes (e.g., Pfeffer et al., 2018;Sahyoun et al., 2019;Whitty et al., 2020), they are only available for a limited number of locations. In recent decades, many high-resolution remote-sensing measurements have become available, providing a great data source on activity at even the most remote volcanoes across the globe. A large number of satellites now measure atmospheric SO 2 , with each newly launched instrument having an increased accuracy (see for example fig. 2 in Theys et al. (2019)). The TROPOspheric Monitoring Instrument (TROPOMI), which is part of the ESA's S5P satellite, has been operational 80 since the end of 2017 and measures atmospheric SO 2 concentration at an unprecedented resolution (Theys et al., 2017(Theys et al., , 2019. TROPOMI therefore provides a useful new source of information to evaluate model simulations of SO 2 clouds using ADMs (and climate models).
One major issue for the development of ADMs for volcanic clouds is the relatively small number of large-magnitude eruptions available since the start of the satellite era in 1979 in order to validate model output. While smaller-magnitude eruptions 85 take place more frequently, large-magnitude eruptions that can emit large amounts of SO 2 into the stratosphere are much more sporadic (e.g., Pyle, 1995;Carn et al., 2016;Schmidt et al., 2018). The 2019 Raikoke eruption is the first eruption with SO 2 emissions in excess of 1 Tg of SO 2 that has been observed by the TROPOMI instrument. Because of the amount of SO 2 emitted into the stratosphere, it provides an ideal test case to validate the skill of the Numerical Atmosphere-dispersion Modelling Environment (NAME) (Jones et al., 2007), which is the dispersion model used by the London VAAC (Witham et al., 2020). In 90 this paper we will focus on the evolution of the volcanic SO 2 cloud during the first 3 weeks after the 2019 Raikoke eruption and compare the output from NAME with the TROPOMI and the Infrared Atmospheric Sounding Interferometer (IASI) satellite SO 2 products. In the accompanying Part 2 paper (Osborne et al., 2020), a detailed assessment of the sulfate aerosol together with volcanic ash from this eruption are discussed as well as the effects from biomass burning aerosols that were emitted into the stratosphere from an unusually strong pyrocumulus event in continental north America. 95 The manuscript is structured as follows: after discussing the TROPOMI and the IASI satellite SO 2 products in sections 2.1 and 2.2, we briefly introduce all the relevant aspects of the NAME dispersion model in section 2.3. As part of this discussion, we present three vertical SO 2 emission profiles in sections 2.3.5 that we used for our NAME simulations, one of which is derived from combining the TROPOMI satellite SO 2 products and an initial 36 hour NAME simulation. Using the introduced input parameters for our 25-day long NAME simulations, we obtain a good qualitative comparison of the simulated SO 2 cloud 100 with the TROPOMI satellite SO 2 products during the first 3 weeks after the eruption, which we present in section 3.1. A more detailed analysis of the model skill is presented in sections 3.2 and 3.3, where we show that the Fractional Skill Score (FSS) and the SAL-score (both metrics are introduced in section 2.4) are powerful tools for assessing the skill of the model simulations in comparison to satellite measurements. The NAME simulation skill in terms of the NH-mean SO 2 mass burden throughout the first 25 days after the Raikoke eruption are presented in section 3.4, showing a large dependence of the mass burden evolution 105 on the vertical emission profile. We will finish with a discussion of our work (section 4) and present the main conclusions in section 5.

TROPOspheric Monitoring Instrument
TROPOMI is part of the ESA's S5P satellite launched on 13 October 2017 (Veefkind et al., 2012) and is a polar-orbiting, sun-110 synchronous, hyperspectral spectrometer that measures Earth reflected radiances in the ultraviolet (UV), visible, near-infrared and shortwave infrared parts of the spectrum. Atmospheric SO 2 vertical column density (VCD, expressed in Dobson units with 1 DU = 2.69 ×10 16 molecules cm −2 ) is retrieved by applying Differential Optical Absorption Spectroscopy (DOAS) (Platt and Stutz, 2008) to the measured ultraviolet spectra in three wavelength ranges (312 nm -326 nm, 325 nm -335 nm, and 360 nm -390 nm). For a more detailed description of the SO 2 retrieval, we refer the reader to Theys et al. (2017) and 115 Theys (2018).
In our study we use the TROPOMI satellite retrievals across the northern hemisphere (north of 25 • N) that cover the Raikoke SO 2 cloud between 22 June and 15 July 2019. Figure 1 shows an example of the TROPOMI daily overpasses over the northern hemisphere. Compared to its predecessors OMI and SCIAMACHY, TROPOMI has a higher horizontal pixel resolution (up to 3.5 km × 5.5 km), allowing for a more detailed characterisation of the small-scale features in volcanic SO 2 clouds (Theys 120 et al., 2019). The retrieved TROPOMI SO 2 VCD product is calculated by accounting for a large number of parameters, such as meteorological cloud fraction, surface albedo and the vertical distribution of absorbing trace gases (e.g. ozone) (Theys et al., 2017;Theys, 2018). As a result, the SO 2 retrievals from TROPOMI are sensitive to many assumptions which can lead to uncertainties of up to ±50%. For SO 2 in the stratosphere, the sum of the various uncertainties can be approximated to be around of ±30 % of the retrieved SO 2 VCDs. For a detailed discussion of the retrieval uncertainties see Theys et al. (2017).

125
One of the largest uncertainties of the TROPOMI SO 2 VCD product is the height of the SO 2 cloud. However, the sensitivity of the measurement with height is well characterised. The most reliable method to compare satellite retrievals with any atmospheric dispersion model output is by applying the column Averaging Kernel (AK) operator to the model data, thereby matching the model SO 2 VCDs to the TROPOMI product. The AKs are obtained alongside the TROPOMI SO 2 VCD product (Theys, 2018). The TROPOMI VCD data presented in this study is calculated assuming that the SO 2 layer is at 15 km above 130 ground level (agl). We have applied the corresponding column AK operator to the NAME simulation output, thereby enabling a one-to-one comparison. We have repeated the analysis using the AKs for an SO 2 layer at 7 km agl in TROPOMI, which affects the absolute SO 2 VCDs (not shown), but not our interpretation of the results or our overall conclusions.
To obtain a daily SO 2 mass estimate from the TROPOMI measurements, we grid the satellite data and combine all the overpasses during a 24-hour period starting at 12 UTC of any given day. In the case of multiple overpasses over a single 135 location, we average the SO 2 cloud at these grid locations to avoid double counting. For the mass estimate from TROPOMI During the initial stage of the eruption, it is likely that TROPOMI underestimates the SO 2 VCDs due to the presence of volcanic ash (e.g., Yang et al., 2010). To understand if ash-inference is likely to have affected our SO 2 estimates, we also retrieve the absorbing aerosol index (AAI) from the TROPOMI instrument (Zweers, 2016). Although the TROPOMI AAI 145 product should be used with care (see e.g., de Graaf et al., 2016;Kooreman et al., 2020), high index values (> 1) can indicate the presence of aerosol plumes from dust outbreaks, volcanic ash, and biomass burning. During the first 48 hours after the eruptions we found high peak AAI values within the volcanic cloud (> 18.9 on the 22 June and > 3.5 on the 23 June), indicating that volcanic ash had an impact on the SO 2 retrieval during this period.

150
The second satellite SO 2 dataset used in our analysis is acquired using IASI onboard of the Metop-A and Metop-B satellites.
These satellites operate in tandem on a polar orbit with a field-of-view (FOV) consisting of four circular footprints of 12 km diameter (at nadir) inside a square of 50 km × 50 km and provide a global coverage twice a day. For our analysis we use the SO 2 plume height estimates based on the IASI data, which is produced by applying the retrieval algorithm presented in Carboni et al. (2012) and Carboni et al. (2016). The IASI instrument also retrieves the SO 2 VCDs within the volcanic plume, 155 but uses a different set of assumptions in the retrieval algorithm (e.g. plume height) compared to TROPOMI. To compare SO 2 VCDs from NAME to the IASI data, one would therefore also need to apply a different scaling (i.e. AK). As the TROPOMI and IASI retrieval assumptions are satellite-specific, a comparison between the two VCD products is not straightforward and is not attempted here. While it would be an interesting exercise to apply our analysis also to the IASI data, we focus on the comparison of NAME with the TROPOMI SO 2 estimates and therefore no further analysis is done for the IASI VCD retrievals. The Numerical Atmospheric-dispersion Modelling Environment (NAME) is developed by the Met Office (Jones et al., 2007) and is the operational dispersion model used by the London VAAC to forecast the dispersion of volcanic clouds within European Airspace ( There is no radiative nor chemical interaction between the ash particles and the sulfur species in NAME; the ash particles and sulfate aerosols are thus considered to be externally mixed. In this section we focus on the dispersion of SO 2 and highlight the important aspects 6 https://doi.org/10.5194/acp-2020-889 Preprint. Discussion started: 7 October 2020 c Author(s) 2020. CC BY 4.0 License.
of NAME for this part of the research. More details on the modelling of ash particles within the model are discussed in the accompanying Part 2 paper (Osborne et al., 2020). 2.3.1 Simulating volcanic cloud dispersion using NAME Simulating the dispersion of a volcanic cloud with NAME relies on the tracing of air parcels through the atmosphere, each containing an ash, SO 2 and/or SO 4 mass. These air parcels are released from the 'source' location (volcano), where the user has to define the eruption source parameters (see section 2.3.4). NAME is an offline model, therefore each parcel is advected by an externally obtained wind field (e.g. a high resolution Numerical Weather Prediction (NWP) model). In our simulations, 175 we use the wind fields from the latest Global analysis of the Met Office Unified Model (MetUM), which have a horizontal resolution of around 10 km at mid-latitudes, 59 levels between the surface and 30 km asl and a 3 hourly temporal resolution.
The path of each trajectory is calculated using the following equation: where x(t) is the location of the parcel at time t, x(t + ∆t) the new location of the parcel at time t + ∆t, u(x(t)) the 3D-180 wind vector at location x(t) and u (x(t)) represents a stochastic perturbation to the parcels trajectory representing turbulence and unresolved sub-grid mesoscale wind variations in the dispersion model. In NAME, u consists of two parts representing atmospheric turbulence (u turb ) and sub-grid mesoscale diffusion (u meso ). The turbulence part represents the stochastic motions from the air parcels due to small-scale perturbations. The mesoscale diffusion represents the mesoscale motions in the atmosphere that are not captured by the resolution of the used NWP model. Each NWP has a limited spatial and temporal 185 resolution and as a result, part of the mesoscale features (e.g. eddies) are not captured by the NWP wind field provided. Both the turbulence and mesoscale diffusion within the free atmosphere (excluding the Planetary Boundary Layer, which has a more detailed scheme ) are calculated using: where K x is a 3D-diffusion vector defined separately for both components, using typical velocity and time length-scales σ and τ for the described perturbations. r represents a random number from a top-hat distribution within the range [-1,1]. The values for σ and τ are dependent on the NWP model used, as it is impacted by the resolution of the model Webster et al., 2018). The values for σ and τ used in this study are obtained from the analysis done by Webster et al. (2018) 195 and are shown in table 1. Note that for. K meso the vertical component (σ 2 w τ w ) is zero. To investigate the importance of the mesoscale diffusion parameter, we conduct two sensitivity simulations with two different SO 2 emissions profiles (see section 2.3.4 for discussion of these profiles) and a reduced value for K meso (see table 2). These simulations are indicated by the subscript rd throughout this study. Model Global analysis (10 km horizontal resolution, 59 levels) with a 3 hourly temporal resolution. Values for σ 2 are given instead of σ to be consistent with the values presented by Webster et al. (2018). Values are given for both the turbulence Kturb and the mesoscale diffusion Kmeso that are used in NAME for the free atmosphere (i.e. excluding the boundary layer).
Global MetUM analysis (0.140625 • × 0.09375 • , 59 levels, 3 hourly resolution) In our simulations, the SO 2 concentrations from all the individual air parcels in NAME are presented as hourly-means on a regular latitude-longitude grid by calculating the total mass of the SO 2 of all parcels in each grid box every hour. The NAME output is calculated using a grid size of 0.2 • latitude and 0.4 • longitude (approximately 20 km × 20 km at the latitude of the Raikoke volcano). The vertical resolution of the output is 500 m up to 15 km agl and 1 km resolution up to 20 km agl, giving a total of 35 levels.

205
To compare the daily SO 2 mass estimates from NAME and TROPOMI, we select the hourly NAME output corresponding to each individual TROPOMI overpass time and select only the grid boxes in NAME that are in the domain scanned by TROPOMI during that overpass. To calculate the SO 2 VCD, we apply the corresponding column AK operator obtained from TROPOMI (see section 2.1) to each grid cell of the NAME output. After multiplying each grid cell value by the area of the grid cell and summing the resulting mass in each column we obtain the VCDs estimate from NAME.

210
In all our NAME simulations we found that the SO 2 cloud is more diffuse than observed by TROPOMI. Therefore, removing all VCDs <0.3 DU in NAME (which is the detection threshold used for TROPOMI, see section 2.1) from the simulations would result in a negative bias within the NAME simulation mass estimates that are not related to the evolution of the cloud, but due to the stronger diffusion within the model. Therefore we have not included a detection threshold when determining the SO 2 mass estimates for the NAME simulations. Similarly to the TROPOMI estimate, the daily mass estimates from NAME are 215 calculated during a 24-hour period starting at 12 UTC of any given day. The SO 2 mass burden is defined as the total SO 2 mass (Tg) within the NH, north of 25 • N.

Chemistry within NAME
NAME contains an atmospheric chemistry scheme (Redington et al., 2009). The relevant chemistry for volcanic clouds is related to the conversion of SO 2 into sulfate (SO 2− 4 ). NAME accounts for the oxidation of SO 2 in the gas phase using the 220 following reaction where HSO 3 is then rapidly oxidised to H 2 SO 4 on formation. When water is present in the atmosphere, the oxidation can happen in the aqueous phase by both H 2 O 2 and O 3 through the following reaction: which is followed by Reactions R2-R6 dominate in cloudy conditions and only occur in model grid boxes when both the cloud fraction and liquid water content are non-zero. The concentrations of H 2 O 2 and O 3 in the atmosphere are pre-defined in the NAME model by using monthly-mean background fields obtained from a historical UM-UKCA model (Unified Model coupled to the United Kingdom Chemistry and Aerosol model) simulation that have been smoothed between months using interpolation.
In NAME the SO 2 and sulfate aerosol particles can be removed through dry and wet deposition. For our simulations we 235 found that the dry deposition had limited importance for the 2019 Raikoke eruption as most of the volcanic clouds are at high altitudes. Wet deposition in NAME is calculated using a standard depletion equation: with C representing the SO 2 air concentration and Λ the scavenging coefficient, which is calculated based on the rainfall rate 240 r (in mm/hr) and two scavenging parameters A and B. The parameters A and B vary for different types of precipitation (i.e., large-scale/convective and rain/snow) and for different wet deposition processes (i.e., rainout, washout and the seeder-feeder process). For more detailed information, including the values for A and B, we refer to Leadbetter et al. (2015) and references therein.

245
When simulating a volcanic eruption, the NAME dispersion model needs eruption source parameters (ESP) consisting of 1) location, 2) timing, 3) mass flux and 4) vertical emission profile, and for simulating volcanic ash also 5) particle density, shape and particle size distribution. Here we will discuss ESP 1-4. For information about the setup of the simulations including ash, we refer to the Part 2 paper (Osborne et al., 2020). For all simulations described in this manuscript, we released a total of 10 million air parcels in NAME within a column above the volcano, each parcel representing an equal amount of SO 2 mass. All   Figure 2. Estimated total emitted SO2 mass for the Raikoke eruption between 21 June 1800 UTC and 22 June 0300 UTC 2019. The initial emission profile was provided by the VolRes (Volcano Response) team, which is implemented in NAME for the 1.5 Tg SO2 simulation (VolRes1.5, orange). We also simulated the same profile for a 2.0 Tg SO2 emission (VolRes2.0, red). Also included is a new emission profile estimate (StratProfile, brown) based on the TROPOMI VCD cloud on 23 June (see section 2.3.5). This profile has a similar total mass emitted to VolRes1.5, but a larger fraction (69% instead of 43%) of its mass is emitted in the stratosphere (see table 2 km agl. The total SO 2 mass emitted based on the VolRes estimate was approximately 1.5 ± 0.2 Tg of SO 2 , of which 43% was released into the lower stratosphere. To investigate the impact of the emission profile on the results, apart from the initial run (VolRes1.5), we also conduct a sensitivity simulation in which we released a total of 2 Tg of SO 2 , using the same profile shape (VolRes2.0). In addition, we derive a different vertical profile based on the TROPOMI SO 2 VCD cloud (StratProfile, see section 2.3.5). Different from the VolRes profiles (which are based on the IASI satellite overpasses on 22 June), our TROPOMI estimated profile is based on 265 the 23 June overpasses, as this reduced the interference of ash during the initial stages of the eruption (see section 2.1) and also showed relatively modest VCDs compared to previous overpasses of TROPOMI. The derived StratProfile profile releases similar amounts of SO 2 into the atmosphere as the VolRes1.5 run (1.57 Tg), but a much higher fraction (69%) of the mass is released into the lower stratosphere with the main peak in SO 2 mass at 12-13 km altitude.
To understand what fraction of the total mass was emitted into the stratosphere, we calculated the tropopause height in the

270
MetUM global analysis using the World Meteorological Organization (WMO) temperature lapse rate definition. Using the spread in the 150 nearest grid points to the volcano location in the model, we get an average tropopause height of 11.2 ± 0.7 km during the first 36 hours after the eruption. To verify this tropopause height, we used radiosonde data from the Kamchatskij Observatory (see fig. 1), which is the nearest radiosonde location to the Raikoke volcano (data can be retrieved from http://weather.uwyo.edu/upperair/sounding.html). Using the same tropopause criteria for the radiosondes released from this 275 location, we estimate an average tropopause altitude of 10.5 ± 0.7 km during the first 36 hours, showing that the MetUM simulated the tropopause height within the expected range.

Stratospheric vertical emission profile
The VolRes vertical emission profile introduced in the previous section is mainly derived from the IASI satellite overpasses  Figure 3d shows the averaged SO 2 VCDs between 48-52 • N along section I-II in fig. 3a for the clouds shown in panels a-c.
Especially along the northern part of the cloud as seen in fig. 3d, we see between 170-175 • E that the VolRes1.5 simulation is underestimating the VCDs from TROPOMI by up to a factor of 8. Figure 4 shows the vertical cross section from the VolRes1.5 simulation through the SO 2 cloud along section I-II in fig. 3a, together with the available estimated cloud heights from IASI for all pixels between 49-50 • N. The cloud height from the IASI 290 retrieval is estimated using the method described in Carboni et al. (2016). Figure 4 shows that the SO 2 cloud between 170-175 • E is simulated in NAME between 11-14 km, which coincides with the lower stratosphere in the MetUM Global model (see fig. 2). The altitude of the SO 2 cloud for the NAME VolRes1.5 simulation along the cross section shown in fig. 4 is within  displacing errors due to timing. The black dots represent the available height estimates including error bars from the IASI retrieval of the SO2 cloud for all pixels between 49-50 • N and is estimated following the method described in Carboni et al. (2016). The blue up-down arrow on the right of the figure indicates the range of cruise altitudes for heavy long-haul aircraft (11.9-13.7 km). the uncertainty range of the IASI height estimates and gives us confidence that NAME simulated cloud heights are realistic. in the simulated heights, but rather due to an underestimation of the fraction of SO 2 released into the stratosphere.
To investigate the sensitivity of the results to the SO 2 mass fraction that is emitted into the stratosphere, we derive a new emission profile based on the TROPOMI vertical column density on 23 June in combination with the 36-hour NAME VolRes1.5 simulation. Using fig. 3d, we scale the VolRes1.5 VCD estimates at each longitude to represent the TROPOMI VCD values along cross section I-II and using fig. 4 we simultaneously derive the corresponding mass averaged cloud height for the  fig. 3c and fig. 3d and, as expected, show a better comparison with the TROPOMI satellite SO 2 VCD retrievals at this particular time. A more detailed vertical emission profile could be constructed using sophisticated inverse modelling techniques (e.g., Eckhardt et al., 2008;Kristiansen et al., 2010;Moxnes et al., 2014), but this is beyond the scope of this paper and is not attempted here.

310
Assessing the model's skill in representing satellite measurements of SO 2 requires appropriate metrics. Similar comparisons should be possible in almost near-real time for VAACs when investigating future eruptions. Therefore, apart from being able to show the details of the model-satellite comparison, it is also important that the metric is easily interpretable by end-users. In the following sub-sections we introduce two metrics for identifying the skill of the simulations: (1) the Fractional Skill Score (FFS), and (2) the SAL-score. The Fractional Skill Score was originally developed to determine the skill of weather forecast models to represent radar rainfall observations (Roberts, 2008;Roberts and Lean, 2008;Mittermaier et al., 2013), but has been since used to also describe the skill of dispersion models in representing volcanic clouds Harvey and Dacre, 2016). For volcanic SO 2 clouds, the Fractional Skill Score is calculated using the ratio between the model-simulated (M k ) and observed (O k ) fractional 320 coverage of the SO 2 cloud at each location (neighbourhood) in the domain investigated. When considering a neighbourhood of N grid points (or pixels), the Fractional Skill Score is calculated using: 13 https://doi.org/10.5194/acp-2020-889 Preprint. Discussion started: 7 October 2020 c Author(s) 2020. CC BY 4.0 License.
The FSS is calculated from the Fractions Brier Score (FBS), which is a variation on the Brier Score (Brier, 1950), and FBS ref is the largest FBS score one can obtain from multiple non-zero fractions within the domain when there is no overlap between the two fields. In the case that observations and simulation are perfectly aligned, FSS is equal to one. In the case of a total mismatch FSS equals to zero. In general for the FSS, a model-simulation is considered to have skill when FSS > 0.5 (see e.g., Harvey and Dacre, 2016).

330
The FSS metric is very suitable for studying the skill of a model in capturing the volcanic cloud's spatial extent. One advantage of using the FSS metric is that it relaxes the requirement for exact matching of the spatial features in the simulations with the observations. Instead when the fractional coverage of the SO 2 cloud within a studied region (i.e. a neighbourhood of size N) is the same for the observations and the simulation, this metric counts it as a correct forecast. By using different sizes of neighbourhoods, one can also determine at which spatial resolution the simulation is skilful (i.e. for which N is FSS > 335 0.5) at any given time, which helps to determine at which spatial scale features of the SO 2 cloud can be considered realistic.
Note however that the method does not consider mass concentrations -it only considers a 'hit' or 'miss' for each location.
By applying the same FSS metric to increasing mass concentration thresholds (i.e. sub-regions of the cloud), one can obtain information about model skill at simulating volcanic SO 2 cloud structures with varying mass densities.

340
The SAL-score is a metric that is composed of three components, which describe the Structure (S), Amplitude (A) and Location (L) of an investigated feature within a specified domain. The metric was originally developed to compare the structure of modelsimulated precipitation fields with observations (Wernli et al., 2008), but has since been adapted to also describe other fields, including volcanic clouds (e.g., Dacre, 2011;Wilkins et al., 2016;Radanovics et al., 2018). Here we will adopt this metric to describe the evolution of the SO 2 cloud. For a detailed description of the equations used to calculate each individual component 345 we refer the reader to Wilkins et al. (2016).
Briefly, to calculate the S and L scores (not needed for the A score), we have to identify all the individual simulated and observed SO 2 clouds. In our analysis each cloud is identified as a group of adjacent grid cells which have a VCD value above a certain threshold. From Theys et al. (2019) we deduce that the detection limit of the satellite measurements for individual pixels is approximately 1 DU. All of the analysis in our study is done at the highest resolution that is available for all fields, which is 350 the NAME model output (0.2 • latitude and 0.4 • longitude). Due to the higher spatial resolution of the satellite product, we have to average the TROPOMI output of multiple pixels within each NAME grid box (on average 9 TROPOMI pixels per NAME output grid box at each given time step) to get both datasets on the same output grid. As a result, we have used a lower detection threshold of 0.3 DU when identifying all grid points that are part of a SO 2 cloud for the (re-gridded) TROPOMI retrievals and the NAME simulations. To remove additional spurious data from the TROPOMI satellite product, we also include a minimum 355 size of each identified SO 2 cloud to be 100 km 2 (approximately the NAME grid box size at 50 • N) before considering in our analysis. Simulated and observed SO 2 VCD values below either of these thresholds are excluded from all S, A, and L calculations.
To interpret the SAL-score, let us first assume a single idealised 2D-gaussian shaped cloud for both the simulated and observed SO 2 VCDs. Looking at the schematic cross-section presented in fig. 5, three characteristics are represented by the S,

360
A and L scores. The S score compares the shapes of each individual SO 2 clouds in terms of the horizontal extent (width) and maximum concentrations within the cloud, by comparing the normalised shape of the clouds (i.e. total mass of the simulated and the observed clouds are made equal, see fig. 5a). A negative S score indicates that the simulated SO 2 clouds are too narrow or have too high peak VCD values when compared to the observed cloud (leptokurtic). When the simulated SO 2 clouds are too wide spread or have too low peak VCD values, this is indicated by a positive S score (platykurtic).

365
The A score represents the comparison between the simulated and the observed total mass of SO 2 within the entire studied domain and is independent on the number of individual SO 2 clouds. Negative A scores represents an underestimate of the total SO 2 mass in the simulation when compared to the observations ( fig. 5b), while a positive value shows that the simulation is overestimating the total SO 2 mass in the domain.
Finally the L-score represents the distribution of the individual simulated and observed SO 2 clouds within the domain and 370 consists of two parts L 1 and L 2 (Wernli et al., 2008). L 1 represents the normalised distance between the domain-averaged centre of mass of all the simulated and observed SO 2 clouds, where a higher positive value represents a larger distance between the simulated and observed domain-averaged centres of mass (see fig. 5c). In the case of multiple SO 2 clouds, L 2 represents the differences in the distribution of individual clouds around the domain-averaged total centre of mass. L 2 is calculated by considering the distance between the centre of mass of each individual cloud and the total domain-averaged centre of mass.

375
In the case of a single object, L 2 is equal to 0, as the centre of mass in the domain is the same as the centre of mass of the individual object.
When the simulation and observations compare perfectly, the score of each of the components is 0. For the S and A score the values are all between ±2, where a value of -1 represents a factor of 3 underestimate of the simulation compared to the observations and +1 represent a factor of 3 overestimate of the simulation. For the L 1 and L 2 scores, values are between [0,1], 380 with the worst possible score being 1 representing a distance equal to the maximum distance within the domain. Similar to Wernli et al. (2008), we will present the 3 components of this metric in a SAL-diagram ( fig. 5d), where the horizontal axis represent the S score, the vertical axis the A score and the colour of each point represents the L score (sum of L 1 and L 2 ). The simulation and observations compare best when all the points are near the origin and have the dark purple colour.

385
First we qualitatively discuss the spatial pattern of the SO 2 cloud and its dispersion across the northern hemisphere during the first week after the eruption. We then discuss the FSS and the SAL-scores for the SO 2 cloud, followed by a discussion of the SO 2 mass burden evolution during the first 25 days after the eruption. A video of the volcanic SO 2 and SO 4 VCDs as simulated by NAME for the VolRes1.5 and StratProfile emission profiles can be found in the video supplements (de Leeuw, 2020). the low-pressure system, the volcanic cloud within the troposphere (below 11 km) moved predominantly in a south-eastward direction along the south-flank of the cyclone until it started to wrap around the centre on 23 June. For the cloud layers at higher altitudes within the stratosphere, the main wind direction was more zonal, resulting in the observed split as is visible in fig. 3. The VolRes1.5 and the StratProfile simulations show the same spatial pattern, but have different SO 2 VCDs within the cloud (see e.g. fig. 3d). As expected, the StratProfile ( fig. 3c) simulation shows a better agreement of the TROPOMI SO 2 VCD 400 values within the cloud (see also section 2.3.5).
By 25 June 2019, a large part the cloud moves in a north-western direction, spreading over the Asian continent as seen in fig.   6a and fig. 6b for both the VolRes1.5 simulation and TROPOMI. Due to the variation in emission heights between the VolRes1.5 and the StratProfile simulations ( fig. 2), we can identify the parts of the cloud in the NAME simulations that are mainly within the troposphere and the stratosphere by comparing their differences. The results are shown in fig. 6c, which shows that the 405 north-western part of the cloud is mainly within the troposphere, while the stratospheric parts of the cloud remain centred around the low-pressure system. Calculating the difference between the VolRes1.5 simulation and the TROPOMI retrievals in fig. 6d, we find that the pattern is very similar to fig. 6c. This shows that the VolRes1.5 simulation mainly overestimates the SO 2 mass of the cloud in the troposphere and underestimates the stratospheric part of the cloud.
On 27 June 2019, the SO 2 cloud starts to spread also at higher altitudes, leading to a complex spatial pattern as shown in 410 fig. 7. While the large-scale structure of the cloud on 27 June has become much more complex, both the VolRes1.5 and the StratProfile simulations capture the general VCD structure of the retrieved TROPOMI cloud well. Note that the small-scale eddies observed by TROPOMI in the centre of the cloud are not simulated by NAME as a result of the limited (spatial and temporal) resolution of the NWP input. Therefore, the small-scale variability cannot be captured by the model, but instead are parameterised by the diffusion parameters as a random perturbation on the wind field (see section 2.3.1). This results in the 415 spreading of the SO 2 cloud with a smoother pattern in the NAME simulations without the high peak values. This also explains the patchy variations shown in figs. 6b and 6c within the centre of the cloud. Averaging the VCDs over the whole domain shown in fig. 7 (thereby removing the small-scale features from TROPOMI), the average VCD values for the VolRes1.5 simulation are 20% lower than measured by TROPOMI. This is also evident from the dominant blue colours in fig. 7b. For the StratProfile simulation ( fig. 7c) the domain-average mass is within 0.01 Tg of SO 2 of the TROPOMI SO 2 mass estimate (i.e. StratProfile 420 SO 2 mass estimate is <1% lower than TROPOMI).
Finally, we also find a larger spread of the cloud in both the VolRes1.5 and StratProfile simulations as seen in figs. 7b and 7c by the red values outside the 1 DU TROPOMI contour. We only included the values > 1 DU in this plot for clarity of the figure. When including lower values (0.3-1 DU), the overestimation of the spread of the cloud in NAME is even larger (not shown here). The VolRes1.5 simulation is able to capture the overall outline of the cloud well for this period, but is struggling to simulate the peak VCD values within the retrieved TROPOMI SO 2 cloud. Focussing on the VCD > 1 DU points in fig. 8a, the simulation has skill for up to 12.5 days after the eruption start. This shows that the simulation is capturing the overall dispersion of the 435 cloud well, as it is able to distinguish well between areas with and without any SO 2 VCDs across the northern hemisphere.
For SO 2 VCDs greater than 30 DU, which correspond to small-scale features within the volcanic cloud, the simulation has no significant skill within 2.5 days after the start of the eruption. This agrees with the fact that the VolRes1.5 simulation was not able to capture the peak values on the 25 June 2019 observed by TROPOMI as shown in fig. 6.
The FSS values for the StratProfile simulation ( fig. 8b) reveal that this simulation performs better than VolRes1.5 and has 440 skill on a longer timescale for all of the SO 2 VCDs. For the lower VCDs (<1 DU), the StratProfile simulation remains skilful 2 days longer than the VolRes1.5 simulation (12.5 days versus 14.5 days). For the VCDs above 30 DU, the FSS skill timescale The timescales for which the NAME simulations show skill (compared to the TROPOMI retrievals) in terms of FSS are 445 shown in table 3 and fig. 9. Independent of the neighbourhood size, the StratProfile simulation has the highest skill for all VCDs. Figure 9 shows that the StratProfile simulation is skilful on timescales twice as long for VCD values above 10 DU compared to the VolRes1.5 and VolRes2.0 simulations. Interestingly the change in neighbourhood size (i.e. averaging region) has only a limited impact on the skill timescales for low VCDs (below 5 DU). This shows that all of the simulations are able to capture the horizontal extent of the SO 2 cloud well on spatial scales similar to our smallest output grid used (0.2 • × 0.4 • ) 450 and on timescales of 2-3 weeks after the start of the eruption.
The reduction in FSS scores for high VCDs are influenced by two factors: 1) does the simulation capture high VCDs? and if so 2) is the location of the high VCD features in the SO 2 cloud (see e.g. fig. 7c) correct? Due to the dispersion of the SO 2 cloud with time, we expect a decrease in the FSS values in time for the higher VCDs as these concentrations are not present anymore in either the TROPOMI retrievals nor the NAME simulations (resulting in FSS=0). For increasing VCDs, the skill 455 timescales are therefore expected to reduce. However, compared to the VolRes1.5 and VolRes2.0 simulations, the StratProfile simulations contains higher SO 2 VCDs throughout the simulation period, resulting in relative higher FSS values.
For high VCDs, the FSS metric depends more on the used neighbourhood sizes as the corresponding SO 2 cloud features get smaller. Using a larger neighbourhood size compares the presence of small-scale features over a larger region, reducing the impact of any misplacement and result in a higher FSS (see section 2.4.1). The VCD values at which model skill increases 460 for a different neighbourhood sizes is therefore linked to a displacement error. In fig. 9, a doubling in skill timescales is found for the larger neighbourhood size (hashed versus non-hashed) at VCDs >10 DU for the VolRes1.5 and VolRes2.0 simulations, while for the StratProfile similar differences are evident for VCDs above 30 DU. This shows that the VolRes1.5 and VolRes2.0 Table 3. The FSS-based timescales estimates for which the NAME simulations have skill compared to TROPOMI at the given spatial resolution N. The given values represent the number of days after start of eruption (rounded by 0.5 days) where the FSS comparison between the NAME simulations and the TROPOMI retrievals drops below 0.5 for various neighbourhood N sizes (value in brackets shows the corresponding model resolution for which the values would be valid), maximum included vertical column density contours and NAME simulations. When calculating the daily FSS value, we only include overpasses where the satellite retrieved total mass was above 0.2 Tg to remove noise. When the model still has skill FSS > 0.6 on the last day where we could determine FSS (17 days  DU are still present up to 4 days. However, these features are slightly displaced, as evident from the increase in the FSS skill timescales from increasing the neighbourhood size from 0.2 • to 1 • . On timescales longer than 5 days, all the NAME simulations show a strong diffusion in the SO 2 cloud (related to the diffusion 470 parameterisations). As a result none are capturing the high VCDs retrieved by TROPOMI, which reduces the FSS quickly to 0. This shows that high VCDs within the SO 2 cloud, which are related to the small-scale eddies, are only skilfully simulated on timescales less than 5 days. VolRes2.0 simulation, fig. 10b shows a more positive A score during the first 5 days than the VolRes1.5 simulation as the former overestimates the total SO 2 mass retrieved by TROPOMI. After 26 June, the A score for VolRes2.0 remains close to the A=0 line in fig. 10b, showing that the total SO 2 mass compares better with TROPOMI for the VolRes2.0 simulation than the VolRes1.5 simulation after 5 days. For the StratProfile simulation (panel 10c), the comparison with TROPOMI is even better throughout the entire simulation, as is evident by the low A score as well as the low L score. The comparison between the 495 StratProfile simulation and TROPOMI for each individual TROPOMI overpass (i.e. each dot in fig. 10c) is close to the A=0 line in the diagram, showing that NAME is able to capture the total SO 2 mass very well. Also a lower L score (darker colour of the squares and circles) indicates that the model is capturing the location of the cloud even more accurately than both the VolRes simulations. These results are consistent with the results shown in figs. 6-9.

The SAL-score
Reducing the diffusion parameter by 75% (K meso , see sections 2.3.1) in the StratProfile rd simulation (panel 10d), reveals 500 a relative decrease in the S score during the first week and no change in the A and L scores when compared to fig. 10c. This behaviour is expected, as a decreased diffusion will not alter the total mass (A score) nor the centre of mass of the individual SO 2 clouds (L score), but it will result in a more concentrated SO 2 clouds (reduction of S). As a result, the StratProfile rd simulation shows the best comparison with the TROPOMI retrievals for the first 5 days of the eruption. After that the S score quickly increases for all the simulations independent of the diffusion parameterisation, as diffusion related to uncertainties 505 from the (fine-scale) meteorological conditions start to dominate the signal. Figure 11a shows the SO 2 mass burden evolution calculated from the TROPOMI satellite retrievals and three NAME simulations (VolRes1.5, VolRes2.0 and the StratProfile) for the 25 days between the start of the eruption and 15 July 2019. The best comparison is obtained using the StratProfile simulation, which captures both the peak value and the long-term evolution 510 remarkably well and falls well within the uncertainty range of the TROPOMI estimate. To obtain the mass burden, we have excluded the VCDs of TROPOMI below 0.3 DU to reduce noise but included all mass for NAME simulations as discussed in Figure 11. The daily evolution of a) the total SO2 mass (Tg of SO2) and b) the total SO4 mass (Tg) for the 2019 Raikoke eruption for different ESP in NAME. We have included the TROPOMI SO2 mass estimate (blue dashed line) as well as the evolution of 3 NAME runs:

Sulfur dioxide mass burden
VolRes1.5, VolRes2.0 and StratProfile (see fig. 2). The dotted lines in the figures show the corresponding daily deposition of SO2 and SO4 (Tg) from the simulations. The peak values in the NAME SO2 mass distribution are slightly higher than the mentioned total emission values in table 2, which is the result from applying the 15 km AK to the dispersion model data. The total SO2 mass burden for TROPOMI is calculated using all locations where the vertical column densities are above 0.3 DU, while for the NAME simulations we include all mass.
The blue shading represents the standard error estimate for the TROPOMI product. The grey bars show the TROPOMI estimated 0.1 × max(AAI) value inside the volcanic cloud for the first 5 days after the eruption. The high AAI values during the first 48 hours indicate high concentrations of ash, thereby affecting the TROPOMI SO2 retrievals during this period.
the methods (see sections 2.1 and 2.3.2). It is likely that TROPOMI underestimates the SO 2 VCDs and thus SO 2 mass during the initial phase of the eruption due to the presence of volcanic ash, which is supported by large values of AAI obtained from TROPOMI during the first 2 days after the eruption (see grey bars fig. 11a and also section 2.1).

515
Consistent with figs. 3, 6 and 7, the VolRes1.5 simulation captures the peak in total SO 2 mass within the uncertainty of the TROPOMI estimate, but is underestimating the TROPOMI SO 2 mass between 27 June 2019 and 15 July 2019 on average by 0.3 Tg. The calculated e-folding times for SO 2 also highlight that the VolRes1.5 simulation loses SO 2 mass at a much faster rate than that calculated based on TROPOMI during the first 6 days after the eruption (e-folding time of ≈9 days versus ≈21 days in TROPOMI between 23-27 June 2019). As a result, this leads to a maximum underestimation of 25% (0.33 Tg) of the VolRes1.5 simulation compared to TROPOMI on 28 June. From 27 June the loss rate for both TROPOMI and VolRes1.5 are similar, with an e-folding time of ≈14-15 days, which is at the high-end of the range reported by others for eruptions of similar magnitude (e.g., Höpfner et al., 2015;Carn et al., 2016).
After 27 June 2019, the total SO 2 mass burden evolution from TROPOMI is best captured by the VolRes2.0 and the Strat-Profile simulations, related to a larger amount of mass emitted into the stratosphere. From the total SO 2 mass emitted into the 525 stratosphere calculated in table 2, we see that both the VolRes2.0 and the StratProfile respectively emit 0.2 and 0.45 Tg of SO 2 mass more into the stratosphere than the VolRes1.5 simulation. For the VolRes2.0 simulation the overall evolution of the SO 2 mass profile is similar to that obtained from the VolRes1.5 simulation (e-folding time of ≈9 day during the first week, e-folding time of 14-15 days afterwards). However, due to the increased total SO 2 emissions (0.5 Tg more than VolRes1.5), the SO 2 mass evolution of VolRes2.0 also overestimates the TROPOMI peak mass by more than 0.5 Tg on 23 June. As expected, given 530 that TROPOMI is used as our baseline metric for initialising our StratProfile simulations, the best overall comparison with TROPOMI is obtained for the StratProfile simulation.
A possible cause for the strong reduction in total SO 2 mass during the first week for the VolRes1.5 and VolRes2.0 simulations might be too strong a conversion of SO 2 into sulfate aerosols during the start of the simulation by the processes presented in section 2.3.2. To test this hypothesis, we also calculated the mass evolution of SO 4 in NAME, which are shown in fig. 11b.

535
From this we can conclude that the chemical conversion into SO 4 is realistic within the NAME simulations. The daily rate of production of SO 4 is less than 0.03 Tg/day, which is a factor of 3 lower than the average daily decrease in SO 2 mass in the VolRes1.5 and VolRes2.0 simulations during the first week (0.1 Tg/day).
The daily total SO 2 mass deposition from the NAME simulations is shown by the dotted lines in fig. 11a. For the 23 and 24 June, the total daily wet deposition dominates the removal of SO 2 from the atmosphere, as it is responsible for 89-90% of the 540 SO 2 mass reduction for the VolRes1.5, the VolRes2.0 and the StratProfile simulations. Atmospheric conditions during the first week of the eruption can explain this relatively large contribution from wet deposition. During the first week of the eruption, the SO 2 cloud is moving within a region of moist air in the warm conveyor belt on the southern edge of the cyclone (see fig. 3).
This favours the chemical conversion of SO 2 into SO 4 through aqueous-phase chemistry and also the removal of SO 2 through wet deposition, resulting in the peak deposition values we show in fig. 11a. The cyclone is mainly a tropospheric phenomenon 545 and as a result wet deposition occurs mostly in the tropospheric part of the SO 2 cloud. As the VolRes2.0 simulation emits the largest amount of SO 2 into the troposphere (see fig. 2), this also explains the highest removal rate (red dotted line fig. 11a peaks at 12% of the NH-mean daily SO 2 mass burden on 23 June) and also the highest conversion rate during the first week of the simulation (evident from the largest SO 4 mass burden in fig. 11b in this period). The wet deposition is lowest for the StratProfile simulation during 23 and 24 June (peaks at 4-5% of the NH-mean daily SO 2 mass burden on 23 June), as less mass 550 is emitted into the troposphere for this profile.
The results from fig. 11 show that there is a high sensitivity of the mass burden evolution in NAME to the vertical emission profile used for this particular eruption, which straddled the tropopause. Due to different atmospheric conditions within the troposphere and stratosphere, the SO 2 mass burden evolution is different within the two layers, resulting in significant differences in the total SO 2 mass burden evolution in our simulations. The average e-folding time of SO 2 is ≈ 10 days in the 555 upper troposphere (e.g., Krotkov et al., 2010;Carn et al., 2016), consistent with the e-folding time simulated during the first days of the VolRes1.5 and VolRes2.0 simulations. After 10 days a large fraction of the tropospheric SO 2 mass is removed from the atmosphere through wet deposition or converted into SO 4 and the remaining signal in fig. 11 is dominated by the stratospheric layers within the cloud. The part of the SO 2 cloud within the stratosphere is much less affected by the cyclone and the environment contains much less moisture. Therefore SO 2 is removed at a much lower rate (SO 2 deposition is <1%) 560 and is mainly converted through the gas-phase reaction with OH, resulting in the longer e-folding time of approximately 14-15 days. The similarity in e-folding time obtained from TROPOMI and all the simulations between 27 June and 15 July suggests that the chemistry scheme in NAME is realistic.
Overall, the total SO 2 mass burden obtained using the StratProfile emission profile compares best with TROPOMI. Based on this comparison, we estimate that the 2019 eruption of Raikoke emitted approximately 0.9-1.1 Tg of SO 2 into the lower 565 stratosphere (11-18 km). With a maximum SO 2 mass burden of 1.5-1.6 Tg in the atmosphere, it follows that approximately 0.4-0.7 Tg was emitted into the upper troposphere (8-11 km).

Discussion
Our study shows that the NAME simulations compare very well with the TROPOMI satellite retrievals of the SO 2 cloud during the first three weeks after the 2019 Raikoke eruption. Despite the increasing complexity of the SO 2 cloud's horizontal structure While the NAME model was not developed specifically to simulate stratospheric volcanic SO 2 clouds, our results show that the model is suitable to be used by VAACs to issue forecasts on the evolution of volcanic SO 2 clouds in the upper 580 troposphere/lower stratosphere. Currently after a volcanic eruption, VAACs provide information on the areas in the atmosphere where volcanic ash is forecasted up to 18 hours into the future. In the case of the Raikoke 2019 eruption, we have shown that a similar approach to produce a forecast for the presence of a SO 2 cloud would have been accurate on this and even longer timescales. However, future SO 2 cloud forecasts will more likely be based on designated SO 2 concentration thresholds. From our simulations we found that NAME is able to capture the horizontal extent of the 1 DU VCD contour on a spatial resolution 585 of 0.2 • × 0.4 • during the first 17 days after the eruption. Assuming that the obtained VCD values are from a cloud at 12 km altitude with a thickness of 2 km (e.g. estimated from fig. 4), 1 DU would correspond to an average SO 2 concentration of 0.02 ppm within this cloud. As reference, based on sulfur dioxide Acute Exposure Guideline Levels (AEGL) (National Research Council Committee, 2010), an extended exposure (> 10 minutes) to SO 2 concentrations of 0.2 ppm (lowest AEGL) can lead to some respiratory irritation, while concentrations above 0.75 ppm can lead to long-lasting adverse health effects. For our 590 example, the lowest AEGL threshold would therefore correspond to VCDs above 10 DU. We find that NAME is capable of capturing the spatial distribution of the features within the SO 2 cloud where the VCDs are larger than 10 DU on the order of 7-10 days (see fig. 9).
Our work highlights that accurate information on ESPs are key when comparing model simulations to satellite retrievals.
Our analysis of the 2019 Raikoke eruption shows that the skill of the NAME simulations is strongly dependent on the emission 595 height of the volcanic SO 2 , with best results obtained when approximately two-thirds of the total SO 2 mass is emitted into the stratosphere. Interestingly while the fractional split of the mass between the stratosphere and troposphere is important, the detailed vertical distribution of SO 2 within each of these two layers has only a minor effect on model performance. To test the sensitivity of the model results to the details of the vertical distribution of the emissions within the troposphere and stratosphere, we performed a simulation where we emitted 1 Tg evenly between 11-14 km and 0.5 Tg evenly between 9-11 600 km (not shown). The general conclusions were the same as shown for the StratProfile, illustrating that, for the 2019 Raikoke eruption, it is key to establish the fraction of the SO 2 mass that was emitted into the stratosphere.
The vertical SO 2 emission profile is also the main driver of the differences found between the NAME simulations using either the StratProfile or the VolRes1.5 vertical emission profile. For the VolRes1.5 simulations, only 43% of the total SO 2 mass is emitted above the tropopause as defined in the MetUM NWP, while in the StratProfile 69% of total SO 2 mass is 605 emitted into the stratosphere. What follows is that an error in the definition of the tropopause height within the MetUM (or any other NWP) model can have a large influence on the skill of the dispersion simulations. If, for example, the MetUM modelestimated tropopause height is 2 km above the actual observed tropopause height, this would lead to a wrong placement of the majority of the SO 2 mass in the troposphere for the VolRes1.5 simulation. Using a different NWP model with a lower tropopause height in the NAME simulations would result in a higher fraction of the SO 2 mass of the VolRes1.5 profile being 610 emitted into the stratosphere and potentially give a better comparison with TROPOMI than the StratProfile estimate (which would overestimate the stratospheric SO 2 mass in that case). However, we found the average tropopause height of 11.2 ± 0.7 km diagnosed in the model at the eruption site is in good agreement with the observed tropopause height of 10.5±0.7 km from a nearby radiosonde location (see fig. 2), which gives us confidence that our StratProfile emission estimates for the stratosphere and troposphere are suitable and that the NAME results are not strongly biased by a wrong tropopause height within the NWP 615 fields.
While we have investigated the impact of changing the vertical SO 2 emission profile (see fig. 2), we have not investigated the effect of any uncertainty related to the atmospheric conditions in the NWP wind fields used as input for the NAME simulation.
A study by Dacre and Harvey (2018) shows that the impact of the atmospheric conditions on the NAME simulations can be large, especially in conditions of large horizontal flow separation in the atmosphere. The specific atmospheric conditions for 620 this particular eruption (i.e. the low-pressure system east of the eruption site) shows isobars that are parallel to each other during the first days after the eruption in the region of the cloud (e.g. fig. 3). Therefore, it is expected that the impact of flow separation is limited during the initial stages of the cloud evolution. After several days, the flow separation becomes more pronounced in various regions in the domain (see e.g. fig. 6). This effect is also reflected in the decrease in the S score value of the SAL diagram shown in fig. 10, as trajectories start diverging in these regions of enhanced flow separation enhancing 625 diffusion parameterisation in the model, which smooths the signal (i.e. random perturbation) of the small-scale eddies within the cloud that are not present in the input wind field (see sections 2.3.1). As uncertainties from the (fine-scale) meteorological conditions used as input for the simulations gradually accumulate, this leads to a larger spread in the SO 2 cloud over time than what is observed in reality. This gives the tendency for the model to be more diffuse on longer timescales, as revealed by the S score increase in figs. 10a-c. 635 We find that the key reason for the increasing S score values on shorter timescales (i.e. 1-5 days) is related to the diffusion parameter K meso used for the model simulations. NAME v8.0 uses a single value for the diffusion coefficient within the free atmosphere (see table 1). The diffusion parameter values currently used in NAME have been determined using observational datasets near the surface . It is therefore possible that the mesoscale diffusion values used in the model might be unsuitable for the higher levels in the atmosphere, especially in the stratosphere and thereby cause too much diffusion 640 from the start of the eruption. To test this hypothesis, we investigated several simulations with a smaller diffusion coefficient (see table 2), where we reduce the mesoscale diffusion value K meso by 75%. The resulting values for the FSS-score and SALdiagram for the StratProfile rd are shown in table 3 and fig. 10d and indicate that the simulations are better able to capture the structure (i.e. peak values and horizontal extent) of the cloud during the first 5 days of the simulations. It is however currently impossible to determine a more precise value for the diffusion parameters, due to the lack of case studies and limited available 645 observations for these high altitudes.
A previous study by Harvey et al. (2018) used a multi-level emulation approach to better understand the influence of model parameters on the accuracy of NAME output for volcanic ash concentration during the 2010 Eyjafjallajökull eruption. Their study showed a limited impact of the mesoscale diffusion parameter K meso on their simulations for the 2010 Eyjafjallajökull eruption. This limited effect of K meso is partly explained by the lower resolution (40 × 40 km) of their model output.

650
This leads to an averaging of small-scale eddies and a reduced impact of the diffusion parameterisation, something we also observed in our study when using a larger neighbourhood size N in our calculations of the FSS score (see fig. 10 and table 3).
Furthermore, the 2010 Eyjafjallajökull emissons were at lower altitude than for the Raikoke eruption, so a larger K meso might be more appropriate. However, the reduced K meso values in our StratProfile rd simulations are within the range of realistic values for the free atmosphere (see table 1 in Harvey et al. (2018)) and so this motivates more research to investigate the 655 potential improvement of the model by having a more detailed representation of the mesoscale diffusion in the model at higher altitudes. Part of ongoing work is to investigate the impact of a new space and time-varying free-atmospheric turbulence scheme that is included in the latest version of NAME (Dacre et al., 2015), which was not available for the simulations presented in this paper.
been observed by the high-spatial resolution TROPOMI instrument. Nonetheless, our work shows the large potential of using TROPOMI SO 2 retrievals (in combination with the correct AKs) to identify and rectify issues in dispersion modelling efforts of volcanic eruptions. Comparing high-resolution satellite measurements and dispersion model simulations is a valuable exercise that can help improve both the volcanic dispersion modelling tools and the satellite retrievals of volcanic plumes. By combining the information obtained from NAME, TROPOMI and IASI for Raikoke 2019, we are able to give a more detailed picture of 665 the eruption source parameters and the dispersion of the volcanic SO 2 cloud than ever before. Even though we only considered this one case study, it becomes clear that improvements in ADMs simulating volcanic eruptions can be expected when more volcanic eruptions are investigated using this or similar frameworks.
While this study has focussed almost entirely on representing the evolution of the gas-phase sulfur in the form of SO 2 , it is acknowledged that this is only part of the story. The Part 2 companion paper (Osborne et al., 2020) focusses on assessing 670 the fidelity of the NAME model in representing the resulting sulfate aerosol together with volcanic ash and any confounding effects from biomass burning aerosol that were emitted into the stratosphere from an unusually strong pyrocumulus event in continental north America.

Conclusions
Volcanic eruptions can pose a large threat to society and in particular the aviation industry. In this study we simulated the 2019

675
Raikoke eruption using the Met Office's Numerical Atmospheric-dispersion Modelling Environment (NAME). The 21-22 June 2019 Raikoke eruption emitted 1.5 ± 0.2 Tg of SO 2 into the upper troposphere/lower stratosphere. We evaluated the skills and limitations of NAME to simulate the dispersion of the resulting volcanic SO 2 cloud by comparing our simulations to high-resolution satellite measurements from the Tropospheric Monitoring Instrument (TROPOMI). A detailed assessment of the sulfate aerosol together with volcanic ash from this eruption can be found in the Part 2 companion paper (Osborne et al.,680 2020). Based on our analysis we conclude that: -NAME accurately simulates the observed location and horizontal extent of the SO 2 cloud during the first 2-3 weeks after the eruption (figs. 6 and 7). NAME performs less well when SO 2 vertical column densities (VCD) exceed 20 DU ( fig. 9), which are predominantly observed as small-scale features within the SO 2 cloud during the early phases of the eruption. Based on the Fractional Skill Score (FFS), we find that our simulations have skill between 12-17 days when 685 considering the 1 DU VCD contour (table 3 and fig. 10). For VCDs larger than 20 DU, predominantly observed as small-scale features within the SO 2 cloud, the model has skill on the order of 2-4 days only.
-Based on both the FFS score and SAL-score ( fig. 10), we find that the model-simulated SO 2 cloud in NAME is more diffuse than in the TROPOMI measurements, in particular for VCDs exceeding 20 DU. The diffusion parameterisation in NAME, which is developed for the lower free troposphere, results in too much horizontal spread in the lower stratosphere too large in the upper troposphere and lower stratosphere and different values should be considered when investigating dispersion processes near the tropopause and in the stratosphere. However, we are not able to determine exact values for the diffusion parameters on the basis of this single volcanic eruption case study with one NWP representation and one set of detailed observations. In future, as more eruption case studies become available, there is a potential to constrain the values of these diffusion parameters in order to better represent the diffusion of upper tropospheric and stratosphere 700 volcanic SO 2 clouds in NAME and other ADMs.
-For the 2019 Raikoke eruption, we find that the skill of the model strongly depends on the eruption source parameter (ESP) used for the simulation. Using the Volcanic Response team profile (VolRes 1.5 fig. 2), we find that NAME removes too much SO 2 mass from the atmosphere during the first week of the simulations ( fig. 11), resulting in a shorter e-folding time in the simulation than estimated based on TROPOMI data (e-folding time of 9 days for VolRes1.5 versus 21 days Determining the vertical SO 2 emission profile for any volcanic eruption from observations is one of the most difficult and challenging tasks the community faces. Our analysis shows that determining the details of the vertical profiles is essential in particular for eruptions that only just straddle the stratosphere in order to accurately forecast the dispersion of volcanic SO 2 . This study demonstrates that combining observational datasets with dispersion model estimates can be beneficial to obtain 715 a more detailed estimate of the volcanic SO 2 flux. The attempts in this paper to give more details on the vertical emission profile are basic and more sophisticated near realtime estimates are currently being developed by the scientific community (e.g., Kristiansen et al., 2010;Moxnes et al., 2014;Pardini et al., 2018). With the current efforts, we expect that estimates of volcanic SO 2 fluxes will become more detailed and will very likely lead to a significant improvement of model simulations of future eruptions.

720
Finally, the 2019 Raikoke eruption demonstrates that volcanic eruptions of this magnitude can have a significant impact on the aviation industry. The average cruise altitudes for heavy long-haul aircraft (11.9-13.7 km) falls well within the altitudes where the volcanic SO 2 cloud for the 2019 Raikoke eruption was observed (see fig. 4). Having reliable dispersion models to simulate volcanic clouds are crucial to better understand and mitigate their potential impacts. We have shown that the FSS and the SAL-score metrics are potentially very powerful tools when assessing the skill of the model simulations in comparison 725 with satellite measurements. The FSS score gives insight into the timescales over which the model has skill and also shows at what resolution results are significant. The SAL-score gives a more detailed overview on three different aspects of the cloud properties (shape, location and mass) and helps to identify what aspects of the eruption cloud are well represented in a model.
Using the two metrics in tandem gives a good overview on the strength and weaknesses of the simulation and helps to interpret the results of the forecast model in more detail. While we have applied the metrics to the NAME model and the TROPOMI 730 retrievals, they can also be easily applied to any combination of dispersion models and spatial observations. It could therefore also be a useful tool to inter-compare skills of satellite observation products and/or multiple dispersion models.
Code and data availability. Code and simulation data used in this manuscript may be requested from the corresponding author and can be downloaded from https://github.com/J-de-Leeuw/Raikoke. The TROPOMI satellite data can be downloaded from the ESA website (https://s5phub.copernicus.eu). I.T. and R.G. plan to archive the Oxford IASI SO2 products; in the meantime these can made available on re-735 quest from Isabelle Taylor (isabelle.taylor@physics.ox.ac.uk). Radiosonde data are available at http://weather.uwyo.edu/upperair/sounding.html.
The NAME code is available under license from the Met Office.
Video supplement. Videos of the SO2 and SO4 NAME VCD simulations for the VolRes1.5 and the StratProfile emission profiles can be found at http://doi.org/10.5281/zenodo.3992052. (de Leeuw, 2020) Author contributions. J.dL., A.S. and C.W. designed the project. J.dL. set-up and performed all the NAME simulations and analysed the