Satellite retrievals of column mass loading of volcanic ash are incorporated into the HYSPLIT transport and dispersion modeling system for source determination, bias correction, and forecast verification of probabilistic ash forecasts of a short eruption of Bezymianny in Kamchatka. The probabilistic forecasts are generated with a dispersion model ensemble created by driving HYSPLIT with 31 members of the NOAA global ensemble forecast system (GEFS). An inversion algorithm is used for source determination. A bias correction procedure called cumulative distribution function (CDF) matching is used to very effectively reduce bias. Evaluation is performed with rank histograms, reliability diagrams, fractions skill score, and precision recall curves. Particular attention is paid to forecasting the end of life of the ash cloud when only small areas are still detectable in satellite imagery. We find indications that the simulated dispersion of the ash cloud does not represent the observed dispersion well, resulting in difficulty simulating the observed evolution of the ash cloud area. This can be ameliorated with the bias correction procedure. Individual model runs struggle to capture the exact placement and shape of the small areas of ash left near the end of the clouds lifetime. The ensemble tends to be overconfident but does capture the range of possibilities of ash cloud placement. Probabilistic forecasts such as ensemble-relative frequency of exceedance and agreement in percentile levels are suited to strategies in which areas with certain concentrations or column mass loadings of ash need to be avoided with a chosen amount of confidence.

We describe modeling efforts to provide quantitative probabilistic forecasts of concentrations of volcanic ash for use by the aviation sector with an emphasis on forecast verification in which the properties of the joint distribution of forecasts and observations are explored through the use of various statistical measures. No standard set of verification metrics for probabilistic volcanic ash forecasts is currently in use.

Currently, forecasts for ash issued by Volcanic Ash Advisory Centers (VAACs) consist of polygons denoting the area of discernible ash. However, within the next 5 years, VAACs may be producing gridded products of ensemble-relative frequency of exceedances of prescribed concentration thresholds

Verification and evaluation metrics for probabilistic forecasts of a set of discrete predictands will guide model development and the construction of dispersion model ensembles and provide valuable information to end users of the forecasts. For instance,

Ash forecasts often have a large bias largely due to uncertainties in the source term. For binary predictands (e.g., forecasts which indicate areas of ash above a given threshold), a frequency bias leads to either over- or underprediction of the extent of the above-threshold area. We discuss measures of bias and use a bias correction procedure called cumulative distribution function (CDF) matching to successfully reduce bias.

Section

Bezymianny began erupting on 21 October 2020 shortly after 20:00 UTC.
Initial estimates reported that the ash cloud reached a plume height of around 9 km

Meteorological conditions above the vent are discussed in Appendix

The eruption makes a particularly good test case for a few reasons. The emission is relatively short and uncomplicated, and thus the source term has less uncertainty than in longer duration eruptions which may consist of many emissions of varying intensities and eruption column heights. The resulting ash cloud forms a complicated three-dimensional structure as it is stretched and folded by the wind field over the course of less than 1 d. As shown later, the exact location and shape of these structures is difficult to forecast.

In many larger eruptions, such as Kasatochi

The short duration of this Bezymianny eruption makes running and testing many different simulation setups more tractable. Most importantly perhaps is that eruptions of this size occur fairly frequently. One aspect of concern to aviation which has not received much attention in the literature is how well forecasts predict the dissipation of the ash cloud, by which we mean its gradual disappearance. This question becomes more urgent as areas of detectable ash become larger due to improved detection methods and there may be more reliance on models to estimate when concentrations drop below a certain threshold.

Satellite retrievals were produced as part of the Volcanic Cloud Analysis Toolkit (VOLCAT). For this eruption, satellite retrievals are every 10 min from 21 October 2020, 20:40 UTC, to 22 October 2020, 21:10 UTC. The data were from the Himawari-8 Advanced Himawari Imager, AHI, Fulldisk scans. Satellite retrievals provide information on atmospheric column mass loading, hereafter just referred to as mass loading, cloud top height, and effective radius. The mass loading is the field which is most utilized here.

The VOLCAT ash detection algorithm

Figure

A rough calculation of mass eruption rate (MER) from the change in mass in the satellite retrievals during this time (Fig.

Maximum retrieved heights are around 10 km during the initial eruption period and increase to around 12 km. It is expected that these heights would be somewhat lower than those from the geometric estimation in

For comparison to model output, the satellite data are parallax-corrected using estimated cloud top heights and then composited by first regridding to a regular 0.1

Cumulative distribution functions of composited VOLCAT retrievals. The dark purple line on the right is for satellite data
from 21 October 2020, 21:00–22:00 UTC. Colors repeat every 5 h. The dark purple line on the left is for satellite data from 22 October 2020, 17:00–18:00 UTC. The vertical black lines denote 0.2, 2, and 20

HYSPLIT is a widely used Lagrangian atmospheric transport and dispersion model (ATDM) developed and maintained by the Air Resources Laboratory (ARL) at NOAA

HYSPLIT was run as a Lagrangian particle model and choice of computational particle number is discussed more in Appendix

The time for different particle sizes to produce significantly different ash cloud distributions was tested using the method described in

Table

Table describing HYSPLIT runs.

HYSPLIT utilizes wind fields and other information from a numerical weather prediction (NWP) model as inputs

RunA is a control run. The source term was initially estimated using methods similar to those currently employed in an operational setting. The start time of the eruption, duration, eruption column height, and eruption column width were
all determined by human interpretation of available observations.
Emission start was 21 October 2020 at 20:00 UTC. Emission duration was 2 h.
Initial mass distribution was uniform throughout a cylinder centered at the vent with a width of 1 km. The base of the cylinder was at 2.88 km, the vent height. The top of the cylinder was at 12.88 km. This is similar to current default operational settings at Washington and some other VAACs

Initially the constant emission rate
was set at

The emission rate in the model is representative of the mass eruption rate of fine ash (MER

The inversion algorithm described in

Each individual HYSPLIT run used in constructing the

The HYSPLIT runs covered a vertical area from the vent at 2.88 to 12.88 km. The HYSPLIT emissions covered a time period from 21 October 2020, 19:00, to 22 October 2020, 00:00 UTC. No emissions outside of this time period or above 12.88 km were considered in rest of the analysis.

A few inversions were performed with heights up to 16.88 km as

Clear-sky observations are pixels where no ash was detected. Several options for the inclusion of clear-sky observations in the inversion were considered. Figure

Emissions determined from the inversion algorithm for several
runs as indicated in the text. The left column

When all clear-sky observations were included, emissions tended to be lower, especially when using later time periods. The vertical profile was also flatter, with the mass release distributed more evenly throughout the column. Very little mass was emitted at 19:00 UTC or below 4 or above 12 km.

When no clear-sky observations were included, estimated emissions tended to be higher. When using later time periods, the peak in the emissions also tended to occur earlier, around 19:00 UTC, which was the earliest time period considered for possible emissions. In the vertical profile there was a strong peak between 7.5 and 10 km, but significant mass was also emitted below 4 and above 12 km. Early emissions at 19:00 UTC and significant amounts of mass below 4 or above 12 km are considered unlikely, and the presence of mass at these times and locations indicates that the clear-sky observations are important constraints.

A balance was struck when near-field clear-sky observations a certain distance, such as two or three pixels, from the observations of ash were excluded. Excluding the near-field clear-sky observations also made the emission estimates less dependent on the time periods used in the inversion.

Increasing the width of the initial cloud
increased the estimated emissions slightly when some clear-sky pixels were utilized as shown in
Fig.

The difference between using 20 and 6

Figure

Emissions determined from the inversion algorithm for
RunB. Panels

For RunM, using the GEFS, we look only at source terms determined using observations up to 00:00 UTC and some clear-sky observations as shown in
Fig.

Emissions determined from the inversion algorithm utilized
with all 31 members of the GEFS.

To correct bias, we employ a procedure called CDF matching

There are several practical considerations in adding or subtracting a constant value to the simulated mass loading or concentration values.
Propagating the additive correction to ash concentrations would involve some assumptions such as dividing the correction evenly among the number of vertical levels containing ash.

Subtracting a value (positive intercept) is similar to applying a threshold, as negative values must be converted to 0. When adding a value (negative intercept), the value is only added to modeled values which have above zero mass loading to begin with. As these procedures can decrease and increase the spread of the forecast cloud, respectively, the intercept can be loosely interpreted as an indicator of how well the spread of the forecast cloud matches that of the observed one.

In the next sections we describe the bias correction for RunA and RunM.
In Sect.

Figure

Demonstration of CDF matching for RunA.

The intercept is a fairly large negative number at early times and increases to a small positive number at later times.
Negative intercepts shift the CDF to the right by adding a constant.
Figure

RunA2 was initialized with a 20 km cloud diameter to see if this would create a more realistic initial condition. The qualitative behavior of the slope and intercept of the fits was the same as shown in Fig.

The modeled horizontal dispersion of the cloud is too fast. As
discussed in Sect.

As discussed in more detail later, even the ensemble produced using the source terms derived from the inversion algorithm, RunM, was biased. We investigated whether utilizing the CDF matching technique could effectively reduce the bias for this case.

Figure

Demonstration of CDF matching for RunM. Panels

The trend for the intercepts is very similar to that seen for RunA. Values start negative for the times which have been utilized in the inversion algorithm (before 00:00 UTC) and then become positive for most of the ensemble members. This indicates that even with the improved source term, the modeled cloud is initially more compact than observed and then becomes more dispersed than observed. In later sections we will see that the positive values of the intercept at later times are indicative of a high bias in the ensemble.

We make the assumption that verification of modeled column mass loading values can be used as a proxy for the verification of forecast concentrations. The reason for this is practical as column mass loading values are now generally widely available for many eruptions of this size and larger, while measurements of concentrations are not. The validity of this assumption may be investigated in the future by employing data from lidar which can give information on ash cloud thicknesses or by utilizing any in situ measurements that may be available from aircraft flights.

Table

Summary of measures categorized by the main aspect of forecast performance they evaluate.

The concentration thresholds of interest for aviation are
0.2, 2, 5, and 10

A qualitative comparison between modeled and forecast data provides context for discussion of the verification metrics. Consequently we start the discussion with a side by side comparison of composited VOLCAT observations and HYSPLIT model output of ash column mass loadings shown in Fig.

Evolution of observations and model results for RunB for emissions determined from the inversion algorithm incorporating 3 h of observations from 21 October, 21:00 UTC, through 22 October, 00:00 UTC. The time stamp for each indicates the beginning of a 1 h average. Thus the top row is a 1 h forecast, the second row is a 4 h forecast, and so forth. The black triangle shows the location of the vent. Note that the color scale changes for each time period. The left panel shows composited VOLCAT data. The middle panel shows model output. The left and middle panels have same color scale. The right panel shows composited VOLCAT data in dark green, while the modeled data are in light purple. Units on all color bars are grams per square meters.

Part of the ash cloud is stretched in the north–south direction. This piece is initially located to the west of the volcano but gradually moves to the east. As it moves east, it becomes longer and thinner and forms a bow, as the parts to the north and south of the volcano move more slowly. By 12:00 UTC, this line of ash has broken into three small areas: one just to the east of the volcano, which has the highest retrieved top heights of around 10–11 km; one to the south with the lowest retrieved top heights of between 5–7 km; and one to the northwest with retrieved top heights of between 7–9 km. Presumably there might be ash at very low concentrations still connecting the pieces.

At 00:00 UTC, another part of the cloud is located to the southeast of the volcano. This portion breaks off from the piece discussed above by 04:00 UTC and moves to the southeast. By 08:00 UTC, this piece is no longer observed.

The model run shown reproduces the general trend fairly well. However, the placement and shape of the line of ash are not reproduced perfectly. In many of the model runs, including this one, the simulation does not stretch the line far enough north nor are the placement and shape of the southeast piece of ash correct. The two pieces of ash remain attached in the model runs for a longer period of time which results in a v shape for the modeled cloud. The southeast piece of ash remains above the threshold for much longer in the simulation.

In general, all the individual model runs follow these trends. Model runs which utilize the inversion algorithm for source determination tend to show better qualitative agreement. However, the exact evolution of the ash cloud is not faithfully reproduced by any run.

For probabilistic forecasts we follow

Observations and model forecast for 22 October 2020, 06:00 UTC.

A binary predictand can be recovered from the ATL by applying a probability threshold. In Fig.

Some sources indicate that it is dosage rather than concentrations that are the relevant factor for airlines

If velocity is constant, then

For the probability of exceeding a critical dosage

The most straightforward way to verify forecast

The probability of exceeding a dosage through a certain grid cell,

A dosage could also be computed from an applied percentile level as in

If dosage is the relevant quantity, then predicting the extent of the ash cloud accurately is critical. Assuming airspeed is around

If this is the case, even if the largest concentrations that are expected in the distal ash cloud are 10

Under these circumstances, accurately predicting the spatial
location of small areas of ash cloud may not be particularly
important. Instead predicting the time at which the ash cloud is small enough to no longer be of concern becomes important.
Simply comparing the area of the observations to area of the simulations over time as in Fig.

Area above the threshold as a function of time. The left column shows the area above 0.2

Applying the bias correction brings the simulated areas of both the lower and higher mass loadings in line with the observed area (Fig.

Applying the bias correction calculated at 00:00 UTC to the forecasts also improves the estimation of the total area above the threshold significantly for many of the ensemble members. For some members in which the bias correction has a positive shift at 00:00 UTC, the forecast area does become too large. This can be ameliorated by applying only the multiplicative correction when the shift is positive.

We will see later that allowing the positive shift helps ensure that a high probability of detection can be achieved, which is important for a strategy of total avoidance of areas of ash.

The reliability diagram, which is made of up the refinement distribution and calibration function, illustrates how well calibrated the modeled probabilities are. The diagram is computed from the ATL field which shows the modeled probability of exceeding a given threshold at each observation point.
The modeled probabilities take on values of

The refinement distribution is a histogram of how often each modeled probability,

Refinement distributions for various time periods and mass loading threshold levels are shown in the right-hand columns of Figs.

Rank histograms (left column), calibration function (middle column), and refinement distribution (right column) for RunM without bias correction (blue bars and solid lines) and with bias correction (red bars and dotted lines). Bias correction determined at 00:00 UTC and allowing positive shifts.
Rank histograms are for all points above 0.1

Rank histograms (left column), calibration function (middle column), and refinement distribution (right column) for RunA without bias correction (blue bars and solid lines) and with bias correction (red bars and dotted lines). Bias correction determined at 00:00 UTC and allowing positive shifts.
Rank histograms are for all points above 0.1

The calibration function is

The rank histogram evaluates whether the ensemble satisfies the consistency condition

Without bias correction, the rank histograms for RunM and RunA show a high over-forecasting bias at all time periods. Too frequently the observation is the lowest or one of the lowest values, which is often 0. The bias correction procedure reduces the over-forecasting bias significantly. The use of the inversion algorithm for source determination also improves the rank histogram.

With the bias correction in place, the rank histogram exhibits a U-shape at the earliest times. The U-shape indicates that the ensemble is overconfident. The ensemble members overlap with each other more than they overlap with the observations. This shows up in the reliability diagram as a line which tends to be flatter than the

At later forecast times, the rank histogram with bias correction becomes flatter towards the left of the distribution and then tends to smoothly decrease toward the right half of the distribution, with an uptick toward the second to last bin and an abrupt decrease in the last bin. The flatness on the left-hand side is mainly due to areas in which the observation and many ensemble members show below-threshold mass loadings. The smooth decrease with the slight uptick toward the right is due to the ensemble members overlapping more with each other than with the observation. The uptick occurs in the second to last bin rather than the last bin because a few ensemble members had a high-frequency bias at later times due to the addition of a constant in the bias correction. Thus the observation was rarely the highest value but often the second highest value. If the bias correction was performed so as not to allow the addition of a constant, then the uptick occurred in the last bin.

As forecast time increases, the ensemble members overlap less and less with each other which is indicated in the refinement distribution. The calibration function becomes flat with all simulated probabilities corresponding to a low actual probability. The rank histogram becomes quite flat on the lower end indicating a large number of points with below-threshold values for the observations and more than half the ensemble members. This is due to increasing difficulty in predicting the location of the ash.

Utilizing the inversion algorithm to determine the source term improved the rank histogram and reliability diagram.

As time passes, the forecast approaches but does not reach a situation in which none of the ensemble members overlap with each other or with the observation. In such a situation, the rank histogram will be perfectly flat if the average area covered by the ensemble members is the same as the area covered by the observations. If on average the ensemble members covered more area but there was still no overlap, then the first bin in the rank histogram would be populated more, indicating over-forecasting bias, while if the ensemble members covered less area on average then the last bin in the rank histogram would be populated more, indicating under-forecasting bias.

The rank histogram might look very good in such a situation, indicating that the ensemble is providing accurate information on the size of the ash cloud. However the reliability diagram would reveal that the ensemble is not able to provide information on the actual location of the ash cloud.
The refinement distribution would show that the forecast has no sharpness. Values of

So far we have seen that, especially with bias correction, the model ensemble can capture the area and qualitative structure of the observed ash cloud but struggles to capture the exact placement and shape. By utilizing the fractions skill score (FSS) we can investigate whether the forecasts become more skillful at a different spatial scale.

FSS determines the resolution at which the forecast has skill. It was developed to evaluate precipitation forecasts

The MSE described in

When computing the FSS, it is standard for the reference forecast to be defined as the largest possible MSE that can be obtained from the forecast and observed fractions

Although the FSS generally has been used with a deterministic forecast as a starting point, there is no reason that the modeled field,

At some value of

If there is no frequency bias, then

Figure

AFSS. Top row

FSS vs. neighborhood size for RunM.
Yellow, orange, and pink lines not shown in the legend are the results for individual ensemble members.
Each row shows FSS for a different time period on 22 October 2020 indicated in the gray box. Each column shows FSS for a different bias correction indicated by the text at the top of the column. Note the change in scale on the

FSS vs. neighborhood size for RunA.
Yellow, orange, and pink lines not shown in the legend are results for individual ensemble members.
Each row shows FSS for a different time period on 22 October 2020 indicated in the gray box. Each column shows FSS for a different bias correction indicated by the text at the top of the column. Note the change in scale on the

The bias correction has a significant impact on FSS, mainly by decreasing the frequency bias and thus increasing the AFSS. A bias correction procedure which does not allow the addition of positive values (Figs. 14 and 15, right column) produced significantly better FSS scores for the NEP and ensemble mean than one that did allow the addition of positive values (Figs. 14 and 15, middle column) at larger scales but slightly worse FSS scores at the smaller scales. Allowing the addition of positive values in the bias correction led to more spread in the FSS values of the individual ensemble members.

As expected, skill decreases with time. The scale at which the NEP became greater than the uniform forecast in the third column reached about 2.5

Many evaluation measures in use are based on a

Evaluation statistics which employ

The following steps are used to create contingency tables for the probabilistic forecasts. First the ATL field is computed for a given mass loading threshold. The ATL field consists of values from 0 % to 100 %. To convert the ATL field to a binary field, a probability threshold is applied as shown in Fig.

The receiver operating characteristic (ROC) is a commonly used graphical forecast verification tool which plots POD vs. POFD for various probability thresholds applied to a probabilistic forecast

The precision recall curve (PRC) is a more appropriate evaluation tool

The area under the curve (AUC) for either the ROC or PRC can be utilized to compare different forecasts. For both cases, an area closer to 1 is indicative of a better forecast.

Precision recall curves are shown in Fig.

Top row

As the probability threshold increases, the POD either stays the same or decreases, while the precision can either increase or decrease. For this case precision tends to increase sharply and then level off, while POD decreases.
In addition, the area under the PRC curve drops rather quickly with increasing forecast time. This is due mainly to a decrease in precision but a decrease in POD is also a significant factor. For the time period from 09:00–10:00 UTC, a low-probability threshold of

The bias correction only has a small effect on the PRC curve. It tends to increase precision, mostly at higher-probability thresholds. For the bias correction shown in the figures, which allows positive shifts, the POD is sometimes improved at low-probability thresholds due to a few ensemble members with positive shifts covering a much larger area. If no positive shifts are allowed in the bias correction, then the POD is generally decreased.

The bias correction on the individual ensemble members is not effective at improving the PRC curve because it does not reduce the spatial spread of the entire ensemble. While the area covered by individual ensemble members is significantly decreased for many of the ensemble members by the bias correction, the area covered by the lower-probability thresholds does not change much.

What does improve the PRC curve is using a coarser spatial scale. In order to look at different spatial scales, we utilize the NEP, which was introduced in Sect.

This method has some advantages over simply using a spatial average of the observed and modeled fields. First, we saw that the NEP performed fairly well as measured by the FSS. Secondly, spatial averaging will tend to decrease mass loadings especially for the small areas of ash considered in this case and would result in large areas and, at some time periods, the whole observed and/or simulated cloud being below the 0.2

Note that

The flow field stretches and folds the ash cloud into complex three-dimensional shapes, the exact placement and shape of which can be quite difficult to predict. Due to the chaotic nature of the flow field, a probabilistic approach is necessary. As there is little turbulent mixing high in the atmosphere, these areas are likely to have a high concentration gradient, and thus fairly high concentrations of ash may exist within a small area. The location of those areas may be highly uncertain, and to obtain a high-POD, a low-probability threshold for the ensemble frequency of exceedance may be needed. A low precision may be the trade-off. However, if dosage rather than concentration is the relevant quantity, then ensemble-relative frequency of exceedance or ATL or APL should not be utilized but instead the probability of exceeding the dosage should be made from the dosage calculation for each ensemble member. If dosage is the relevant quantity, then predicting the exact location of small areas of ash may be less important than predicting their extent.

We utilized one simple case that is representative of common medium-sized eruptions in the northern latitudes with the goal of developing a workflow which includes source determination, bias correction, and forecast verification for probabilistic forecasts of ash for aviation. The workflow could be relevant for other applications in which gridded observations of the entire pollutant cloud are available.

When satellite retrievals of column mass loading are available, an inversion algorithm to determine height and time-resolved emissions above the vent is an effective method of improving the forecast. In agreement with

This case illustrated the effectiveness of the CDF matching bias correction technique. The method is simple and fast, does not rely on spatial overlap between the simulated and observed fields, and can be utilized to improve the short-term forecast.
One factor left to be determined is whether allowing positive shifts in the forecast values is desirable. Allowing positive shifts can increase the POD but may increase bias in some ensemble members at later forecast times and decrease the ensemble skill at larger neighborhood sizes (see Sect.

The fit from the CDF matching may also simply be used for identifying how far apart the modeled and observed CDFs are in a similar fashion to the Kolmogorov–Smirnov (KS) parameter that has been used

We introduced a suite of verification measures specifically for probabilistic forecasts of volcanic ash. The FSS was used to evaluate both the ensemble mean and NEP and indicates the spatial scale at which the ensemble has skill, while the AFSS measures bias. The NEP, which combines ATL or ensemble-relative frequency of exceedance with the probability of finding ash within a neighborhood,

Verification was performed on forecasts which utilized observations up to 00:00 UTC on 22 October 2020 for the source determination and observations up to 01:00 UTC for the bias correction. The time periods for verification were within the 12:00–13:00 UTC forecast time, at which point the observed ash cloud covered less than 100 pixels of the size

This work paves the way for future investigations and development.
We found evidence similar to the findings of

Considerable work may be done to improve the construction of the ensemble.
For instance, using an ensemble reduction or weighting technique

Profiles of some of the model winds and temperature as well as time series of the planetary boundary layer height (PBLH) and precipitation are shown in Fig.

The temperature profile indicates that the tropopause is around 300 mB, which is approximately the height of injection estimated from the inversion algorithm in Sect.

Time series of precipitation

Three HYSPLIT runs were performed with all inputs identical except for the particle size. Particle diameters of 0.6, 6, and 20

Score,

Comparison of computational particle positions for 6 and 20

The basic output of a Lagrangian particle model is the position of computational particles and the amount of mass each represents. This information is transformed into a concentration field by density estimation. HYSPLIT utilizes a simple bin counting density estimation scheme in which the total mass in the user-defined bins is found by summing over the residence time-weighted mass of each particle in the bin. Then time-averaged concentration is arrived at by dividing by the volume. Although this scheme generally requires more particles than others, it has the advantage that the number of particles needed for a simulation can be estimated in a straightforward way for many model configurations

We suppose that the quantity of 0.1

To test we created runs identical to RunB but with

For RunA,

The situation for RunM is somewhat more complicated. For the individual runs for the inversion algorithm,

HYSPLIT code is available at

AC completed all the model runs, designed and performed the analysis, and wrote the first draft of the paper. TC provided expertise and software for inverse modeling. BW, BS, and CPL provided the GEFS data in a format suitable to ingest into HYSPLIT. BW assisted with the statistical analysis. AR contributed to code to process the observations and perform the statistical analysis. JS and MP provided satellite retrievals and expertise on their use. BS and MP provided expertise on forecasting for aviation. MP helped secure financial support for the work. All authors carried out review and editing of the paper.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We would like to thank Jamie Kibler and Jeff Osiensky for providing information on VAAC operations. We thank Arnau Folch and the anonymous reviewer for their careful review of the paper. Their feedback greatly improved the readability and conclusions of the paper.

This research has been supported by the National Environmental Satellite, Data, and Information Service (grant no. NA19NES4320002).

This paper was edited by Stefano Galmarini and reviewed by Arnau Folch and one anonymous referee.