the Creative Commons Attribution 4.0 License.

the Creative Commons Attribution 4.0 License.

# Estimation of power plant SO_{2} emissions using the HYSPLIT dispersion model and airborne observations with plume rise ensemble runs

### Xinrong Ren

### Fong Ngan

### Mark Cohen

### Alice Crawford

The SO_{2} emission rates from three power plants in North Carolina are estimated using the HYSPLIT Lagrangian dispersion model and
aircraft measurements made on 26 March 2019. To quantify the underlying modeling uncertainties in the plume rise calculation, dispersion simulations are carried
out in an ensemble using a total of 15 heat release parameters. For each heat release, the SO_{2} emission rates are
estimated using a transfer coefficient matrix (TCM) approach and compared with the Continuous Emissions Monitoring Systems (CEMS) data. An
“optimal” member is first selected based on the correlation coefficient calculated for each of the six segments that delineate the plumes from the
three power plants during the morning and afternoon flights. The segment influenced by the afternoon operations of Belews Creek power plant has
negative correlation coefficients for all the plume rise options and is first excluded from the emission estimate here. Overestimations are found
for all the segments before considering the background SO_{2} mixing ratios. Both constant background mixing ratios and several
segment-specific background values are tested in the HYSPLIT inverse modeling. The estimation results by assuming the 25th percentile observed
SO_{2} mixing ratios inside each of the five segments agree well with the CEMS data, with relative errors of 18 %, −12 %, 3 %,
93.5 %, and −4 %. After emission estimations are performed for all the plume rise runs, the lowest root mean square errors (RMSEs) between the
predicted and observed mixing ratios are calculated to select a different set of optimal plume rise runs which have the lowest RMSEs. Identical
plume rise runs are chosen as the optimal members for Roxboro and Belews Creek morning segments, but different members for the other segments
yield smaller RMSEs than the previous correlation-based optimal members. It is also no longer necessary to exclude the Belews Creek afternoon
segment that has a negative correlation between predictions and observations. The RMSE-based optimal runs result in much better agreement with
the CEMS data for the previously severely overestimated segment and do not deteriorate much for the other segments, with relative errors of 18 %,
−18 %, 3 %, −9 %, and 27 % for the five segments and 2 % for the Belews Creek afternoon segment. In addition, the RMSE-based
optimal heat emissions appear to be more reasonable than the correlation-based values when they are significantly different for CPI Roxboro
power plant.

- Article
(13721 KB) - Full-text XML
- BibTeX
- EndNote

_{2}emissions using the HYSPLIT dispersion model and airborne observations with plume rise ensemble runs, Atmos. Chem. Phys., 23, 12907–12933, https://doi.org/10.5194/acp-23-12907-2023, 2023.

Both Eulerian and Lagrangian atmospheric transport models have been widely used to provide forecasts or analyses of atmospheric components for a wide range of purposes varying from emergency response to climate change predictions. However, in many applications, such as volcanic eruptions, wildfire events, accidental radionuclide releases from nuclear power plants, and climate change predictions, emissions are the most critical model input parameters but are mostly unknown and difficult to quantify. Even when emission inventories are made available through bottom-up approaches, some of the emissions are often associated with large uncertainties and systematic biases due to outdated databases, inaccurate emission factors, and invalid assumptions regarding operations, processes, and/or activities (throughput) during the bottom-up emission estimation. Therefore, various inverse modeling methods using so-called top-down approaches have been developed in order to estimate emissions by combining direct observations and the accumulated knowledge already built into atmospheric transport models. Lagrangian particle dispersion models are particularly suited to applications related to point source emission estimations because they effectively avoid calculation outside air pollutant plumes and do not have the numerical diffusion problems of most Eulerian models. Many source term estimation applications have been developed using various dispersion models and inverse modeling schemes (e.g., Stohl et al., 2012; Winiarek et al., 2012, 2014; Saunier et al., 2013; Chai et al., 2015; Bieringer et al., 2017; Hutchinson et al., 2017; Chai et al., 2018; Kim et al., 2020).

The National Oceanic and Atmospheric Administration (NOAA) Air Resources Laboratory's (ARL) HYSPLIT Lagrangian model is one of the most extensively used atmospheric transport models to simulate the atmospheric transport, dispersion, and deposition of pollutants and hazardous materials (Draxler and Hess, 1997; Stein et al., 2015). A HYSPLIT inverse system based on 4D-Var data assimilation and a transfer coefficient matrix (TCM) was developed and applied to estimate the cesium-137 source from the Fukushima nuclear accident using global air concentration measurements (Chai et al., 2015). The system was further developed to estimate the effective volcanic ash release rates as a function of time and height by assimilating satellite mass loadings and ash cloud top heights (Chai et al., 2017). More recently, the HYSPLIT-based Emissions Inverse Modeling System (HEIMS) was developed to estimate wildfire emissions from the transport and dispersion of smoke plumes captured by geostationary satellite aerosol optical depth observations (Kim et al., 2020). In another HYSPLIT inverse system study with Cross-Appalachian Tracer Experiment (CAPTEX) data collected from six controlled releases, Chai et al. (2018) found that adding model uncertainty terms was able to improve source estimate results.

The source term estimation problem proves to be challenging because of the chaotic nature of the atmospheric flow. In addition, the observations from routine monitoring networks are typically sparse and often do not provide enough information to determine emission sources. Many field campaign studies have been carried out with airborne measurements by research aircraft in order to estimate certain air pollutant and greenhouse gas emission sources. Both traditional mass balance methods (e.g., Mays et al., 2009; Cambaliza et al., 2014; Liggio et al., 2016; Ren et al., 2018) and various inverse modeling methods which take advantage of atmospheric transport models (e.g., Karion et al., 2019; Angevine et al., 2020; Pitt et al., 2022; Lopez-Coto et al., 2022) have been applied to quantify different emissions. While many inverse modeling applications have been carried out and compared with bottom-up emission inventories, large uncertainties are still associated with top-down estimations. Karion et al. (2019) showed an intercomparison study using both the inventory scaling method and Bayesian inversion with several dispersion models and meteorological inputs for emission estimation with flight observations. They found significant variabilities (up to a factor of 3) between different models and between different days and indicated that further work was needed to evaluate and improve vertical mixing in tracer dispersion models.

To better evaluate the top-down estimates of emissions, Angevine et al. (2020) studied a power plant with Continuous Emissions Monitoring Systems (CEMS) data
as the known emissions. They used a model-assisted mass balance method and examined the estimate uncertainties with an ensemble of HYSPLIT runs with
different meteorological inputs and concluded with reasonably large (30 %–40 %) uncertainties for the top-down estimates of
emissions. However, a constant heat release of 85 MW as the main plume rise parameter used in the Briggs formulation was specified for all the
simulations. This could have caused an underestimation of the uncertainties. Gordon et al. (2018) and Akingunola et al. (2018) found that the Briggs plume rise
algorithm (Briggs, 1984) significantly underestimated plume rise, in contrast to the majority of past plume rise measurement studies. A recent study
by Kim et al. (2023) to estimate power plant SO_{2} emission rates with aircraft measurements also highlighted the large uncertainties caused by
the plume rise calculation when using a Gaussian footprint approach.

Fathi et al. (2021) investigated the impact of storage and release due to meteorological variability on mass balance emission rate retrieval accuracy using virtual aircraft sampling of a regional chemical transport model output. The storage-and-release events contributed to the mass balance emission estimate errors ranging from −25 % to 24 % in their tests. They recommended repeat flights around the given facility and/or time-consecutive upwind and downwind vertical profiling during the sampling period. However, inverse modeling methods using a dispersion model without assuming constant meteorological fields are expected to perform better than the mass balance method.

In this study, the HYSPLIT inverse modeling system is tested with flight observations collected in 2019 by the University of Maryland Cessna 402B
research aircraft to estimate SO_{2} point source emissions from three power plants in North Carolina, USA. An ensemble of model runs with a
range of emission heat release parameters is used to quantify the forward model simulation uncertainties due to the plume rise calculation. The paper
is organized as follows. Section 2 describes the flight observations as well as the HYSPLIT model configuration and the source term
inversion method. Section 3 presents emission inversion results, and a summary is given in Sect. 4.

## 2.1 Observations

A suite of airborne measurements was collected using an instrumented small research aircraft, the University of Maryland Cessna 402B, on 26 March 2019. A
morning flight started from 13:45 to 17:38 UTC and an afternoon flight lasted from 19:31 to 23:33 UTC. The flight tracks and the locations of the
power plants are shown in Fig. 1. The flights were intended to sample downwind plumes originating from three coal-fired power plants in
North Carolina: Roxboro (36.4833^{∘} N, 78.0731^{∘} W), CPI Roxboro (36.4350^{∘} N, 78.9619^{∘} W), and Belews Creek
(36.2811^{∘} N 80.0603^{∘} W). Note that another power plant, Mayo (36.5278^{∘} N 78.8917^{∘} W), is also in the region but
did not operate on the day. Measurement of SO_{2} mixing ratios was made with a Thermo Environment model 43S pulsed fluorescence
analyzer. Calibration of the SO_{2} analyzer was conducted before and after the field study with an SO_{2} standard that is traceable to
National Institute of Standards and Technology (NIST) reference standards. Additional measurements were also made, including aircraft locations, wind
speed, wind direction, temperature, pressure, relative humidity, and mixing ratios of several other gas species, as well as some aerosol optical
properties. More details related to the aircraft instruments and measurements can be found in Ren et al. (2018).

To better compare the HYSPLIT model results with the observations, the original 1 s data are averaged inside each four-dimensional (4D)
HYSPLIT sampling grid box, i.e., 0.01^{∘} longitude by 0.008^{∘} latitude, 100 m in altitude, and 1 min in time in this
application. It should be noted that the aircraft typically travels several three-dimensional (3D) grid boxes within a minute. The original
1 s data inside each 3D grid box are averaged separately so that multiple 1 min records would result from such a 4D averaging. For
brevity, the 4D averaged data are still referred to as 1 min data hereafter.

## 2.2 HYSPLIT model

In this study, SO_{2} plumes originating from the power plants are modeled using the HYSPLIT model (Version 5.2.0) in its particle mode in which
three-dimensional (3D) Lagrangian particles released from the source location passively follow the wind field. Random velocity components based on
the meteorological data are added to the mean advection velocities to simulate the dispersion process. The details of the model can be found in
Draxler and Hess (1997, 1998) and Stein et al. (2015). Green et al. (2019) found that the SO_{2} oxidation rates during the day from power plants were
0.22–0.71 % h^{−1} using 13 flights from 6 February to 15 March 2015 over the eastern United States. The measurements were made during a clear-sky day on
26 March 2019 and the travel time of the measured air parcels from the stacks is less than 3 h. So it is reasonable to treat SO_{2} as
a passive tracer and ignore its oxidation. A particle release rate of 20 000 per hour is used for all calculations. The meteorological data used to
drive HYSPLIT are from the Weather Research and Forecasting (WRF; version 4.0.1) model (Powers et al., 2017). The WRF model was configured for three nested
domains with horizontal grid spacing of 27 km (D01), 9 km (D02), and 3 km (Fig. 2). A total of 33 vertical
layers were defined with a higher resolution near the surface and 100 hPa for the model top. There were 20 layers below 850 hPa with the
first mid-layer height of the model at around 8 m. The simulations for D01 were initialized by using the North American Regional
Reanalysis (Mesinger et al., 2006) with 32 km grid spacing and availability every 3 h. Then, the WRF results from the coarser domains provided the
initial and boundary conditions for the inner domains. The daily WRF runs had a 30 h duration including 6 h a spin-up period (i.e.,
starting at 18:00 UTC on the previous day). The physics options for the WRF simulations were the rapid radiative transfer model for radiation
parameterization (Iacono et al., 2008), WSM6 for microphysics (Lim and Hong, 2010), the Grell 3D ensemble for the sub-grid cloud
scheme (Grell and Devenyi, 2002), the Noah land surface model (Chen and Dudhia, 2001), and the Mellor–Yamada–Nakanishi–Niino 2.5 level turbulent kinetic energy (TKE) scheme for the planetary boundary
layer (PBL) parameterization and its corresponding surface layer scheme (Nakanishi and Niino, 2006). In the WRF simulations, 3D grid nudging of winds is applied
in the free troposphere and within the PBL. Figure 3 shows that the WRF wind speed data mostly agree well with the aircraft
observations. However, at the beginning of the afternoon flight the 1 min observations show large variations in wind direction that the
5 min WRF data cannot represent. The WRF TKE data are used to calculate the turbulent velocity variances. The
ratios of the vertical to the horizontal turbulence are set as 0.18 for both daytime and nighttime. The boundary layer
stability is computed from the heat and momentum fluxes from the meteorological data. The WRF mixed layer depth is directly used in the HYSPLIT model.

The dry deposition velocity of SO_{2} is calculated using the resistance method following Wesely (1989), Chang et al. (1990), and
Walmsley and Wesely (1996). Note that the canopy resistance component depends upon a number of plant physiological and ground surface characteristics which are
provided to the HYSPLIT model by a land use input file. The molecular weight, diffusivity ratio, and effective Henry's law constant are specified as
64 g mol^{−1}, 1.9, and 1 × 10^{5} $\mathrm{mol}\phantom{\rule{0.125em}{0ex}}{\mathrm{L}}^{-\mathrm{1}}\phantom{\rule{0.125em}{0ex}}{\mathrm{atm}}^{-\mathrm{1}}$, respectively. The actual Henry's constant of 1.24 $\mathrm{mol}\phantom{\rule{0.125em}{0ex}}{\mathrm{L}}^{-\mathrm{1}}\phantom{\rule{0.125em}{0ex}}{\mathrm{atm}}^{-\mathrm{1}}$ is used to
define the wet removal process for SO_{2} as a soluble gas. The sampling grid is defined to be 0.01^{∘} longitude by
0.008^{∘} latitude and 100 m in altitude from the surface to 2000 m above ground level. Mass mixing ratios are output every minute by
setting the HYSPLIT parameter ICHEM to 6 to divide output mass by air density. They are later converted to volume mixing ratios by multiplying by the
molecular weight ratio of air to SO_{2}.

## 2.3 Plume rise

The plume rise calculation in HYSPLIT is based on the Briggs formula derived from dimensional analysis for buoyancy-dominated plumes from power plant
stacks (Briggs, 1969, 1984). Equation (1) shows the formulas used in the HYSPLIT model for the final plume rise Δ*H* in different
meteorological conditions following Arya (1999):

where *F*_{b} is the buoyancy flux term, $\stackrel{\mathrm{\u203e}}{u}$ is the mean wind speed, *u*_{∗} is the friction velocity, and *s* is the static
stability parameter as defined in Eq. (2).

Here *g* is gravitational acceleration. *T*_{v} is the moist air virtual temperature, and ${\stackrel{\mathrm{\u203e}}{\mathit{\theta}}}_{\mathrm{v}}$ is the mean virtual
potential temperature. Note that the stability parameter is calculated using the surface conditions of the meteorological data in the HYSPLIT model. A
recent study by Akingunola et al. (2018) suggests a layered buoyancy approach that allows stability to change with height for the Briggs plume rise
calculation. However, the layered approach is not implemented in the HYSPLIT model yet. The buoyancy flux term *F*_{b} is approximated by
Eq. (3) (Briggs, 1969):

where *c*_{p}, *ρ*, and *T* are the specific heat at constant pressure, average density, and temperature of ambient air,
respectively. *Q*_{H} is the heat emission from the stack. Assuming standard atmosphere, *Q*_{H} is the only user input parameter
besides meteorological conditions that affects the final plume rise height Δ*H*. It is possible to calculate *Q*_{H} when the relevant
parameters such as the flow rate and gas temperature at the stack exit are available. However, the exit gas temperature of the three stacks during the
study period cannot be obtained. Note that even if *Q*_{H} can be accurately estimated, the Δ*H* calculation through Eq. (1)
is still subject to significant uncertainties due to some assumptions for simplification. In addition, when certain parameters are not readily
available, it is preferable to assume them as unknown to allow better applicability for the source term estimation method. Thus we use a range of
*Q*_{H} values for plume rise height calculation to form an ensemble of dispersion runs, and the “optimal” plume rise runs that best match
the observations will be selected afterwards. In detailed studies at six Tennessee Valley Authority locations over many years, it was found that heat
emissions ranged from 20 to 100 MW per stack with one to nine stacks operating (Briggs, 1969). For each stack in operation, 15 heat emission
values uniformly distributed from 10 to 150 MW are tested in HYSPLIT simulations. During the study period, only one stack was operating at
each of the three power plants.

## 2.4 Inverse modeling method

Similar to previous HYSPLIT inverse modeling applications (e.g., Chai et al., 2015, 2017, 2018; Kim et al., 2020; Crawford et al., 2022), a transfer coefficient matrix (TCM)
approach is used for the inverse modeling application. After a stack heat emission scenario is specified, 24 independent HYSPLIT Lagrangian model runs
with unit hourly emissions starting from 00:00 to 23:00Z on 26 March 2019 are made at each power plant to form a TCM using the 4D averaged
1 min airborne SO_{2} observations. A transfer coefficient at row *m* and column *n* of the TCM represents the source–receptor
sensitivity of observation *m* with respect to the *n*th unit-emission run from a certain source location and release hour. The unknown emissions can
be solved by minimizing a cost function that integrates the differences between model predictions and observations, deviations of the final solution
from the first guess (a priori), and other relevant penalty terms if needed (Daley, 1991). Following Chai et al. (2018), a cost function
normalization scheme is introduced and the cost function ℱ is defined as

where *q*_{ij} is the discretized source term at hour *i* and location *j* for which an independent HYSPLIT simulation has been run and recorded in a
TCM. ${q}_{ij}^{\mathrm{b}}$ is the first guess or a priori estimate, and ${\mathit{\sigma}}_{ij}^{\mathrm{2}}$ is the corresponding error variance. We assume the
uncertainties of the release at each time and location are independent of each other so that only the diagonal term of the typical a priori error
variance ${\mathit{\sigma}}_{ij}^{\mathrm{2}}$ appears in Eq. (4). *c*^{h} and *c*^{o} denote HYSPLIT-predicted and measured mixing ratios,
respectively. The observational errors *ϵ*_{m} are assumed to be uncorrelated. Since the term ${\mathit{\u03f5}}_{\mathrm{m}}^{\mathrm{2}}$ is essentially
used to weight $({c}_{\mathrm{m}}^{\mathrm{h}}-{c}_{\mathrm{m}}^{\mathrm{o}}{)}^{\mathrm{2}}$ terms, the uncertainties of the model predictions and the representative
errors are included besides the observational uncertainties.

To consider *ϵ*^{2} in a simplified way, it is formulated as

As the additive term parameters *a*^{o} and *a*^{h} affect the *ϵ*^{2} in a similar way, the representative errors caused by
comparing the measurements with the predicted concentrations averaged in a grid can be included in either *a*^{h} or *a*^{o}. The
multiplying factor applied to the second term in Eq. (4) is the normalization to avoid having a zero source as a spurious solution when a logarithmic metric is used
in the cost function. ${\mathit{\u03f5}}_{\mathrm{m}}^{\mathrm{b}}$ represents the total uncertainties when ${q}_{ij}^{\mathrm{b}}$ is initially used in the model
predictions. The details of the normalization can be found in Chai et al. (2018).

Chai et al. (2018) show that the logarithmic metric yields better inversion results than the original air concentration metric. In this application, the
metric variable in Eq. (4) is changed to ln(*c*), i.e., replacing $({c}_{\mathrm{m}}^{\mathrm{h}}-{c}_{\mathrm{m}}^{\mathrm{o}})$ with
$\text{ln}\left({c}_{\mathrm{m}}^{\mathrm{h}}\right)-\text{ln}\left({c}_{\mathrm{m}}^{\mathrm{o}}\right)$. In such a case, ${\mathit{\u03f5}}_{\mathrm{m}}^{\text{ln}\left(c\right)}$ is comprised of
two parts, as

Note that a constant small mixing ratio of 10^{−6} ppbv is added to denominators ${c}_{\mathrm{m}}^{\mathrm{o}}$ and ${c}_{\mathrm{m}}^{\mathrm{h}}$
to avoid division by zero.

## 3.1 Transfer coefficient matrix

As mentioned in Sect. 2.4, a TCM approach is used in the inverse modeling. The time-varying model predictions of each independent
HYSPLIT Lagrangian model run with unit hourly emissions at all the receptor time and locations are recorded as the transfer coefficients (TCs). The
transfer coefficients from a set of model runs can be combined to generate a transfer coefficient matrix (TCM). Figure 4 shows a TCM with
72 columns separated into three parts representing the three power plants. Each of the 24 columns for a power plant represents a HYSPLIT run with
unit hourly SO_{2} emissions specified for a single hour on 26 March 2019. Each row indicates a 1 min 4D SO_{2} observation with
at least a nonzero transfer coefficient obtained from the 72 HYSPLIT runs. The stack heat emission *Q*_{H} = 50 MW is specified
for all 72 runs. A total of 464 out of 1503 1 min 4D SO_{2} observations are affected by the three power plants during this test
period, according to this set of HYSPLIT runs. Among those 464 observations, the first 234 1 min observations belong to the morning flight and
the next 230 observations are from the afternoon flight. Most of the observations with zero transfer coefficients for all 72 HYSPLIT runs have low
SO_{2} mixing ratios, which are likely due to SO_{2} background caused by minor sources other than the three power plants. Note that the
background SO_{2} mixing ratio may vary from one location to another and from one hour to the next.

Figure 4 shows that the emissions before 15:00Z or after 21:00Z of the day from any of the three power plants do not contribute to the
predicted SO_{2} plumes along the tracks of the morning or afternoon flights. Apparently the SO_{2} emitted from the power plant stacks
before 15:00Z had been transported out of the region when the aircraft measurements were made along the flight routes. Figure 1b shows
that aircraft left the domain of interest at 21:00Z so that SO_{2} emitted after 21:00Z was not sampled either. For 463 of the 464 indexed
observation rows in Fig. 4, the nonzero transfer coefficients only appear in one of the three parts. That is, all observations except one
are only affected by a single power plant for the current set of model runs. The only exception (*I*_{obs} = 369) for the
1 min observations is influenced by both Roxboro and CPI Roxboro. When stack heat emission *Q*_{H} = 60 MW or a higher value
is applied, the plumes from the three power plants are all separate without any overlap. This implies a decoupled system in which the emission
sources from the three different power plants can be solved separately. However, with lower heat emissions (*Q*_{H} = 10, 20, 30,
40 MW) some isolated 1 min observations may be influenced by both Roxboro and CPI Roxboro. The largest number of such observations
appears when *Q*_{H} = 10 MW is applied to all three power plants where 6 of the 479 observations with nonzero transfer
coefficients are affected by both Roxboro and CPI Roxboro. It is found that estimating the emissions from each power plant separately by ignoring the
coupling effect or by removing such rare observations yields nearly identical solutions.

It is also found that the observations from the morning flights (*I*_{obs} = 1–234) and afternoon flights (*I*_{obs} = 235–464)
are affected by a different set of hourly emissions. That is, none of the 72 hourly emission HYSPLIT runs contribute to both the morning flights and
afternoon flights. The observations of the morning flight help to constrain the hourly emissions at 15:00Z, 16:00Z, and 17:00Z from Roxboro, the
hourly emissions at 16:00Z and 17:00Z from Belews Creek, and the hourly emissions at 15:00Z and 16:00Z from CPI Roxboro, while the observations of the
afternoon flight help to constrain the hourly emissions at 19:00Z and 20:00Z from Roxboro, the hourly emissions at 18:00Z and 19:00Z from Belews Creek,
and the hourly emissions at 19:00Z and 20:00Z from CPI Roxboro. However, some of hourly emissions will not be well-constrained. For instance, the
hourly emissions at 18:00Z from Roxboro can only be constrained by six 1 min SO_{2} mixing ratio observations, and the hourly emission at
18:00Z from Belews Creek can only constrained by five observations. Figure 4 shows that each observation row has only one or two nonzero TC
values. If there are two nonzero TC values for any observation row, they are in two consecutive columns which represent two HYSPLIT runs with hourly
emissions at two consecutive hours. Instead of trying to estimate the emissions at the individual hours from each power plant, here we will only
estimate the average emissions of the two or three consecutive hours that can be constrained by the morning or afternoon flights. With this decoupling
approach, the cost function minimization becomes a very simplified problem.

## 3.2 Stack heat emission

As described in Sect. 2.3, when other meteorological parameters are fixed, the stack heat emission *Q*_{H} becomes the single user
input parameter to affect plume rise calculation with the Briggs formula being used in HYSPLIT. A total of 15 *Q*_{H} values from an expected
range of 10 to 150 MW are tested. For each heat emission value, 24 independent HYSPLIT Lagrangian model runs with unit hourly emissions
starting from 00:00 to 23:00Z on 26 March 2019 are made at each power plant, resulting in a total of 1080 model simulations. Figure 5
shows some of the plume rise results at the three different power plant locations. Note that the plume rise is added to the stack height listed in
Table 2 for the virtual release height used in the model. The plume rise mostly goes up during the day, following the PBL
development. Figure 5 also shows that the WRF PBL heights appear to be underestimated when compared with the two observation-based PBL
heights estimated using the vertical potential temperature profiles. Because Roxboro and CPI Roxboro are close to each other, both the PBL heights and
the plume rise results with the same *Q*_{H} = 50 MW are quite similar. Increasing heat emissions from *Q*_{H} = 50 to
100 MW at Belews Creek results in almost doubled plume rise. Conversely, a decreased heat
emission from *Q*_{H} = 50 to 20 MW drastically reduced the plume rise.

For each heat emission value applied to a power plant, the 24 HYSPLIT simulations with unit hourly emissions can be combined together to generate the
SO_{2} plume patterns for the particular power plant. Unless there are significant hourly emission variations the correlation coefficient (*r*)
between the combined plume and the observations is a good metric to evaluate the model performance without the need to estimate the emission
magnitudes. Figures 6 shows the correlation coefficients between 1 min aircraft SO_{2} observations and the unit-emission
HYSPLIT simulations with different heat emissions from the three power plants. When calculating model counterparts of the observations, both
horizontal nearest-neighbor and interpolation approaches are used. Note that the horizontal interpolation will increase the number of nonzero
transfer coefficients in TCMs. For instance, the number of nonzero rows of the TCM in Fig. 4 increases to 570 with horizontal interpolation
from the previous 464 with the nearest-neighbor option. In addition, the interpolation helps to smooth the gridded predictions. Figure 6
shows that correlation coefficients typically improve by up to 0.1 using the interpolation option. All the results presented later are with horizontal
interpolation when calculating model counterparts of the observations. For the Roxboro plume, the HYSPLIT simulation with *Q*_{H} between 60 and
90 MW yields a fairly good correlation between the crude predictions and observations, with *r* equal to or better than 0.6. The
best *Q*_{H} for CPI Roxboro that generates better pattern matches with the observed SO_{2} mixing ratios is probably between 40 and
90 MW, with *r* close to 0.5. However, the simulated plume from Belews Creek only reaches reasonable correlation coefficients of *r*=0.5
when *Q*_{H} is between 120 and 140 MW. When *Q*_{H} is below 80 MW, low and even negative correlation coefficients
appear between the predictions and observations. This will be investigated later by separating the morning and afternoon flights.

Table 1 shows the correlation coefficients between 1 min aircraft SO_{2} observations from the morning and afternoon
flights and the model counterparts using the unit-emission HYSPLIT simulations with different heat emissions from the three power plants. For Roxboro,
the HYSPLIT simulation with *Q*_{H} = 70 MW yields the best correlation coefficient *r*=0.68 for the morning flight, but the best
correlation coefficient *r*=0.64 for the afternoon is obtained when *Q*_{H} is given as 100 MW. In fact, the power plant emissions had
variations among the operation hours during the day. The HYSPLIT predictions of the morning and afternoon flight observations are contributed by the
unit hourly emission runs from 15:00 to 17:00Z and 19:00 to 21:00Z, respectively. The CEMS SO_{2} hourly emissions at Roxboro are 582, 345,
and 360 kg h^{−1}
for 15:00, 16:00, and 17:00Z, respectively, and 465 and 486 kg h^{−1} for 19:00 and 20:00Z, respectively. The lower average hourly
emission (429 kg h^{−1}) contributing to the morning flight than the average hourly emission (476 kg h^{−1}) contributing to
the afternoon flight suggests a higher *Q*_{H} for the afternoon flight than the morning flight since the emissions of SO_{2} and
heat emissions are expected to be proportional to each other for a particular stack. This agrees with the findings here; i.e., a higher optimal
*Q*_{H} (100 MW) is needed for a better simulation of the afternoon flight than the optimal *Q*_{H} (70 MW) for the
morning flight.

For the Belews Creek plume, the model results with *Q*_{H} = 80 MW seem to capture the plume pattern recorded by the morning
flight, with a correlation coefficient *r*=0.87 between the 1 min observations and the HYSPLIT counterparts. However, the correlation
coefficients between HYSPLIT predictions and the afternoon flight observations are all negative with all 15 *Q*_{H} values. This implies
problems other than plume height calculation with HYSPLIT. As shown in Fig. 3, there are large discrepancies between the WRF wind
directions and the aircraft measured ones at the beginning of the afternoon flight near Belews Creek (see Fig. 1). An attempted
assimilation of aircraft wind measurements using the WRF observational nudging is not quite effective to correct the wind direction biases. In
addition, successful predictions of the measured SO_{2} require wind field measurements at the upwind locations in an earlier time period,
which are not available for the current case. No optimal plume rise will be selected for this segment before Sect. 3.3.3.

The HYSPLIT simulations with *Q*_{H} = 90 MW and *Q*_{H} = 140 or 150 MW are found to correlate best with the CPI
Roxboro SO_{2} plumes measured during the morning and afternoon flights, with correlation coefficients of 0.75 and 0.44, respectively. The CEMS
SO_{2} hourly emissions at Roxboro CPI are 281 and 300 kg h^{−1} for 15:00 and 16:00Z, respectively, and 316 and
295 kg h^{−1} for 19:00 and 20:00Z, respectively. While the fact that the optimal *Q*_{H} is higher in the afternoon
corresponds well with the higher average SO_{2} emission from the CPI Roxboro power plant at 306 kg h^{−1} for 19:00–20:00Z versus
291 kg h^{−1} for 15:00–16:00Z, the much lower correlation coefficient of *r*=0.44 for the afternoon plume indicates large prediction errors
even with the optimal *Q*_{H} (140 or 150 MW).

Table 1 also shows that the model simulation generally performs better in the morning than in the afternoon. This is probably related to the fact that the wind directions in the afternoon are more variable than in the morning, as shown in Fig. 3. The meteorological variability may cause storage-and-release events, which make successful emission estimation more difficult to obtain, especially for the mass balance method (Fathi et al., 2021).

## 3.3 Inversion results

It has been shown that the current problem can be decoupled among the three different power plants. In addition, the SO_{2} measurements from
the power plant plumes during the morning and afternoon flights are affected by emissions of distinctive periods of 2 to 3 h. Thus six
segments are considered independently. Considering the very limited number of 1 min observations to constrain the emissions at certain hours
as discussed in Sect. 3.1, constant emissions are assumed for each of the six segments.

When pre-processing the observations, multiple 1 s SO_{2} values are averaged to generate 1 min observations. The standard deviation of
the multiple original 1 s observations is calculated to represent the observational uncertainty. The parameters in Eq. (5) are
found using linear regression as *f*^{o}=0.1 and *a*^{o}=0.05 ppbv. Chai et al. (2018) found that the inversion results were
not very sensitive to the observation uncertainty estimates. They also showed that setting the model uncertainty parameter to *f*^{m}=0.2
yielded good results when compared with the known emission sources in the case study. Here the model uncertainty parameter of *f*^{m}=0.2 is
also assumed and the additive term *a*^{m} is set as 0.05 ppbv, identical to *a*^{o}.

### 3.3.1 Zero background

Inversion estimations are first carried out without subtracting any background SO_{2} mixing ratios from the observations. That is, the
observations are assumed to originate only from the three power plant sources. Emission estimation results of the three power plants obtained by
minimizing the cost functions using the morning and afternoon flights separately are listed in Table 3 with 15 different assumed heat
emissions.

Based on the morning flight, the estimated Roxboro SO_{2} emission varies from 701.5 kg h^{−1} with *Q*_{H} = 10 MW
to 473.2 kg h^{−1} with *Q*_{H} = 150 MW. With the optimal *Q*_{H}=70 MW, SO_{2} emission is
estimated as 551.9 kg h^{−1}, 29 % greater than the average CEMS between 15:00 and 17:00Z. Table 2 shows that the emissions at
14:00 and 15:00Z are both 582 kg h^{−1}, while the emissions at 16:00 and 17:00Z decrease to 345 and 360 kg h^{−1} before going up
again to 509 kg h^{−1} at 18:00Z. The average emission from 19:00 to 20:00Z estimated based on the afternoon flight with the optimal
*Q*_{H} = 100 MW is 520.9 kg h^{−1}, 9 % larger than the average CEMS value. Contrary to the morning flight, the
estimated emissions are generally greater with increasing emission heat. The estimated emissions are 875.7 and 449.3 kg h^{−1} with
*Q*_{H} = 150 and 10 MW, respectively.

Using the morning flight observations, Belews Creek SO_{2} emissions between 16:00 and 17:00Z are overestimated with all 15 heat
emissions. With the optimal *Q*_{H} = 80 MW, the estimated emission is 1417.3 kg h^{−1}. Although this is 57 %
larger than the average CEMS emission, it is better than the estimates with other *Q*_{H} values except *Q*_{H}=100 and 120 MW,
which yield slightly lower emissions (1411.6 and 1406.0 kg h^{−1}).

It is also noted that estimated Belews Creek SO_{2} emissions using the afternoon flight observations are mostly within a factor of 2 when
compared with the average hourly CEMS emissions, while significant negative correlations are found between the observations and the model
predictions. The worst underestimations when *Q*_{H} = 40–70 MW are associated with lower absolute correlations ($\left|r\right|<\mathrm{0.3}$). At
*Q*_{H} = 110 MW when the most extreme anticorrelation ($r=-\mathrm{0.68}$) occurs, the estimated SO_{2} emission of
697.3 kg h^{−1} is very close to the average hourly CEMS emission of 794 kg h^{−1} between 18:00 and 19:00Z. The inverse
correlation is caused by the plume misplacement mostly due to wind direction error. The high absolute correlation indicates that the model probably
predicts the mixing ratio gradient relatively well but misplaces the plume relative to the actual plume. Since the model predicts higher mixing ratios when
observation values are low but predicts lower mixing ratios when observation values are higher, neither lower nor higher emissions would improve the
agreement between the predictions and observations. Thus, no significant biases arise from such cases. Nonetheless, the negative correlations between
the model and observations indicate model deficiencies and require special attention.

The CPI Roxboro emission estimates based on the morning and afternoon flights with the optimal *Q*_{H} values (90 and 140 MW) are
712.8 and 384.4 kg h^{−1}, respectively. They are overestimated over the CEMS by 145 % and 26 %. The CPI Roxboro emission estimates
based on the morning flight increase significantly when *Q*_{H} is above 100 MW. Overestimations of the SO_{2} emissions by
factors of 18 and 15 are found with *Q*_{H} set as 140 and 150 MW, respectively. Table 1 shows that the two heat
emissions yield correlation coefficients of 0.10, a significant drop from *r*=0.75 when *Q*_{H} is assumed to be 90 MW. Although the
highest correlation coefficient between observations and unit-emission HYSPLIT predictions for a specific flight segment may not produce the best
emission estimates, a low correlation coefficient typically indicates modeling deficiencies very effectively.

### 3.3.2 SO_{2} background

With zero background SO_{2} mixing ratios, the emission estimates based on the optimal heat emission are all greater than the CEMS
emissions. This indicates that it is necessary to consider the SO_{2} background mixing ratios. The HYSPLIT simulated mixing ratios are
actually the enhancements over the background mixing ratios. As shown in Table 4, there are 810 1 min observations, which is more than half
of the 1503 1 min SO_{2} observations not residing in any of the HYSPLIT simulated plumes originating from the three power plants with
any of the 15 heat emissions. It has to be noted that the flight patterns could have been better constructed. Sampling upwind as well as downwind or
in closed-shape flight patterns which enclose the sources (see, e.g., Ryoo et al., 2019, Fathi et al., 2021, and Kim et al., 2023) would have significantly helped
in the estimation of the SO_{2} background mixing ratios.

At first, the median value of the missed SO_{2} observations (0.199 ppbv) is assumed to be the background SO_{2} mixing ratio. This
value is subtracted from all the observations unless the values are below this background value, where the observations are set as zero. Using the
adjusted observations, the emission estimation results are listed in Table 5. Compared to the estimates with zero background mixing
ratios, the estimated emissions are all reduced, as expected. The Roxboro emissions are estimated to be 436 kg h^{−1} for 15:00–17:00Z and
403.3 kg h^{−1} for 19:00–20:00Z. The morning segment estimate agrees
much better with the CEMS than the previous estimate without considering the
background SO_{2} mixing ratios. The estimated Belews Creek emission of 1259.7 kg h^{−1} is significantly improved as well. The CPI
Roxboro emission during the 15:00–16:00Z period is overestimated by 89 %, but it is not as severe as the previous 145 % overestimation. The
estimated CPI Roxboro emission for the 19:00–20:00Z period is within 4 % of the CEMS value.

Table 4 shows the statistical distribution values of the six different segments, i.e., the morning and afternoon plumes from the three
power plants. The highest 1 min SO_{2} mixing ratio of 7.249 ppbv is inside the Belews Creek plume measured during the morning
flight. The observed SO_{2} mixing ratios inside the Belews Creek plumes are much higher than those from the other plumes. It is beneficial to
assume different background values for the six different segments of the observations. The minimum, the 5th percentile, the 10th percentile, and the
25th percentile mixing ratios of the morning and afternoon observations inside the plumes from three different power plants are assumed to be
segment-specific background mixing ratios. After subtracting the assumed background values from the observations, the emission estimation results are
listed in Table 5. The estimated emissions decrease with increasing background values. With the segment-specific 25th percentile as the
background, the Belews Creek emission estimation of 929.7 kg h^{−1} is within 3 % of the CEMS values, and the other estimates are
comparable to the results by assuming a constant background mixing ratio of 0.199 ppbv.

### 3.3.3 Root mean square errors (RMSEs)

Up to now, the best heat emission parameters have been selected based on the correlation coefficients between the observations and predicted counterparts for each segment of the observations after an ensemble of HYSPLIT runs with 15 different heat emissions. This can be performed before the emissions are estimated since the correlation coefficients are not affected by the magnitudes of the emissions when emissions for each segment are assumed to be constant. After the emission magnitudes are estimated, model performance can be evaluated using other statistical metrics.

The correlation-based emission estimations using all 15 different heat emission parameters by assuming the segment-specific 25th percentile
observation to be the background mixing ratios are listed in Table 6. The root mean square errors (RMSEs) of the HYSPLIT predicted morning
and afternoon plumes from the three power plants with all the plume rise ensemble runs are listed in Table 7. The optimal heat
emissions that yield the best correlation coefficients also result in the smallest RMSEs for two segments, the morning plumes from Roxboro and Belews
Creek. The afternoon plume from Roxboro predicted with *Q*_{H} = 90 MW and the estimated emission of 389.9 kg h^{−1} has
the smallest RMSE of 0.428 ppbv. The emission is underestimated by 18 %. However, for both the morning and afternoon plumes from CPI
Roxboro, optimal heat emissions associated with the highest correlation coefficients are quite different from the heat emissions that produce the
smallest RMSEs. If the model runs associated with the smallest RMSEs are selected, the estimated CPI Roxboro SO_{2} emissions are
265.1 kg h^{−1} for 15:00–16:00Z and 389.1 kg h^{−1} for 19:00–20:00Z, which are 9 % underestimated and 27 % overestimated compared to
CEMS. While the 19:00–20:00Z emission is worse than the result based on the best correlation, the 15:00–16:00Z emission estimation is much closer to
the CEMS than the correlation-based result, which is 94 % overestimated. For the plume from Belews Creek observed during the afternoon flight,
*Q*_{H} = 140 MW yields the lowest RMSE of 1.874 ppbv, which is more than 3 times the median SO_{2}
observation in the segment. The lowest RMSE of 0.859 ppbv for the Belews Creek morning segment is smaller than the 25th percentile value of the
observation (1.041 ppbv). For the other four segments, the best RMSEs are slightly larger than the median of the observations. This
indicates the poor performance of the Belews Creek afternoon model simulation. However, the emission inversion with
*Q*_{H} = 140 MW still yields a very good estimate of 811.6 kg h^{−1}, which is only 2 % overestimated.

Figure 7 shows the comparison of both the RMSE-based and correlation-based optimal predictions with the morning and afternoon flight
observations in the HYSPLIT predicted plumes from the three power plants. Identical results are obtained using the smallest RMSE and the highest
correlation coefficient for the morning segments from Roxboro and Belews. For both cases, the predicted SO_{2} mixing ratios agree well with
the observations. Note that here the SO_{2} predictions include both the predicted SO_{2} enhancement with the estimated emissions and
the assumed segment-specific background values, which are chosen as the 25th percentile observations inside the particular plumes. For the other cases,
the RMSE-based predictions tend to produce lower mixing ratios for the observed high SO_{2} values. Thus the linear regression lines for the
RMSE-based predictions tend to have flatter slopes. However, the RMSE-based emission can still be larger, such as the CPI Roxboro afternoon case. The
scatter plot for the Belews Creek afternoon case clearly shows anticorrelation as indicated by the negative correlation coefficients listed in
Table 1. This is caused by plume misplacement due to wind direction errors. Although the predicted high and low mixing ratios are
opposite to the observations, the minimization of the cost function defined by Eq. (4) is still capable of reaching an estimate close to
the actual emission rate. The observations appear to have a good representation of the mixing ratio distribution for the plume at the distance from
the source. Even if the model misplaced the plume location, predicted mixing ratios that have a similar distribution of the low and high values still
have the minimal cost function. That is, the inverse modeling method is not very sensitive to plume misplacement. If
*Q*_{H} = 110 MW that generates the highest negative correlation of −0.68 is chosen as the optimal plume rise parameter, the
estimated emission for Belews Creek during the 18:00–19:00Z period is 709.9 kg h^{−1}, which is only 11 % lower than the CEMS value of
794 kg h^{−1}. It might still be possible to have reasonable emission inversion results even when plumes are misplaced by the model.

Figure 8 shows the optimal predictions based on the highest correlation coefficients and minimal RMSEs at 800 $\mathrm{m}\phantom{\rule{0.125em}{0ex}}\mathrm{a}.\mathrm{g}.\mathrm{l}.$ at
17:00 and 19:00Z. Continuous vertical profiles along the flight track, or “curtain” plots, of the correlation-based and RMSE-based optimal
predictions are shown in Fig. 9 and enlarged in Figs. 10 and A1–A9. For the
morning flight, the optimal predictions of the Roxboro and Belews Creek plumes based on the highest correlation coefficient and minimal RMSE are
identical. The prediction results agree well with the observed plume placement and width, as well as the mixing ratios. On the other hand, for the CPI
Roxboro morning plume, the RMSE-based optimal prediction with *Q*_{H} = 30 MW is quite different from the correlation-based
optimal prediction with *Q*_{H} = 90 MW. The center of the RMSE-based plume is at a lower altitude than the correlation-based
plume (Figs. A1, A2, and A4). The lower-placed plume is also associated with lower mixing
ratios that match the observations better. Figure 8c shows a wider CPI Roxboro plume of the RMSE-based result than the correlation-based
result in Fig. 8a. The larger extent of the RMSE-based CPI Roxboro plume results in an extra appearance of the plume under the flight
track in the curtain plot (Fig. A3).

For the Roxboro plume captured during the afternoon flight, the correlation-based optimal prediction with *Q*_{H} = 100 MW and
SO_{2} emissions of 418.9 kg h^{−1} shows very similar spatial structures and mixing ratios as the RMSE-based optimal prediction
with *Q*_{H} = 90 MW and 389.9 kg h^{−1}. Figures 8, 10,
and A7–A9 show little difference between the two, and both agree well with the 1 min aircraft
observations. Figure 10 shows that both predictions underestimated the observed peak values along the flight, but the peak location
and the width of the Roxboro plumes match well between the predictions and observations. Note that the SO_{2} emissions are underestimated by
both of the optimal selections for this segment.

As shown in Table 1, strong anticorrelation is found between predicted and observed SO_{2} mixing ratios of the Belews Creek
afternoon plume. The prediction with *Q*_{H} = 110 MW that has highest absolute correlation coefficient is selected here as the
correlation-based solution. Figure A6 shows that it is not very different from the RMSE-based result with
*Q*_{H} = 140 MW. Both cases clearly misplaced the first transect of the plume and predicted wider transects than the
observations. It is found that the second transect shown in Fig. A6 is well-predicted with *Q*_{H} = 110 MW.

For the CPI Roxboro plume observed during the afternoon flight, the correlation-based optimal prediction with *Q*_{H} = 140 MW
and the RMSE-based optimal prediction with *Q*_{H} = 50 MW appear drastically different in Figs. 8,
10, and A7–A9, as expected. Figures 10 and
A7–A9 show that the RMSE-based optimal prediction has wider plume transects and has them placed at lower
altitudes than the correlation-based results. The predicted mixing ratios match the observations much better than the correlation-based results,
although the estimated emission of 389.1 kg h^{−1} is not closer to the CEMS emission of 306 kg h^{−1} than the correlation-based
estimation of 294.1 kg h^{−1}. In addition, Fig. 10 shows that the RMSE-based solution captures an observed narrow CPI
Roxboro plume transect that the correlation-based solution fails to reproduce. The results here indicate the need to have more observations at different
altitudes in future flight planning.

An ensemble of HYSPLIT runs with various heat release parameters for the Briggs plume rise algorithm is made to estimate SO_{2} emissions from
three power plants. Using a TCM approach for the inverse modeling, independent HYSPLIT Lagrangian model runs with unit hourly emissions are carried
out for each heat release value. The SO_{2} emissions from the three power plants during the morning and afternoon flight periods on 26 March
2019 are estimated separately through six different segments.

Initially the “optimal” plume rise runs are selected based on the highest correlation coefficients between predictions and observations. A segment
with negative correlations is excluded. It is found that the SO_{2} emissions are overestimated for all the remaining segments if background
mixing ratios are not considered. Several different assumptions of background values are then tested. Assuming the 25th percentile observed
SO_{2} mixing ratio inside each segment to be the background SO_{2} mixing ratios yields good emission estimates, with relative errors
of 18 %, −12 %, 3 %, 93.5 %, and −4 % when compared with the CEMS data (see Fig. 11). Note that the ranges of
the inverted emissions with 10 MW above and below the optimal heat emissions are used to indicate the sensitivities of the results to the heat
emissions. While the differences between the emission estimates and the known CEMS data provide some confidence to the results, quantification of the
uncertainties associated with the method probably requires further investigation in the future.

Using the same segment-specific SO_{2} background assumption, optimal plume rise runs are later selected to have the smallest RMSEs between
the predicted and observed mixing ratios. The previously excluded segment that has negative correlation coefficients between predictions and
observations is also included in the emission inversion. While identical plume rise runs are chosen as the optimal members for Roxboro and Belews
Creek morning segments, different runs are selected for the other three segments than the previous correlation-based results. In addition, emission
inversion for the previously excluded segment that has negative correlation coefficients between predictions and observations is also carried out. The
relative errors are 18 %, −18 %, 3 %, −9 %, and 27 % for the five segments and 2 % for the Belews Creek afternoon
segment. Figure 11 shows that the RMSE-based estimate of SO_{2} emissions from CPI Roxboro at 15:00–16:00Z agrees much better
with the CEMS data than the correlation-based estimate does. The RMSE-based SO_{2} emission estimates of Roxboro at 19:00–20:00Z and CPI
Roxboro at 19:00–20:00Z appear to deteriorate slightly. However, the associated HYSPLIT predictions show better agreement with the observations than
the correlation-based optimal runs because of their smaller RMSEs.

While the stack exit gas temperature data are not available for this study, a single constant stack exit temperature is provided for each facility in
the 2020 National Emissions Inventory (NEI) (Personal communication with George Pouliot at the U.S. EPA). Using the average measured air temperature
as the ambient temperature and the other CEMS data (United States Environmental Protection Agency (U.S. EPA), 2022), including hourly exit airflow rates, the morning and afternoon heat emissions are
estimated as 52–59 and 49–56 MW, 80–92 and 76–87 MW, and 13–13 and 12–13 MW for Roxboro, Belews Creek, and CPI Roxboro,
respectively. Note that the heat emission estimation is sensitive to the stack exit temperature, which is expected to vary from hour to hour, similar
to the exit airflow rates and the SO_{2} emissions. Nonetheless, these estimated values indicate the reasonable ranges of the heat
emissions. When correlation-based and RMSE-based methods agree with each other in their optimal heat emission for Roxboro and Belews Creek morning
segments, the optimal heat emissions are very close to the estimated stack heat emissions here. When the two methods disagree, the
correlation-based optimal heat emissions of 90 and 140 MW for CPI Roxboro in the morning and afternoon are unreasonably high, but the RMSE-based
optimal emissions of 30 and 50 MW could still be reasonable. This suggests that the RMSE-based results are probably more reliable.

While the uncertainty of the heat emission is the focus here, there are a lot of other uncertainties associated with the emission estimates. For
instance, uncertainties in many parameters, such as the assumed background SO_{2} mixing ratios, the meteorological data input such as the wind
direction and speed, and some of the HYSPLIT turbulence parameterizations related to the turbulent mixing, will all affect the final results. Even if
the hourly exit temperatures were available, the plume rise calculated using the Briggs algorithm may still misplace the plume. It is likely that the
optimal heat emissions chosen here have compensated for other errors in the model.

The relatively low resolution of heat emissions with an increment of 10 MW for the plume rise ensemble runs may result in significant errors for some cases. For instance, Fig. 11 shows large ranges of the emission estimates when using 10 MW above and below the correlation-based and RMSE-based optimal heat emissions for the CPI Roxboro afternoon segment. Since it is not easy to select the best-performing plume rise run based on the limited observations, it is probably better to use several ensemble members to quantify the uncertainties of the model simulation as well as the emission estimates. This is indicated in Fig. 11 but needs to be further explored in the future.

Negative correlation is found between predictions and observations for the Belews Creek plume captured by the afternoon flight due to the wind
direction errors of the meteorological data. However, the RMSE-based SO_{2} emission estimate is only 2 % above the CEMS value. More
surprisingly, if the plume rise run with the highest absolute correlation coefficient is selected, the SO_{2} estimate of
715.6 kg h^{−1} is very close to the CEMS average emission rate of 794 kg h^{−1}. We speculate that the inverse modeling is not very
sensitive to the plume misplacement because the cost function minimization would favor an unbiased population distribution even when misplacement by
the model is present. However, special care is needed for such situations in which large RMSEs indicate model deficiencies.

It has to be noted that the current dispersion simulation directly places the pollutant release points with the calculated plume rises elevated above the stacks, while the actual plumes reach their apexes gradually. Thus the dispersion model is not able to accurately reproduce the exact plume shapes at locations close to the source. The afternoon flight around Belews Creek power plant is closer to the source than the other segments. This probably makes this case more difficult to simulate accurately than the other segments.

This study shows that RMSE is a better metric than the correlation coefficient in choosing the best ensemble member for the SO_{2} emission
inversion. While the RMSE-based optimal plume rise runs appear to agree better with the observations than the correlation-based optimal runs,
observations are often missing when and where the optimal runs are significantly different. Additional measurements at multiple altitudes would
have been really helpful. In future flight planning for similar top-down emission estimation studies more vertical profiles of the target pollutant
should be measured. In addition, more upwind measurements are also recommended in order to better quantify the background concentrations caused by
many other emission sources. It is also wise to choose relatively steady meteorological conditions for the flight campaign since unsteady conditions
such as frequent wind direction changes pose great challenges not only for the inverse modeling but also for the meteorological simulation and the
dispersion modeling. The current study shows the value of ensemble simulations when certain model parameters are difficult to determine, such as
stack heat emissions as shown here.

HYSPLIT code is available at https://www.ready.noaa.gov/HYSPLIT.php (NOAA, 2023). The observation data are available upon request.

TC designed and performed the model analysis and wrote the first draft of the paper. XR conducted the measurement collection and analysis. FN completed the WRF runs to generate meteorological data. MC provided expertise for the HYSPLIT modeling and plume rise algorithm. AC conducted the initial SO_{2} simulations. All authors contributed to the paper editing and revision.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This study was supported by NOAA award NA16OAR4590121 at the NOAA Air Resources Laboratory in collaboration with the Cooperative Institute for Satellites Earth System Studies (CISESS), University of Maryland, College Park, MD 20740, USA.

This research has been supported by NOAA Research (NOAA award no. NA16OAR4590121).

This paper was edited by Joshua Fu and reviewed by two anonymous referees.

Akingunola, A., Makar, P. A., Zhang, J., Darlington, A., Li, S.-M., Gordon, M., Moran, M. D., and Zheng, Q.: A chemical transport model study of plume-rise and particle size distribution for the Athabasca oil sands, Atmos. Chem. Phys., 18, 8667–8688, https://doi.org/10.5194/acp-18-8667-2018, 2018. a, b

Angevine, W. M., Peischl, J., Crawford, A., Loughner, C. P., Pollack, I. B., and Thompson, C. R.: Errors in top-down estimates of emissions using a known source, Atmos. Chem. Phys., 20, 11855–11868, https://doi.org/10.5194/acp-20-11855-2020, 2020. a, b

Arya, S. P.: Air pollution meteorology and dispersion, Oxford University Press, New York, NY, ISBN 0-19-507398-3, 1999. a

Bieringer, P. E., Young, G. S., Rodriguez, L. M., Annunzio, A. J., Vandenberghe, F., and Haupt, S. E.: Paradigms and commonalities in atmospheric source term estimation methods, Atmos. Environ., 156, 102–112, https://doi.org/10.1016/j.atmosenv.2017.02.011, 2017. a

Briggs, G. A.: Plume Rise, AEC Critical Review Series TID-25075, U. S. Atomic Energy Commission, Division of Technical Information, Oak Ridge, Tennessee, 1969. a, b, c

Briggs, G. A.: Plume rise and buoyancy effects, in: Atmospheric Sciences and Power Production, edited by: Randerson, D., DOE/TIC-27601 (DE84005177), pp. 327–366, United States Dept. of Energy Technical information Center, Springfield, VA, USA, 1984. a, b

Cambaliza, M. O. L., Shepson, P. B., Caulton, D. R., Stirm, B., Samarov, D., Gurney, K. R., Turnbull, J., Davis, K. J., Possolo, A., Karion, A., Sweeney, C., Moser, B., Hendricks, A., Lauvaux, T., Mays, K., Whetstone, J., Huang, J., Razlivanov, I., Miles, N. L., and Richardson, S. J.: Assessment of uncertainties of an aircraft-based mass balance approach for quantifying urban greenhouse gas emissions, Atmos. Chem. Phys., 14, 9029–9050, https://doi.org/10.5194/acp-14-9029-2014, 2014. a

Chai, T., Draxler, R., and Stein, A.: Source term estimation using air concentration measurements and a Lagrangian dispersion model – Experiments with pseudo and real cesium-137 observations from the Fukushima nuclear accident, Atmos. Environ., 106, 241–251, 2015. a, b, c

Chai, T., Crawford, A., Stunder, B., Pavolonis, M. J., Draxler, R., and Stein, A.: Improving volcanic ash predictions with the HYSPLIT dispersion model by assimilating MODIS satellite retrievals, Atmos. Chem. Phys., 17, 2865–2879, https://doi.org/10.5194/acp-17-2865-2017, 2017. a, b

Chai, T., Stein, A., and Ngan, F.: Weak-constraint inverse modeling using HYSPLIT-4 Lagrangian dispersion model and Cross-Appalachian Tracer Experiment (CAPTEX) observations – effect of including model uncertainties on source term estimation, Geosci. Model Dev., 11, 5135–5148, https://doi.org/10.5194/gmd-11-5135-2018, 2018. a, b, c, d, e, f, g

Chang, J. S., Middleton, P. B., Stockwell, W. R., Binkowski, F. S., and Byun, D.: The regional acid deposition model and engineering model, in: Acidic deposition: State of science and technology, Vol I, Emissions, Atmospheric Processes, and Deposition, PB-92-100403/XAB, USA, 1990. a

Chen, F. and Dudhia, J.: Coupling an advanced land surface-hydrology model with the Penn State-NCAR MM5 modeling system. Part I: Model implementation and sensitivity, Mon. Weather Rev., 129, 569–585, 2001. a

Crawford, A., Chai, T., Wang, B., Ring, A., Stunder, B., Loughner, C. P., Pavolonis, M., and Sieglaff, J.: Evaluation and bias correction of probabilistic volcanic ash forecasts, Atmos. Chem. Phys., 22, 13967–13996, https://doi.org/10.5194/acp-22-13967-2022, 2022. a

Daley, R.: Atmospheric Data Analysis, Cambridge University Press, Cambridge, ISBN 0-521-38215-7, 1991. a

Draxler, R. and Hess, G.: Description of the HYSPLIT_4 modeling system, Tech. Rep. NOAA Technical Memo ERL ARL-224, National Oceanic and Atmospheric Administration, Air Resources Laboratory, Silver Spring, Maryland, USA, 1997. a, b

Draxler, R. and Hess, G.: An overview of the HYSPLIT_4 modeling system for trajectories, dispersion and deposition, Aust. Meteor. Mag., 47, 295–308, 1998. a

Fathi, S., Gordon, M., Makar, P. A., Akingunola, A., Darlington, A., Liggio, J., Hayden, K., and Li, S.-M.: Evaluating the impact of storage-and-release on aircraft-based mass-balance methodology using a regional air-quality model, Atmos. Chem. Phys., 21, 15461–15491, https://doi.org/10.5194/acp-21-15461-2021, 2021. a, b, c

Gordon, M., Makar, P. A., Staebler, R. M., Zhang, J., Akingunola, A., Gong, W., and Li, S.-M.: A comparison of plume rise algorithms to stack plume measurements in the Athabasca oil sands, Atmos. Chem. Phys., 18, 14695–14714, https://doi.org/10.5194/acp-18-14695-2018, 2018. a

Green, J. R., Fiddler, M. N., Holloway, J. S., Fibiger, D. L., McDuffie, E. E., Campuzano-Jost, P., Schroder, J. C., Jimenez, J. L., Weinheimer, A. J., Aquino, J., Montzka, D. D., Hall, S. R., Ullmann, K., Shah, V., Jaegle, L., Thornton, J. A., Bililign, S., and Brown, S. S.:
Rates of Wintertime Atmospheric SO_{2} Oxidation based on Aircraft Observations during Clear-Sky Conditions over the Eastern United States, J. Geophys. Res., 124, 6630–6649, https://doi.org/10.1029/2018JD030086, 2019. a

Grell, G. and Devenyi, D.: A generalized approach to parameterizing convection combining ensemble and data assimilation techniques, Geophys. Res. Lett., 29, 1693, https://doi.org/10.1029/2002GL015311, 2002. a

Hutchinson, M., Oh, H., and Chen, W.-H.: A review of source term estimation methods for atmospheric dispersion events using static or mobile sensors, Inf. Fusion, 36, 130–148, 2017. a

Iacono, M. J., Delamere, J. S., Mlawer, E. J., Shephard, M. W., Clough, S. A., and Collins, W. D.: Radiative forcing by long-lived greenhouse gases: Calculations with the AER radiative transfer models, J. Geophys. Res., 113, D13103, https://doi.org/10.1029/2008JD009944, 2008. a

Karion, A., Lauvaux, T., Lopez Coto, I., Sweeney, C., Mueller, K., Gourdji, S., Angevine, W., Barkley, Z., Deng, A., Andrews, A., Stein, A., and Whetstone, J.: Intercomparison of atmospheric trace gas dispersion models: Barnett Shale case study, Atmos. Chem. Phys., 19, 2561–2576, https://doi.org/10.5194/acp-19-2561-2019, 2019. a, b

Kim, H. C., Chai, T., Stein, A., and Kondragunta, S.: Inverse modeling of fire emissions constrained by smoke plume transport using HYSPLIT dispersion model and geostationary satellite observations, Atmos. Chem. Phys., 20, 10259–10277, https://doi.org/10.5194/acp-20-10259-2020, 2020. a, b, c

Kim, J., Seo, B.-k., Lee, T., Kim, J., Kim, S., Bae, G.-N., and Lee, G.:
Airborne estimation of SO_{2} emissions rates from a coal-fired power plant using two top-down methods: A mass balance model and Gaussian footprint approach, Sci. Total Environ., 855, 158826, https://doi.org/10.1016/j.scitotenv.2022.158826, 2023. a, b

Liggio, J., Li, S.-M., Hayden, K., Taha, Y. M., Stroud, C., Darlington, A., Drollette, B. D., Gordon, M., Lee, P., Liu, P., Leithead, A., Moussa, S. G., Wang, D., O'Brien, J., Mittermeier, R. L., Brook, J. R., Lu, G., Staebler, R. M., Han, Y., Tokarek, T. W., Osthoff, H. D., Makar, P. A., Zhang, J., Plata, D. L., and Gentner, D. R.: Oil sands operations as a large source of secondary organic aerosols, Nature, 534, 91–94, https://doi.org/10.1038/nature17646, 2016. a

Lim, K.-S. S. and Hong, S.-Y.: Development of an Effective Double-Moment Cloud Microphysics Scheme with Prognostic Cloud Condensation Nuclei (CCN) for Weather and Climate Models, Mon. Weather Rev., 138, 1587–1612, https://doi.org/10.1175/2009MWR2968.1, 2010. a

Lopez-Coto, I., Ren, X., Karion, A., McKain, K., Sweeney, C., Dickerson, R. R., McDonald, B. C., Ahn, D. Y., Salawitch, R. J., He, H., Shepson, P. B., and Whetstone, J. R.: Carbon Monoxide Emissions from the Washington, DC, and Baltimore Metropolitan Area: Recent Trend and COVID-19 Anomaly, Environ. Sci. Technol., 56, 2172–2180, https://doi.org/10.1021/acs.est.1c06288, 2022. a

Mays, K. L., Shepson, P. B., Stirm, B. H., Karion, A., Sweeney, C., and Gurney, K. R.: Aircraft-Based Measurements of the Carbon Footprint of Indianapolis, Environ. Sci. Technol., 43, 7816–7823, https://doi.org/10.1021/es901326b, 2009. a

Mesinger, F., DiMego, G., Kalnay, E., Mitchell, K., Shafran, P., Ebisuzaki, W., Jovic, D., Woollen, J., Rogers, E., Berbery, E., Ek, M., Fan, Y., Grumbine, R., Higgins, W., Li, H., Lin, Y., Manikin, G., Parrish, D., and Shi, W.: North American regional reanalysis, B. Am. Meteorol. Soc., 87, 343–360, 2006. a

Nakanishi, M. and Niino, H.: An improved Mellor–Yamada level-3 model: Its numerical stability and application to a regional prediction of advection fog, Bound.-Lay. Meteorol., 119, 397–407, 2006. a

NOAA: HYSPLIT model, NOAA Air Resources Laboratory, NOAA [code], https://www.ready.noaa.gov/HYSPLIT.php (last access: 23 September, 2023), 2023. a

Pitt, J. R., Lopez-Coto, I., Hajny, K. D., Tomlin, J., Kaeser, R., Jayarathne, T., Stirm, B. H., Floerchinger, C. R., Loughner, C. P., Gately, C. K., Hutyra, L. R., Gurney, K. R., Roest, G. S., Liang, J., Gourdji, S., Karion, A., Whetstone, J. R., and Shepson, P. B.: New York City greenhouse gas emissions estimated with inverse modeling of aircraft measurements, Elementa: Science of the Anthropocene, 10, 00082, https://doi.org/10.1525/elementa.2021.00082, 2022. a

Powers, J. G., Klemp, J. B., Skamarock, W. C., Davis, C. A., Dudhia, J., Gill, D. O., Coen, J. L., Gochis, D. J., Ahmadov, R., Peckham, S. E., Grell, G. A., Michalakes, J., Trahan, S., Benjamin, S. G., Alexander, C. R., Dimego, G. J., Wang, W., Schwartz, C. S., Romine, G. S., Liu, Z., Snyder, C., Chen, F., Barlage, M. J., Yu, W., and Duda, M. G.: THE WEATHER RESEARCH AND FORECASTING MODEL Overview, System Efforts, and Future Directions, B. Am. Meteorol. Soc., 98, 1717–1737, https://doi.org/10.1175/BAMS-D-15-00308.1, 2017. a

Ren, X., Salmon, O. E., Hansford, J. R., Ahn, D., Hall, D., Benish, S. E., Stratton, P. R., He, H., Sahu, S., Grimes, C., Heimburger, A. M. F., Martin, C. R., Cohen, M. D., Stunder, B., Salawitch, R. J., Ehrman, S. H., Shepson, P. B., and Dickerson, R. R.: Methane Emissions From the Baltimore-Washington Area Based on Airborne Observation: Comparison to Emissions Inventories, J. Geophys. Res., 123, 8869–8882, https://doi.org/10.1029/2018JD028851, 2018. a, b

Ryoo, J.-M., Iraci, L. T., Tanaka, T., Marrero, J. E., Yates, E. L., Fung, I., Michalak, A. M., Tadić, J., Gore, W., Bui, T. P., Dean-Day, J. M., and Chang, C. S.:
Quantification of CO_{2} and CH_{4} emissions over Sacramento, California, based on divergence theorem using aircraft measurements, Atmos. Meas. Tech., 12, 2949–2966, https://doi.org/10.5194/amt-12-2949-2019, 2019. a

Saunier, O., Mathieu, A., Didier, D., Tombette, M., Quélo, D., Winiarek, V., and Bocquet, M.: An inverse modeling method to assess the source term of the Fukushima Nuclear Power Plant accident using gamma dose rate observations, Atmos. Chem. Phys., 13, 11403–11421, https://doi.org/10.5194/acp-13-11403-2013, 2013. a

Stein, A. F., Draxler, R. R., Rolph, G. D., Stunder, B. J. B., Cohen, M. D., and Ngan, F.: NOAA's HYSPLIT atmospheric transport and dispersion modeling system, B. Am. Meteorol. Soc., 96, 2059–2077, 2015. a, b

Stohl, A., Seibert, P., Wotawa, G., Arnold, D., Burkhart, J. F., Eckhardt, S., Tapia, C., Vargas, A., and Yasunari, T. J.: Xenon-133 and caesium-137 releases into the atmosphere from the Fukushima Dai-ichi nuclear power plant: determination of the source term, atmospheric dispersion, and deposition, Atmos. Chem. Phys., 12, 2313–2343, https://doi.org/10.5194/acp-12-2313-2012, 2012. a

United States Environmental Protection Agency (U.S. EPA): Field Audit Checklist Tool (FACT), version 1.6.0.3 https://www.epa.gov/airmarkets/field-audit-checklist-tool-fact (last access: 23 September, 2023), 2022. a, b

Walmsley, J. L. and Wesely, M. L.: Modification of coded parametrizations of surface resistances to gaseous dry deposition, Atmos. Environ., 30, 1181–1188, https://doi.org/10.1016/1352-2310(95)00403-3, 1996. a

Wesely, M. L.: Parameterization of surface resistances to gaseous dry deposition in regional-scale numerical models, Atmos. Environ., 23, 1293–1304, https://doi.org/10.1016/0004-6981(89)90153-4, 1989. a

Winiarek, V., Bocquet, M., Saunier, O., and Mathieu, A.: Estimation of errors in the inverse modeling of accidental release of atmospheric pollutant: Application to the reconstruction of the cesium-137 and iodine-131 source terms from the Fukushima Daiichi power plant, J. Geophys. Res.-Atmos., 117, D05122, https://doi.org/10.1029/2011JD016932, 2012. a

Winiarek, V., Bocquet, M., Duhanyan, N., Roustan, Y., Saunier, O., and Mathieu, A.: Estimation of the caesium-137 source term from the Fukushima Daiichi nuclear power plant using a consistent joint assimilation of air concentration and deposition observations, Atmos. Environ., 82, 268–279, 2014. a

_{2}emissions of three power plants are estimated using aircraft observations and an ensemble of HYSPLIT dispersion simulations with different plume rise parameters. The emission estimates using the runs with the lowest root mean square errors (RMSEs) and the runs with the best correlation coefficients between the predicted and observed mixing ratios both agree well with the Continuous Emissions Monitoring Systems (CEMS) data. The RMSE-based plume rise appears to be more reasonable.

_{2}emissions of three power plants are estimated using aircraft observations and an...