Evaluating the Impact of Storage-and-Release on Aircraft-based Mass-Balance Methodology Using a Regional Air Quality Model

We investigate the potential for aircraft-based top-down emission rate retrieval overand under-estimation using a regional chemical transport model, the Global Environmental Multiscale-Modeling Air-Quality and CHemistry (GEM-MACH). In our investigations we consider the application of the mass-balance approach in the Top-down Emission Rate Retrieval Algorithm (TERRA). Aircraft-based mass-balance retrieval methodologies such as TERRA require relatively constant meteorological conditions and source emission rates to reliably estimate emission rates from aircraft observations. Avoiding cases where 5 meteorology and emission rates change significantly is one means of reducing emissions retrieval uncertainty, and quantitative metrics that may be used for retrieval accuracy estimation are therefore desirable. Using these metrics has the potential to greatly improve emission rate retrieval accuracy. Here, we investigate the impact of meteorological variability on mass-balance emission rate retrieval accuracy by using model simulated fields as a proxy for real world chemical and meteorological fields, in which virtual aircraft sampling of the GEM-MACH output was used for top-down mass balance estimates. We also explore 10 the impact of upwind emissions from nearby sources on the accuracy of the retrieved emission rates. This approach allows the state of the atmosphere used for top-down estimates to be characterized in time and 3D space; the input meteorology and emissions are “known”, and thus potential means for improving emission rate retrievals and determining the factors affecting retrieval accuracy may be investigated. We found that emissions retrieval accuracy is correlated with three key quantitative criteria, evaluated a priori from forecasts and/or from observations during the sampling period: (1) changes to the atmospheric 15 stability (described as the change in gradient Richardson number), (2) variations in the direction of transport, as a result of plume vertical motion and in the presence of vertical wind shear, and (3) the combined effect of the upwind to downwind concentration ratio and the upwind to downwind concentration standard deviations. We show here that cases where these criteria indicate high temporal variability and/or high upwind emissions can result in “Storage-and-Release” events within the sampled region (control volume), which decrease emission rate retrieval accuracy. Storage-and-release events may contribute the bulk 20 of mass-balance emission rate retrieval underand over-estimates, ranging in the tests carried out here from -25% to 24% of the known (input) emissions, with a median of -2%. Our analysis also includes two cases with unsuitable meteorological conditions and/or significant upwind emissions to demonstrate conditions which may result in severe storage, which in turn cause emission rate underestimates by the mass-balance approach. We also introduce a flight strategy whereby the emission rate re1 https://doi.org/10.5194/acp-2021-343 Preprint. Discussion started: 14 June 2021 c © Author(s) 2021. CC BY 4.0 License.


Introduction
Aircraft-based measurements of air pollutants provide an effective top-down approach for estimating total integrated emissions of sources ranging from individual stacks to large industrial complexes and cities. These top-down estimates are commonly employed for the evaluation of bottom-up emissions reported to inventories and can provide insights on air quality and health impacts of anthropogenic emissions. Top-down estimates can also be used to provide highly resolved emissions data for input 35 into air quality models. The mass-balance technique and Gauss's divergence theorem can be employed to infer point and area source emission rates from aircraft measured data. Several studies have utilized aircraft-based mass balance emission rate retrievals in the past (Kalthoff et al., 2002;Mays et al., 2009;Turnbull et al., 2011;Karion et al., 2013Karion et al., , 2015Cambaliza et al., 2014;Gordon et al., 2015;Nathan et al., 2015;Tadić et al., 2017;Conley et al., 2017;Ryoo et al., 2019). Flight patterns from these studies include single-height transects, single-screen flights and box flights. Box flight refers to flight patterns with 40 enclosed shapes (polygonal, cylindrical) and is accomplished by flying in closed paths (loops) at multiple altitudes around the emission source while making measurements of meteorological fields and pollutant concentrations. Kalthoff et al. (2002) used aircraft measured data to estimate CO and NOx emissions of the city of Augsburg in southern Germany, using data from two research aircrafts at several altitudes (polygonal box) around the boundaries of the city during the EVA project (Slemr et al., 2002). Emission rate estimations were made using a mass-balance approach applied to aircraft measurements, and estimates 45 were used in improving emissions input data for an air quality model (KAMM/DRAIS). Through model simulations, Panitz et al. (2002) analyzed the validity of the assumptions made by Kalthoff et al. (2002) and also studied the contributions of all relevant processes (advective fluxes, deposition, turbulent transports, and chemical transformations) to the mass budget of chemical species (e.g. CO and NOx). They concluded that, in an aircraft-based mass-balance approach, 85% and 95% of source emissions can be determined from the advective fluxes. Gordon et al. (2015) conducted top-down aircraft-based emission rate 50 retrievals by flying polygonal (e.g. rectangular) box flights around individual oil sands facilities in Alberta, Canada. This study also involved development of a methodology for extrapolation of aircraft measured data below the lowest flight level based on various assumptions and the analysis of the associated uncertainties. Gordon et al. (2015) estimated uncertainty in aircraftbased estimates as 2%-30%, with the source of this uncertainty attributed mainly to extrapolation of aircraft measurements below the lowest flight track. Conley et al. (2017) and Ryoo et al. (2019) made aircraft-based emission estimations by flying 55 cylindrical box flights around different point/area emission sources. The study by Conley et al. (2017) also involved evaluation representation. The details of the GEM-MACH configuration used here appear in Table A1, Appendix A.
For this work, a series of retrospective high-resolution nested air-quality simulations were made for the period of the JOSM summer 2013 aircraft intensive campaign (JOSM, 2011). The GEM-MACH model grid configuration consisted of a coarse parent domain with a horizontal resolution of 10km covering North America, and a nested 2.5km high resolution domain including Canadian provinces of Alberta and Saskatchewan (see Fig. 1a). To prevent the model high-resolution meteorology 130 from drifting chaotically from observed meteorology, the GEM-MACH 10km simulations were updated every 24 hours at 6 UT (local mid-night). Most of the JOSM 2013 cases examined here take place during the late mornings and early afternoons in local time (LT); by choosing to update the driving meteorology in the model at 6 UT sudden shifts in the model meteorological fields were avoided during the local daytime, and more than sufficient spin-up time (6h time-frame) was provided for model meteorological fields (in particular the cloud microphysics fields) to reach equilibrium before local noon. 135 Flux box data, including meteorological fields and SO 2 concentrations, were extracted from the model output database for the period of nine JOSM 2013 flights over three different facilities (Fig. 1b). The mass-balance approach was employed to retrieve model input emissions using the extracted flux box data. Earlier work (Fathi, 2017) suggests that the use of regional air-quality models as proxies for observations must be taken with care, with three main considerations in their use for this purpose: 140 1. Air-quality models usually require the use of some form of "mass conservation" correction for their advection algorithms (de Grandpré et al., 2016), which can otherwise create spurious negative concentrations. However, mass conservation algorithms which conserve mass over the entire model domain, but do not conserve mass locally, effectively include spurious mass transport -in turn affecting emissions retrievals using model output. A local mass correction algorithm (3 shell ILMC approach, Sørensen et al., 2013) was therefore used in the GEM-MACH simulations carried out here. Model 145 resolution also impacts mass conservation and needs to be sufficiently high that the sources may be resolved. 2. Regional air-quality models provide instantaneous output at every grid cell on a fixed (2 min) time-step -however, if these are interpolated to a finer time resolution (e.g. along an aircraft flight track), errors in interpolation may affect the use of the model output in retrieval algorithms. Values of model variables at 3-D grid-points, in the vertical columns making up the emissions "box", were therefore used in the analysis which follows to avoid these (temporal and spatial) 150 interpolation errors in the use of model output as a proxy for observations. 3. In aircraft-based sampling (both in real world aircraft measurements and in virtual sampling with model predicted data), temporal lag between data points collected along flight tracks may represent an additional source of error. This may introduce additional errors if the "static atmosphere" assumption is not met. Here, instantaneous model output at each time-step (2min) was used for mass-balance estimates, to avoid the potential for interpolation from the model resolution 155 (2.5km) and time-step to influence the retrievals generated here.
For this purpose, GEM-MACH model four-dimensional data (3D space, 1D time) were extracted at every 2 min model chemistry time-step for the flux boxes over individual oil sands facilities for our nine studied cases. Aircraft-based mass-balance retrievals were simulated by employing the divergence theorem in analyzing the model extracted data. The estimated emission rate time-series and flight-time (1.5-2.5 hours) averages were then compared to model input emissions (MIE), to evaluate 160 top-down mass balance retrievals.

Divergence Theorem, the Mass-balance technique and Emission Rate Retrievals
During an actual aircraft campaign, measurements are made along a flight trajectory on the lateral walls of a flux box (control volume); measured data on a single 2D screen (closed surface) comprised of box lateral walls is all that is available for emission walls as our 2D screen ("TR" approach, section 2.2.2).
The temporal evolution of the system can be studied by increasing the number of flights around the same facility. Here, this approach is explored with GEM-MACH simulated fields by analyzing multiple 2D screens at consecutive model time-steps (3D fields) and generating mass balance estimates ("TR*" approach, section 2.2.3).
A truly comprehensive analysis of the tracer mass budget within the flux box is only possible with 3D-volumetric data over 170 the entire extent of the box and over time (4D fields). However, collection of 4D (volumetric time-series) data is not practically feasible during aircraft measurements as the aircraft cannot sample the entire volume of the box even with increased flights during a time window of only a few (e.g. 2-3) hours. One advantage of the use of a model such as GEM-MACH as a proxy for observations is that 4D forecast fields can be generated, to provide complementary information for mass balance retrieval investigation. Here we use GEM-MACH model 4D chemical and meteorological fields to conduct a comprehensive analysis 175 of the transport of the emitted mass from sources within the control volume and to further investigate uncertainties associated with storage-and-release events in top-down mass balance retrievals ("DR" approach, section 2.2.1).
The Gauss's divergence theorem states that the total integrated divergence of a vector field (e.g. fluid flow) F in a region of space (control volume) V equals to the total flux of F through the enclosing boundary (surface) S, (1)

180
A mass-balance approach (conservation of mass) based on the divergence theorem can be employed to estimate chemical (pollutant) emission rates by sources (e.g. stack emissions) within a control volume V where atmospheric dispersion and transport takes place. In order for mass to be conserved, the inflow of mass of species C into V (E C,in ) plus the rate of generation/emission of mass (E C ) within V , must equal the rate of mass removal from the control volume (E C,out ) by the outgoing flux and processes such as surface deposition and the rate at which mass is accumulated/stored within the volume of 185 the box (E C,S ); written in a format linked to the divergence theorem (Eq. 1), Note that the left hand side and the terms in brackets on the right hand side of Eq.
(2), comprise the implementation of the divergence theorem in the mass-balance equation. In steady state conditions, i.e., under the assumption of time-invariant meteorology and input emissions, the net accumulation (rate) of mass can be assumed zero and therefore the divergence 190 theorem can be utilized to equate E C to the net flux through the boundaries of the box (E C,out − E C,in ). The divergence theorem is commonly utilized in aircraft-based studies with flight paths approximating flux boxes of polygonal (e.g., Gordon et al., 2015;Liggio et al., 2019) and/or cylindrical (e.g., Ryoo et al., 2019)  box flight top height), were analyzed using the mass balance approach to estimate the model input emissions. We note that this approach incorporates information not available for aircraft observation-based retrievals (i.e. volumetric time-series data), but may be used to analyze the importance of storage-release, and as a standard against which the other methods which follow may 210 be evaluated and compared.
The total mass of compound C within the control volume V (B C (t, V )) is calculated at each model time-step (∆t) by integrating over the entire volume of the box. The rate of change in mass can then be calculated by taking the time derivative of B C (t, V ) and following the chain rule, where M R is the ratio of the compound molar mass (e.g. 64.07 g/mol for SO 2 ) to the molar mass of air (28.97 g/mol), χ C (t, V ) and ρ air (t, V ) are compound mixing ratio and air density at each model grid point and time-step, respectively; with the 3D integrand volume element dV = dxdydz. The first integral on the right hand side of Eq. (3) represents the rate of change in compound mass within V due to variations in compound mixing ratio χ C (t, V ) independent of the changing air density, The second integral represents the apparent change in compound mass (which is calculated as a product of air density and compound mixing ratio) due to temporal variations in air density ρ air (t, V ), caused by meteorological processes, and does not represent an actual change in compound mass within the control volume, The minus sign signifies that this is an apparent change in compound mass, which is accounted for in the mass balance equation.

225
In applying the mass balance technique, we set the net rate of change of Eq.
(3) as equal to the sum of all flux and chemical terms changing the compound mass (emissions, horizontal and vertical advection, vertical diffusion and chemical production).
Solving for E C,S , the processes contributing to the net change in compound mass within the control volume V can be described with the following expression: 230 where E C represents the emission rate of compound C from sources within the control volume. E C,H and E C,V represent the net horizontal and vertical fluxes through the lateral and top walls of the flux box, respectively (derivations appear in Appendix B, Eqs. B1-B3). E C,V D represents the surface deposition rate, which is available as an output variable from GEM-MACH.
Change in compound mixing ratio due to variations in air density can be accounted for by calculating E C,M from Eq. (5).
E C,X represents the rate of change (i.e. the formation rate of C) due to chemical processes, and can be assumed negligible for 235 a tracer compound such as SO 2 relative to the emission rates for these facilities since it has a lifetime on the order of days.
In the context of emissions estimation using observed concentrations and winds along an aircraft trajectory, Eq. (4) can't be estimated directly, since it requires a volume integral over the box enclosing the emission source; data that are not available along the flight path. However, when a regional air-quality model is used as a proxy for observations, and/or a strategy of repeat flights are used (see Sect. 2.2.3), E C,M and E C,S can be estimated directly. When model values are used these terms may be 240 generated from equations (4) and (5). For example E C,S is estimated using the mixing ratio rate of change (∂χ C /∂t) multiplied by the air density (ρ air ) at each model grid point and integrating over the volume of the emissions enclosure box, with a similar approach used for E C,M . E C,S represents the net rate of increase/decrease of tracer mass within the box; a residual rate of change associated with the difference given in Eq. (6). We therefore refer to it as a "storage" term for tracer mass. Observationbased emission rate estimate studies (e.g. TERRA) assume E C,S = 0, while in reality a non-zero E C,S term is incorporated 245 in the E C term; here we examine the potential magnitude of E C,S using the GEM-MACH model's concentration predictions as a proxy for aircraft observations, and use the combination of mass-balance approach and GEM-MACH to fully investigate the potential impacts of storage of tracer mass on emission rates retrieval over/underestimates. We also use this combination of retrieval algorithm and regional air-quality model to generate meteorological forecast model variables which predict conditions in which the storage term may be of concern, reducing its influence on future emissions analysis (Sect. 4), and to demonstrate 250 flight procedures which may be adopted a priori to reduce its influence on emissions retrievals (Sect. 2.2.3). Rearranging Eq.
(6) and neglecting E C,X as being insignificant over the short distances (5-20km) between source and emissions box wall, compound emission rates can be estimated based on the other terms as, Eq. (7) equates the total integrated emission rate, originating from sources (point and/or area) within the box (control volume), and employing a second order central Finite Difference (FD) scheme (i.e. ∂χ C /∂t ∼ = (χ C (t + ∆t) − χ C (t − ∆t))/2∆t); the choice of FD scheme was made based on optimized performance and accuracy in our computations and it was determined that higher order schemes did not provide further advantages in terms of accuracy. A similar set of equations, lacking sources, sinks and a separate storage term, can be written for the air mass changes within the emissions box (see Appendix B), which lead to the mass balance expression for air

265
Eqs. (4), (5) and (7) (with other terms in Appendix B) were utilized in analyzing GEM-MACH model 4D flux box data for the nine cases, and DR method estimates were compared to the model input emissions (MIE). Results are discussed in Sect. 3.1.

TERRA Retrieval (TR) approximation; Using GEM-MACH 2D Fields
Aircraft-based retrieval methods such as TERRA rely on measured data along the lateral walls of the box. In observation-based retrievals, the horizontal flux E C,H can be calculated (directly) from aircraft collected data and the other terms need to be 270 estimated based on several assumptions about the atmospheric conditions and source properties (Gordon et al., 2015). Here, we explore the use of regional ( to variations in air density over flight time, E air,M and E C,M , based on observed meteorological trends in air density ( ∂ ∂t ρ air ) from nearby weather stations (towers) and average species mixing ratios (χ C ) around the box. Here, we substituted the weather 290 data with a single column (GEM-MACH predicted) air density temporal rate of change ∂ ∂t ρ air (t, z) at point (x o , y o ) located at the horizontal center of the box for each of the nine studied cases. Here, single column ∂ ∂t ρ air (t, x o , y o , z) is assumed to represent the spatial average rate of change in air density over the region of interest through the assumption of horizontal homogeneity, as in observational TERRA applications. Our comparisons between the GEM-MACH box average and the single column rate of change in air density indicate < 5% difference between the two trends for the time periods of our studied cases.

295
This gives where A is the base area of the box. χ C (t, z) is the average GEM-MACH predicted species mixing ratio around the box along the flight path (s) at height z, which is the same as in aircraft-based applications of TERRA.

300
Furthermore, due to lack of measurements at the box top, the vertical flux E C,V in TERRA in observation-based applications is estimated in two steps: (a) the vertical air mass flux (E C,V ) is estimated based on the other terms by rearranging the mass balance expression for air (Eq. 8), and (b) the species mixing ratio (e.g. SO 2 ) at the box top is assumed to be uniform and equal to the average (along flight path s) screen top mixing ratio (χ C,top ) (Gordon et al., 2015). Here, the same procedure was employed in deriving the vertical flux through box top using GEM-MACH output fields, As there are no volumetric data available in aircraft measurement applications of TERRA, the storage rate within the box cannot be inferred directly. Emission rate estimations in aircraft-based methods such as TERRA are made with the underlying assumption of steady state conditions and therefore E C,S is assumed to be negligible, effectively incorporating it into E C .
Thus, for aircraft-observation TERRA applications, mimicked here, Eq. (7) is reduced to, Estimations of emission rates were made for each model time-step (using instantaneous 2D screens) during the time periods of the nine studied JOSM cases by using GEM-MACH output to approximate the input data for TERRA retrievals (TR) as described above. Note that the TR method, as described in this section, makes use of GEM-MACH 2D data at each time t to generate an estimate (one estimate at each time-step t during the entire sampling period ) while making approximate estimations 315 of the temporal gradient terms (Eqs. 9 and 10) at time t by following procedures similar to aircraft-based applications of TERRA. Flight-time averaged estimates are then compared to the flight-time averaged model input emissions (MIE), Sect. 3.2.

Revised TERRA Retrieval (TR*); Using GEM-MACH 3D Fields
Utilizing time-consecutive instantaneous screens around an emission source, the temporal evolution of the system and its impact on mass-balance estimates can be studied in more detail, as discussed at the beginning of this section (Sect. 2.2). By 320 using air density screens at every 2 min model time-step (2D space, 1D time), as opposed to using a single column (vertical) profile in the TR method, estimations of E air,M and E C,M can be improved (by 1-3%, determined through comparisons with GEM-MACH box averaged temporal trends): where ρ air (t, z) and χ C (t, z) are the average GEM-MACH predicted air density and species mixing ratio around the box (along s) at height z. A similar procedure (using time-consecutive screens) can be used for estimating the storage term E C,S . The main difference between DR (Eq. 7) and TR (Eq. 12) retrievals is that the former explicitly accounts for and calculates the storage term E C,S . As discussed later in Sect. 3.3, the results from the Direct Retrieval (DR) method demonstrate the potentially significant contribution of the storage term E C,S to the total integrated estimated emission rates E C (Eq. 7). Ignoring this term, 330 as in Eq. (12), could result in substantial over/underestimates, if retrievals are attempted for conditions where the underlying assumption of time-invariant meteorology is not met and/or when significant upwind emissions (from other sources) enter the flux box. As noted in the Introduction, a priori metrics may be used to avoid these conditions. The key information for calculating E C,S , is the estimate of the temporal gradient in species mixing ratio (∂χ C /∂t) as in Eq. (4). This is achieved in the DR method by utilizing GEM-MACH model 4D data (volumetric time-series) and integration over the entire volume of the box 335 at each model time-step. These 4D data are not available for aircraft applications of retrieval algorithms, since instantaneous datasets cannot be created from flight information, and collection of volumetric data may not be operationally feasible in aircraft campaigns, where flight durations may be limited to a few hours. However, data from time-consecutive 2D screens may be used to approximate E C,S , leading to improved estimates of emission rates -by providing ancillary information for the study of temporal trends in chemical and meteorological fields. As we show later in Sect. 3.3, estimates of E C,S from time-340 consecutive 2D screens are strongly correlated (P r = 0.9) with the corresponding DR estimates of E C,S using volumetric (3D) time-series. This high correlation suggests that E C,S can be approximated using ∂χ C /∂t generated from time-consecutive 2D screens (3D fields) as representative of the rate of change in mass within the volume of the box. The approximation of E C,S using screen data alone is given by, where, same as in Eq. (14), ρ air (t, z) and χ C (t, z) are the average GEM-MACH predicted air density and species mixing ratio along s (around the box) at height z. Note that the key difference between Eq. (4) and Eq. (15) is that the former integrates over the volume, using consecutive time steps of chemical transport model output, while the latter integrates over average values along s at each height level z on the screen walls using consecutive time steps, and multiplies the result by the box horizontal area. However, Eq. (15) may also be generated from aircraft observations, something which is not possible for Eq. (4). For the 350 case of aircraft observations, a repeat flight would be required: rather than just a single flight path starting near the surface and working upwards along the box boundary, the aircraft would execute (at least) two consecutive box flights around the same facility. A better approach would be the use of two (multiple) aircrafts, or UAVs as they are becoming more popular for air quality measurements (e.g., Elston et al., 2015;Mölders et al., 2015;Barchyn et al., 2018;Li et al., 2018), flying in tandem following the same flight path in order to reduce time lag between samplings (in turn reducing the chance for meteorological 355 variability to be a concern). Differencing between the fields obtained from the two consecutive flights would be used to estimate the value of E * C,S . The correlation between E C,S and E * C,S also indicates that the latter may be used to estimate, from repeat flight aircraft observations, the extent to which within-box mass storage may influence emission rate retrievals. We note that the correlation between E C,S and E * C,S was deduced here through GEM-MACH model simulations with model grid point resolution (in the horizontal) of 2.5km; this correlation (and the corresponding time/length scales) can be further investigated 360 through model simulations at higher resolutions (e.g. 50m-250m) and/or field observations (e.g. controlled tracer release).
Eqs. (14) and (15) along with the other terms in Eq. (12), give the revised TERRA Retrieval (TR*) method,   (12), for each case. Also shown in Fig. 3 for reference are the ±50%, 30% and 20% error limits, with the ±30% being the maximum estimated uncertainty for TERRA applications in Gordon et al. (2015). Estimates for most of the flight cases studied here using GEM-MACH output fell within ±30% of MIE, exceptions being cases on August 26th and 28th (cases 5 and 8). The latter two flights were not analyzed in past work due to unfavorable conditions for reliable application of TERRA, namely, light variable winds (case 5) and high upwind emissions (case 8). However, they have been included 375 here using GEM-MACH predictions in mass-balance estimates, as extreme cases in which storage-and-release events have a significant influence on resulting retrievals. We herein refer to cases 5 and 8 as the rejected cases. Using the DR method, we were able to determine the relative contribution of the different terms to the net estimate of E C (Fig. 4). The main contributing terms were EC,H and EC,S (> 97%). The other terms contributed < 3%, combined.

Direct Retrieval (DR)
DR estimations were made by analyzing model 4D data for flux boxes approximating the nine cases. The numerical integration 380 expressions for calculating each term, using model data, are described in Table B1. Estimates of E C for SO 2 emission rates for the three oil sands facilities (Syncrude, CNRL and Suncor) were made by substituting the calculated terms from Table B1 in Eq. (7). Figure 4 demonstrates the contribution of each of the terms in Eq. (7) to the total integrated emission rate E C by direct retrieval (DR) from the GEM-MACH model 4D data for the nine flight cases. As shown in Fig. 4 the main contribution came from two terms, the net integrated horizontal flux E C,H and the storage term E C,S , together contributing > 97% of the net 385 change in mass. For 7 out of 9 studied cases, E C,H accounted for > 80% of the total emissions (E C ). This is in agreement with another model based study by Panitz et al. (2002), where the contribution of advective fluxes (E C,H and E C,V ) were estimated as 85% and 95% with no estimation of the storage term. The combined contribution of the other terms (E C,V , E C,V D and E C,M ) were < 1% for all cases except for cases 6 and 8 (August 26th and 28th) were it was 1.2% and 2.9%, respectively.
For 2 out of 9 studied cases, the contribution of the storage term (E C,S ) was > 30%, the maximum estimated uncertainty for 390 cases analyzed in Gordon et al. (2015). As discussed earlier, these flights were not analyzed in past work -but they have been included here as good examples of cases in which time-variant meteorology and/or high upwind emissions result in significant over/underestimates due to storage-and-release events. We also show later in Sect. 3.3 how these estimates may be improved.
DR method estimates were compared to the model input emissions (MIE), as shown in Figs. 3 and 5 and Table 2. Estimates were within 1-23% of MIE for the nine studied cases. We note that in some cases (4, 6 and 9), negative E C,S values reduced 395 the net E C estimate. Rejected cases 5 and 8 were the two cases with the largest storage term (E C,S ) contributions of 43% and 156% (averaged over flight time) of the total estimated emissions (E C ), respectively. Once corrected for storage (i.e. DR method), estimates for these two cases were within 12% and 23% of MIE. Cases 3 and 4 had the lowest storage rates (2% and -3%) where storage-corrected estimates were within 1% and 2% of MIE, respectively.
For the majority of our studied cases, our results indicate that the storage term (if not accounted for) contributes the bulk of 400 emission rate over/underestimates for the variables investigated here, hence methods to predict and/or reduce its influence are desirable for aircraft retrieval algorithms. We note that the impact of storage on emission retrieval estimates will depend on the relative magnitude of the estimated emissions themselves in comparison to other sources of uncertainty. That is, an emissions estimate double that of previous estimates, which has a ±30% over or underestimate associated with storage, may still be concluded as being significantly higher than the previous estimates (i.e. the emissions change is greater than the uncertainty 405 associated with the retrieval itself). However, the storage term as defined here has been shown to be the main contributor to emissions over/underestimates, and will therefore be examined in more detail in the analysis which follows. We also note that the impact of E C,S is sometimes negative; a net release of mass from the box. Note that net releases of stored mass from the box would result in emissions overestimates; net storage of mass within the box would result in emissions underestimates.
Storage-and-release events occur when mass is accumulated within the volume of the box and released at a later time through

TERRA Retrieval (TR) approximations
By limiting the analysis to the extracted model data along box lateral walls (Fig. 2), TR estimates of model input emissions 425 (MIE) were made. Terms in Eq. (12) were solved numerically using model data, with E C,H calculated as described in Table   B1; numerical expressions for calculating the rest of the terms are described in Table B2. Model surface deposition rates for the base area of the box (A) were used in TR estimations, similar to the DR method, the main difference between these two methods is that, unlike Eq. (7), Eq. (12) does not account for the storage term. TR estimates of SO 2 emission rates were within 4-39% of MIE for 8 out of 9 studied cases, see Table 2. For cases 3 and 4, the two cases with the lowest net storage, results  24th (case 2) and September 2nd (case 6) included two consecutive box flights around the same facility, shown as shaded areas on their respective time-series plots. Note that the DR and TR* methods, unlike TR method, account for the storage rate (EC,S and E * C,S ).

455
Here we show that change in tracer mass within the flux box can also be observed on box lateral walls over time (using time-consecutive screens), as described in Sect. 2.2.3. Figure 6 demonstrates the correlation between the 4D (volume) and 3D (screens) estimates of the temporal change in species mixing ratio (∂χ C /∂t) for all the data points (2 min resolution) corresponding to the period of the nine studied cases. The side plots show the range (and distribution) of the estimated ∂χ C /∂t by 3D (horizontal axis) and 4D (vertical axis) calculations. ∂χ C /∂t range is higher for 4D estimates (2.52 ppbv/hr) than for 460 3D estimates (0.98 ppbv/hr), but the two variables are correlated. The correlation coefficient P r = 0.7 (with slope m = 2.98 and b = 0.09 intercept for the least-square fit line), indicates strong correlation between 4D and 3D estimates. The exception being rejected case 8 where the correlation is low due to relatively high upwind emissions during the time period of this case (on August 28th). Excluding case 8 from this analysis, increases the correlation between 3D and 4D estimates to P r = 0.9. As discussed earlier, the lack of information about the storage rate was the primary source of under/overestimates in TR 465 method estimates for the majority of our studied cases. Here we demonstrate the use of an estimate of the storage term (E * C,S , see Sect. 2.2.3) in a revised retrieval approach. We note that this term may be estimated from consecutive screens (Eq. 15) or tandem aircraft samples for ambient atmosphere applications. Data from time-consecutive screens (e.g. 2, 3) can be used to numerically solve Eqs. (14) and (15) (see Table B3 for discrete integral expressions). For GEM-MACH-driven retrievals as used here, these time-consecutive screens originate from consecutive model output times. Here, we show the effect of the 470 E * C,S estimate on the resulting E C estimates using GEM-MACH fields for top-down emission rate retrievals. Improved/revised TERRA Retrievals (TR*) can be made by plugging in E * C,M and E * C,S in Eq. (16). TR* method estimates are compared to the model input emissions (MIE) in Figs. 3 and 5 and Table 2. For 8 out of 9 cases, TR* estimates were within 2% to 34% of MIE.
TR* estimates showed improvement over the TR method for all the cases (including the rejected case 8) by 2-52% of MIE, see Table 2. TR* estimations rely on the availability of temporal data for estimating change in tracer concentrations (∂χ C /∂t) 475 around the box for each emission rate retrieval case, which in our analysis with simulated fields were readily available with the GEM-MACH model data at each time-step. Again, for aircraft applications, a repeat box flight procedure can be used, where an aircraft carries out a second box flight immediately after the first flight, or two aircraft (or UAVs) follow the same flight path in tandem, one positioned a fixed distance behind the other.

480
The main processes contributing to the change in SO 2 mass within the volume of the box are illustrated in Fig. 2a ). An SO 2 emission source continuously adds mass to the box at emission rate E C . Most of the mass is advected out of the box through the top and lateral walls (E C,V and E C,H ), and some is deposited to the ground surface (E C,V D ). All the while, mass is continuously stored within the volume of the box and may later be released at net storage rate E C,S (positive to negative). Under time-invariant conditions (e.g. wind field, atmospheric stability, emission rate), the rate 485 of generation of mass (E C ) is at net (mass) balance with the other mass-removing processes (E C,H , E C,V and E C,V D ) and the total accumulated mass within the box volume (B C ) remains relatively constant over time, and the storage rate negligible (E C,S 0). The mass-balance approach based on the divergence theorem (Eq. 12) can therefore accurately equate the source emission rate to the net integrated flux leaving the box through the enclosing surfaces. However, localized (spatial) variations in meteorological fields (e.g. atmospheric stability, wind) and chemical fields (e.g. temporal changes in source emission rate (E C ) 490 and/or significant incoming upwind emissions) during the retrieval time, can shift the mass balance towards E C,S = 0 (Eq. 7)i.e. storage-and-release events. The scale and duration of the storage-release events contribute to over-and under-predictions in estimated emissions based on the mass-balance approach. Using GEM-MACH model 4D data we carried out extensive analysis of the storage-release events during our studied cases and devised three quantitative predictors for the severity of these events.
In the following subsections, using results from DR, TR and TR* estimates, we discuss these parameters and their correlation 495 with over-and under-estimates in top-down mass balance emission rate retrievals due to storage-and-release events.

Gradient Richardson number (Ri) and changing atmospheric stability
Aircraft-based emission retrieval methods such as TERRA, utilize the mass-balance approach with the underlying assumption of mean steady-state conditions over the region of study (∼ 100 × 100 km 2 ) during the flight time (∼2-3 hours). However, localized and transient variations in atmospheric stability at the vicinity of the stack/source location can impact the pollutant 500 plume rise and transport within the box and result in storage-release events. Horizontal wind velocity usually increases logarithmically with height. The height the plumes reach in the atmosphere depend on the stack parameters (exit velocity, exit temperature, stack height) and on the local stability conditions (temperature and wind profile near the stack location). Thus plume height may change with varying stack parameters and stability conditions during the course of a retrieval. This in turn would move the mass to a different flow regime -e.g. higher horizontal velocities if the plume height has increased, and lower 505 horizontal velocities if the plume height has decreased. These changes in plume height, in response to changes in atmospheric stability, may thus lead to a change in the rate at which emitted mass is transported out of the box. Stability changes are thus a good predictor of the possibility for mass storage and release. A measure of the change in dynamic stability is therefore desirable as a predictor for conditions under which this aspect of the "static meteorology" condition, required for reliable top-down emission rate retrievals, is not met.

510
Atmospheric dynamic stability can be described in terms of the gradient Richardson number (Ri), which is a dimensionless ratio, related to static stability and the shear induced turbulence (AMS, 2020),  (Mailhot and Benoit, 1982), which is also used as a parameter (in addition to emission stack/source information such as temperature, exit velocity) to calculate plume rise height . Changes in Ri with respect to time 520 are thus an indicator of the potential for the plume to change height during an aircraft emission rates retrieval; Ri may be predicted using a weather forecast model as part of pre-flight planning.
For our analysis, 4D fields of Ri were extracted from the model output dataset for each of the nine studied cases. Figure 7 shows the case for the JOSM flight on 20 August (case 4) with vertical profile time-series at the location of the emitting stack, and compares Ri in (a), with the SO 2 mixing ratio in (b), the net horizontal flux E C,H in (c) and the storage rate E C,S in (d).

525
As shown in Fig. 7a, the region of unstable air (blue) was gradually increasing in height during the sampling time, resulting in an increase in the vertical extent of the mixing layer. Note that the color-scale is symmetric around the critical gradient Richardson number Ri C = 0.25 in Fig. 7a. The increase in the vertical extent of the mixing layer during the period of case 4, resulted in the SO 2 plume mixing into heights with higher wind speeds (Fig. 7b). The incremental plume rise resulted in a sequence of fast-release events, that is, the simulated plume was moved into a faster (higher level) horizontal flow, and SO 2 530 mass was carried to the boundaries of the box faster -the net rate of mass loss through the downwind walls thus increased during the sampling time period ( ∂ ∂t (E C,H ) > 0, Fig. 7c; ∂ ∂t (E C,S ) < 0, Fig. 7d). Note the anti-correlation between E C,S and E C,H in Fig. 7c, d. We also note that although faster moving flow can result in more mixing and the subsequent plume dilution, the decrease in compound mixing ratio is not necessarily enough to compensate for the increased exiting flux. If emissions retrievals are calculated under these conditions without accounting for the storage term (negative in this case), emissions estimates will be biased high (overestimation). These fast-release events temporarily disturb the required balance between the addition of mass to the box volume by source emissions (E C ) and removal of the mass through vertical and horizontal fluxes and surface deposition, causing a negative change in the total stored mass (B C ) within the volume of the box (E C,S < 0). These release events can be seen as troughs (peaks) in the E C,S (E C,H ) time-series plot (Fig. 7c, d). While the plume remained at a constant height, the balance was restored over time (E C,S → 0); this can be seen as peaks/plateaus in E C,S time-series in Fig.   540 7d. After these periods of changing stability, the plume started to rise again, and the balance was shifted, resulting in negative E C,S values. This process was repeated several times over the period of case 4, resulting in a sequence of release events. Note that the trough to peak duration (period) and scale (magnitude) of each event are inversely proportional to the wind speed at plume height, and they are reduced as the plume ascends into faster moving layers of air resulting in a damped oscillatory storage and release sequence. In this particular example, this periodic storage-release during the sampling time, with subsequent 545 equilibrium intervals, resulted in relatively small net storage of E C,S = −3% over the sampling time and TR estimates within 6% of MIE (slight overestimation) for case 4. When consecutive storage-release pairs are captured during the flight/sampling time as in case 4, the average error due to storage term (E C,S ) is reduced. However, if the storage or release is of longer duration, so that an unpaired event occurs during observations, the net error increases, as was the case for flight case 9 where an unpaired release event (one storage-release pair and one additional release event) resulted in net storage of E C,S = −27% 550 and consequent TR overestimates. Monotonic storage or release events occur when mass is continuously stored (e.g. when a plume descends into slower moving air throughout a sampling interval; ∂ ∂t (E C,S ) > 0 and ∂ ∂t (E C,H ) < 0) or released (e.g. release of previously stored mass during a sampling interval; ∂ ∂t (E C,S ) < 0 and ∂ ∂t (E C,H ) > 0); such events result in TR under/overestimates as in case 2 where net storage of E C,S = 20% (i.e. storage event: E C,S > 0 and ∂ ∂t (E C,H ) < 0) resulted in TR underestimations.

555
The average value Ri = −0.14 at plume height, indicates unstable conditions (Ri < Ri C ) during the period of case 4the average value was calculated by spatially averaging Ri at the vertical layer containing the plume center (maximum concentration in the vertical) over the horizontal extent of the box at each time-step and then averaging over the sampling time.
Atmospheric stability was further decreased at an average rate of ∆ t Ri = −0.25 h −1 over the period of this case. Change in atmospheric stability (represented by ∆ t Ri) is thus related to plume vertical motion and mixing, which in turn contributes to 560 storage-release events and a corresponding over-or under-estimate in TR retrievals; ∆ t Ri is thus a useful forecast metric to use as an a priori indicator of the potential for changing atmospheric stability affecting emission rates retrieval accuracy, and may also be calculated from retrieved aircraft meteorological data. Table 2 lists (flight time) average Ri and ∆ t Ri values for each of the nine studied cases. Our analysis indicates a strong positive correlation between the absolute value |∆ t Ri| and the NRMS error for DR, TR and TR* methods with the Pearson correlation coefficients (P r ) of 0.6, 0.7 and 0.8, respectively. Figure 8   565 shows the correlation between |∆ t Ri| and NRMS error for each method. Note that the rejected case 8 with ∆ t Ri = −0.34 h −1 (see Table 2) was excluded from correlation analysis as an outlier, due to the extreme impact of high upwind emissions (and not meteorological variability) on emission rate estimates (further discussed in Sect. 4.3). ∆ t Ri can be calculated from aircraft-measured data along the lateral walls of the box and the measured data from additional profiling spiral flights upwind and downwind the source, such as those conducted during JOSM 2013 campaign. Analysis of ∆ t Ri can provide insights 570 regarding the change in atmospheric stability and the probability of plume vertical motion and the resulting storage-release events during flight time; the uncertainty levels associated with the storage rate (E C,S ) in the retrieved emissions can therefore be quantified.

Plume vertical motion and variations in the direction of the transport
The direction of the wind experienced by the plume may change when the plume remains at one altitude, or may change as 575 the plume rises or falls in the atmosphere (e.g. via the well-known "Ekman spiral" of wind direction changes with increasing height, AMS 2020). Changes in wind direction for a fixed plume height would result in the rejection (for top-down retrievals) of the flight -here we note that changes in wind direction due to plume vertical motion may also be a potential reason for flight rejection. Using GEM-MACH predicted wind fields for the nine studied cases (chosen a priori for relatively constant wind speeds at each height), we here investigated the extent of variations in wind direction at the time-varying vertical level 580 containing the plume center, where σ α is the standard deviation in wind direction α wind time-series (spatially averaged over the horizontal extent of the box including lateral walls at each time step) and ∆t SP is the Sampling Period time duration. In aircraft-based retrievals (e.g. TERRA), ∆ t α can be estimated from aircraft measurements around the box at plume center-point height. Our analysis of the 585 wind fields for the nine studied cases indicates a positive correlation (P r = 0.6) between the rate of variation in wind direction (∆ t α) and the rate of change in atmospheric stability (∆ t Ri). Variations in the direction of the flow at plume height (∆ t α) can result in the re-circulation of the plume, accumulation of the emitted mass within the volume of the box and its release at a later time (storage-release event). Such an event occurred during our GEM-MACH model simulations for the period of the JOSM flight on 26 August around CNRL (case 5). Figure 9 shows horizontal cross-sections of the SO 2 mixing ratio data at example of how wind direction changes may interact with changes in stability: while the wind direction is relatively static at each height, changes in stability may cause the plume to rise to different heights, where the wind direction may change or even reverse relative to the level below. The changes in turn will alter (slow) the rate of mass release from the walls of the box ( ∂ ∂t (E C,H ) < 0); a storage event, with ∂ ∂t (E C,S ) > 0. For this particular case (5), between 20:20UT and 20:50UT the plume center was at H 1 and was advected in a weak south-easterly wind; between 20:50UT and 21:00UT, the plume mixed upwards 600 and rose to H 2 where wind direction was north north-westerly and remained there until 21:30UT (which afterwards descended back to H 1 ). This change in wind direction at plume height (∆ t α) as a consequence of plume vertical motion due to change in atmospheric stability (∆ t Ri = 0), resulted in a net storage of E C,S = 43% during case 5 and consequently an underestimation in emission rate using the TR method.
Cases 6 and 9 (both release events) along with rejected case 5 (storage event) were the three cases most impacted by 605 variations in wind direction at plume-center height. Flight time average ∆ t α values for all nine cases are given in Table 2. Our results indicate a positive correlation between ∆ t α and NRMS error in DR, TR and TR* estimates, similar to ∆ t Ri; Figure 10 demonstrates this correlation for the three methods, with P r scores 0.4, 0.7 and 0.6, respectively. As mentioned earlier, there is also a positive correlation between ∆ t α and ∆ t Ri (P r = 0.6); the combined effect of change in the direction of the transport and changing stability increases the uncertainty in the retrieved emission rates. In aircraft-based emission rate retrievals, wind 610 shear strength in terms of ∆ t α can be included in uncertainty analysis associated with plume vertical motion due to change in atmospheric stability (∆ t Ri). Aircraft-measured data along the box walls combined with measurements during additional spiral flights can be used to evaluate these metrics.

Weighted upwind to downwind concentration ratio (φ)
In applying the mass-balance technique, the mass flux associated with upwind emissions (and regional background levels) 615 entering the box volume (inflow) is subtracted from the downwind outgoing flux containing the emissions within the box (outflow) to estimate the net emission rate from sources within the box volume. The mass inflow associated with regional background levels (e.g. for CH 4 and CO with relatively high background concentrations) is balanced out with the mass flux out of the box due to spatial (horizontal) homogeneity of background concentrations. However, emissions from upwind sources carried through the sampling region (box volume) are spatially heterogeneous (plumes) and therefore their inflow and outflow through a control volume is not necessarily equal and canceling. For instance, converging horizontal wind fields and/or plume vertical motion and mixing can result in temporary accumulation of the mass of these upwind plumes within the box (control volume). If the emissions from upwind sources are large relative to the within-box source(s), the emission rates retrieval may therefore become less accurate. That is to say, all the processes (including the ones described above in terms of ∆ t Ri and ∆ t α) that contribute to storage-release of the mass emitted by the source(s) within the flux box, may also affect the mass emitted 625 from upwind sources which are entering the box through the upwind wall(s) -leading to further increased uncertainty in the retrieved emission rates for sources within the box. The box upwind-to-downwind concentration ratio is in strong correlation with the relative magnitude of upwind emissions entering the box to the emissions from a given source within the box, with P r ≥ 0.8 as determined for our nine studied cases. To distinguish upwind emissions from regional background levels, this ratio can be weighted by temporal standard deviations in upwind and downwind concentrations. Hence we define, where χ C,u is the screen average upwind concentration and χ C,p is the plume average concentration; here the plume is defined as downwind concentrations that are one standard deviation above the mean. σ χ C ,u and σ χ C ,d are upwind and downwind concentration standard deviations. Note that while the first ratio in Eq. (19) represents the magnitude of upwind to source emissions, the second ratio distinguishes upwind plumes from background concentrations: in the absence of upwind plumes, σ χ C ,u 635 approaches zero due to the uniform nature of background concentrations. Parameter φ can therefore provide a quantitative indication of the level of confidence in the retrieved emissions based on the mass-balance approach, due to this factor. Table 2 lists φ values in % for the nine studied cases, determined from our GEM-MACH simulated fields.
For an isolated source with relatively low upwind emissions (which in turn depends on the mean wind direction and the location of the nearby sources) φ would be small and the application of the mass-balance technique would be expected to 640 yield more accurate results. This was true for cases 3 and 4 with φ < 0.2% (see Fig. 2d for case 4). JOSM 2013 cases 3 and 4 were conducted over CNRL and Syncrude oil sands facilities on 20 August and 3 September 2013. For the period of these two flight cases, wind directions were also consistently from the south-west, a condition that was accurately simulated in our GEM-MACH model runs for the same period. CNRL and Syncrude are the two facilities located on the western side of the Athabasca oil sands complex, with no facilities to their west and therefore no upwind sources of emissions when wind direction 645 is from the west (Fig. 1b). These conditions resulted in low E C,S rates of 2% and -3% for cases 3 and 4. For these two cases, TR estimates were within 4-6% of the model input emissions (MIE). Our TR estimates of SO 2 emissions for CNRL were in strong agreement with TERRA SO 2 emission rate estimates by Gordon et al. (2015) using aircraft-measured data during JOSM 2013 campaign, where estimates were 4.8% higher than minute-by-minute emissions reported by CNRL. Our analysis indicates that about 3% of this overestimation could be due to release of the stored mass during the flight on 20 August (case 650 4).
Small φ values represent favorable conditions for applying the mass-balance technique for emission rate retrievals. For 7 out of 9 studied cases, φ was less than 10% with NRMS errors between 4-28% for TR estimates. The opposite was true for rejected cases 5 and 8 with φ scores 11% and 13%, with NRMS errors 39% and 166% for TR estimates. This under performance was a combined result of upwind emissions which were higher than the source emissions, consistent converging wind fields 655 (slower wind speeds within the flux box relative to the upwind wall throughout the flight time), and plume vertical motion due to change in atmospheric stability (∆ t Ri = 0). For rejected case 5, there was an additional contribution from variations in the direction of plume transport (∆ t α = 46°, sampling time average). Rejected case 8 was conducted around the Suncor facility on 28 August 2013, during which the mean wind direction was from the west; as a result large upwind SO 2 plumes originating from the Syncrude facility, located on the west side of Suncor (see Fig. 1b), continuously entered the flux box during the period 660 of this case. SO 2 mass was accumulated within the volume of the box resulting in a flight/sampling time average storage rate of 156% of the total retrieved emission rates (see Fig. 4). Note that a large upwind source was a condition previously identified for rejecting this flight from emissions calculations (which is why emissions from this flight were not estimated in previous work). The φ predictor introduced here allows this effect to be quantified (from model forecasts used to plan aircraft flights, and the analysis of aircraft measured pollutant data time-series). Figure 11 shows (a) air flux, (b) SO 2 mixing ratio (χ SO2 ) and 665 (c) SO 2 mass flux screens around Suncor at the start of rejected case 8; note the strong negative (inwards) flux on the west wall (originating from upwind emissions from the Syncrude facility) and the weaker positive (outwards) flux on the east wall (Fig.   11). By accounting for the storage rate (E * C,S ), using model 3D data from time-consecutive screens, NRMS error was reduced by 52% of MIE in TR* estimates for this case. Direct retrieval (DR) of the Suncor emission rates for the period of rejected case 8, where storage was estimated directly from the model 4D data (volumetric time-series), was within 23% of MIE. φ = 13% Figure 12. Correlations between the normalized root-mean-square (NRMS) error and the forecast parameter φ, for DR, TR and TR* estimates. The shaded areas indicate the 90% confidence region. Case ID is indicated for each data point. Pearson correlation coefficients (Pr) for the three methods are also noted on the graph. Results indicate a strong correlation between NRMS error and the φ parameter.
for rejected case 8 implies unsuitable conditions (> 30% uncertainty) for the application of the mass-balance technique, as the TR method failed to retrieve facility emissions (E C < 0); the φ parameter allows a quantification of the conditions for rejection of flight data for emission rates retrieval purposes.
Our results indicate a strong positive correlation between φ and the NRMS error (strongest among the three forecast parameters), independent of the method. Figure 12 shows this correlation for the three retrieval methods, note that rejected case 8 was 675 excluded from correlation analysis as an outlier (extreme storage event) since its inclusion would dominate the correlation and obscure the information from the other data points. Pearson correlation coefficients (P r ) for DR, TR and TR* were calculated as 0.6, 0.9 and 0.7, respectively. That is, a significant proportion of the variability in the emission retrieval NRMS error can be accounted for by each of the metrics proposed here. In aircraft-based retrievals, analysis of φ can provide insights on the expected reliability of the retrieved emissions and the probability of storage-release events. Pre-flight model forecasts of φ can 680 be used for the same purpose. For φ scores greater than 10%, the uncertainty in the retrieved emission rates due to storagerelease is over 30%. For such conditions, it is advised that the storage rate (E C,S ) be estimated from time-consecutive screens of data. This can be achieved by conducting multiple box flights in order to estimate E * C,S for improved top-down emission rate retrievals (TR*, see Sect. 2.2.3 and Sect. 3.3).

685
The three forecast parameters ∆ t Ri, ∆ t α and φ as discussed above provide a new approach for analyzing the uncertainty levels associated with potential storage-release events during aircraft-based emission rate retrieval flight time, either using a priori meteorological forecasts to provide guidance for emission retrieval flight decision-making, or in post-processing of already completed flights. Table 1 summarizes the correlations between these parameters and the NRMS error in estimates by TR method in terms of P r correlation coefficients and the slopes (m) of the best fit lines (least squares) in Figures 8, 10 and 12.

690
The last row in Table 1, demonstrates the correlation between the changes in the two meteorological forecast parameters ∆ t Ri and ∆ t α and change in NRMS error levels; accordingly, net changes of 0.57 h −1 in |∆ t Ri| and 60°h −1 in ∆ t α during the flight/sampling time may result in 30% (or more when both contribute) uncertainty in the estimates, the maximum estimated uncertainty (30%) for TERRA SO 2 emission rate retrievals in Gordon et al. (2015). The combined effects of |∆ t Ri| = 0.62 h −1 and ∆ t α = 46°h −1 during the ∼2 h flight time of case 5, contributed to a net error of 39% in TR estimates for this case.

695
Further, for any forecasted/estimated rate of change in meteorology in terms of ∆ t Ri and ∆ t α, shorter sampling durations (e.g. two flights in tandem) may result in more confidence in the estimated emission rates; small values of ∆ t Ri and ∆ t α correspond to steady state conditions, whereas higher values correspond to increased uncertainty in the estimates due to storage-release events. The NRMS errors in TR estimates for cases with φ < 10% (7 out of 9), were less than 30%. φ > 10% correspond to over 30% uncertainty in the estimates, as for rejected cases 5 and 8. Note that cases 5 and 8 were excluded (rejected) from 700 TERRA retrievals in previous work due to unfavorable conditions for the application of TERRA. Here, using GEM-MACH predicted fields, we were able to quantify the impact of changing meteorology and/or high upwind emissions in mass-balance emission rate retrieval for these two cases in terms of the three forecast parameters ∆ t Ri, ∆ t α and φ as described above. Table 2     We considered the simulations of nine JOSM 2013 emission estimation flight cases, over three oil sands facilities with high SO 2 emissions (CNRL, Syncrude and Suncor), and used GEM-MACH model 4D output data as a surrogate for aircraft measurements in emission rate estimations to evaluate the application of the mass-balance approach in top-down methodologies.

715
Although the emphasis was on SO 2 emissions, the explored concepts and the three forecast parameters developed here are also applicable to species with high background levels (e.g. CH 4 , CO).

725
Results were compared to the model input emissions (MIE) and the performance of the three methods was analyzed in terms of Normalized (to MIE) Root-Mean-Square (NRMS) error. DR method estimates were within 1-23% of MIE for all 9 studied cases. TR and TR* estimates for 7 out of 9 cases were within 4-28% and 2-17% of MIE, respectively. This range of emissions retrieval error was similar to the range (±30%) estimated from aircraft observations in previous work (Gordon et al, 2015), with the remaining 2 flights (rejected cases 5 and 8) excluded in previous work due to unsuitable meteorological Storage-release events occur when temporal and spatial variations in meteorological conditions and/or source emission rate result in a temporary imbalance between the addition of mass to the volume of the box through source emissions and removal of mass by vertical and horizontal fluxes through box top and lateral walls and the deposition to ground surface. The transient storage of the emitted mass within box volume and its later release contribute to emission rate over-and under-estimations 745 based on the mass-balance technique. We introduced a correction term, E C,S , in the mass-balance equation to represent the rate of storage-release. Utilizing GEM-MACH model output data, we investigated the contribution of this term to the total integrated retrieved emission rates. The storage rate term (E C,S ) contributed to over 20% of the total retrieved emissions for 5 out of 9 cases, signifying the impact of storage-release events in emission rate retrievals based on the mass-balance approach.
The corresponding uncertainties in estimated emission rates were quantified for each of the nine studied cases, and conditions 750 giving rise to storage-release events were extensively explored.
We have introduced an approach for estimating the storage term as the TR* method, based on estimations of the rate of change in the stored mass within the box volume from data on time-consecutive 2D screens around the same emission source (3D data-set). We have shown that by estimating the storage term (E * C,S ) in this way, estimates can be improved by 2-52% of MIE. Three forecast parameters were introduced for predicting/identifying the probability of storage-release 755 events: variations in atmospheric dynamic stability represented by change in gradient Richardson number (∆ t Ri), shift in the direction of transport represented by variations in wind direction at plume height (∆ t α), and the weighted upwind to downwind concentration ratio (φ). We have demonstrated the strong correlation between these three parameters and the NRMS error in mass-balance estimates and discussed the potential of utilizing the three forecast parameters in analysis of the uncertainty levels in aircraft-based top-down emission rate retrieval methodologies. Model-based forecasts of these parameters can also 760 provide practical advice for the planning of future airborne measurement campaigns.
The investigation carried out here has resulted in the creation of quantitative measures for the extent to which storage-and release-events may impact on aircraft emissions retrieval accuracy. Retrieval uncertainties, using regional air-quality model output, were shown to be of similar magnitude to previously published values, with the exception of cases where the underlying assumption of time-invariant meteorological conditions was not valid and/or where significant upwind emissions impacted the 765 estimates. We have also devised a methodology to reduce the impact of this form of emissions retrieval error by estimating the storage rate (E * C,S ) through the use of repeat flights, around the same facility, either with a single or multiple aircraft(s).
Appendix A Table A1. Gas-phase chemistry Acid Deposition and Oxidant Mechanism, version 2 (ADOM-II) represents gas -phase chemistry for 42 gas species, integrated using a Young and Boris solver. Stockwell and Lurmann (1989) Particle microphysics Sectional approach -8 particle species (sulphate, nitrate, ammonium, primary organic carbon, secondary organic carbon, black carbon, sea salt, crustal material), and 12 particle bins Gong et al. (2002Gong et al. ( , 2003 Aqueous chemistry and gas and aerosols scavenging Cloud scavenging of gases and aerosols, aqueous phase chemistry using a Young and Boris solver (combined time-resolved and steady-state chemistry). Gong et al. (2015) Deposition Gas (Robichaud scheme) and particle dry deposition (Zhang scheme) as Large stack data derived from Continuous Emissions Monitoring (CEMS). Coats (1996); Zhang et al. (2018) The net horizontal flux exiting through lateral walls of the box (E C,H ) is calculated by extracting compound mixing ratio (χ C ), air density (ρ air ) and the wind normal to the lateral walls of the box from the model output, and integrating over the surface (2D screen) comprised of box lateral walls at each time-step, where χ C,top (t, x, y) and W top (t, x, y) are compound mixing ratio and the vertical wind component (positive upwards) at the box top. In the observational application of the retrieval methodology, the top of the box corresponds to an aircraft measurement altitude which is above the plume.
The horizontal air flux through the screen (positive outwards) is calculated from extracted air density and wind fields along 785 the screen at time t as, E air,H (t) = ρ air (t, s, z) U ⊥ dsdz (B4) The rate of change in the air mass (accumulation) within the box due to changing air density (compressible fluid) can be estimated as, The vertical air flux leaving the box (positive upwards) is calculated similar to Eq. (B3), E air,V (t) = ρ air,top (t, x, y) W top (t, x, y) dxdy (B6)

B2 Discrete Integral Expressions
The numerical integration expressions for calculating the terms in Eqs. (7), (12) and (16), using model data, are described in Tables B1, B2 and B3. Throughout, the 2nd order central finite difference scheme (Jacobson, 2005) was used to numerically solve the time derivatives (∆/∆t) with the residual error of order O(∆t 2 ), where ∆t is equal to the 2 min (1/30 h) model time-step. Surface deposition rates (E t C,V D ) were extracted directly from the model output for the area covered by the base of the flux box for each case, to eliminate deposition estimates as a potential source of error in retrievals. Note that in Table B2, the species vertical mass flux (E t C,V ) is calculated using the estimated vertical air mass flux (E t air,V ) from the mass-balance equation for air (Eq. 8) and the box top average mixing ratio (χ t C,top ).    (14) and (15)  Competing interests. The authors have the following competing interests: John Liggio is an editor for ACP.