A New Inverse Modeling Approach for Emission Sources based on the DDM-3D and 3DVAR techniques: an application to air quality forecasts in the Beijing– Tianjin–Hebei Region

. We develop a new inversion method which is suitable for linear and nonlinear emission sources (ES) modeling, based on the three-dimensional decoupled direct (DDM-3D) sensitivity analysis module in the Community Multiscale Air Quality (CMAQ) model and the three-dimensional variational (3DVAR) data assimilation technique. We established the explicit observation operator matrix between the ES and receptor concentrations, and the background error covariance (BEC) matrix 20 of the ES which can reflect the impacts of uncertainties of the ES on assimilation. Then we constructed the inversion model of the ES by combining the sensitivity analysis with 3DVAR techniques. We performed the simulation experiment using the inversion model for a heavy haze case study in the Beijing-Tianjin-Hebei (BTH) region during December 27-30, 2016. Results show that the spatial distribution of sensitivities of SO 2 and NO X ES to their concentrations, as well as the BEC matrix of ES, 25 are reasonable. Using a posteriori inversed ES, underestimations of SO 2 and NO 2 during the heavy haze period are remarkably improved, especially for NO 2 . Spatial distributions using a posteriori inversed ES are consistent with in-situ observations at 45 stations over the BTH 30 region, and simulation errors decrease significantly. These results are of great significance for: studies on the formation mechanism of heavy haze; reducing uncertainties of ES and its dynamic updating; providing accurate ‘virtual’ emission inventories for air-quality forecasts and decision-making services for optimization control of air pollution. and as


Introduction
Since the implementation of the Air Pollution Prevention and Control Action Plan in September 2013, urban air quality in China has improved overall. However, heavy haze frequently occurs over Beijing-Tianjin-Hebei (BTH) and the surrounding region in winter. In recent years, many researchers have studied the formation mechanism of heavy haze in the BTH region (Huang et al., 2014;Cheng et 40 al., 2016;Liu et al., 2016). These studies have shown that rapid conversion from primary gas pollutants to particulates is an internal triggering factor for the "explosive" and "persistent" heavy haze (Wang et al., 2014), and secondary particulate concentrations, such as sulfate and nitrate, account for a significant percentage of PM2.5. Thus, effectively controlling the emissions of precursors of secondary aerosols (such as SO2 and NOx) is important for reducing environmental, economic, and human health 45 problems caused by PM2.5 concentrations (Huang et al., 2014).
Emission inventories provide important fundamental data for investigating the causes of air pollution, and atmospheric chemical transport model (ACTM). Uncertainties in ES are a major factor in determining the simulated and forecast accuracy of the ACTM, and these uncertainties can greatly affect the design of ES control strategies (Tang, 2006). The methods for establishing an emission 50 inventory include the bottom-up approach based on human activities, energy consumption statistics, and various emission factors, as well as top-down inversion modeling of ES based on monitoring data of air pollutants using satellite remote sensing and ground observations. Many studies have established various ES inventories in China using the bottom-up approach, e.g., Bai et al., 1996;Streets et al., 2003;Zhang et al., 2009Zhang et al., , 2012Cao et al., 2011;Zhao et al., 2012;Zhao et al., 2015; 55 However, the ES estimated by this method differ greatly due to large uncertainties in the statistical data, emission factors, and spatiotemporal apportionment coefficients (Ma et al., 2004). Moreover, real-time updates of emission inventories are difficult to achieve because of its rapid spatiotemporal variations 3 due to high-speed urbanization, and a delay in the release of statistical data of approximately 1-2 years.
The top-down approach is a useful supplement to bottom-up estimates, which are subject to 60 uncertainties in emissions factors and activities (Streets et al., 2003). Inverse modeling, in which emissions are optimized to reduce the differences between simulated and observed data, is a powerful method that eliminates the problems of the bottom-up approach.
Results show that using an inversion modeling approach to retrieve the spatial distribution of ES can greatly improve air quality simulations and forecasts by the ACTM. Many studies have achieved a certain amount of improvement using the EnKF and 4DVAR methods. The advantage of the EnKF 80 method is that the observation operator is implicit in the assimilation process of ES, and it avoids developing the tangent linear and adjoint models. For example, the fully coupled "online"Weather Research and Forecasting model coupled with Chemistry (WRF-Chem) are used as the forward model to relate the SO2 emissions to the simulated concentration, and efficiently update the emissions based on the routine surface SO2 observations (Dai et al.,2021).However, this method has stricter requirements 85 for error perturbations in ES and the construction of bias-correction models. In addition, the large 4 number of ensemble members in the EnKF method leads to a huge computational cost. Some studies adopted the 4DVAR method to inverse the ES of NOx and CO based on the Goddard Earth Observing System (GEOS)-Chem adjoint model. However, the GEOS-Chem model is often used to simulate large-scale physical and chemical processes and rarely utilized in urban air quality forecasts because its 90 spatial resolution is too coarse. This method also has high computational costs due to the gradient calculation of the objective function.
The 3DVAR method is a generalization of optimal interpolation methods. It has the advantages of conveniently adding dynamical constraints and directly assimilating unconventional observation data (Li et al., 2013). 3DVAR is widely-used in the assimilation of meteorological and atmospheric chemical 95 data due to simplicity, ability to use complex observations operators, and low computational cost.
However, this method has two requirements: assimilated variables must remain relatively stationary within the assimilation window, and the method must be coordinated between the assimilated initial field and the iterative integration of the model. To apply the 3DVAR method to inverse ES, it is necessary to construct an inversion model that can satisfy the aforementioned requirements. Firstly, 100 although ES have monthly, seasonal, and annual variations, the variation of ES is constant within a short period (e.g., for an assimilation window of 1 hour). Secondly, the assimilation effect of ES depends on the quality of observation data and the consistency between observed and simulated values.
To ensure consistency between observations and simulations, the sensitivity of the receptor's concentrations with respect to the ES should be accurately calculated (Hu et al., 2009). Using the 105 three-dimensional decoupled direct (DDM-3D) sensitivity analysis method within the CMAQ model, reasonable sensitivity coefficients between ES and the receptor's concentrations can be calculated. This coefficient matrix is then used in the 3DVAR assimilation process, which ensures consistency between the ES and the modelled results. Thus, the top-down 3DVAR constraint methods for ES based on the first-or high-order DDM-3D sensitivity analysis techniques can maintain the coordination between 110 assimilated field of ES and simulated concentration of air pollutants.
The primary methods used to calculate the S-R relationship include the brute force, the adjoint, and the DDM-3D method. Many studies have shown that these methods can improve ES inventories constructed by bottom-up methods for NOx (Napelenok et al., 2008), CO (Bergamaschi et al., 2000 and5 Heald et al., 2004), NH3 (Gilliland et al., 2003), and EC (Hu et al., 2009). The adjoint method is a 115 backward-sensitivity calculation method, while brute force and DDM-3D are forward-sensitivity calculation methods. For inverse modeling of pollution sources with single receptor, the backward-sensitivity method is more suitable, with low computation costs for certain grid sizes in a given time period, but it is not suitable for ES with multiple receptors, which result in high computational costs (Hu et al., 2009;Wang et al., 2013). The forward-sensitivity calculation method is 120 more suitable for inversing the ES based on observed data from satellites or multiple surface stations (Hu et al., 2009). Cohan et al. (2002) introduced the DDM-3D method to the CMAQ model, and created the CMAQ-DDM-3D module for low-order sensitivity calculations in early 2010. In 2014, they added a high-order calculation module for particles (High-Order DDM-3D for Particular Matter; HDDM3D/PM) in the newly released version of CMAQ model. Wang et al. (2013) claim that the sensitivity calculation 125 results using the DDM-3D method are more reasonable than the brute force method. Some studies have used the DDM-3D method (Napelenok et al., 2008;Hu et al., 2009)

or a combination of the DDM-3D
and a discrete Kalman-filter method (Wang et al., 2013) in conjunction with measurements from satellite and ground observations to inverse BC and NOx ES in the United States. Because inverse modeling of ES based on discrete Kalman-filtering is more suitable for linear systems, we use the 130 DDM-3D method to calculate the S-R linear and non-linear relationship.
The results of inverse modeling are very sensitive to uncertainties in the ES of NOx, NH4 and inorganic aerosols . Impact of uncertainties in the ES on the assimilation effects need to be considered in the top-down inversion model. The top-down 3DVAR inversion methods developed in this study can include the impacts of ES uncertainties by the BEC matrix of ES based on 135 multiple sets of ES. We developed a new inverse modeling approach for ES that combines the DDM-3D sensitivity analysis method with the 3DVAR assimilation technique, and then applied it to a case study during a typical heavy haze episode. This paper is organized as follows: Section 2 describes the inversion model and presents results of sensitivity analysis and the BEC; Section 3 provides details of the WRF-CMAQ model, and configurations and experiments of simulation; Section 4 presents the 140 results of the control and experiment simulations with a priori and a constrained posteriori ES, respectively; finally, the discussions and conclusions are provided in Section 5. 6

Model and data
We used an offline modeling system that includes two components: the Weather Research and 145 Forecasting (WRF) model (Michalakes et al., 2004) and the CMAQ model (Dennis et al., 1996;hereafter referred to as WRF-CMAQ). This study focuses on the BTH region with 5 x 5-km grid spacing, 32 vertical layers of varying thickness (between the surface and 50 hPa), and an output interval of 1 h. The

Constructing BEC matrix
To construct the BEC matrix for the inversion model, we combined the National Meteorological Center's (NMC) technique (Parrish and Derber, 1992) with the SMOKE model based on uncertainty analysis of 7 the ES inventories. We created the BEC matrix by four steps, as follows: (1)Determine the total errors of ES from a priori bottom-up inventory.
Uncertainty analyses of ES require detailed information of activities and emission factors from a priori MEIC emission inventory. The relevant data collected in the China Environmental Yearbook are limited, and do not satisfy with the requirements of uncertainty analysis. Therefore, we used the available 175 research results relating to SO2 and NOx ES, and conducted uncertainty analysis for four types of major sources (industry, power plants, residents, and transportation) based on activities and emission factors from the references Zheng et al.,2018;Peng et al.,2019) using the AuvTool Software (Frey et al, 2002), and determining the error ranges in total emission rates of SO2 and NOx (Table 1). Uncertainties in SO2 industry and power plant ES are slightly greater than those for NOx, while 180 the opposite is true for emissions from the residential and traffic sectors.
(2)Generate multiple sets of inventories using the random perturbation technique.
Based on the aforementioned error ranges in total emission rates, we generated 30 sets of inventories for SO2 and NOx with the same resolution as MEIC for each month using a random perturbation method (Kerry et al., 2007). Firstly we obtain the probability distribution of errors of ES based on uncertainty 185 analysis for four sections, respectively. Then we conduct thirty times of random perturbation on uncertainties of four sections of ES according to the probability distributions using the same perturbation coefficients for every perturbation. Lastly we calculate thirty total emission rates using random uncertainties of four sections for every sets of inventories, respectively.
(3) Process the 3-D gridded ES as input to the CMAQ model.

190
We used the SMOKE model, national population and road network distribution data in 2016, the temporal apportionment coefficients in the BTH region (Zhang et al., 2007, Simpson et al., 2003, and Wang et al., 2010, and the CB05-ae06-aq chemical species data in the CMAQ model, to process thirty sets of nationwide emission inventories into 3-D gridded ES with a grid spacing of 5×5 km. Each grid has 124×130 points, with 12 vertical levels.
Finally, the NMC method was used to calculate the BEC matrix of the 3-D gridded ES for each month, including horizontal and vertical correlation coefficients and standard deviations. The background error 8 is defined as the difference between thirty sets of 3-D gridded ES generated by the random perturbation method, and the 3-D gridded background ES directly processed from the original MEIC emission 200 inventory with the SMOKE model, at every hour (24 hours diurnal variation of ES for every month).
According to the literature Li et al., 2013;Zang et al., 2016), the approximate calculation of the BEC matrix is as follows:

205
where is the perturbation field and is the background field of a priori ES. Eq. (1) can be written as follows: where D is the standard deviation (SD) matrix and C is the correlation coefficient matrix. With this factorization, D and C can be calculated separately. D is a diagonal matrix whose elements are SD of 210 all state variables in the 3-D grids. C is used to improve the ability of the 3DVAR in representing the impacts of local emissions at one grid on other grids; these impacts vary in the vertical direction, and they are heterogeneous in the horizontal direction.
where Sj is the sensitivity of the pollutant j to the parameter pj, pj is a priori ES of the pollutant j, C is the concentration of the pollutant j, and is the perturbation coefficient of the ES. Theoretically, the DDM-3D method truly captures the sensitivities of pollutant concentrations to ES, and results are more accurate than the brute force method, for the BTH region (Wang et al, 2013

Observation operators
The relationship between pollutant source and the receptor's concentration is established according to Eq. (3). Next, we create the observation operator matrix between ES and receptor concentrations as 10 follows:

255
where H is the observation operator matrix, E are a posteriori ES, which can be written as the product of the perturbation coefficient and a priori ES during the assimilation window time, and E0 are a priori ES. For primary pollutants such as SO2 and NO2, Sj is a first-order sensitivity coefficient, and H is a linear observation operator between the ES and the receptor concentration. For secondary pollutants such as PM2.5 and O3, Sj is a high-order sensitivity coefficient, and H is a nonlinear observation operator.

260
In this study, we use the first-order sensitivity coefficient to calculate H for SO2 and NOx ES.

Observational error covariance
We firstly performed quality control on the observed SO2 and NO2 concentration data. This process involved three steps: (1) Redundant data removal, and matching the density of observation data to the model grid. For 265 some grids with more than one observation station, we used the average of those stations.
(2) Extrema control, i.e., filtering out data exceeding three times of SD of observation data.
(3) Anomaly removal, i.e., data that remained constant for 24 consecutive hours, as well as any negative data, were removed.

270
Data that passed quality control still contained observation or instrument errors. These errors are related to many factors such as instrument type, calibration design, and environmental conditions. In addition, in the variational assimilation process, representation errors caused by the forward-calculation and variational processes must be considered. Higher-resolution models produce smaller representation errors. Representation error, , can be expressed as follows (Pagowski et al., 2010): where is the amplification factor, which is used to adjust the instrument error, is related to the SO2 and NO2 concentrations, and Δ is grid distance of the model. Note that Ls is usually smaller in urban areas and larger in suburban areas. The amplification parameters of the observing stations in cities, suburbs, and rural areas are 2.5 km, 5 km, and 10 km, respectively (Zang et al., 2016). Finally, the 280 total observation error for the SO2 and NO2 concentrations, , is written as:

3DVAR inversion model
We introduce a cost function with respect to the ES in accordance with 3DVAR: 285 where c is the observation variable, R is the observation error matrix, and e is the inversing variable of an a posteriori ES. The optimal inversion of ES for SO2 and NOx are obtained using Eq. (7). The 3DVAR solves for the minimum value of J(e) to determine the inversing variable e. This process typically employs a gradient propagation method, with the increment of an ES defined as follows: 290 Accordingly, the innovation vector of pollutant concentration is defined as: Therefore, Eq. (7) can be written in gradient form: After conditionally processing the cost function, a finite-memory quasi-Newton method was 295 used to conduct iterative minimization. The background field was set as the initial iteration values. The maximum number of steps at the end of the iteration and the minimum gradient for convergence were predetermined. The iteration was finished when one of these conditions was met, and the optimal analysis increment, , was obtained. Finally, the optimal assimilation analysis field of the ES, = + , was obtained. The result was a three-dimensional variational inversion model of the ES, using 300 the uncertainty analysie of the ES and sensitivity coefficients between the ES and receptor's concentrations; the overall framework is shown in Figure 5.

Results and discussion
A typical heavy haze event occurred in the BTH region at the end of December 2016. We applied the 305 3DVAR inversion model to constrain a hourly posteriori ES of SO2 and NO2 using measurements from 45 and 129 stations, respectively, on December 27, 2016. We validated simulations from the control and experiment run using observational data during December 28-30, 2016. concentrations with respect to ES are consistent, therefore the sensitivity coefficients can reasonably reflect the impacts of the ES on concentrations. However, simulated SO2 and NO2 concentrations with a underestimated priori ES are all significantly lower than observations during the heavy haze period.
Thus, it is important to improve a priori ES using the inversion model.

325
In general, SO2 and NO2 concentrations simulated using a posteriori ES are closer to observations than a priori ES, and regional differences in improvements for SO2 and NO2 exist. For SO2, the improvement is noticeable in the BTH region. However, the simulated concentrations in Beijing with a posteriori ES are overestimated. This may be related to greater uncertainties in SO2 sources and the impacts of regional transport from surrounding areas. For NO2, simulated differences with a priori and 330 a posteriori ES are significant in major cities such as Beijing, Tianjin, Shijiazhuang, Baoding, Xingtai, Handan, and Jinan. The simulated concentrations of NO2 using a posteriori ES are more consistent with measurements, while those with a priori ES are significantly underestimated.
We also investigated temporal variations in regionally-averaged SO2, NO2, and O3 concentrations simulated using a priori and a posteriori ES, and observations from the 45 stations over the BTH region 335 during December 28-30, 2016 ( Figure 11). In general, simulated SO2, NO2, and O3 concentrations using a posteriori ES are closer to measurements, while the SO2 and NO2 concentrations simulated by a priori ES are significantly lower than observations, and the modelled O3 concentrations are obviously 13 higher than measurements. In addition, the peak of SO2 simulations with a posteriori ES are close to measurements, but the peak of NO2 and valley of O3 simulations are lower and higher than observations, 340 respectively. This may be related to the absence of inverse modeling of volatile organic compound (VOC) ES and uncertainties of sensitivity coefficients calculation. In this study, we used only first-order sensitivity coefficients, but the relationship between ES of precursors of O3 such as VOCs and NOx, and their receptor's concentrations are nonlinear, and O3 is generated from both NOx and VOCs ES. Therefore, higher-order sensitivity coefficients are necessary for inverse modeling of ES of 345 NOx and VOCs.