The potential for geostationary remote sensing of NO2 to improve weather prediction

Observations of winds in the planetary boundary layer remain sparse making it challenging to simulate and predict atmospheric conditions that are most important for describing and predicting urban air quality. Short-lived chemicals are observed as plumes whose location is affected by boundary layer winds and whose lifetime is affected by boundary layer height and mixing. Here we investigate the application of data assimilation of NO2 columns as will be observed from geostationary orbit to improve predictions and retrospective analysis of wind fields in the boundary layer.


Introduction
Data assimilation methods are a fundamental tool for numerical weather prediction (NWP) with observations of temperature, pressure, winds, humidity, etc. used as constraints on initial conditions and time evolution of atmospheric energy and winds (e.g., Bauer et al., 2015). With the exception of water vapor and ozone, observations of atmospheric constituents are generally not used in current NWP, although the field is shifting focus to include these data (Xian et al., 2019). Importantly, the tools of data assimilation are increasingly the focus of a variety of off-line chemical transport models (CTMs) that aim to improve regional air quality forecasts and to enhance the understanding of emissions of gases and aerosol into the atmosphere (e.g. Lahoz et al., 2007;Bocquet et al., 2015;Miyazaki et al., 2014Miyazaki et al., , 2017Miyazaki et al., , 2020, and there is growing interest in on-line assimilation of other chemicals and aerosol (e.g., Gelaro et al., 2017;Inness et al., 2013;Baklanov et al., 2014;Dee et al., 2014;Flemming et al., 2015;Inness et al., 2015Inness et al., , 2019. Meteorology and chemical constituents are not independent. Coupled chemistry-meteorology models such as WRF-Chem include explicit feedback between chemical constituents and meteorological parameters (Grell and Baklanov, 2011). This development in numerical modeling offers the opportunity to study the interaction and feedback between atmospheric physics, dynamics and composition such as the impact of air constituents on incoming radiation, the modification of weather (cloud formation and precipitation) by natural and anthropogenic aerosol, and the impact of climate change on the frequency and strength of events with poor air quality (e.g., Fiore et al., 2012). In parallel with this advance in modeling capability, observations of gases and aerosols from space-based instruments are providing an unprecedented view of constituents from the surface to the mesosphere. Space observations of column NO 2 have been applied in the verification of point-source emissions (e.g., Beirle et al., 2011;Russell et al., 2012), quantification of uncertain sources (such as biogenic and soil emissions) (e.g., Lin, 2012), and detection and characterization of episodic events, such as wildfire and lighting (e.g., Mebust et al., 2011;Miyazaki et al., 2014;Zhu et al., 2019).
The combination of these two advances sets the stage for the joint assimilation of both meteorology and chemistry in which chemical observations can improve the representation of dynamical motions in the atmosphere. In concept, it is easy to see the potential benefit of assimilating composition observations. For example, modeled winds might be transporting material to the southwest while an observed plume is moving to the west. In this example, the chemical observations would cause the assimilated model to alter the wind direction, thus aligning the predicted meteorology with the observed flow of chemicals. This is just one example. Chemical observations are also sensitive to wind speed (Valin et al., 2013;Laughner et al., 2016) and planetary boundary layer (PBL) dynamics. Examples of the beneficial information flow across the two subsystems include the improvement in cloud distributions after assimilating aerosols (Saide et al., 2012) and the potential for improvement in temperature, wind and cyclone development during dust storms via assimilation of aerosol optical depth (AOD) (Reale et al., 2011(Reale et al., , 2014. Improvement in stratospheric winds by assimilating chemical tracers has also been demonstrated (Peuch et al., 2000;Semane et al., 2009;Milewski and Bourqui, 2011;Allen et al., 2013;Chu et al., 2013). Examples of joint chemistry-meteorology assimilation in simpler models include studies by Allen et al. (2014Allen et al. ( , 2015, Haussaire and Bocquet (2016), Emili et al. (2016), Ménard et al. (2019), and Tondeur et al. (2020). Among the challenges that must be addressed as we begin to understand the potential benefits of joint assimilation of physical state variables and composition are the aspects of two linked subsystems (meteorology and chemistry) that can be most efficiently improved by linking them to observed chemical fields.
Aerosols, CO and CO 2 have been the focus of most prior chemical assimilation Saide et al., 2014;Mizzi et al., 2016). In our first analysis of NO 2 assimilation  we examined the potential for assimilation of high-spatial-resolution (∼ 3 km) and high-temporalresolution (hourly) NO 2 columns as will be provided by future geostationary observations to improve the representation of NO x emissions. NO x has a lifetime of approximately 5 h within the boundary layer and thus exhibits variation in concentrations at the spatial scales on the order of 50-75 km downwind of emissions. Those fine temporal and spatial scales make NO x variations more strongly coupled to short-timescale meteorological parameters than other more long-lived chemical tracers such as aerosol or CO. In our initial research , we focused on the retrieval of the NO x emissions. We found that using the column NO 2 to constrain emissions accurately required simultaneous meteorology and chemical assimilation. The strongest constraints were found in regions with high emissions and using hourly assimilation of meteorological observations. Our assimilation anticipates the launch of a geostationary satellite for column NO 2 observations, Tropospheric Emissions: Monitoring of Pollution (TEMPO), scheduled for launch in 2022. Related instruments include the Korean GEMS instrument launched in early 2020 and the ESA Sentinel 4 instrument to be launched in in the near future. The TEMPO observations will have two features that will make them a significant advance compared to current instruments in low earth orbit. First, the instrument will make measurements with hourly repeats during the sunlit portion of the day. Second the instrument will have approximately 3 × 3 km pixels, a substantial increase in spatial resolution compared to the ozone monitoring instrument (OMI) and an improvement over the TROPOMI instrument (Zoogman et al., 2017). That spatial resolution is also sufficient to quantify gradients in NO 2 that result from the combined effects of emissions, chemistry and transport.
Here we focus on winds. We expect the influence of NO 2 column assimilation on wind fields to be at the spatial scale of 75 km set by the NO 2 chemical lifetime and the average wind speed (e.g., Laughner and Cohen, 2019). We begin by describing the data assimilation tools and a simulator for future geostationary satellite observations of NO 2 columns (Sect. 2). In Sect. 3, we describe assimilation experiments that provide insight into the constraints that the column NO 2 observations will have on winds. In Sect. 4, we discuss the improvements to the accuracy of the modeled winds and assess the potential benefits of this approach to data assimilation. We conclude in Sect. 5.

Methodology
The data assimilation system is comprised of the forecast model WRF-Chem and the Data Assimilation Research Testbed (DART) as described in Mizzi et al. (2016) and Liu et al. (2017). The WRF-Chem/DART setup, TEMPO simulator and meteorological observations are described in more detail in Liu et al. (2017). Here we briefly describe the updated data assimilation system that allows NO 2 observations to influence winds.

WRF-Chem model
We use WRF-Chem version 3.7 with a one-way nested domain (Fig. 1). The outer domain of 12 km resolution covers the western United States, and the inner domain of 3 km resolution covers Denver and the mountain region to its west. On the outer domain the initial and boundary conditions are driven by weather reanalysis data (the North American Mesoscale Forecast System, NAM, or the North American Regional Reanalysis, NARR) for meteorology and by the Model for OZone And Related chemical Tracers (MOZART) for chemistry (Emmons et al., 2010). After a 1-month simulation on the outer domain, the inner domain is initialized with the initial and boundary conditions taken from the outer domain simulations.
The anthropogenic emissions are taken from the National Emission Inventory (NEI) 2011, which describes the hourly emissions for a typical summertime weekday. Biogenic emissions are parameterized using Model of Emissions of Gases and Aerosols from Nature (MEGAN) (Guenther et al., 2006). Gas-phase reactions are simulated using the regional acid deposition model version 2 (RADM2).

DART assimilation system
WRF-Chem-DART is a regional multivariate data assimilation system developed by the National Center for Atmospheric Research (NCAR) to analyze meteorological and chemical states simultaneously (Anderson and Collins 2007;Anderson et al., 2009). In this study we use the DART toolkit configured as the ensemble adjustment Kalman filter (EAKF) (Anderson, 2001). We apply adaptive spatially and temporally varying inflation to the prior state to maintain the ensemble spread (Anderson et al., 2009). We use horizontal localization to reduce the influence from spurious correlations (Anderson, 2012). A Gaspari and Cohn (1999) weighting function is applied with weights diminishing to zero 20 km away from the observation location. As in Liu et al. (2017), sensitivity tests show that NO 2 data assimilation with an hour assimilation window performs the best using the weighting function with a width of 20 km. The analyzed chemical states are NO 2 concentrations. The analyzed meteorological states include winds (U , V , W ), temperature (T ), cloud and cloud water properties (QVAPOR, QCLOUD, QRAIN, QICE, QS-NOW), and other variables as described in Table 2 of Romine et al. (2013). The analysis is updated using DART from continuously cycled 1 h 30 member ensemble WRF-Chem forecasts. The DART configuration details are provided in .
Previous studies that assimilate chemistry and meteorology simultaneously apply the variable localization approach (Arellano et al., 2007;Kang et al., 2011;Liu et al., 2017) which zeroes out the covariance between chemistry and some of the meteorology variables without taking advantage of the information related to meteorology carried by the chemical tracers. In this study, we partially turn off the variable localization and allow the assimilated NO 2 observations to influence horizontal wind (U and V ). With this setup, the advection scheme in the WRF-Chem model predicts downwind NO 2 evolution based on the wind fields. The EAKF computes the covariances between the predicted NO 2 distribution and wind variables. These sensitivities are utilized to refine the model state toward one that best fits the NO 2 observations considering the confidence in both the observations and model prediction.

Initial and boundary condition ensembles
We add random perturbations to the temperature field of a single initial state to produce an ensemble of perturbed meteorological initial conditions. The perturbations were generated by sampling the Global Forecast System (GFS) background error covariance using the WRF data assimilation system (WRFDA) (Barker et al., 2012). (For those trying to repeat exactly, we used cv_option = 3.) The statistics are estimated with the differences of 24 and 48 h GFS forecasts with T170 (∼ 75 km) resolution, valid at the same time for 357 cases, distributed over a period of 1 year. The ensemble member lateral boundary condition perturbations are generated based on random variations within the initial ensemble (using the DART pert_wrf_bc program). Updating the boundary conditions so that the analysis time matches the analysis states from DART requires care in labeling (the DART up-date_wrf_bc program is used).

Synthetic observations
To generate synthetic TEMPO NO 2 retrievals, we use the TEMPO NO 2 simulator developed in Liu et al. (2017) as the observation operator to compute the observed column from a model prediction. It includes a layer-dependent box air mass factor (BAMF) for each observation pixel. BAMF is atmospheric scattering weights that depend on parameters including viewing geometry, surface (terrain or cloud) pressure and surface reflectivity. The parameters used to compute BAMFs are sampled from a model run with hourly frequency and clear sky conditions. Details for the TEMPO simulator and observation error generation are described in Liu et al. (2017). Note that we developed this simulator prior to the TEMPO science team creating its own product. We perform observing system simulation experiments (OSSEs) to analyze the wind constraints from synthetic NO 2 observations. We initialized the WRF-Chem nature run (NR) on a 12 km resolution domain (d01) on 1 June 2014 at 00:00 UTC. The meteorological initial and boundary conditions are taken from the NAM, and the chemistry simulation is constrained by MOZART output (see https: //www.acom.ucar.edu/wrf-chem/mozart.shtml to download MOZART data, last access: 1 July 2020). After a 1-month simulation on d01, the NR on the 3 km domain (d02) is initialized from the d01 model simulation on 2 July 2014 at 15:00 UTC. Its meteorological and chemical boundary conditions are provided by NAM reanalysis data and the d01 simulation, respectively. We have a parallel model simulation labeled control run (CR) which is performed in the same way as the NR except its meteorological simulation is initialized and constrained by a different forecast model, the NARR. Constrained by different reanalysis data, the NR-and CRsimulated winds in the boundary layer differ and thus show the discrepancy in the NO 2 transport processes. We perform data assimilation on d02 from 3 July 2014, 13:00 UTC, to 6 July 2014, 00:00 UTC, with an hourly assimilation window. This timing allows for analyses of three complete daytime cycles. In our OSSE, the NR simulations are considered as the "true atmosphere" from which synthetic NO 2 observations are generated using the TEMPO simulator. After a 1 h forecast the prior ensemble is combined with synthetic NO 2 observations to produce the posterior ensemble. The difference in wind and NO 2 simulations between the NR and the ensemble mean results from the utilization of two different sets of reanalysis data as meteorological constraints and from the assimilation, while we apply the same forecast model, emission input, and model physics and chemistry scheme. The posterior ensemble will be used as the initial conditions to forecast the next hour. We evaluate the data assimilation performance by comparing the mean of the posterior estimate with the NR simulations.
We designed a series of six experiments to evaluate the potential of geostationary observations of column NO 2 to improve wind fields. First, we conduct a free model run (FREE) with 30 ensemble members derived from the CR without data assimilation. This will set the baseline performance and will be compared with cases that assimilate observations to evaluate the benefit of data assimilation to improve the winds. In the second experiment (CHEM), we assimilate synthetic TEMPO NO 2 observations over the 12 h daytime to constrain the winds in the ensemble. By comparing with FREE, we can evaluate the improvement in wind simulations as a result of assimilating NO 2 observations. In experiment (T , RH), we assimilate hourly observations of temperature and humidity which can indirectly update winds via the covariances of temperature and humidity against wind states. In experiment (T , RH, CHEM), we assimilate synthetic TEMPO NO 2 ob-servations together with temperature and humidity observations. In this case, wind analyses are constrained by the multiple indirect observations via covariances with temperature, humidity and NO 2 . In the experiment (MET), we assimilate all meteorological observations including wind, temperature and humidity. This is representative of the current weather observing systems' representation of boundary layer winds. Finally, we assimilate synthetic TEMPO NO 2 observations in addition to the meteorological observations in the experiment (MET, CHEM) to assess the influence of NO 2 observations on winds under the circumstances of a full meteorology assimilation.

Results and discussion
We compare the assimilation results with the NR states to evaluate the assimilation performance. The RMSE of the observed quantities are calculated as i and x t i are the model and true values at the ith model grid point, respectively, and l is the total number of grid points of interest. For the analyzed wind variables, the grid points of interest are all the points located within a model subdomain as shown in Fig. 2, containing the lowest five model levels vertically (∼ 250 m). We find that the horizontal transport of urban NO 2 is most sensitive to the winds in the lowest five model levels, and the top of the shallow boundary layer in the morning is as low as the fifth model level. The likely scenario that power plant stacks result in emissions outside this vertical window was not explored in this study. We also analyze the uncertainty (spread) of the prior and posterior estimates. The uncertainty is calculated as the 1σ standard deviation of the ensemble.

NO 2 assimilation
The performance of ensemble-based assimilation is determined by the representation of the ensemble uncertainty. In OSSEs we test how well the ensemble system represents the uncertainty by comparing the ensemble spread with the RMSE computed with respect to the true observations. Figure 3 shows the temporal evolution of the RMSE and the spread for synthetic TEMPO NO 2 column observations in FREE and the three experiments with synthetic TEMPO observations assimilated. We find that in all experiments the variation in the prior ensemble spread follows the fluctuations of the prior RMSE with a similar magnitude after the first day of assimilation. This indicates that the ensemble system develops a good amount of spread for NO 2 states and wind states as well because the NO 2 spread results from the wind differences among ensemble members.
For all the experiments assimilating synthetic TEMPO NO 2 observations, the diurnal variation in the prior RMSE and spread is related to the NO 2 column variation with the peaks in the morning and evening rush hours and local minima in the early afternoon. The errors in the comparison to the synthetic TEMPO NO 2 columns are reduced by 78 % on average from the prior to the posterior estimates. The temporal average of the posterior RMSE varies from 2.6 to 2.9 × 10 14 molec. cm −2 , which is very similar to the NO 2 assimilation results in our previous experiment ENS.1 as shown in Fig. 4 of Liu et al. (2017). Experiment CHEM shows lower prior RMSE of TEMPO NO 2 than the FREE for two reasons. First, assimilation of TEMPO in CHEM reduces the errors in the posterior NO 2 of the last cycle, which results in better forecasts of prior NO 2 . Second, assimilation of NO 2 improves the wind forecast in models (as shown in Sect. 4.2) and thus reduces the NO 2 transport errors. This demonstrates that in places without wind observations, assimilating synthetic TEMPO NO 2 observations can reduce the errors in the NO 2 forecast by allowing NO 2 observations to improve wind simulations in models.

Using synthetic TEMPO NO 2 observations to constrain the winds
Errors of the winds in models affect the horizontal advection of NO 2 and result in differences between observed and modeled NO 2 vertical column density that can be used to correct the winds. In this ensemble assimilation system, we examine the impact of assimilating synthetic TEMPO NO 2 observations on the winds in the boundary layer when different sets of meteorological observations are assimilated. Figure 4 shows the hourly evolution of the posterior RMSE of wind state variable U for all six experiments. Results for V are similar. We exclude the first daytime point in our analysis because it takes time for the assimilation system to equilibrate. Without any constraint on winds, FREE shows varying wind RMSE with higher values in the night than the daytime. With the assimilation of TEMPO only, CHEM shows error reduction in the posterior wind analysis in each daytime cycle (Fig. 4a). Table 1 compares the temporal average of the posterior wind RMSE for the six runs during daytime. The daytime average posterior RMSE is reduced by 0.44 m s −1 (15.70 %) and 0.41 m s −1 (15.45 %) for U and V wind from FREE to CHEM. We find that the reduction in wind RMSE resulting from daytime assimilation disappears after the first night cycle (Fig. 4). This is because the daytime error reduction is only observed in regions with abundant NO 2 concentrations; wind errors in regions with little NO 2 remain high during the day and quickly propagate into the regions with high daytime NO 2 during the night once there is no longer any NO 2 assimilation to constrain the error. As a result, the nighttime average RMSE of CHEM is very close to that of FREE, independent of the improvement of wind simulations from daytime. In the transition from night to daytime, the influence of assimilating NO 2 observations on winds begins with the first daytime cycle. This demonstrates that the covariance of wind and NO 2 develops and remains during the night. Figure 2a and b show the difference in U wind between the CHEM run and the truth at 13:00 MST on 4 July before and after assimilation. The incremental change in U wind after assimilation is plotted in Fig. 2c. The difference between the truth and the prior NO 2 column amounts viewed by the TEMPO simulator is also shown in Fig. 2d. Because the U wind is underestimated in the prior, the modeled NO 2 plume in the prior is more concentrated at the source and more dispersed to the east than in truth. After the assimilation of the synthetic TEMPO NO 2 columns, we observe that the wind increases at the top and middle of the domain, where it was most underestimated prior to assimilation (Fig. 4). Averaged over the domain, the U wind RMSE is reduced from 2.32 to 1.56 m s −1 from the prior to the posterior.
In the next two experiments (hereafter, T and RH, respectively) we assimilate observations of temperature and humidity in the (T , RH) run to adjust the wind variables. As shown in Table 1, (T , RH) shows 13.91 % and 15.10 % error reduction in posterior U and V winds during daytime compared to the unconstrained run FREE. These are improvements to winds from assimilating temperature and humidity  observations using the covariances between meteorological variables. In addition, we find the averaged daytime posterior wind RMSE of (T , RH) is very close to that of the CHEM run. This demonstrates that TEMPO NO 2 columns, as indirect chemical observations of winds, can be used to constrain winds, as well as temperature and humidity observations which are also indirect observations of winds. However, the temporal variations in the daytime posterior wind RMSE between the two runs are different (Fig. 4b). At the beginning of the daytime cycles, the (T , RH) run shows lower pos-terior wind RMSE than CHEM as temperature and humidity observations are assimilated during the night, resulting in lower nighttime wind errors, whereas no nighttime NO 2 TEMPO observations are available to be assimilated. In the later daytime cycles, the posterior wind RMSE in CHEM becomes lower than that in (T , RH) due to the assimilations of TEMPO NO 2 . When we assimilate TEMPO NO 2 together with temperature and humidity observations in (T , RH, CHEM), we find further error reductions in posterior wind during the third day compared with (T , RH) (Fig. 4b). This is because (T , RH) shows no error reductions in posterior winds in the afternoon of the third day, while the assimilation of TEMPO NO 2 alone can successfully reduce wind errors (Fig. 4a). There are only minor differences between the (T , RH) and (T , RH, CHEM) runs during the second daytime. This is because assimilating temperature and humidity observations alone has reduced the wind errors to the extent that assimilations of additional NO 2 observations can not provide further improvements. Furthermore, Fig. 5 shows the wind speed in the afternoon is mostly between 2 to 4 m s −1 on the second day (4 July) and 4-6 m s −1 on the third day (5 July). When the wind is stagnant, we do not expect strong covariances between winds and NO 2 because the horizontal transport of NO 2 due to wind is not strong. When wind speed is higher on the third day, it in-creases the ensemble covariances between wind and NO 2 to achieve further improvement on wind.
The MET experiment has the lowest RMSE in the prior estimates of NO 2 because it has the lowest wind errors, and thus NO 2 transport errors, as a result of the assimilation of direct wind observations (Figs. 4c and 5). Nevertheless, even in this run there is a small benefit to assimilating NO 2 columns as can be seen in the reduced RMSE of the wind on the third day.

Conclusions
The assimilation of column NO 2 is explored as a constraint on boundary layer winds. Compared with assimilations of temperature and humidity, assimilations of column NO 2 are as effective as a constraint on winds during the daytime. Col- Figure 5. Winds for successive hours from 18:00 to 23:00 UTC on the second (4 July -top row) and third (5 July -bottom row) days of the assimilation.
umn NO 2 which is only available in sunlight is less effective than T and RH in the morning but more effective in the afternoon. In addition, we find that assimilating column NO 2 as will be provided by the TEMPO satellite instrument does not degrade the results of assimilating temperature and humidity observations to constrain winds, and it improves on wind reanalysis, especially when wind speeds are above 4 m s −1 . Including all available data, T , RH, winds and column NO 2 makes it more difficult to discern the improvement from the NO 2 column assimilation. Nevertheless, we observe improvements in wind reanalysis even under these circumstances (Table 1). This initial experiment covers a small domain surrounding the city of Denver and only a few days. With this initial study suggesting the method has promise, a larger-scale experiment should now be evaluated. We hope this study will inspire such research.
Code availability. The data used in this paper and the associated software packages are available via the Github open repository via the following links, with descriptions (last access: 20 June 2020); https://github.com/NCAR/DART (Anderson and Collins, 2007; the ensemble Kalman filter solver and the WRF-Chem/DART interface code), https://github.com/NCAR?q=wrf&type=&language=&sort= (Skamarock et al., 2008; WRF model, as well as access to utilities used by WRF-Chem-DART, and ensemble spread for meteorology), and https://github.com/NCAR/WPS (Skamarock et al., 2008; the WRF pre-processor for taking large scale meteorology fields and putting them into WRF input format).
Data availability. Data for this analysis have not been archived as their size is too large, but they can be recreated with the software described above.
Author contributions. XL and RCC conceived the project, and XL developed and executed the numerical experiments. APM, JLA, IF and RCC contributed to the design of the experiments and interpretation of the results. XL and RCC prepared the manuscript with contributions from all co-authors.