Impacts of global NO x inversions on NO 2 and ozone simulations

. Tropospheric NO 2 and ozone simulations have large uncertainties, but their biases, seasonality, and trends can be improved with NO 2 assimilations. We perform global top-down estimates of monthly NO x emissions using two Ozone Monitoring Instrument (OMI) NO 2 retrievals (NASAv3 and DOMINOv2) from 2005 to 2016 through a hybrid 4D-Var/mass balance inversion. Discrepancy in NO 2 retrieval products is a major source of uncertainties in the top-down NO x emission estimates. The different vertical sensitivities in the two NO 2 retrievals affect both magnitude and seasonal variations of top-down NO x emissions. The 12-year averages of regional NO x budgets from the NASA posterior emissions are 37 % to 53 % smaller than the DOMINO posterior emissions. Consequently, the DOMINO posterior surface NO 2 simulations greatly reduced the negative biases in China (by 15 %) and the US (by 22 %) compared to surface NO 2 measurements. Posterior NO x emissions


Introduction
Tropospheric ozone is a harmful secondary air pollutant affecting human health, sensitive vegetation, and ecosystems (NRC, 1991;Monks et al., 2015). Long-term ozone (O 3 ) exposure is estimated to cause 1.04-1.23 million respiratory deaths in adults (Malley et al., 2017). Short-term exposure to high ambient ozone concentrations is associated with respiratory and cardiovascular mortality (Turner et al., 2016;Fleming et al., 2018). Accurate simulations of ozone in highly polluted regions are important for better pollution forecasts and more effective emission regulations. Tropospheric ozone is formed through photochemical reactions between nitrogen oxide (NO x = NO + NO 2 ), carbon monoxide (CO), methane (CH 4 ), and volatile organic compounds (VOCs) in the pres-ence of sunlight (Crutzen, 1973;Derwent et al., 1996). These precursor gases are mainly emitted from fossil-fuel combustion, biomass burning, oil and gas production, industry, agriculture, and biogenic activities. Tropospheric ozone can also be transported from the stratosphere through stratospheretroposphere exchange (Stohl et al., 2003;Hsu and Prather, 2009;Stevenson et al., 2006;Lu et al., 2019), but this magnitude is smaller than the amount from chemical production by a factor of 5-7 (Young et al., 2013). Ozone is removed from the troposphere through deposition (Fowler et al., 2009), photodissociation, and reactions with HO 2 , NO 2 , unsaturated VOCs, halogens, and aerosols (Crutzen, 1973).
From 1850 to 2000, global mean tropospheric ozone burden has increased by 29 % (Young et al., 2013). Human activities are major sources of ozone precursor gases, contributing to a 9 % (24.98 Tg) increase of the global tropospheric ozone burden from 1980 to 2010 . Ozone formation and trends depend nonlinearly on the local relative abundances of NO x and VOCs and the radiative regime in which these occur. Previous studies have shown that changes in surface ozone are dominated by regional emission trends of precursor gases . At the global scale, 77 % of NO x emissions are from anthropogenic sources, according to the HTAP 2010 inventory (Janssens-Maenhout, 2015). Anthropogenic NO x emissions have been decreasing in North America and Europe due to transportation and energy transformations (Simon et al., 2015) but have been increasing in China up until 2011 according to bottom-up emission inventories (Liu et al., 2016;Hoesly et al., 2018). Top-down NO x emission estimates using satellite observations from the Ozone Monitoring Instrument (OMI) showed a similar turning point in China (Miyazaki et al., 2017;Qu et al., 2017), but there was a slowdown in reductions in the US compared to bottom-up estimates (Miyazaki et al., 2017;Jiang et al., 2018). However, in India and the Middle East, where ozone production is more efficient than higher-latitude regions , NO 2 column densities from OMI are continuing to increase .
Top-down methods have the advantage of being able to update emissions in a more timely fashion than the bottomup approaches; still, top-down approaches can contain large differences and uncertainties. For instance, the magnitude of tropospheric NO 2 column densities from two global retrievals from the National Aeronautics and Space Administration (NASA) and the Royal Netherlands Meteorological Institute (KNMI) differ by 50 % and have different trends at the regional scale (Zheng et al., 2014;Canty et al., 2015;Qu et al., 2017). These differences in column densities can propagate to differences in top-down NO x emission estimates (e.g., Miyazaki et al., 2017;Qu et al., 2017). In this study, we assess the importance of these discrepancies in NO x emissions for the simulation of ozone. We derive global top-down NO x emissions from 2005 to 2016 using two widely used products (OMNO 2 v3 and Dutch OMI NO 2 (DOMINO) v2) based on the same inversion process for consistent evalua-tions (Sect. 3). We also evaluate a new OMI NO 2 retrieval product, the Quality Assurance for the Essential Climate Variables (QA4ECV) , and apply it to derive monthly NO x emissions in 2010. We do not repeat our entire set of ozone evaluations with this product given that its magnitude and seasonality do not significantly differ from the other two products. We further explore the impact of adjusting NO x emissions on ozone simulations by evaluating the ozone simulations produced from bottom-up and topdown NO x emissions against global surface measurements from the Tropospheric Ozone Assessment Report (TOAR) database and the China National Environmental Monitoring Center (CNEMC) network.
In addition to local sources, the lifetime of ozone (∼ 22 d on global average) is sufficiently long enough for intercontinental transport (UNECE, 2010). Consequently, every country is an exporter as well as an importer of ozone pollution. Transport from East Asia can be an important contributor to ozone exceedances in the western US (Goldstein et al., 2004;Zhang et al., 2009Zhang et al., , 2014Fiore et al., 2014;Verstraeten et al., 2015;Lin et al., 2017;Jaffe et al., 2018). The influence of intercontinental ozone transport is strongest in spring and summer, when background ozone concentrations reach 50 ppbv at the west coast of the US . The impact of background ozone is increasingly important and challenging due to the decreased local sources of precursor gases in the US (Hoesly et al., 2018) and the recent stricter ozone standard in the US. This involved lowering the annual fourth highest maximum daily 8 h average ozone concentration from 75 to 70 ppbv in 2015 . Optimization of NO x emissions in the upwind regions can improve remote ozone simulations in downwind regions after transport of intercontinental pollution plumes from the free troposphere to the surface (Zhang et al., 2008;Verstraeten et al., 2015). Therefore, we also evaluate the model simulations of remote ozone at the west coast of the United States using bottom-up and top-down NO x emissions in Sect. 4.

GEOS-Chem and its adjoint model
The GEOS-Chem adjoint model (Henze et al., 2007) v35k is used to derive global NO x emission estimates at 2 • × 2.5 • resolution. It was developed for inverse modeling of aerosol and gas emissions using the 4D-Var method by Henze et al. (2007Henze et al. ( , 2009) based on version 8 of GEOS-Chem, with bug fixes and updates up to version 10. Simulations in this study are driven by Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) meteorological fields from the NASA Global Modeling and Assimilation Office (GMAO). Anthropogenic emissions of NO x , SO 2 , NH 3 , CO, NMVOCs (non-methane volatile organic compounds), and pri-mary aerosol from the HTAP 2010 inventory version 2 (Janssens-Maenhout et al., 2015) are used to drive all prior simulations from 2005 to 2017. The diurnal variation of NO x emissions is derived from EDGAR hourly variations (http://wiki.seas.harvard.edu/geos-chem/index.php/Scale_ factors_for_anthropogenic_emissions3Diurnal_Variation) and is not optimized in the inversion. The use of nonanthropogenic emissions and other setups follows Qu et al. (2017Qu et al. ( , 2019a. In the following analyses, we refer to this model as "GC-adj." GC-adj does not include several halogen chemistry mechanisms that affect ozone depletions primarily over the oceans (Sherwen et al., 2016a;Wang et al., 2019) and at highaltitude regions (Sherwen et al., 2016a). Given their impact on the global background ozone concentrations, we also use GEOS-Chem v12.1.1 to evaluate ozone simulations at 2 • × 2.5 • resolution driven by the MERRA-2 meteorological fields. The chemistry updates include the stratospheric chemistry from the unified tropospheric-stratospheric chemistry extension (UCX) (Eastham et al., 2014), halogen chemistry (Bell et al., 2002;Parrella et al., 2012;Sherwen et al., 2016aSherwen et al., , 2016bSchmidt et al., 2016;Sherwen et al., 2017), and updated isoprene and monoterpene chemistry Fisher et al., 2016;Marais et al., 2016;Travis et al., 2016). The Harvard-NASA Emissions Component (HEMCO) is employed to process emissions in this version of GEOS-Chem (Keller et al., 2014). We use 72 levels of vertical grid and global anthropogenic emissions from the Community Emissions Data System (CEDS) (Hoesly et al., 2018). Top-down NO x emissions derived by using GC-adj are also input to this model to evaluate the impact of NO 2 data assimilation on ozone simulations under different chemical mechanisms. We refer to this model as "GCv12" in this article.
For each NO x emission dataset, the model spin-up time is 6 months, starting from July 2005. Therefore, we derive NO x emissions from 2005 but only evaluate simulations with measurements from 2006. To avoid high biases when comparing simulated ozone averaged over the first vertical model layer (∼ 100 m in box height) with surface measurements, 2 m ozone mixing ratios are calculated by scaling simulated ozone mixing ratios in the first layer using adjusted dry deposition velocities at 2 m following Zhang et al. (2012) and Lapina et al. (2015).

Satellite observations and global top-down NO x emissions
We estimate global top-down NO x emissions at the surface from 2005 to 2016 at 2 • × 2.5 • resolution using tropospheric NO 2 column densities from OMI. OMI is an ultraviolet/visible nadir solar backscatter spectrometer aboard the NASA Aura satellite. It has a local overpass time of about 13:45 LT and a nadir resolution of 13 km×24 km. OMI was launched in July 2004 and has provided operational data products since October 2004. Two level-2 NO 2 retrieval products are used to derive long-term top-down NO x emissions in this study: the NASA standard product OMNO 2 version 3  and the DOMINO version 2 from KNMI (Boersma et al., 2011). A new OMI NO 2 retrieval, the Quality Assurance for the Essential Climate Variables (QA4ECV) , has recently become available. This product is jointly developed by KNMI, the Belgian Institute for Space Aeronomy (BIRA-IASB), University of Bremen, Max Plank Institute for Chemistry, and Wageningen University. We evaluate the magnitude of NO 2 column densities and the seasonality of posterior NO x emissions in 2010 from this product. We screen all OMI NO 2 retrievals using data quality flags and by the criteria of positive tropospheric column, cloud fraction < 0.2, solar zenith angle < 75 • , and viewing zenith angle < 65 • . We excluded all retrievals that are affected by row anomaly.
We converted GEOS-Chem NO 2 vertical column densities (VCDs) to slant column densities (SCDs) using scattering weight from the OMI retrievals and then compared GEOS-Chem SCDs with SCDs retrieved from OMI. The scattering weights are the product of the averaging kernels and the air mass factor (AMF) (Palmer et al., 2001;Chance and Martin, 2017). A cost function is defined as the observation-errorweighted differences between simulated and retrieved NO 2 SCDs plus the prior emissions error-weighted departure of the emission scaling factors from the prior estimates. We minimize the cost function using the quasi-Newton L-BFGS-B gradient-based optimization technique (Byrd et al., 1995;Zhu et al., 1994), in which the gradient of the cost function with respect to the control parameter is calculated using the adjoint method. Details of the assimilation of NO 2 SCDs, how vertical sensitivities of satellite retrievals are accounted for, and the hybrid 4D-Var/mass balance inversion of NO x emissions are described in Qu et al. (2017). We use top-down NO x emissions estimated from the NASA standard product and the DOMINO product (Qu et al., 2020a, b) in the evaluations of ozone simulations.

Surface measurements
We evaluate surface NO 2 simulations with measurements from the Environmental Protection Agency (EPA) Air Quality System (AQS) in the US and the China National Environmental Monitoring Center (CNEMC) network in China. The city monitoring sites included in the analysis represent either urban background or the averaged pollutant concentrations over the city. Simulated ozone mixing ratios from 2006 to 2016 are compared to surface observations from the TOAR Surface Observation Database (Schultz et al., 2017a) at the global scale and the CNEMC network in China. TOAR has produced a relational database of global surface ozone observations at all available sites; see Gaudel et al. (2018) for illustrations of the global coverage of the TOAR data. Pre-compiled TOAR data (https://doi.org/10.1594/PANGAEA.876108, available from 1995 to 2014, Schultz et al., 2017b) at each individual site are used in this study. Given the sparse TOAR data coverage of only 32 sites over China, hourly surface ozone measurements from the CNEMC (http://106.37.208.233:20035/, last access: 1 November 2020) are used to evaluate simulations in China from 2014 to 2016. The CNEMC national network was designed for urban and suburban air pollution monitoring. The archive contains hourly observations of ozone, carbon monoxide, nitrogen dioxide, sulfur dioxide, and fine particulate matter across mainland China since 2013.

Ozonesonde measurements
Ozone profile measurements from the Intercontinental Chemical Transport Experiment Ozonesonde Network Study (IONS-2010) (Cooper et al., 2011) (Ryerson et al., 2013) and was a continuation of previous IONS experiments to measure tropospheric ozone variability across North America (Thompson et al., , 2008Cooper et al., 2007). Balloon-borne electrochemical cell sensors were used to measure ozone profiles with an accuracy of ±10 % in the troposphere (Johnson et al., 2002;Smit et al., 2007). All six sites in California from IONS-2010 (referred to as Trinidad Head, Point Reyes, Point Sur, San Nicolas, Joshua Tree, and Shasta) are included in this study. These measurements are made in the mid-afternoon (95 % occurring between 14:00 and 16:59 LT) over a 6-week period from 10 May to 19 June 2010. There are 34-37 profiles for all sites except for San Nicolas Island, where only 26 profiles are available due to multiple instrument failures. Measurements made between 700 and 800 hPa are used to evaluate remote-ozone simulations.
3 Magnitude, seasonality and trend of NO x emissions, surface NO 2 , and surface ozone Differences between the prior and posterior NO x emission estimates are mainly driven by the differences between simulated and retrieved tropospheric NO 2 vertical column densities (VCDs), which are compared in Sect. S1 in the Supplement. The GEOS-Chem NO 2 SCDs converted using scattering weight from the NASA product are larger than the SCDs calculated using the DOMINO scattering weight and the same GEOS-Chem VCDs (See Fig. S2). These can be explained by the use of different surface albedo and cloud product in the two retrievals. The retrieved NO 2 SCDs from the NASA product are mostly smaller than the DOMINO retrieval except for some regions between 40 and 60 • N in January 2010. The smaller magnitude in OMI SCD and the larger magnitude in GEOS-Chem SCD using the NASA scattering weight lead to a smaller magnitude of posterior NO x emissions than inversions from the DOMINO product. The cost function has reduced by 6 %-29 % in the monthly inversion.

Annual average
As shown in Table 1 As shown in Fig. 1, the NASA posterior NO x emissions are less than the prior NO x emissions in the northeast US, northeast China, and southeast China. The DOMINO posterior NO x emissions are larger than the prior emissions in most regions except for northern Mexico and most parts of the tropics. The QA4ECV posterior NO x emissions have more consistent negative increments in eastern China with the NASA posterior emissions and more consistent positive increments in the United States, India, Europe, and Australia with the DOMINO posterior emissions. At the regional scale, NASA posterior increments are −3 % in China, −1 % in the US, +0.3 % in India, and −1 % in western Europe. The increments from the DOMINO posterior emissions are +21 % in China, +31 % in the US, +28 % in India, and +38 % in western Europe. The different changing directions in the above two posterior NO x emissions are consistent with the reportedly higher magnitude of NO 2 column densities in the DOMINO product than the NASA product in densely populated and industrial regions (Zheng et al., 2014;Canty et al., 2015;Qu et al., 2017). The increments from the QA4ECV posterior emissions are +5 % in China, +19 % in the US, +18 % in India, and +14 % in western Europe.
To evaluate the magnitude of the posterior NO x emissions, we compare simulations of surface NO 2 concentrations using the NASA-and DOMINO-based NO x emissions with surface measurements in the US and China. Surface NO 2 simulations at coarse resolution are usually biased low compared to measurements at urban sites, due to the short lifetime of NO x . We therefore start with analyzing this resolution error by generating high-resolution pseudo surface measurements at 0.1 • × 0.1 • and compare them with low-resolution model simulations at 2 • × 2.5 • . We generate high-resolution surface NO 2 concentrations by scaling simulated surface NO 2 concentrations at 2 • × 2.5 • grid cells by the ratio of OMI NO 2 column density gridded at 0.1 • × 0.1 • to the OMI NO 2 column density gridded at 2 • × 2.5 • grid cell. We identify 0.1 • × 0.1 • grid cells that include surface monitoring sites and treat downscaled surface NO 2 concentrations at these grid cells as the pseudo surface measurements. Comparisons of pseudo surface measurements and NO 2 simulations at 2 • × 2.5 • purely reflect differences caused by comparing NO 2 concentrations at 2 • × 2.5 • with higher-resolution surface measurements at urban regions. The mean of the pseudo NO 2 measurements is 32 % higher than the low-resolution simulations in the US, and it is 18 % higher than the lowresolution simulations in China. The real surface measurements, which represent a single point within the 0.1 • × 0.1 • grid cell, are expected to have even larger biases than the values calculated here, where we assume the measurements are at 0.1 • × 0.1 • grid cells. The smaller bias in China in comparison to the US is related to the higher background NO 2 concentrations in China. Figure 2 shows the comparisons of annual mean surface NO 2 concentrations in 2015 from measurements and simulations using different NO x emission inputs. The selection of this year is due to the limited availability of nationwide surface NO 2 measurements in China. Surface NO 2 concentrations in both China and the US are measured by chemiluminescence analyzers, each equipped with a molybdenum converter, which converts additional NO y compounds to NO and leads to a positive bias in NO 2 measurements (Dunlea et al., 2007;Steinbacher et al., 2007). We therefore calculate a correction factor following Lamsal et al. (2008) for each GEOS-Chem simulation and divide the simulated NO 2 concentrations by this correction factor to convert simulated NO 2 to the measured species. The correction factors are generally higher in the US than in China but have similar seasonality (see Fig. S3 in the Supplement). Subtracting the resolution bias from the statistics shown in Fig. 2, the equivalent normalized mean bias (NMB) of surface NO 2 concentrations using the NASA posterior is −54 % in China and −41 % in the US. The equivalent NMB using the DOMINO posterior is −38 % in China and −19 % in the US. These remaining negative biases reflect the unrepresentativeness of 0.1 • pseudo measurements for real point measurements for resolution bias correction, comparison of NO 2 concentrations averaged over 2 • × 2.5 • simulation to limited measurements, the underestimates of NO 2 retrievals using coarse-resolution prior information, and the inability of data assimilation to increase emissions at grid cells where NO 2 retrievals are below the detection limit of OMI. Although we have not performed a NO x emission inversion using the QA4ECV product for 2015, we expect its bias to lie between the results from the NASA and DOMINO products, based on the magnitude of NO x emissions in 2010. We evaluate the simulated ozone concentrations with global surface measurements from the TOAR database using three ozone metrics: maximum daily 8 h average (MDA8) ozone, daytime average ozone (08:00-20:00 LT), and 24 h average ozone. In addition to the GC-adj simulation, with which we derived top-down NO x emissions, we also input the same top-down emissions to GCv12 and evaluate ozone simulations from this more recent version of the GEOS-Chem that includes updated halogen and isoprene chemistry.
All GC-adj simulations of 2 m ozone concentrations have a high bias compared to the TOAR measurements in 2010. NMB and normalized mean square error (NMSE) are largest for 24 h ozone concentrations. Simulations using posterior NO x emissions have slightly better agreement with the measurements from TOAR in 2010 (Fig. 3). In particular, simulations using the DOMINO posterior NO x emissions have the smallest NMB in all ozone metrics and the smallest NMSE in all metrics except for the North Hemisphere (NH) summertime MDA8 ozone. Simulations using the NASA posterior NO x emissions have the best spatial correlation when compared with measurements for all metrics except for the NH summer daytime ozone and annual MDA8 ozone, for which DOMINO posterior simulations have the largest correlation coefficient (Fig. S4).
In comparison, GCv12 simulations have a low bias in daytime ozone but high bias in 24 h average ozone, reflecting the potential underestimate of ozone loss at night. The impact of NO 2 assimilation on improving estimates of surface ozone simulations in GCv12 depends upon the ozone metric, as shown in Fig. 3c. Simulations using the DOMINO posterior emissions have the smallest NMB for annual mean daytime ozone; simulations using bottom-up NO x emissions have the smallest NMB for annual mean MDA8 ozone; simulations using the NASA posterior emissions have the smallest NMB for annual mean 24 h averaged ozone. These results suggest that the simulated diurnal variations of surface ozone concentrations may not be correct. The current constraints on NO x emissions use observations from OMI, which overpasses the same location approximately once per day. The diurnal variations of NO x emission are constrained to be those of the prior emissions. The daily NO 2 column densities from OMI are smaller compared to the diurnally varying ground-based retrievals (Herman et al., 2019). Assimilating NO 2 observations from instruments overpassing at different times of the day (e.g., Boersma et al., 2008;Miyazaki et al., 2017) and using hourly constraints from the geostationary satellite data (e.g., Geo-stationary Environmental Monitoring Spectrometer (GEMS), Tropospheric Emissions: Monitoring of Pollution (TEMPO) (Zoogman et al., 2017), and Sentinel-4) have the potential to improve simulations of ozone diurnal variations and different ozone metrics, although the ratio of NO 2 column densities from satellites that overpass in the morning and afternoon are generally lower than the same ratios from surface measurements (Penn and Holloway, 2020). Simulated MDA8 ozone values are mostly biased low in NH summer but biased high in annual mean concentrations, reflecting different seasonal variations in simulated and measured ozone concentrations, which will be further discussed in Sect. 3.2. Evaluations with the CNEMC ozone measurements in China are in Sect. S2 in the Supplement.

Seasonal variation
The seasonal variations of monthly NO x emissions are consistent between the prior emissions and the NASA posterior emissions (Fig. 4). The DOMINO posterior emissions show different seasonal variations in several regions. In China, the prior emissions and the NASA posterior NO x emissions show summer peaks, which are mainly caused by the increase of natural sources when temperatures are high and lightning occurs more often (Qu et al., 2017). The DOMINO posterior emissions have the largest values in January and June in China, consistent with the posterior seasonality from Miyazaki et al. (2017) constrained by the same OMI NO 2 product. The June peak in China has been explained by the crop residual burning (Stavrakou et al., 2016). The peak of the DOMINO posterior NO x emissions in the United States and Mexico shifted earlier in the year to June and July compared to the prior emissions and NASA posterior emissions, similar to the results from Miyazaki et al. (2017). The peak in DOMINO posterior emissions corresponds to the time of high soil NO x emissions, which are reported to be underestimated in high-temperature agricultural systems in the bottom-up inventory (Oikawa et al., 2015;Miyazaki et al., 2017). The differences between the DOMINO posterior and the other two sets of emissions are especially large during the springtime in India, when biomass-burning activity increases (Miyazaki et al., 2017;Venkataraman et al., 2006). These retrieval products have similar numbers of observations and spatial distributions of observation densities after the filtering. The different seasonal variations in the posterior NO x emissions may reflect the AMF structural uncertainties when the retrieved NO 2 column densities use different ancillary data (Lorente et al., 2017). For instance, the GEOS-Chem NO 2 SCDs converted using the scattering weight from the NASA product have larger seasonal variations than the SCDs converted using the DOMINO scattering weight in the US, reflecting the different seasonal variations of vertical sensi- tivities from the two retrievals. The seasonal variations of simulated surface NO 2 concentrations are similar with measurements in China and the US (see Fig. S6).
Seasonal variations of 2 m ozone concentrations simulated by the GC-adj are also similar despite different NO x emission inputs: the differences in correlation coefficients of the simulated and the measured monthly ozone concentrations are less than 9 %. The simulations of 2 m ozone concentrations from GCv12 show better seasonality when using the posterior NO x emissions than using the prior, as shown in  with CNEMC measurements in China, simulations using the prior emissions have the most consistent seasonal variations and smallest NMSE. All simulations have smaller seasonal variations than the measurements in daytime ozone.

Interannual variations
The three different versions of NO x emissions have differ- We evaluate the trend of simulated surface NO 2 concentrations in the US with AQS measurements due to its availability throughout the study period (Fig. 7). From 2006 to 2016, the surface NO 2 concentrations show consistent decreases in the AQS measurements (by 32 %) and GC-adj simulations (by 26 % using the NASA posterior, by 10 % using the DOMINO posterior, and by 7 % using the prior emissions). Since we use the same anthropogenic emissions throughout 2006-2016 in the prior simulations, the variations in the black line reflect changes from natural sources and the impact of meteorological factors (e.g., temperature, humidity, wind, etc.). Surface NO 2 simulations using the NASA posterior NO x emissions also have the largest correlation coefficient when compared to the measurements (R 2 = 0.93 for the NASA posterior, R 2 = 0.81 for the DOMINO posterior, and R 2 = 0.74 for the prior). The more consistent trends and correlations in surface NO 2 simulations using the NASA posterior emissions are consistent with the larger decrease of NASA posterior NO x emissions in the US (by 20 % or for comparison a decrease of 1 % in the DOMINO posterior) from 2006 to 2016, as shown in Fig. 6.
The interannual variability of global simulations of 2 m ozone sampled at the TOAR locations is similar between GC-adj and GCv12. During the NH summer, simulations using the DOMINO posterior NO x emissions have the most consistent trend in daytime and 24 h average ozone in both models (see Table S1 in the Supplement); GC-adj simulations using the NASA posterior emissions have the best consistency with the measured trend of MDA8 ozone. The different performances of NO x emission datasets for different ozone metrics is a consequence of the hard constraint on NO 2 diurnal variations within the assimilation (and the lack of sufficient observations to constrain this). This can lead to better agreement of mean ozone concentration with measurements over particular hours but worse mean concentrations averaged over other hours. Detailed analyses of global ozone trends are in Sect. S3. At the regional scale, shown in Fig. 8, surface ozone measurements from TOAR mostly fall within the ranges of assimilation results. The interannual variations of simulated ozone over the whole region (black dotted lines) are generally smaller than the ones at grid cells that include surface measurements (solid black lines). The number of years that ozone measurements are available in each grid cell is shown in Fig. S8. The overlap of solid black and green lines in Fig. 8 suggests that interannual variations of anthropogenic NO x emissions from CEDS do not have a large impact on surface ozone simulations. The trends of simulated annual MDA8 ozone concentrations are correlated with impacts from meteorology and non-NO x sources based on simulations (shown as green lines) that use the same anthropogenic NO x emissions for all years and simulations that use interannually varied anthropogenic NO x emissions, leading to ozone changes of up to 4 ppbv (China), 5 ppbv (South Korea), 1 ppbv (US), 2 ppbv (Mexico), 1 ppbv (South America), 1 ppbv (Australia), 1 ppbv (western Europe), and 6 ppbv (Africa) from one year to the next. The trends of simulated MDA8 ozone are similar when using the NASA and the DOMINO posterior NO x emissions as inputs. The DOMINO-derived MDA8 ozone concentrations are higher than the NASA-derived ones in all studied regions, represented by the upper and lower limits of the error bars, respectively. GCv12 simulated ozone concentrations are smaller than simulations from GC-adj, especially over relatively less polluted regions, consistent with the inclusion of halogen chemistry in GCv12, which depleted ozone. The simulated MDA8 ozone trends in grid cells that include measurements in the US and Australia are more consistent with the TOAR measurements than the other regions, with coefficients of determination (R 2 ) larger than 0.45. The larger differences in ozone between the prior and posterior emissions as well as variability between the two top-down NO x emissions in GCv12 suggest a larger responsiveness of the ozone chemistry to changes in NO x . We do not expect simulated ozone trends to be completely consistent with the measurements in the TOAR database due to errors in the model's transport, chemical mechanism, and VOC emissions.
We further separate the ozone trends in grid cells that include measurements into changes caused by NO x emissions as well as meteorology and non-NO x sources. The second trend is calculated through simulations that use constant NO x emissions throughout the studied years. It has a similar trend from GCv12 and GC-adj as shown in the green lines in Fig. 9. The trend caused by NO x emissions is obtained by subtracting the second trend from the ozone trend simulated using NO x emissions at each corresponding year. The ozone trends due to changes in meteorology and non-NO x sources (green lines) are moderately correlated (R > 0.5) with measurements from TOAR in Australia, the US, South America, and India. The ozone trends due to changes in posterior NO x emissions (red and blue lines) only have positive correlations with TOAR measurements in both GC-adj and GCv12 simulations in Africa and Australia. Ozone measurements in 2014 decreased compared to the 2006 level in the US and Mexico. GC-adj simulations do not have significant trends in these regions, whereas GC-v12 simulations show increases in China, the US, and Mexico. Meteorological and non-NO x sources lead to larger interannual variations in ozone simulations than those driven by NO x emissions in South America, Australia, and Africa, where anthropogenic activities are much less than the other regions. These underscore the challenges of attributing observed variations in ozone to changes in NO x emissions at regional scales.

Western US remote ozone
Assimilations of ozone precursor gases have the potential to improve remote-ozone simulations, which can be used to provide boundary conditions for regional air quality models and to quantify and attribute sources of background ozone. We therefore focus specifically on remote regions in the US in this section to evaluate the vertical profile and surface concentrations of ozone simulations.

Evaluations with ozonesonde profiles
Field campaigns and routine observations of ozone concentrations along the west coast of the US have provided opportunities to understand regional and intercontinental influences on surface air quality . Evaluations with the IONS-2010 measurements in Fig. 10 show that the GCv12 simulations of ozone vertical profiles have negative biases (NMB between −8 % and −32 %) above all six sites. The standard deviations of ozonesonde and simulated profiles overlap with each other (see Fig. S9). The GC-adj simulations have positive biases at San Nicolas and Trinidad Head and have smaller negative biases (NMB between −3 % and −11 %) at the remaining sites than the GCv12 simulations. The magnitudes of the NMSE and NMB of the GCv12 simulations at 700-900 hPa are also larger than those of the GC-adj simulations (see Fig. S10). The prior simulations in GCv12 apply NO x emissions at different altitudes, whereas the posterior GCv12 and all GC-adj simulations apply all NO x emissions to the surface. This leads to different transport and formation of ozone at different model layers and therefore causes larger differences in ozone simulations in the upper troposphere. The air masses at this altitude in the eastern Pacific are demonstrated to impact inland nearsurface ozone concentrations (Cooper et al., 2011;Lin et al., 2012;Yates et al., 2015). The different biases in ozone simulations close to the surface can be explained by the usage of different emission inventories (e.g., different biogenic emissions) and different boundary layer mixing schemes (nonlocal mixing  in GCv12 and full mixing in GC-adj). The different chemical mechanisms in the two model versions affect the different model biases especially in the upper troposphere. For instance, inclusion of halogen chemistry and additional chlorine chemistry in GCv12 leads to 19 % and 7 % decreases of global tropospheric ozone burden (Sherwen et al., 2016a;Wang et al., 2019). GCv12 simulations using the CEDS emissions have smaller NMSE and NMB than the simulations using the pos- Figure 9. Changes of regional mean annual MDA8 ozone concentrations compared to 2006 from TOAR measurements (magenta line), due to changes in bottom-up NO x emissions (black), due to changes in top-down NO x emissions (blue lines for simulations from GC-adj and red lines for simulations from GCv12), and due to changes in meteorology and non-NO x emissions (green lines). Only sites that have continuous measurements throughout the 9 years are included. The vertical bars represent the spread of changes from simulations using the NASA and the DOMINO posterior NO x emissions. The impacts of meteorology and natural sources are removed from black, blue, and red lines by subtracting simulations using 2010 bottom-up anthropogenic emissions for all years from simulations that use bottom-up NO x emissions corresponding to each year.
terior NO x emissions in all six sites in 2010. In comparison, the GC-adj simulations using the DOMINO posterior NO x emissions have the smallest NMSE and NMB at all sites except for San Nicolas and Trinidad Head, where the prior simulations have the smallest error and bias. Further evaluations with ozonesondes at Trinidad Head in 2016 are shown in Sect. S4 in the Supplement.

Evaluations with TOAR surface ozone measurements at remote sites
To further evaluate the model performance under different geographical scenarios, we compare surface ozone simulations from GC-adj and GCv12 with observations from simple to complex environments. These include (1)  daytime, representing high-elevation rural sites during wellmixed daytime conditions. The coefficients of determination (see Table S2) between the simulations and the measurements are larger than 0.6 for all daytime ozone comparisons except for Mt. Bachelor Observatory. The correlation coefficients are smaller than 0.5 for all nighttime comparisons, reflecting the need to further improve simulations of nighttime chemistry and atmospheric processes. In Fig. 11, the surface ozone concentrations from both GC-adj and GCv12 simulations have low biases compared to the surface measurements at remote sites. These low biases in the GCv12 simulations are consistent with its performances when evaluated with ozonesondes from IONS-2010 and with daytime surface ozone at the global scale. However, the low biases in the GC-adj simulations are different from its high biases when compared with the global surface ozone concentrations and the ozone profiles at San Nicolas and Trinidad Head. This demonstrates the different biases in ozone simulations at rural and urban sites. Simulations using the DOMINO posterior emissions have the smallest NMSE and NMB at all remote sites except for the GCv12 simulations at Mauna Loa at night and Great Basin National Park during the daytime.

Discussion and conclusions
We performed global inversions of NO x emissions from 2005 to 2016 using two widely used OMI NO 2 retrievals from NASA (OMNO 2 v3) and KNMI (DOMINO v2). Different vertical sensitivities from the two retrievals are a major cause of the discrepancies in the posterior emissions. The DOMINO posterior NO x emissions have a larger magnitude than the prior emissions and the NASA posterior emissions. Consequently, GC-adj simulations using the DOMINO posterior NO x emissions have the smallest negative bias in surface NO 2 and the smallest positive bias in 2 m ozone. The impact of NO 2 assimilations on improving estimates of the GCv12 surface ozone simulations depends upon the ozone metrics, suggesting inaccurate diurnal variations in the surface ozone simulations. GEOS-Chem simulations using the DOMINO posterior emissions have the largest coefficients of determination for summertime daytime (R 2 = 0.81) and summertime 24 h (R 2 = 0.96) ozone. Simulations using the NASA posterior emissions have the smallest bias and error for all ozone metrics and the largest correlation for summertime MDA8 ozone (R 2 = 0.88). Ozone simulations with GEOS-Chem v12.1.1 using the DOMINO posterior NO x emissions lead to the most consistent seasonality in 24 h average ozone (R 2 = 0.99) with TOAR measurements, while the NASA posterior emissions lead to the best agree- Posterior NO x emissions lead to improved simulations of ozone at several remote sites in the western US. The GC-adj simulations using the DOMINO posterior emissions have the smallest NMSE and NMB compared to ozonesonde measurements during IONS-2010, except for the San Nicolas and Trinidad Head sites. At the remote surface sites evaluated in this study, surface ozone simulations using the DOMINO posterior emissions have the best performance except for GCv12 simulations at Mauna Loa at night and Great Basin National Park during the daytime. The reduced negative biases in daytime surface ozone simulations using the DOMINO posterior emissions at these remote sites and at most IONS-2010 sites are consistent with the increases of daytime remote ozone in the western US through NO 2 and ozone data assimilation in Huang et al. (2015). Simulations using the DOMINO posterior emissions are demonstrated to provide more precise magnitudes at these remote sites and can potentially be used as boundary conditions for regional air quality models for further air pollution and health studies.
The remaining differences between simulated and measured ozone can be explained by the roles of VOCs, errors in satellite retrievals, and uncertainties in the chemical and physical processes in the model simulations. In addition to NO x , emissions of other ozone precursors also impact the accuracy of ozone simulations. For instance, inversion of isoprene emissions over the southeast US decreases surface ozone simulations by 1-3 ppbv (Kaiser et al., 2018). Inversion of non-methane VOC emissions changes surface afternoon ozone simulations by up to 10 ppbv in China (Cao et al., 2018). Assimilation of multiple species (e.g., ozone, CO, HNO 3 , and SO 2 ) together with NO 2 may improve posterior ozone simulations, but the performance of posterior simulations may depend on the chemical transport model, as shown in Miyazaki et al. (2020), where the GEOS-Chem adjoint model v35 shows mixed performance in correcting the bias between ozonesonde and posterior simulations between 850 and 500 hPa at different latitude bands. Both OMI NO 2 retrievals employed in this study use NO 2 vertical shape fac-tors from coarse-resolution simulations and therefore are biased low compared to in situ measurements (Goldberg et al., 2017). These retrievals also have not explicitly accounted for the aerosol optical effects, which are demonstrated to degrade the accuracy of NO 2 column concentrations when aerosol optical depth (AOD) is very high (Chimot et al., 2016;Liu et al., 2019;Cooper et al., 2019). The differences in the magnitude of ozone concentrations from GC-adj and GCv12 reflect the impact of other species emissions and chemical mechanisms on the bias of ozone simulations. Previous studies also show that global simulations at coarse resolution are not able to capture the observed persistence of chemical plumes in the free troposphere on intercontinental scales, leading therefore to underestimates of remote-ozone concentrations Zhuang et al., 2018).
Although biases, errors, seasonalities, and interannual variations of ozone simulations have been improved in several cases through constraints on NO x emissions, there are still large discrepancies in the vertical profile and diurnal variations between ozone simulations and measurements. For instance, the different performances of each set of NO x emissions on the simulations of different ozone metrics reflect errors in the ozone diurnal simulations. The differences in ozone vertical profiles suggest errors in vertical transport in the model. These discrepancies could not be improved by adjusting only surface NO x emissions using observations at one time of the day, as performed in this study. Future geostationary satellite observations will provide opportunities to update NO x emissions at every hour. Separately constraining NO x emissions from the surface (e.g., anthropogenic sources) and the upper atmosphere (e.g., lightning sources, Pickering et al., 2016) as well as implementing these posterior NO x emissions at their corresponding vertical levels can potentially improve the vertical profile of ozone simulations.
Author contributions. ZQ, DKH, ORC, and JLN designed the research; ZQ performed the research and prepared the paper with help from all authors.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. Part of the computing resources supporting this work was provided by the NASA High-End Computing (HEC) program through the NASA Advanced Supercomputing (NAS) division at Ames Research Center. Zhen Qu would also like to acknowledge high-performance computing support from Cheyenne (https://doi.org/10.5065/D6RX99HX, NCAR, 2020) provided by NCAR's Computational and Information Systems Laboratory, sponsored by the National Science Foundation.
Financial support. This research has been supported by the National Aeronautics and Space Administration (grant nos. HAQAST NNX16AQ26G and ACMAP NNX17AF63G).
Review statement. This paper was edited by Robert Harley and reviewed by two anonymous referees.