European NOx emissions in WRF-Chem derived from OMI: impacts on summertime surface ozone

Ozone (O3) is a secondary air pollutant that negatively affects human and ecosystem health. Ozone simulations with regional air quality models suffer from unexplained biases over Europe, and uncertainties in the emissions of ozone precursor group nitrogen oxides (NOx = NO + NO2) contribute to these biases. The goal of this study is to use NO2 column observations from the OMI satellite sensor to infer top-down NOx emissions in the regional meteorology-chemistry model WRF-Chem, and to evaluate the impact on simulated surface O3 with in situ observations. We first perform a simulation for July 2015 5 over Europe and evaluate its performance against in situ observations from the AirBase network. The spatial distribution of mean ozone concenctrations is reproduced satisfactorily. However, the simulated maximum daily 8-hour ozone concentration (MDA8 O3) is underestimated (mean bias error (MBE) = -14.2 μg m-3), and its spread is too low. We subsequently derive satellite-constrained surface NOx emissions using a mass balance approach based on the relative difference between OMI and WRF-Chem NO2 columns. The method accounts for feedbacks through OH, NO2’s dominant daytime oxidant. Our optimized 10 European NOx emissions amount to 0.50 Tg N (for July 2015) 0.18 Tg N higher than the bottom-up emissions (which lacked agricultural soil NOx emissions). Much of the increases occur across Europe, in regions where agricultural soil NOx emissions dominate. Our best estimate of soil NOx emissions in July 2015 is 0.1 Tg N, much higher than the bottom-up 0.02 Tg N natural soil NOx emissions from the MEGAN model. A simulation with satellite-updated NOx emissions reduces the systematic bias between WRF-Chem and OMI NO2 (slope = 0.98, r2 = 0.84), and reduces the low bias against independent surface NO2 15 measurements by 1.1 μg m-3 (-56%). Following these NOx emission changes, daytime ozone is strongly affected, since NOx emission changes particularly affect daytime ozone formation. Monthly averaged simulated daytime ozone increases by 6.0 μg m-3, and increases of >10 μg m-3 are seen in regions with large emission increases. With respect to the initial simulation, MDA8 O3 has an improved spatial distribution, expressed by an increase in r2 from 0.40 to 0.53, and a decrease of the mean bias by 7.4 μg m-3 (48%). Overall, our results highlight the dependence of surface ozone on its precursor NOx and demonstrate 20 that simulations of surface ozone benefit from constraining surface NOx emissions by satellite NO2 column observations.


Introduction
Ozone (O 3 ) is an air pollutant that affects human and ecosystem health (Lelieveld et al., 2015;Ainsworth et al., 2012). It also affects radiative forcing directly as a greenhouse gas (IPCC, 2013), and indirectly by impacting ecosystem carbon uptake via deposition (Sitch et al., 2007). Despite decreases in ozone concentrations in Europe since 2000 (Chang et al., 2017), peak ozone concentrations still exceed the WHO air quality guideline of 100 µg m -3 and the European long-term objective of 120 µg m -3 (EMEP/CCC, 2016). For example, 87% of European air quality stations did not meet this long-term objective (EEA, 2017) in 2015, and vegetation exposure thresholds were exceeded in large parts of the continent during this year, particularly in Southern and Central Europe (Rouïl and Meleux, 2018). 5 The formation of ozone in the lower troposphere is a photochemical process that depends nonlinearly on concentrations of its precursor species nitrogen oxides (NO x = NO + NO 2 ) and volatile organic compounds (VOCs) (e.g. Sillman et al., 1990). In NO x -limited conditions, ozone production increases with NO x emissions and is less sensitive to VOC emissions.
However, ozone production under NO x -saturated conditions increases with VOC emissions, but decreases with increasing NO x emissions. European NO x emissions are dominated by the anthropogenic contribution from fossil fuel combustion for 10 transportation, electricity generation and industry. In summer, there are additional contributions from soils and lightning, which together comprise 40% of the total European NO x emission budget (Jaeglé et al., 2005). Soil NO x emissions in turn have an anthropogenic component, since nitrogen-containing fertilizers are partly re-emitted to the atmosphere as NO x (Steinkamp and Lawrence, 2011).
Anthropogenic emissions in Europe have decreased due to air pollution abatement measures and the economic crisis that 15 started in 2008 (Castellanos and Boersma, 2012). Bottom-up anthropogenic emission inventories suggest a continued reduction of NO x emissions in more recent years. This is consistent with the ongoing development of European air quality conditions towards the NO x -limited regime (Jin et al., 2017), which is projected to continue in the future (Beekmann and Vautard, 2010).
On the other hand, a decrease in European anthropogenic and natural NO x emissions is not supported by trend analysis of remote sensing and in situ NO 2 observations (Jiang et al., 2019, submitted), although this potentially reflects a growing relative 20 contribution from natural NO x emission sources (Silvern et al., 2019). Nevertheless, downward anthropogenic emission trends have been suggested as an important driver of the decreasing trend in peak ozone concentrations in Europe (ETC/ACM, 2016).
Regional air quality (AQ) models are important tools for studying and forecasting ozone pollution. These models simulate processes relevant for ozone pollution at a resolution that can better capture observed spatial gradients compared to coarser global models. Regional AQ models can therefore be applied to simulate polluted conditions in or surrounding urban areas, 25 or for air quality impact assessments. Coupled (or "online") meteorology-chemistry models resolve meteorology, transport, chemical transformation and removal of pollutants at the same spatial and temporal resolution. The coupled treatment of meteorology and chemistry is mandatory, because ozone concentrations depend on feedbacks between meteorological and chemical processes: 1) O 3 sources such as chemical formation depend on radiation, temperature and water vapour (Pusede et al., 2015;Coates et al., 2016), and 2) O 3 sinks, such as dry deposition, also largely depend on meteorological drivers 30 (Clifton et al., 2017;Kavassalis and Murphy, 2017). However, coupled regional air quality models are subject to several sources of uncertainties. These uncertainties are related to the limited knowledge on ozone precursor emissoins (Kuenen et al., 2014;Pouliot et al., 2015), the representation of boundary conditions , tropospheric chemistry in the chemical mechanism , and the land surface and its feedbacks with tropospheric chemistry (Baklanov et al., 2014). 35 Many regional AQ models have been applied to simulate NO x and O 3 in European summers, for research and forecasting purposes. Models tend to underestimate summertime NO x compared to rural background in situ observations Mar et al., 2016). Comparison against satellite NO 2 column observations also revealed underestimations at regional scales (Huijnen et al., 2010;Aidaoui et al., 2015). Another study found both positive as well as negative biases, which were attributed to the coarse resolution of the emission inventories (Pope et al., 2015). AQ models satisfactorily reproduce the 5 spatial distribution in summer O 3 . However, mean O 3 can be under-or overestimated depending on the model and chemical mechanism Mar et al., 2016). In addition, many models consistently underestimate peak ozone values that typically occur in the afternoon (Tuccella et al., 2012;Solazzo et al., 2012;Marécal et al., 2015;Im et al., 2015). This is problematic for air pollution impact assessments, since the peak ozone values are important for determining the detrimental effects on human health and ecosystems. 10 The sensitivity of O 3 to its precursor NO x , which is particularly pronounced in summer (e.g. Jin et al., 2017), suggests that there is good potential to improve O 3 simulations by constraining simulated NO x with observations. The past 20 years have seen the development of methods to estimate NO x emissions with satellite-based NO 2 columns in a mass balance approach, where biases in the model-simulated and satellite-observed NO 2 columns are used to update NO x emissions. The technique has been applied in global models (Martin et al., 2003;Lamsal et al., 2008;Vinken et al., 2014a), and more recently also in 15 regional models (e.g. Ghude et al., 2013). Applications of the technique include emission trend analysis (e.g. Lamsal et al., 2011) and source-specific constraints on NO x emissions (e.g. Ghude et al., 2013;Vinken et al., 2014a, b;Verstraeten et al., 2015). Changes in NO x emissions impact tropospheric chemistry, and therefore changes in O 3 are expected. This was shown by Ghude et al. (2013), who found local changes in surface O 3 mole fractions up to 10 ppb over India after satellite-based NO x emission scaling. Verstraeten et al. (2015) reported ozone increases up to 8 ppb at 800 hPa (±1.5 km) in China after scaling 20 local NO x emissions with OMI observations, and found that simulated free-tropospheric ozone between 3-9 km was in better agreement with tropospheric O 3 columns observed by the Tropospheric Emission Sounder. However, ozone changes at the surface after constraining NO x emissions with satellite observations have thus far not been evaluated with in situ data to our knowledge.
Considering the importance of NO x for simulations of ozone and the previously reported ozone changes after applying 25 satellite-based NO x emissions, we here investigate the potential improvement in simulated surface ozone concentrations over Europe due to the application of satellite observations of NO 2 to adjust NO x emissions. To this end, we use the WRF-Chem meteorology-chemistry model (Grell et al., 2005) to simulate surface ozone in Europe in July 2015, at the approximate peak of the ozone season. We first perform a model evaluation with AirBase in situ NO 2 and O 3 observations (EEA, 2018) and OMI NO 2 column measurements from the recently released QA4ECV dataset (Boersma et al., 2017a). We subsequently derive a 30 new, OMI-based ("top-down") NO x emission inventory, and evaluate its effects on WRF-Chem simulations of surface NO 2 and O 3 with the independent AirBase observations. The structure of the paper is as follows. We describe the model set-up and observations in section 2. Section 3 presents the method to calculate OMI-derived NO x emissions. In section 4, we evaluate a WRF-Chem set-up with bottom-up emissions in situ and column observations, and in section 5 we describe the derived modified surface NO x emissions. We evaluate the 35 impacts on surface NO x and O 3 with independent in situ observations in section 6. We conclude with a discussion (section 7) and summarize our conclusions in section 8.

WRF-Chem
We perform simulations with the coupled meteorology-chemistry model WRF-Chem, version 3.7.1 (Grell et al., 2005). The 5 model domain consists of 170 by 170 cells at 20×20 km 2 horizontal resolution covering Europe, centered at 51.98 • N and 5.66 • E. Vertically, the domain extends from the Earth's surface up to 50hPa, and consists of 27 layers with 13 layers in the lowermost 1500m. Chemistry simulations of O 3 and its precursor groups NO x and VOCs are performed with the CBM-Z gasphase chemical mechanism (Zaveri and Peters, 1999). Simulations of atmospheric chemistry with this mechanism compare well with the European multi-model mean for summer O 3 in a gas-phase mechanism comparison study . 10 A complete list of parameterization options adopted in our WRF-Chem setup can be found in Table 1 of the Supplement. Our simulations were performed with a time stepping of 180 s for a period of 38 days (24 June -31 July 2015), allowing a 1- week spin-up to analyze the model output for July. An evaluation of large-scale meteorological performance with ERA-Interim reanalysis fields can be found in Sect. 2 of the Supplement.
We used anthropogenic emissions from the TNO-MACC-III inventory (Kuenen et al., 2014) for 2011, the most recent in- 15 ventory available when the model experiments were performed. TNO-MACC-III contains anthropogenic emissions for lumped species groups NO x and VOCs. NO x emissions were partitioned assuming that 97% is emitted as NO and 3% as NO 2 . VOC emissions were divided over 15 emission categories in CBM-Z, following the VOC speciation by Archer-Nicholls et al. (2014).
This speciation procedure is further described in Table 3 of the Supplement. Point source emissions were distributed over the five lowermost model layers following sector-specific emission altitude profiles (Bieser et al., 2011). 20 Biogenic emissions of VOCs and soil NO x were calculated online with the MEGAN model implementation within WRF-Chem (Guenther et al., 2006(Guenther et al., , 2012. The domain-total biogenic isoprene emissions are 1.82 Tg of isoprene, which is slightly lower than the 9-year spread of 2-4.5 Tg isoprene for July, based on an inverse modeling study using OMI HCHO column measurements for 2005-2013 (Bauwens et al., 2016). We simulate lightning NO x emissions using a parameterization based on cloud-top height (Price and Rind, 1993;Wong et al., 2013), using a flash rate of 80 mol flash −1 based on a recent satellite-based 25 estimate (Pickering et al., 2016). Simulations with higher flash rates of 500 mol flash −1 (Ott et al., 2010) and 310 mol flash −1 (Miyazaki et al., 2014) resulted in overestimated upper-tropospheric contributions to the NO 2 columns relative to OMI.
Anthropogenic emissions are the dominant NO x source over Europe in July with a total monthly emission strength of 304 Gg N (76%). Minor contributions are associated with lightning (81.4 Gg N; 20%) and soils (15.0 Gg N;4%). We note that especially soil NO x emissions are low compared to previous studies, in which soils, including agricultural areas, have been 30 estimated to contribute 40% to the total European NO x emission budget (Jaeglé et al., 2005;Ganzeveld et al., 2010).

AirBase NO 2 and O 3 in situ measurements
Surface measurements are taken from the European Air Quality Data Portal operated by the European Environment Agency, 5 hereafter referred to as AirBase (EEA, 2018). We used all data at rural background stations from the validated E1a data stream. The large availability of the data allows us to make a strict selection on data availability. For monthly averages, we discard stations if data is missing for more than 24 hours. Stations used for the evaluation of monthly averages at 12:00 h UTC may have a maximum data gap of 1 data point. This resulted in a final selection of 184-397 stations, depending on the performance metric (see Table 1). In our analysis of O 3 and NO 2 we evaluate monthly time series and mid-day (12:00 h UTC) 10 concentrations (denoted as [O 3 ] 12h and [NO 2 ] 12h , respectively). We additionally calculate the maximum daily 8-hour mean ozone concentration (MDA8 O 3 ), a widely applied metric for O 3 health impacts.

OMI NO 2 column measurements
We use tropospheric NO 2 columns from the Ozone Monitoring Instrument (OMI) onboard NASA's EOS Aura mission (Levelt et al., 2006). The polar-orbiting instrument detects radiation backscattered from the Earth's atmosphere. Retrieval of tropo-15 spheric vertical column densities (VCDs) from space follows a three-step procedure . First, total slant columns (SCDs; i.e., columns along the average light path through the atmosphere) are obtained from a spectral fit to the OMImeasured reflectance spectra in the visible wavelength range using the Differential Optical Absorption Spectroscopy (DOAS) method. Then, the stratospheric contribution component is separated from the total NO 2 column via data assimilation into the TM5 global Chemistry Transport Model (Dirksen et al., 2011). The final step is to obtain tropospheric VCDs by dividing the 20 SCDs by a tropospheric Air Mass Factor (AMF) that describes the vertical sensitivity of the instrument to atmospheric NO 2 (Eskes and Boersma, 2003). This is a function of satellite viewing geometry, surface albedo, terrain height, cloud properties, and a priori NO 2 profile.
The recent EU FP7 project Quality Assurance for Essential Climate Variables (QA4ECV) has led to the development of a new OMI NO 2 data product (Boersma et al., 2017a). The underlying consortium retrieval algorithm is based on the NO 2 column 25 retrieval principles described in Boersma et al. (2007), but with improvements in the three aforementioned steps . Zara et al. (2018) described how better wavelength calibration, and inclusion of liquid water absorption and an intensity offset-correction reduced uncertainties in NO 2 SCDs to 0.7 − 0.8 × 10 15 molec. cm −2 (up to ±35 %). Lorente et al. (2017) improved the AMF calculation method via the extension of the AMF look-up table with more reference points, and a correction for the sphericity of the atmosphere. The ancillary data for the AMF calcultion has also improved relative to earlier algorithms 30 such as DOMINO v2 : surface albedo from the 5-year OMI albedo climatology (Kleipool et al., 2008), cloud information from the improved OMI O 2 -O 2 algorithm (Veefkind et al., 2016), and a priori NO 2 profiles from TM5-MP at (Williams et al., 2017). The study by Lorente et al. (2017) also showed that substantial differences between AMFs arise when different a priori NO 2 profiles (as well as surface albedo and cloud properties) are used in the retrieval. This underlines that a re-calculation of the tropospheric AMFs based on simulated WRF-Chem 20 × 20 km 2 , replacing the coarse TM5-MP 1 • × 1 • NO 2 profiles, may help to reduce model-satellite differences (Lamsal et al., 2010;Vinken et al., 2014b), and we will explore this further below.
2.4 AMF re-calculation 5 We take care to remove inconsistencies in the model-satellite comparison introduced by different assumptions about the vertical NO 2 profile in the satellite product compared to the model. The AMF calculation requires assumptions about the vertical profile of NO 2 to convert slant columns into vertical columns. We replace the a priori TM5-MP NO 2 profiles (at 1 • × 1 • ) by WRF-Chem NO 2 profiles at a 20×20 km 2 resolution. This has two advantages: 1) model-satellite comparisons are no longer affected by differences in model assumptions between WRF-Chem and TM5-MP that lead to different vertical NO 2 profiles, and 2) the 10 higher resolution WRF-Chem setup resolves spatial gradients in the a priori profile that are not appropriately captured in TM5-MP due to the coarser model resolution. Single-orbit results indicate that re-calculation of the AMFs leads to retrieved columns that are 1 × 10 15 molec. cm −2 higher in densely populated areas, and lower or unaffected in surrounding non-urban regions.
This effect has been seen before in earlier studies (Huijnen et al., 2010;Heckel et al., 2011;Russell et al., 2011;Maasakkers, 2013;Vinken et al., 2014b). 15 We apply the method described by Lamsal et al. (2010) and Boersma et al. (2016) to replace the TM5-MP vertical NO 2 profile by the WRF-Chem profile in the calculation of the air mass factor (AMF): where M trop is the tropospheric AMF based on an assumed profile from WRF-Chem or TM5, A trop,l is the tropospheric averaging kernel element for layer l, x l,W RF −Chem is the NO 2 column density in model layer l, and L is the uppermost 20 TM5-MP layer in the troposphere. The tropospheric averaging kernel in Eq. 1 is defined as follows (Boersma et al., 2017b): where M and M trop refer to the AMF and the tropospheric AMF, respectively. Note that the WRF-Chem vertical NO 2 profile has been sampled at the TM5-MP vertical layer structure, so l refers to TM5-MP model layers.

Top-down NO x emissions: methods
Satellite-detected NO 2 columns are sensitive to NO x emissions at the surface. We exploit this dependence to derive satellite-25 based surface NO x emissions using local OMI NO 2 columns. We apply an improved version of the mass balance procedure (Martin et al., 2003;Lamsal et al., 2011;Vinken et al., 2014b), which accounts for non-linear feedback from NO x emission changes on NO 2 concentrations via OH: where E bu and E td represent NO x emissions from the bottom-up inventory (bu) and the satellite-based top-down estimate (td), respectively. C W C,bu represents the monthly-averaged NO 2 vertical column density (VCD) simulated by WRF-Chem, and C OM I,bu is the monthly averaged modified QA4ECV OMI NO 2 VCD using air mass factors based on the original WRF-Chem NO 2 vertical profile (C W C,bu , see Section 2.4). WRF-Chem NO 2 VCDs are co-sampled with valid OMI observations. We only use OMI and WRF-Chem data for pixels with valid satellite observations for at least 4 days in July 2015 to minimize the 5 random error in the satellite retrieval.
We account for the nonlinear NO x -OH chemistry feedback via a dimensionless scaling factor β, for which we performed a perturbation simulation with surface emissions increased by 20%: where C bu are the NO 2 columns after a WRF-Chem simulation with bottom-up NO x emissions, and ∆C bu,1.2 is the change in 10 NO 2 columns after perturbing bottom-up NO x emissions by +20%. In low-NO x environments, this perturbation leads to higher OH levels and thus to more efficient NO x loss to HNO 3 , so that a β > 1 is needed to achieve column agreement. In NO x -rich environments, however, OH levels are suppressed by enhanced NO x emissions so that the relative increase in NO 2 columns is larger than 20%, resulting in a β < 1. The use of β to account for the sensitivity of the NO 2 column to local emissions is essentially a linearization step of non-linear effects due to chemistry. 15 Application of Equations 2 and 3 would lead to updated NO x emissions, and consequently also to modifications in the WRF-Chem NO 2 profile shapes in response to the updates (e.g. Vinken et al., 2014b). This is accounted for via γ, which we also obtain from the simulation with +20% perturbed emissions: where C W C represents the WRF-Chem NO 2 vertical column density (VCD), and C OM I represent the OMI NO 2 VCD retrieved 20 using WRF-Chem NO 2 vertical profiles from the bottom-up simulation (C W C ), for the bottom-up (subscript bu) and emission perturbation simulation (subscript 1.2), respectively. Our approach to calculate γ differs from Vinken et al. (2014b), who derived γ from a separate simulation after accounting for β. Our approach requires one less forward simulation and is thus computationally more efficient, with little impact (<3%) on total derived emissions compared to the approach by Vinken et al.
(2014b). 25 We calculate the scaling factors β and γ for all land-based and shipping lane WRF-Chem cells based on monthly mean NO 2 columns (i.e., ocean-based pixels with emissions above a threshold value of 1 mol km -2 h -1 ). These pixels thus also include shipping lanes and offshore oil platforms. OMI-inferred emission changes are calculated locally, i.e. for each individual model cell for which the aforementioned data availability criteria are fulfilled. This differs from previous work where these factors were calculated for regions containing multiple model cells (Vinken et al., 2014a, b) or for individual pixels in global models 30 with a coarse resolution (e.g. Lamsal et al., 2011).
We discard the effect of transport of NO 2 away from the source region ('smearing'). In July, solar intensity in Europe is close to its annual peak, which means that the NO 2 lifetime is short due to efficient oxidation. Therefore, the clear-sky monthly mean NO 2 column difference between model and satellite is indicative of local NO x emission updates. Previous studies showed that this method reduces the model-satellite NO 2 column difference but does not resolve it completely (e.g. Vinken et al., 2014b;Ghude et al., 2013) as a result of the linearization that is applied in the perturbation calculation. Nonetheless, we will show in 5 this study that the systematic bias between WRF-Chem and OMI NO 2 columns is largely removed after application of Eqns. Our results are in agreement with previous regional chemistry model evaluations for Europe. Such studies typically focus  Table 1. We find that WRF-Chem reproduces the spatial distribution well, with peak NO 2 occurring in Northwest Europe and North Italy. In these regions with high NO x emissions, average WRF-Chem-simulated concentrations are however underestimated by up to 10 µg m -3 compared to observations. AirBase concentrations show a region with elevated NO 2 concentrations in Southwest Germany. WRF-Chem also shows elevated NO 2 20 concentrations in this region, but does not reach such elevated concentrations. Overall, WRF-Chem shows more spatial heterogeneity in surface NO 2 concentrations than is apparent from the observations. Observed NO 2 concentrations in background areas in Spain, France and Eastern Europe are 2-5 µg m -3 or higher, while the model consistently simulates values <2 µg m -3 in these regions. This overall underestimation is also seen in Fig. 8, where the simulated daily mean NO 2 concentration is shown against AirBase observations. The model performance of our WRF-Chem setup is in line with previous WRF-Chem studies. 25 Mar et al. (2016) found small overestimations (0.67-2.96 µg m -3 ) in mean NO 2 . Another study found an annual average mean bias of -0.9 µg m -3 , caused by underestimations of peak NO 2 in WRF-Chem (Tuccella et al., 2012).

2-4.
A comparison between WRF-Chem and AirBase monthly-averaged 12:00 h UTC NO 2 concentrations is presented in Figure   2c and d and Table 1. We find that WRF-Chem on average strongly underestimates mid-day NO 2 concentrations by 2.96 µg m -3 (38.5%).

NO 2 VCD
Before we perform a comparison between NO 2 VCDs from WRF-Chem and OMI, we first discuss the effect of the NO 2 profile shape on the OMI-retrieved columns. Figure 3 shows the change in the monthly-averaged OMI NO 2 column density after replacing TM5-MP NO 2 profiles by WRF-Chem profiles using the procedure described in Sect. 2.4. The OMI NO 2 VCDs change most prominently over urban/industrial areas such as the Netherlands, Paris, Berlin, Madrid, Milano and Rome.
The background areas are largely unaffected, or show small (± 0.2 ×10 15 molec. cm -2 ) NO 2 VCD increases (e.g. Spain) or decreases (regions in France, Germany, Poland, Ukraine and Romania). The vertical NO 2 profile over sea regions in western Europe strongly peaks at the surface, because shipping NO x in WRF-Chem is emitted in the lowermost model layer. Overall, 5 the average NO 2 column change over non-land regions is small (<2%).
We subsequently compare WRF-Chem to this modified OMI product. The monthly-averaged NO 2 vertical column densities from WRF-Chem and OMI are displayed in Fig. 4. The model is sampled at 12:00 h UTC, close to the OMI overpass time of ±13:30 h LT, and is co-sampled with valid satellite observations. There is good agreement in the spatial distribution of monthly-averaged NO 2 VCDs (r 2 = 0.68). NO 2 columns are underestimated by 0.3×10 15 molec. cm -2 on average, with strong 10 underestimations of up to 2×10 15 molec. cm -2 in urban and industrial northwestern Europe. WRF-Chem overestimates NO 2 columns in some isolated urban areas with high NO x emissions such as London, Madrid, Rome, and in parts of Eastern Europe.
We note that Fig. 4 shows small underestimations of the simulated NO 2 VCD compared to OMI (±0.2 × 10 15 molec. cm -2 ) in background regions (e.g. the Alps, rural Spain and France, Scandinavia) and over the oceans. Simulated NO 2 columns therefore show stronger spatial gradients than OMI-retrieved columns, which is in line with Huijnen et al. (2010). Other In the following section, we will derive satellite-constrained NO x emissions and discuss potential reasons for this mismatch.

Top-down emissions
We derive top-down NO x emissions using the method described in Section 3. Fig. 5 shows the July total bottom-up and topdown surface NO x emissions and their difference. Top-down NO x emissions amount to 498 Gg N, which is 56% higher than the bottom-up inventory, and increases occur across the domain (Fig. 5c). NO x emissions are reduced in several isolated grid 5 cells that generally correspond to urban areas. The difference between top-down and bottom-up emissions is larger than the 16% increase reported by Miyazaki et al. (2017), although that study found strong (40-67%) local increases in areas with high NO x emissions such as Belgium, western Germany and northern Italy.
Our top-down emissions are much higher than the bottom-up emissions over Germany and Poland. Over Belgium and the Netherlands, the difference between top-down and bottom-up emissions is also substantial, but notably smaller despite larger 10 differences between OMI and WRF-Chem NO 2 columns over the low-countries (Fig. 4c). This reflects the chemical regime with very high bottom-up NO x emissions in this region, resulting in suppressed mid-day OH concentrations, and consequently, longer NO 2 lifetimes (as diagnosed by low beta values over northwestern Europe in Supp. Fig. 1).
We subsequently replace bottom-up emissions with our observation-constrained top-down NO x emissions and perform a new WRF-Chem simulation. As expected, the new NO 2 columns agree much better with the OMI NO 2 columns than those 15 from the simulation with bottom-up emissions (Fig. 6). WRF-Chem with bottom-up emissions generally underestimates OMI NO 2 columns by 23.4%. As expected, the simulations with the top-down emissions agree better with OMI, and the slope of 0.98 between the new WRF-Chem and OMI NO 2 columns (Fig. 6b) suggests that the systematic underestimation in the model is effectively resolved by applying the top-down emissions. The mean relative error is reduced to -7.5%, and the spatial correlation coefficient between WRF-Chem and OMI NO 2 also improves considerably (from 0.68 to 0.84). discrepancy between the WRF-Chem-simulated and OMI-observed NO 2 VCD triggers to assess how much of this discrepancy 30 can be attributed to this model's representation of soil NO emissions.

Attribution to emission sources
To separate the soil NO x contribution from the anthropogenic emission updates, we perform a simple budget calculation as a first-order constraint on the partitioning of the top-down emissions between their anthropogenic and soil-based sources. We assume that the relative difference in anthropogenic sources is uniform over the emission bins in Fig. 7. This factor is calculated as the median of the relative change in emissions for the three highest bins (>50 Mg N cell -1 for July, see Fig. 7), and amounts to 0.22. This allows us to attribute the remaining emission difference to soils. Based on this crude first estimate, we derive topdown soil NO x emissions to be 112 Gg N month -1 , versus WRF-Chem/MEGAN-simulated bottom-up soil NO emissions of only 15 Gg N month -1 . The anthropogenic enhancement factor is relatively uncertain, but does not strongly impact our derived 5 posterior soil NO x emission estimate: if, instead of the median (m = 0.22), we use the mean relative change in emissions for the three highest bins (µ = 0.41), our soil contribution is still a factor >4 larger (69.0 Gg N month -1 ) compared to WRF-Chem's simulated bottom-up soil NO source. Therefore, this first-order estimation suggests that a substantial fraction (43-69%) of the NO x emission increment after optimization can be attributed to soils.
To evaluate the derived total soil NO x emissions, we perform a comparison with literature-based estimates in Table 2. We 10 find that bottom-up soil NO x emissions are underestimated by a factor 5-7 compared to previous studies. In some of those studies (e.g. Ganzeveld et al., 2010), land use management practices (fertilizer and manure application) provide a substantial contribution to European soil NO emissions, a feature that appears to be missing in the representation of soil NO emissions in WRF-Chem. This supports our hypothesis that a substantial fraction of the increase in surface NO x emissions may be attributed to soils. We will discuss this further in Sect. 7. still slightly underestimates the monthly averaged observed NO 2 observations, as indicated by a slope of 0.89. However, the low bias in WRF-Chem surface NO 2 concentrations with respect to AirBase improves from -2.5 to -1.1 µg m -3 .
Compared to the monthly average, we find little improvement in WRF-Chem's skill to predict surface NO 2 at 12:00 h UTC.
The model's low bias in NO 2 reduces from -3.0 to -2.6 µg m -3 and the index of agreement improves by only 0.02 (4%). This more modest improvement in performance can be understood from mid-day surface NO 2 concentrations being more strongly 25 driven by photochemical removal processes and boundary layer development than the 24-hour mean NO 2 levels, that are more sensitive to NO x emissions due to strongly reduced mixing and photochemistry at night. Fig. 8 Fig. 9 shows the relative biases between WRF-Chem and observed NO 2 as a function of (binned) bottom-up anthropogenic NO emission strength. Both the WRF-Chem simulations with bottom-up emissions (Fig. 9a) as well as the simulation with top-down emissions (Fig. 9b) show a low bias against OMI and AirBase for regions with low emissions, and a positive relative bias in regions with stronger emissions. The relative bias is however considerably reduced in the simulation with top-down NO x emissions, both at the surface and in the column. However, WRF-Chem still displays a stronger relative bias compared to AirBase than compared to OMI. This feature can likely be attributed to a difference in spatial scales between the 20 × 20 km 2resolution model versus the footprint area of local AirBase measurements, which can be easily influenced by a nearby NO x source that is less well captured in the model, due to instantaneous mixing over a larger volume. Another potential explanation 5 for lower relative bias of WRF-Chem compared to AirBase than compared to OMI is interference of in situ measurements with molybdenum converters (see Sect. 2.2). This is in line with our previous finding that the slope of the top-down NO 2 column regression fit approaches 1, while the slope of the fit for in situ NO 2 observations is still below 1. We also note that the spread in the relative bias compared to AirBase increased for the top-down simulation, with more positive relative bias values for all bins. Nonetheless, the results shown in Fig. 9 provide confidence regarding application of the model as a tool to reconcile 10 local-scale bottom-up emissions and concentrations with larger-scale remote sensing-based NO 2 measurements.

Ozone
Next, we address our main question whether the improved simulation of NO 2 leads to better model performance for surface ozone simulations. We find that WRF-Chem with top-down emissions improves upon the bottom-up simulation for both the 24-hour mean, as well as the 12:00 h UTC and MDA8 ozone metrics. The model index of agreement improves by 0.08-0.11

Discussion
In this study we demonstrate the added value of deriving satellite-based NO x emissions in (regional) air pollution models for simulations of summertime ozone, focusing on July 2015 over Europe. We use a modified version of the mass balance approach  The mass balance approach that we used to derive observation-constrained European NO x emissions has several important advantages over more formal inversion methods that are applied in the literature (e.g. Miyazaki et al., 2014Miyazaki et al., , 2017. The method is highly traceable due to the simple calculation of scaling parameters from model output for a baseline and perturbation simulation, and column NO 2 measurements. However, the linearization (see Sect. 3) oversimplifies the nonlinearity of the NO x -O 3 chemistry, which means that the model-satellite discrepancy is not resolved completely after one iteration. Additionally, 30 the approach is only applicable on a pixel-basis when the NO x lifetime is sufficiently short to discard the contribution of transport from adjacent model NO 2 columns. The model-satellite difference for a simulation we performed for March 2015 (not shown) shows less spatial heterogeneity over regions with a diffuse spatial distribution of NO x sources (e.g. Germany).
These shortcomings can be resolved by averaging the signal over multiple grid cells, or by applying more formal inversion methods.
Our results demonstrate that surface NO x emissions in our WRF-Chem configuration are increased substantially after applying an emission scaling approach. In a first-order budget calculation we derive that 43-69% of this total increase can be attributed to soil NO x . This is diagnosed from the notably higher relative increase in emissions in regions with moderate 5 anthropogenic emissions compared to regions with low and high anthropogenic emissions. We therefore conclude that the contribution of soil NO x to total surface emissions is likely underestimated in our model set-up. Additionally, our top-down soil NO x emission estimate, derived with a budget calculation, agrees well with previous estimates for European summer (Table 2).
Our findings are in line with a previous study (Oikawa et al., 2015) that, using WRF-Chem with MEGAN soil NO x emissions, found a strong underestimation of NO x emissions in a high-temperature agricultural region. 10 Several studies previously investigated the relation between soil NO x emissions and O 3 formation. For example, one study estimated that European soil NO x emissions contribute 4 ppb to the daily maximum concentration (Stohl et al., 1996). A sensitivity study by  indicates that a strong up-scaling of soil NO x emissions by a factor 5 indeed leads to a better representation of the peak ozone concentration. It has further been shown that an improved process-based representation of soil NO x emissions leads to MDA8 O 3 changes by up to 6 ppb (Rasool et al., 2016), and a reduced mean bias for ozone con- 15 centrations, particularly in agricultural areas (Rasool et al., 2019). Together, these findings provide support for the hypothesis that underestimated soil NO x emissions, in particular those from agricultural areas, contribute to underestimated peak ozone concentrations.
The comparison against in situ NO 2 observations from the AirBase network may be hindered by interference of reactive N species for measurements with molybdenum converters. The type of converter is not reported in the database. Literature-20 reported estimates of measurement overestimations due to this interference are 22% (Dunlea et al., 2007) and 5-18% (Boersma et al., 2009) at urban sites, and 20-42% at a rural site (Steinbacher et al., 2007). A correction factor can be applied to obtain corrected NO 2 measurements from observations using a molybdenum converter, which is on average 0.4-0.6 in summer, but with a large spread (0.2-0.8) (Lamsal et al., 2008(Lamsal et al., , 2010. The strongest corrections of molybdenum-based in situ NO 2 measurements are needed in remote environments, where NO x is a relatively smaller component of the total reactive nitrogen budget 25 compared to areas closer to NO x sources (Lamsal et al., 2008). We hypothesize that this can partially explain the remaining model-observation mismatch for NO 2 after the use of top-down emissions.
Despite the demonstrated improvement in ozone simulations, our simulation with OMI-derived top-down NO x emissions still misrepresents the high tail of the ozone distribution. We believe that there is a potential explanatory role for local to regional meteorological processes. The representation of several mesoscale phenomena requires a higher model resolution 30 than 20 × 20 km 2 . For example, Millán et al. (1997) demonstrated that local re-circulation of residual air masses from higher aloft, containing elevated O 3 transported aloft during previous days, can be entrained in the boundary layer and contribute substantially to air pollution episodes in southern Europe. This is supported by an analysis of measured ozone (precursors) in northeast Spain by Querol et al. (2017), where this mesoscale circulation pattern was found to contribute to concentrations that exceed the information threshold value set by the European Union (180 µg m -3 ), alongside contributions from locally emitted NO x and biogenic VOCs.
Simulations of surface ozone in AQ models are also impacted by the choice of chemical parameterization. Recently, several studies have investigated the influence of the chemical mechanism on simulated NO x and O 3 concentrations. Regarding ozone chemistry, chemical mechanisms differ predominantly in two aspects: 1) the grouping of VOC species in species categories 5 ("lumping") according to their chemical structure or number of C-atoms, and 2) the inorganic rate coefficients involved in the catalytic cycling of NO x , HO x and O x . Especially the latter aspect has a strong influence on simulated NO 2 concentrations, and can therefore influence the derivation of top-down emission estimates using satellite observations (Stavrakou et al., 2013). Coates et al. (2016) investigated the maximum ozone formation potential in different chemical mechanisms and found that mechanisms with lumped VOC categories led to lower ozone mixing ratios compared to a mechanism with a near-explicit 10 treatment of VOCs. Knote et al. (2015) found small differences in inorganic rate constants among mechanisms and thus concluded that VOC representation was the dominating source of uncertainty among mechanisms. However, Mar et al. (2016) performed a WRF-Chem sensitivity study where MOZART inorganic rate constants were applied within RADM2, leading to mean O 3 concentration differences of 8 µg m -3 between those mechanisms.
In order to test the importance of inorganic NO x -HO x -O x reaction rates for ozone formation, we implemented inorganic rate 15 constants from three different mechanisms (CBM-Z, RADM2 and MOZART) in a mixed layer model with simplified chemistry (Janssen et al., 2012). Further details are given in Sect. 5 of the Supplement. Our analysis shows that varying the temperaturedependent rate constant of HNO 3 formation (k NO2 + OH ) can lead to a spread of 2 ppb for end-of-afternoon ozone values on a typical summer day in a polluted boundary layer. CBM-Z uses the lowest k NO2 + OH among the considered mechanisms, and thus leads to a higher NO 2 lifetime and more O 3 formation than in other mechanisms. Therefore, we conclude that modification 20 of inorganic reaction rate constants has a modest effect on simulated O 3 , but is not likely to lead to increases in simulated O 3 in our WRF-Chem configuration. Nevertheless, the model representation of ozone chemistry should be carefully considered in NO x and O 3 air quality studies, besides the representation of NO x emissions.
Several studies have considered the resolution dependence of air quality simulations. This is especially relevant for NO 2 , since NO x emissions display strong variation on the 20 × 20 km 2 scale applied in this study. Increasing model resolution leads 25 to better representation of these local gradients and therefore improves simulations of NO 2 concentrations . Valin et al. (2011) found that an accurate representation of mid-day NO 2 columns from highly localized sources requires a high model resolution, but regions with more diffuse sources can be simulated at a coarser resolution of ±10 × 10 km 2 . Although ozone production regimes do not strongly depend on the model resolution in regional models, high resolution models perform better at simulating local O 3 titration in freshly emitted NO plumes (Cohan et al., 2006). 30 Besides the representation of meteorological processes, there is an additional uncertainty related to surface-atmosphere exchange of pollutants. Dry deposition constitutes 17% of the tropospheric sink of ozone, and is the second most important removal process after chemical removal (Hu et al., 2017). Several studies have recently investigated the role of meteorological drivers that determine ozone removal at the surface. However, these meteorological controls are oversimplified in deposition parameterizations. The vapour pressure deficit strongly controls stomatal uptake of ozone, thereby affecting surface ozone 35 levels in spring to summer in the United States (Kavassalis and Murphy, 2017). Analysis of 10-year O 3 flux observations in the northeastern United States revealed that the removal of ozone by the land surface exhibits a strong inter-annual variability, which is not captured in dry deposition parameterizations (Clifton et al., 2017). Lastly, the role of soil moisture has been proposed as a regulator of surface ozone uptake (Tawfik and Steiner, 2013) and is often neglected in parameterizations of dry deposition, even though a recent study found that it can significantly reduce simulated ozone uptake (Anav et al., 2017).
Improving the biophysical representation of the dry deposition process in WRF-Chem will be one of our foci in the future.
Future studies that apply satellite-based constraints on surface NO x emissions can benefit from observations from the recently launched TROPOMI instrument (Veefkind et al., 2012), which delivers NO 2 column data at an unprecedented resolution of 7 × 3.5 km 2 . This has the potential to lead to important improvements in satellite-constrained NO x emissions. Recent work (Lorente et al., 2019, in review) has applied TROPOMI observations in a column model study to derive emissions from Paris. 10 The resolution of the instrument additionally enables the focus on more local areas with one dominating source such as soils in agricultural or bare-soil regions.

Conclusions
We performed a WRF-Chem simulation of NO x and ozone over Europe for July 2015 and assessed its performance with AirBase in situ observations and OMI NO 2 column measurements. We find that WRF-Chem underestimates high surface A WRF-Chem simulation with optimized NO x emissions removes the model's systematic bias with respect to OMI NO 2 , and leads to an improved spatial agreement (slope = 0.98, r 2 = 0.84). An evaluation against AirBase NO 2 reveals that the top-down simulation improves particularly in the monthly average, where the systematic mismatch is reduced (slope = 0.89 instead of 0.73) and the mean bias is reduced by 50%. For ozone, the model skill improves particularly for mid-day and MDA8 O 3 , when local ozone formation occurs and the sensitivity of ozone formation to NO x concentrations is highest. On average, surface O 3 concentrations increase by 6 µg m -3 (6%). Still, peak (mid-day) ozone values are underestimated after NO x emission optimization.

5
Overall, our findings demonstrate that air quality model simulations combined with in situ and remote sensing observations can be used to infer missing sources of NO x at the surface. By optimizing NO x emissions with satellite observations, substantial improvements in simulated ozone can be achieved. Our work shows that this helps to reduce the persistent biases in O 3 that most air quality models are suffering from. Projected decreasing trends in anthropogenic NO x emissions will mean that the contribution of soils to total European NO x emissions will likely increase in the future, and thus deserves careful attention in 10 (European) air quality assessments, along with detailed assessments of emissions of volatile organic compounds and wildfires, boundary layer mixing, and chemistry.
Code and data availability. WRF-Chem output and re-calculated OMI NO2 columns are available upon request, as well as scripts to recalculate the tropospheric AMF and the resulting changes in satellite NO2 columns.
Author contributions. AV, KFB and LG designed the experiment. AV performed the model simulations and analysis, with support from all 15 co-authors. AV wrote the manuscript, with contributions from all co-authors.
Competing interests. The authors declare no competing interests.